Introduction to Spatial Network Forecast with R
First published 2019
Preface
Cover image by Gerd Altmann
The online version of this book is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
ISBN 978-1-9162552-0-3
Welcome to Introduction to Spatial Network Forecast with R. This tutorial book is intended to provide a comprehensive introduction to forecasting strategies of network data.
We use R throughout the book, and we intend students to learn how to forecast with R. The concepts and tools are taught on the same framework, so you will learn the theory and the implementation at the same time. R is free and it is a broad based application for statistical analysis in general, not just for forecasting.
A good introduction to R language can be found in: https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf. The R Studio online learning resources have tutorials and links on various aspects of the R language and R Studio environment: https://www.rstudio.com/online-learning/. And finally, the book R for Data Science presents an excellent introduction of data analytics with R and can be found in https://bookdown.org.
Most quantitative prediction problems use either time series data (collected at regular intervals over time) or cross-sectional data (collected at a single point in time). In this book we are concerned with forecasting future data, and we concentrate on the time series domain. Anything that is observed sequentially over time is a time series. In this book, we will only consider time series that are observed at regular intervals of time (for example, hourly, daily, weekly, monthly, quarterly, annually). Irregularly spaced time series can also occur, but are beyond the scope of this book.
A number of practical decisions problems in business fall into a category known as network flow problems. These problems share a common characteristics: they can be described in a graphical form knows as a network. This book focusses on the short term forecasting1 of transportation network flow problem. Travel time data collected using automatic number plate recognition data as part of Transport for London Congestion Analysis Project2 are applied throughout the book.
More advanced or detailed treatment of the methods can be found in the reference section of each chapter. Comments, feedbacks or any additional information, please send an email to ucesljl@ucl.ac.uk.
Motivations and Approach
Network science literature has gained popularity to a broad spectrum of areas such as engineering, economics and geography over the last decade. Most of the approaches focus on analysing characteristics and opportunities of enhancing the effectiveness of a network system. Many researches centre on questions regarding structural and dynamics issues, as well as their implications to the outcomes of a network system.
This book demonstrates the mechanics of modeling and solving network problems and statistical forecasting. The document drives the students to the methodologies with a description of each applied method. Furthermore, by means of practical sections, the book offers a comprehensive pathway to modeling and solving techniques required to each forecasting methodology.
The book also aims to equip the reader with an understanding of the principles underlying each methodology, the essential of data analytics and visualisation technique. Additionally, a full charter is devoted to principles of exploratory spatio-temporal data analysis and visualisation in order to offer the foundations of spatio-temporal autocorrelation and spatial-temporal modelling, an essential part to tackle machine learning and spatio-temporal series analysis and forecasting.
By the end of the book, the student would have the essential skills to apply time series modelling to forecast travel times of network data of urban road, generate insights into spatio-temporal dataset to uncover patterns in spatial, temporal and spatio-temporal data and to evaluate methodologies and discuss strengths and weaknesses of each model.
Prerequisites
To get the most out of this book, it is assumed that you are numerically literate and you also have some programming experience already.
To run the code of this book, you need fisrt to install RStudio and have tidyverse packages installed. RStudio is an integrated development environment, or IDE, for R programming. You can download and install it from http://www.rstudio.com/download. Further instructions of how to set-up R can be accessed in the Appendix.
Base packages are preinstalled and can be loaded using library('package_name')
. Libraries from a repository called CRAN (the Comprehensive R Archive Network, https://cran.r-project.org) need to be installed first using install.packages('package_name')
before they can be loaded using library.
Conventions
Throughout the book we use a consistent set of conventions to refer to code:
Functions are in a code font and followed by parentheses, like
sum()
, ormean()
.Used for program listings to refer to program elements such as variable or function names, databases, data types, element variables, statements and keyword, are in a code font, without parentheses, like
data
orx
.If we want to make it clear what package an object comes from, we will use the package name followed by two colons, like
dplyr::mutate()
.
We also employ some specific typographical conventions:
italic indicates new terms, reference sources, filenames and file extensions or name of R packages.
Constant width italic shows text that should be replaced with use-supplied values or by values determined by context.
For your convenience (for example, when you want to copy and run the code), we do not add prompts (>
and +
) to R source code in this book and outputs are preceded by ##
. These are the results of running the code and should not be typed in the console or included in your scripts. You should see these outputs in your R Studio console when you run the code.
Online Version
An online version of this book is available at https://laurentlsantos.github.io/forecasting/. The book has been compiled using bookdown3 and will continue to evolve in between reprints of the physical book.
Acknowledgements
We are particularly indebted to those at University College London (UCL) who have been generous with their time and expertise. From the Department of Civil, Environmental and Geomatic Engineering (CEGE), we are extremely grateful to professor James Haworth from SpaceTimeLab , who introduced and equipped with the techniques of spatial-temporal data analysis and big data analytics. We are also thankful to James Haworth for the data and code applied throughout the book.
We also must thank Tao Cheng, professor in geoinformatics at UCL, for all the skills and on-going academic mentorship in the field of spatial analysis and geocomputation. To Mohamed Ibrahim, PhD researcher at SpaceTimeLab, for all support and debugging with R programming.
We must also extent our thanks to Transport of London which is ultimately responsible for the London Congestion Analysis Project (LCAP), a significant achievement in understanding and managing congestion in a such fascinating city.
i.e. the forecast of intra-day traffic.↩
A full report on understanding and managing congestion in London is available on http://content.tfl.gov.uk/understanding-and-managing-congestion-in-london.pdf↩
For more information, see https://bookdown.org↩