On its face, predicting traffic is a simple problem. In many cities, traffic lights are set on basic timers, synchronized to determine traffic flow. Because the volume of traffic is fairly constant throughout the day, averaging that flow based on time of day and day of week is a reasonable approach.
But what about the unexpected incidents, such as road construction, stalled vehicles, accidents, parades? For this reason alone, traffic modelers struggle with the complexity and sheer volume of information that goes into accurately predicting traffic and estimating travel and arrival times.
The Internet of Things and the proliferation of mobile connected devices, including GPS-enabled smartphones, provides the tools to handle this complexity and achieve new levels of accuracy. But as with other data mining and analytics challenges, there is still no silver bullet and modellers still need ingenuity to do the job well.
The way it works is that GPS-enabled smartphones share their current positions with a central server while travelling, in realtime. These data are supplemented by other available sources, such as traffic cameras, twitter streams, live traffic copter reports and 511 calls, as well as weather feeds, live news and other sources, whatever is relevant and available. The server analyzes those live datastreams, deducing levels of congestion and travel times on the roads. If a blockage or slowdown is determined, traffic on surrounding roads such as feeders and connectors can be inferred.
But as with the old traffic technology, even this new data-driven approach suffers from incomplete information: different smartphones may be reporting their data at different intervals, and, worse yet, not all vehicles have smartphones or, even among those that do, not all are sharing their GPS data. Weather, scheduled construction, extraordinary events and other factors confound the analysis. These gaps exacerbate what would otherwise be a merely a problem of large volume, multidimensional data.
It’s here that a machine learning answers the challenge by using statistics to derive accurate information from otherwise imperfect time-series data. Different approaches employ regression, Bayesian, decision-tree and other algorithms, either separately or in combination. Twitter and other streams are mined in much the same way as for brand or sentiment analysis, but for clues that correlate to congestion at specific times in specific places.
What all these methods have in common is that they benefit from larger and more data, improving as they go.