Processing road traffic data at scale
Analysing and deriving valuable information from large amounts of data is full of challenges. One of them is getting the results on time. It requires good planning of resources, the use of right tools, a proven methodology and experienced team. Still, with the real data come real challenges – you might need to review your assumptions e.g., analyze data for a much longer period or repeat your computations with fine-tuned settings. And, on the top, your customer is pushing on the deadlines. Sounds familiar?
In short, from time to time, you might be required to process much more data in a shorter time. This means you need to scale – efficiently assign and utilise much more computing resources in a shorter time.
The mobility data provider
CE-Traffic is a provider of traffic and mobility information services for private businesses and the public sector. We use anonymised location data from hundreds of thousand connected GPS devices to monitor and analyse road traffic information. We also use anonymised signalling data from the mobile operator network for monitoring and analysis of people being present in certain locations or travelling e.g. counting visitors of selected sites, advanced tourism statistics, origin-destination analysis.
In the MELODIC project we looked first at one of the key challenges in the data analysis workflow – can MELODIC help us to automatically scale data processing so that the results are always on time?
Historical traffic information is essential for road and city authorities as well as private business owners. Depending on the use case, customer requirements and scope of the analysis of the historical road traffic information may vary a lot. From the assessment of a single spot on a chosen road segment to multiyear analysis of the core city network or even analysis of the countrywide network of motorways. However, sometimes, historical data is not enough, and it is required to use a simulated environment to perform what-if analysis. We decided to use this scenario to evaluate the capabilities of the MELODIC platform in tackling our challenge.
First, we needed an application ready for automatic deployment. This means an application that can scale horizontally. In this respect, a two-component application composed of Manager controlling the process and a Worker performing simulations (with mulitple instances) was developed using open source technologies such as Python, Celery and Redis.
Second, we had to decide what information MELODIC will require to monitor to find the best deployment plan and further adapt it during the run time. Eventually, we decided on the following set:
- remaining time of the experiment,
- remaining number of simulations,
- the average time of a single simulation.
Based on these metrics MELODIC can continuously calculate the number of the minimum cores required to finish the experiment on time and, thus to decide about the size and number of worker machines.
Then, we have modelled the application in Camel and define a utility function that minimizes the deployment cost.
Now, we are able to use MELODIC to automatically deploy our simulation experiment to the cloud with just one click.
Up to now, all our applications were deployed in a manual or semi-automatic way. People based on their experience decided about required resources, e.g. size and number of virtual machines. This is usually a time-consuming process and prone to human errors. Also, adaption usually requires human action or at least is limited to a predefined scenario.
Thanks to MELODIC we can, now, with no effort run experiments with various settings and with a very much varying number of simulations per experiment. Due to expressing and enforcing real-time constraints, results are always delivered on time, regardless if hundreds or thousands of simulations are required or a single simulation takes 10 seconds or 10 minutes. This is possible as MELODIC continuously monitors the application deployment and automatically assigns the required resources without human intervention.
This has a great impact on the efficiency of the work of data analysts. They can focus now more on the interpretation and understanding of the data, thus not on providing resources required to run their experiments. As this is a very encouraging outcome, we continue working on other scenarios to the utilise the full power of MELODIC.