
Introduction
As part of a recently published paper and Kaggle competition, Lyft has made public a dataset for building autonomous driving path prediction algorithms. The dataset includes a semantic map, ego vehicle data, and dynamic observational data for moving objects in the vehicle’s vicinity. The challenge presented by Lyft with this dataset is to use this data to build a model that can predict the paths of moving objects and the path an autonomous vehicle ("AV") should take based on the observations made by the AV sensors and perception stack.
In more specific terms, the challenge is, given a set of information about the current vehicle state and its surroundings, to predict the best plan – a set of actions and behaviors – for the vehicle to safely navigate autonomously.
Lyft provides a large volume of training data in the L5 Prediction dataset; tens of thousands of 25-second sequences of data are available in over 100GB. Along with the data, Lyft has also offered a set of tools for parsing and visualizing the data.
This article will explore the details of the L5 Prediction dataset with these tools and a novel data visualization platform called VizViewer (or "VV" for short). Utilizing the VizViewer platform, we’ll uncover insights about the data while discussing the benefits of the visualization techniques for dataset tuning and feature engineering. To wrap up, we’ll preview a lane prediction visualization that could be used to solve the general path planning problem.
Dataset Structure
Within the context of autonomous driving, there are two general subsets of data to consider: the static environment and the dynamic environment. The former would include data that mostly remain relatively fixed over time, such as the road network paths, the number of lanes in the current road, traffic signs and traffic lights, etc. The latter includes data about the varying driving conditions such as the location and speeds of pedestrians or vehicles nearby, or the color of an upcoming traffic light.
The L5 Dataset provides data for both of these data types. One form is a semantic map, sometimes referred to as an HD map [1], which encodes details about the static driving environment. The second is a voluminous "scene" database for dynamic time-series data. The L5 Kit provided by Lyft includes tools for extracting data from both these sources.
Semantic Map Overview
Within the dataset, the static environment is defined by the semantic map. This can be thought of as an environmental 2D map that has been densely annotated with information appropriate for the driving context. The semantic map offers a predefinition of the expected driving environment; without a semantic map, this static information would need to be continuously perceived and interpreted by the vehicle’s sensors and CPUs. Thus, a semantic map is a powerful tool for pre-computing and offloading much of the work involved in AV planning and prediction problems.

The semantic map itself contains these various attributes:
- a directed graph of roads and their lanes
- the physical position of the lane lines of a road, down to the centimeter
- the physical position of the stop lines, stop signs, traffic lights, crosswalks, and other traffic control elements
- speed limits
- the possible states for a given traffic light (e.g., red, green, yellow) and the lanes for which they control traffic (e.g., left turns, through lanes, right turns)
- parking zones (specifically if they share a lane)
Scene Database Overview
The dynamic features encoded in the dataset include the spatial information of the "ego" vehicle (the AV collecting the data), the "agents" (freely moving observed objects), and the traffic light states ("red", "green", "yellow"). Each agent also has a "class" label, describing it as a set of probabilities of common object types such as cars, pedestrians, cyclists, etc. These three data sources are encoded and indexed separately into tabular form.
The spatial features of the ego and agents contain the "pose" of the objects (their x, y, z cartesian coordinates, and orientation) and for agents, their "extent" (size of the object). Each data sample has a timestamp, and all observations with a common timestamp represent a "frame" of data. A "scene" consists of a contiguous sequence of observation frames with respect to time. The scene links frames from each of the other three data tables using a list of indexes to each record in the table.

The motivation behind this scene-centric structure is important to note. In many machine learning problems, each time-based data sample is used independently as examples to train a model. However, in this dataset, the entire scene – as a collection of data samples – are the atomic units of data used to train a machine-learned model.
The reasons for this should be fairly intuitive; to make predictions about the paths a set of objects can take, the samples must be coherent and causally linked across time to build an accurate description of motion. If an amalgam of data samples from different scenes is used to generate a path, the resulting path would most likely be completely inconsistent; the objects would move in impossible ways from the ground truth. Given the need for coherency across time for accuracy in path planning, scenes are the building blocks of data that we will examine holistically using VizViewer.
So what is VizViewer?
VizViewer is a web application and platform for collaboration and visualization of complex, multi-modal datasets. It consists of a suite of communication, data processing, and visualization components bundled into an accessible and easy to use dashboard UI. VV provides tools for interpreting data and accelerating productivity in data analysis workflows. It achieves these goals through a cohesive, configurable, interactive, and versatile toolset for analyzing datasets of different modalities while interoperating with Python and the Jupyter Notebooks.
In short, VizViewer is a helpful extension of coding tools for data exploration and insights into dense datasets with different types of content. In the next sections, we’ll explore the characteristics of the L5 Prediction dataset using VizViewer to understand the data better, build improved training sets, and utilize it to debug and evaluate models.




Semantic Map Visualization
The L5 Prediction Dataset Kit comes with a simple tool for visualizing the semantic map and scene data together. The tool can take a specific set of coordinates and dimensions to generate an image of the roads, lane lines, and other labeled elements. It can also render the traffic lanes’ dynamic state by marking specific lanes a certain color if the lane’s traffic is affected by the traffic light, i.e., when a traffic light is red, the lanes it controls are also marked red. These images can be merged to make a short movie clip of the scene, shown below.

As an alternative, VizViewer has an interactive 3D rendering toolkit that can render the semantic map with free form exploration along with a scene-specific view. The map can be zoomed similarly to other online mapping tools and has support for satellite and vector map layers.
With VV, the map can be navigated and examined for details that might be interesting for training our models. For example, if we are looking for samples related to left turns onto a multilane street, we can examine the map for street intersections that fit this case and then filter our samples by the coordinates of this region of interest. To assist with the exploration, the map elements can also be selected by clicking them to expose more details about the element.

VV integrates with Python, allowing data to be aggregated and processed using Python code, then sending data to VV for rendering via a Python API. For example, VV has data querying features that allow objects to be highlighted in the 3D view based on features of interest. A feature query can be defined in Python; then, with an API call, the VV dashboard will update, find, and select the features that satisfy these conditions. The image below shows the semantic map search results by highlighting roads with a decreasing minimum number of lanes criteria. This could help identify areas where samples might be gathered for specific driving scenarios (e.g., highways, residential streets, driveways, parking lots).

# example query command for marking roads with 3 to 5 lanes
vv.semantic_query({ "where":
"msg.kind == 'road' && msg.num_lanes >= 3 && msg.num_lanes <= 5"
})
To summarize, VizViewer’s 3D interactive mapping tool features allow a data modeler to examine the contextual information within the semantic map easily. Furthermore, visual searches for specific attributes within the semantic map can assist with training set selection and modeling workflows.
Deep Dive into Feature Augmentation
As mentioned, the scene database contains spatial and orientation coordinates for objects in a scene organized into a time-series of frames. The coordinates describe absolute values relative to the origin of the semantic map. We’ll explore how we can convert this raw data into additional information that is more beneficial for data interpretation and creating a machine learning model.
Below is a VV chart showing various spatial features of a vehicle driving down a left curve within a particular scene. The top chart uses the raw data from the dataset, plotting X and Y positions on the primary vertical axis, and yaw (orientation) on another. The bottom chart offers more noticeable detail in the changes of the X and Y values by plotting the delta from the first frame in the scene’s data series. The yaw values change equally in both charts, showing that the vehicle was turning through a curve during the initial 10 seconds.

In the context of machine learning, feature augmentation and data engineering are the processes of molding the data into a form that improves model convergence and accuracy. For example, models could converge faster if their feature values are rescaled into a smaller range. This example above illustrates that raw data could be transformed to accentuate more underlying details in the data within a smaller range of values. In the case above, changing the plot from absolute values to deltas of the values made the change in the time series more obvious with a plot of the same size. Additionally, adjusting the data so it is contextual to a scene provides for easier interpretation. For our case above, using relative values from the initial frame of a scene will yield standardized plots for easier comparison when examining different scenes.
A few helpful features can be derived from spatial data about the motion of the objects. These can be used to build a motion model for a given object type. For example, because cars have wheels, they should move mostly forward and backward, but not freely from side to side (non-holonomic motion). Therefore, a motion model that independently tracks orientation, longitudinal, and lateral motion would be desirable. With a motion model, an object type’s appropriate dynamics can be trained, simulated, and tested for validity. Below are some of the augmentation values that can be calculated to assist with describing a motion model.
- longitudinal (forward/backward) velocity, acceleration, and jerk (the 1st derivative of acceleration)
- lateral (side-to-side) velocity, acceleration, and jerk
- yaw rate (change in orientation), yaw acceleration, yaw jerk
Another set of derived features can model the relationships between two objects. These features will help train the model to understand how to generate a planned path given the dynamics between objects (e.g., slow down if you are approaching an object) and the environment (e.g., slow down when approaching a turn or stop sign).
- distances and relative speed between objects (ego and agents)
- distances between objects and semantic maps features (traffic lights, stop lines, crosswalks)
- the orientation of the current or desired lane lines
- position within the current or desired lane

Above is a plot from VV that shows some of these augmented features, such as velocity and acceleration. What is noteworthy is the use of filtering and smoothing in calculating these derived values. The dotted lines represent the unfiltered values, while the solid lines represent the smoothed values derived from spline-based interpolation methods. The smoothing is applied via Python code to assist with the convergence of a trained model using these features. The code below shows an example of how the augmented values were smoothed.
Data Exploration & Visualization with VizViewer
Having described the data features in detail, exploring and visualizing these features is a helpful part of building training and validation sets for an ML model. We’ll dive into some explorations and insights about the data, describing how VizViewer assists with these tasks.
For exploration, we’ll set up a dashboard that will allow for easy viewing of the data with different modalities. VizViewer offers a configurable dashboard to build a data-viz layout best suited for a data exploration task. While this paradigm is different from inline Visualization in Jupyter Notebooks, this alternative UI offers additional comprehensiveness, customization, and interaction beyond the confines of the Notebook’s code cells. This is desirable when a task requires comparing and synthesizing multiple streams of feature data into one cohesive representation, which we’ll examine further.

Furthermore, the dashboard can be configured to arrange panels of visual components in an optimal manner of a user’s choosing. For example, to view spatial dataset patterns, the 3D mapping components and charting components are chosen to provide a holistic view. The charts can be configured to present data in various forms; time-series charts and histograms are used for this particular task. An example of a composite view of the map and histogram is shown below.

Examining the visualization above, the map shows the paths (magenta) taken by the ego vehicle across all the sample dataset scenes. Below, the larger histogram view shows the distribution of the feature data across all scenes. We can see the data is concentrated down one particular predetermined path. The data follows a normal distribution for most features, but not in all cases; speed follows a bimodal distribution pattern, with most of the data samples either being near zero or a near 13 m/s (30 mph), a common speed limit for most city streets. The graph below shows normalized histograms for multiple features across 100 bins and an un-normalized histogram plotting the probability distribution of speed values.

Having a holistic view of the data is useful, yet it is equally useful to drill into specific scenes to explore whether there is coherence in our derived calculations across the dataset. With VV’s configurable selection features, a specific scene can be selected on the map by clicking the path, revealing more details about the scene’s time-series data. In the example below, the graphs of the ego vehicle’s motion of the right are updated when a section of the scene’s path is selected on the map to the left. Using this feature, a data engineer can quickly validate the consistency of these motion values across varying sections of the semantic map. For example, velocity/acceleration should slow down on sharp turns, so the map would assist with isolating those potential scenes for validation.

We can see details about the vehicle’s longitudinal and lateral velocity for a selected scene within the image above. For scenes with data samples along straight paths, the lateral velocity and yaw rate will remain close to zero. However, if a path along a turn or a curve is selected, the expected visualized result is an increase in lateral velocity and yaw rate. The image confirms both of these outcomes across different sections of the map.
Heat Map Analysis
To examine how speed is influenced by location, aggregated data statistics can be analyzed using a heatmap feature. The heatmap collects data into a grid, then assigns a color set to the data distribution. The heatmap shows where the data samples are located by coloring the region, while the color itself represents the feature’s magnitude. For example, below is a heatmap of ego vehicle speed. We can see a pattern with high-velocity samples (brighter shades) collected down specific roads on the map. In comparison, lower-velocity (darker shades) samples are collected on smaller side streets. This can indicate regions of the map with fast-moving traffic vs. slower, more regulated traffic.

An important topic to discuss is the consistency of the agent observations. Within each scene, a set of agents can be observed; however, many agent observations could be short-lived or sporadic, only being labeled and tracked across a short time span and not the entire scene length. The following heatmap illustrates this point, showing a decreasing number of samples as the minimum number of sequential frames is raised from 0 to 9 seconds in 3-second intervals. Given 25 seconds as the scene length, scenes with longer sequences of agent tracking will be relatively sparse; therefore, any robust prediction model will have to make inferences across non-sequential data frames.

Regardless of the sparseness, scenes with higher agent frame continuity will be more valuable agent data examples for training. The longer the number of observed frames, the more accurate the predictions will be for paths at longer time horizons. To avoid bias based on location, it would be important to gather these less common examples from as many sections of the map as possible, so using the heatmap would be helpful for this task.
Another interesting insight we can observe visually is the inverse correlation of speed with the number of observations. The image below shows two heatmaps overlaid, ego vehicle speed (blue) and agent observation density (red). Areas where the speed is low will likely result in an increase in the agent observation count. While this might not be obvious at first, the reason for this should be clear when we observe where on the map these correlations best occur; these occur at intersections, where speed is likely to decrease due to traffic lights or stop signs, and thus yields a higher likelihood for additional agent observations from traffic across both streets.

To summarize, we discovered some useful insights about the data, which is an essential step in the model building process. To review the data holistically, we can employ tools like heatmaps and histograms at different scales to identify spatial patterns that may be advantageous to capture in our models. Being able to easily access data at a high-level and low-level through interactive selection was also helpful. The insights learned through the exploration process will lead to a better determination of what correlations and biases may exist in the dataset. It will also provide better information about the availability, distribution, and quality of particular data samples. Equipped with this knowledge, we can better feature engineer training sets and avoid overfitting or underfitting a certain subset of model-driven behavior.
Path Evaluation & Visualization
Transitioning from data exploration to model development, we’ll switch focus from a global view of the data to a local scene. We will explore aspects of visualizing scene data and path data for debugging and evaluation.

As shown above, the local scene view within VizViewer offers a 3D simulation of the vehicle and its semantic environment, along with labels for agents, lane states, traffic light states, annotated bounding boxes for agents (yellow and blue boxes), and the annotated planned path of travel (blue). Plots of the various features are synchronized with the model simulation in one unified layout. VV also offers a UI for controlling the simulation’s state, such as a play and pause button, rate controls, and discrete timestamp adjustments. The scene camera is fully interactive, allowing for visualizing different perspectives of the scene. These features are beneficial while debugging the model’s behavior within a scene.
As part of a path prediction model, one subproblem is determining the current lane of a given vehicle. If we can accurately detect the lane a vehicle is in, we can build a model to predict where a vehicle will travel confidently, given the pose and derived motion features. Additionally, the semantic map will be needed to determine the possible lane paths in the environment through which vehicles would likely travel.
During the course of this exploration, I was able to train a neural network with various pose features and the semantic map data to determine an estimate of the current lane and possible next lane. Using VizViewer, the lane lines and lane candidates can be visualized and annotated with additional data from the SVM network (e.g., raw regression values, confidence scores). By loading a scene, running the visual simulation, and using the interactive 3D view, the resulting paths can be examined and tested.

The blue lane segments highlight the possible prediction paths, with a darker color representing a higher degree of confidence in the path. Additionally, the paths can be clicked to expose the underlying data, such as the confidence score. In the example scene above, the vehicle approaches a 3-way stop sign. The prediction that the vehicle would continue to move forward was fairly high as it came to a stop. As the vehicle began to move forward, the prediction began to change to a left turn and the confidence values increased as it moved through the turn. With an improved model that includes more data, such as considering the pedestrian motion and the delay between stopping and turning, a left turn prediction could be made much sooner.
As the model is further developed, visualization can help determine how well the planned paths are performing. Deviations from a lane can be reviewed, potential collisions with other objects can be detected and highlighted for display. This evaluation can be performed for both ego and agents, detecting when paths intersect. Smoothness is also important for accurate behavior modeling. Using the chart functionality to display attributes such as speed and acceleration will also help evaluate how smooth a predicted path is. For achieving these goals, VV offers a beneficial UI for model tuning and assessment.
Final Points
The Lyft Prediction Dataset proved to be a massive dataset with the potential for some interesting patterns for research and prediction algorithms. The inclusion of free form agent observational data for motion prediction is beneficial and valuable. Additionally, the SDK offers useful tools for extracting the data; yet the dataset’s structure is easy to navigate with a subset of the SDK.
There are a few critiques about the dataset package. One downside is this dataset only includes a pre-planned path for the ego vehicle on a limited set of street types. Hopefully, in the future, Lyft will expand the dataset to include samples collected across a heterogeneous set of streets. Another point is the quality of agent labels are poor at times; labels are incorrectly assigned or unusual motion in the agents is apparent, but that should be expected for a small percentage of data samples. Additionally, while useful as a starting point, the visualization tools provided in the L5 Kit are far less helpful for data exploration. The visualization tool is provided, ostensively, as a way to generate training data for a possible convolutional neural network (CNN) based inference model, not necessarily for data exploration [2].
Through the use of the L5 Kit, VizViewer and Jupyter Notebooks, we explored and visualized the dataset in both novel and useful ways. Specifically, VV offered the ability to create a custom dashboard that can receive data from Python code for contextual visualization. The interactive charts, maps, 3D visualizations, and simulations were synthesized and synchronized to facilitate data discovery, exploration, and model debugging processes. Additionally, VV’s interactive visualization features offered ways to perform these tasks with more freedom and efficacy than traditional Notebook based UI tools like Matplotlib.
So how would approach analyzing a large dataset as the one reviewed here? What other interesting data points would you like to explore?
If you found this article interesting, please leave a comment and follow me for similar content. For information about VizViewer, please check out VizViewer.com, where you can sign-up to request access to the chat community and updates about the platform.
References
[1] Lyft Level 5, Rethinking Maps for Self Driving, (2018), Lyft Level 5 Blog.
[2] Houston, Zuidhof, Bergamini, et al., One Thousand and One Hours: Self-driving Motion Prediction Dataset (2020). Baseline motion prediction solution (page 6).
Attribution: the images displayed in this article used data from the Lyft L5 Prediction Dataset. The article provides educational content about this dataset and a critical review.