“This article aims to contribute to the ongoing discussion on the role of machine learning in the automotive industry and to highlight the importance of this topic in the context of autonomous vehicles. In particular, it aims to increase understanding of the capabilities and limitations of machine learning techniques.
Author: Sorin Mihai
In the first of this three-part series, the authors investigate the drivers and potential applications of machine learning techniques in highly automated driving scenarios. The second part defines the theoretical background of machine learning techniques and the types of neural networks available to automotive developers. The third part evaluates these options in the context of functional safety requirements.
Machine learning has been one of the hottest topics in research and industry over the past few years. Recent advances in computational performance and algorithms have drawn new attention compared to the advent of machine learning decades ago.
Machine learning, especially deep learning solutions, has contributed to the impressive recent achievements of artificial intelligence. Applications include natural language processing (NLP), personal assistance, AlphaGo beating humans, and human-level behavior when learning to play Atari games.
Considering the impressive results that machine learning and deep learning can achieve when solving extremely complex problems, it is clear that researchers and engineers are also considering applying them to highly autonomous driving (HAD) scenarios for self-driving cars. NVIDIA’s Davenet, Comma.Ai, Google Car, and Tesla have achieved the first promising results in this area. Machine learning and deep learning approaches have yielded initial prototypes, but the industrialization of these functions presents additional challenges in, for example, basic functional safety considerations.
This article aims to contribute to the ongoing discussion on the role of machine learning in the automotive industry and to highlight the importance of this topic in the context of autonomous vehicles. In particular, it aims to increase understanding of the capabilities and limitations of machine learning techniques.
First, we discuss the design space and architectural alternatives for machine learning-based highly autonomous driving in the context of the EB robinos reference architecture. It then details two selected use cases that Elektrobit is currently researching and developing.
Part II provides a theoretical background on machine learning and deep neural networks (DNNs), which provide a basis for deriving criteria for selecting machine learning techniques for a given task. Finally, Section III discusses verification and validation challenges that affect functional safety considerations.
Machine learning and highly automated driving
Developing the highly autonomous features that lead to self-driving cars is a complex and daunting task. Engineers often use the principle of divide and conquer to address these challenges. And for good reason: a decomposed system with a well-defined interface can be tested and verified more thoroughly than a single black box.
Our approach to highly autonomous driving is EB robinos, shown in Figure 1. EB robinos is a functional software architecture with open interfaces and software modules that allows developers to manage the complexities of autonomous driving. The EB robinos reference architecture integrates components according to the “sense, plan, act” decomposition paradigm. Additionally, it leverages machine learning techniques in its software modules to deal with highly unstructured real-world driving environments. The following subsections contain selected examples of technologies integrated in EB robinos.
Figure 1. Open EB robinos reference architecture.
In contrast, end-to-end deep learning approaches also exist, which cover everything from perception to action (Bojarski et al. 2016). However, with regard to the processing and training of extreme cases and rare events, and the exponential amount of necessary training data, decomposition methods (i.e. semantic abstraction) are considered more reasonable (Shalev-Shwartz et al. 2016).
However, even following the decomposition approach, there is a need to decide which parts are best handled individually or in combination with others. It must also be determined whether machine learning methods are expected to outperform traditional engineering algorithms at tasks accomplished by specific blocks. Not least, this decision may be influenced by functional safety considerations. As discussed later in this series, functional safety is a key element of autonomous driving. Traditional software components are written according to specific requirements and tested accordingly.
The main problems with testing and validation of machine learning systems are their black-box nature and random behavior of learning methods. It is basically impossible to predict how a system will learn its structure.
The criteria and theoretical background given above can guide informed decisions. Elektrobit is currently researching and developing use cases where machine learning methods are considered promising. Two such use cases are described next. The first involves the generation of artificial training samples for machine learning algorithms and their deployment in traffic sign recognition. The second use case describes our self-learning car approach. Both examples use state-of-the-art deep learning techniques.
Use Case 1: Human Sample Generation and Traffic Sign Recognition
This project proposes a speed limit and speed limit end traffic sign (TS) recognition system in the context of OpenStreetMap (OSM) data used in enhanced entry navigation systems. The goal is to run the algorithm on a standard smartphone that can be mounted on a car’s windshield. The system detects traffic signs and their GPS locations and uploads the collected data to a backend server via the phone’s mobile data connection. The method is mainly divided into two stages: detection and recognition. Detection is achieved by boosting the classifier. Recognition is performed through a probabilistic Bayesian inference framework that incorporates information conveyed by a set of visual probabilistic filters. The next section of the paper contains a description of the theoretical background behind the algorithm used.
Figure 2: Block diagram of a smartphone-based TSR system
The acquired color image is passed to the detector in 24-bit RGB format. The detection process is performed by evaluating the responses of cascaded classifiers computed over a detection window.
The detection window moves across the image at different scales. Potential traffic sign regions of interest (RoIs) are collected as a set of object hypotheses. From a feature extraction perspective, the classification cascade is trained using Extended Local Binary Patterns (eLPB). Suppose each element in the vector is classified as a traffic sign by a support vector machine (SVM) learning algorithm.
Traffic sign recognition methods rely on manually labeled traffic signs for training detection and recognition classifiers. Due to the wide variety of traffic sign templates used in different countries, the marking process is tedious and error-prone.
For a traffic sign recognition method to perform well, specific training data for each country is required. Creating enough manually marked traffic signs is time-consuming because location, lighting, and weather conditions must be considered.
Therefore, Elektrobit created an algorithm that automatically generates training data from a single artificial template image to overcome the challenge of manually annotating large numbers of training samples. Figure 4 shows the structure of the algorithm.
Figure 4. Block diagram of the artificial sample generation algorithm for machine learning-based recognition systems.
This approach provides a way to generate artificial data for the training phase of machine learning algorithms. The method uses a reduced dataset of real and generic traffic sign image templates for each country to output a collection of images.
The features of these images are artificially defined by a series of image template warping algorithms. The resulting artificial images were evaluated using Kernel Principal Component Analysis (KPCA) on a reduced set of real-world images. Artificial datasets are suitable for training machine learning systems when the features of the generated images correspond to those of real images, in this particular case for traffic sign recognition.
Elektrobit replaces the Boosting SVM classifier with a deep region-based detection and recognition convolutional neural network to improve the accuracy of the original traffic sign recognition system. The network is deployed using C++affe (Jia et al. 2014), a deep neural network library developed by Berkley and supported by NVIDIA. Caffe is a pure C++/CUDA library with Python and Matlab interfaces. In addition to core deep learning capabilities, Caffe also provides reference deep learning models that can be used directly in machine learning applications. Figure 5 shows the Caffe network structure for traffic sign detection and recognition. The different colored blocks represent convolution (red), pooling (yellow), activation (green), and fully connected network layers (purple).
Figure 5. Convolutional Neural Networks for Deep Region-Based Detection and Recognition in Caffe.
Use Case 2: Learning How to Drive
The revolution in deep learning has recently increased attention to another paradigm, reinforcement learning (RL). In RL, the agent itself learns how to perform certain tasks through a reward system. This method falls into the category of semi-supervised learning because the design of reward systems requires domain-specific knowledge.
In contrast to supervised learning, even the input data does not need to be labeled. Much of the recent interest in RL is due to the pioneering work of the Deep Mind team. The team managed to combine RL with deep neural networks capable of learning action-value functions (Mnih et al. 2016). Their system was able to learn to play multiple Atari games with human-level abilities.
We built a deep reinforcement learning system, as shown in Figure 6, to safely experiment with autonomous driving learning. The system uses the TORCS open source competition simulator (Wymann et al. 2014). TORCS is widely used in the scientific community as a highly portable multi-platform racing simulator. It runs on Linux (all architectures, 32-bit and 64-bit, little-endian and big-endian), FreeBSD, OpenSolaris, MacOSX, and Windows (32-bit and 64-bit). It has many different cars, tracks and opponents to race. We can collect images for object detection as well as key driving metrics from the game engine. These metrics include the speed of the car, the position of the ego car relative to the centerline of the road, and the distance to the car in front.
Figure 6. Deep reinforcement learning architecture for learning how to drive in a simulator.
The goal of the algorithm is to learn driving commands on its own by interacting with the virtual environment. A deep reinforcement learning paradigm is used for this purpose, where a deep convolutional neural network (DNN) is trained r(s^’,a) by a reinforcement action a that provides a positive reward signal. State s is represented by the current game image displayed in the emulator window. There are four possible actions: speed up, slow down, turn left and turn right.
DNNs compute a so-called Q-function that predicts the optimal action as to be performed for a particular state. In other words, DNNQ – computes a value for each state-action pair. The operation with the highest Q-value will be executed, which will move the simulator environment to the next state, s’. In this state, the performed action r(s’, a) is evaluated by the reward signal.
For example, if a car can accelerate without a collision, the relevant actions that make this possible will be reinforced in the DNN; otherwise, it will be discouraged. Reinforcement is performed in the framework by retraining the DNN with the state reward signal. Figure 7 shows the Caffe implementation of the deep reinforcement learning algorithm. The network layers have the same color coding as in Figure 6.
Figure 7. Caffe-based deep convolutional neural network architecture for deep reinforcement learning.
The second part defines the theoretical background of machine learning techniques and the types of neural networks available to automotive developers.
The Links: 7MBP200VEA120-50 G185XW01-V201 FF200R12KE3