Next Article in Journal
Adaptive Quick Reduct for Feature Drift Detection
Previous Article in Journal
Constant-Time Complete Visibility for Robots with Lights: The Asynchronous Case
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Smart Black Box 2.0: Efficient High-Bandwidth Driving Data Collection Based on Video Anomalies

Robotics Institute, University of Michigan, Ann Arbor, MI 48109, USA
*
Author to whom correspondence should be addressed.
Submission received: 15 December 2020 / Revised: 2 February 2021 / Accepted: 3 February 2021 / Published: 9 February 2021
(This article belongs to the Section Databases and Data Structures)

Abstract

:
Autonomous vehicles require fleet-wide data collection for continuous algorithm development and validation. The smart black box (SBB) intelligent event data recorder has been proposed as a system for prioritized high-bandwidth data capture. This paper extends the SBB by applying anomaly detection and action detection methods for generalized event-of-interest (EOI) detection. An updated SBB pipeline is proposed for the real-time capture of driving video data. A video dataset is constructed to evaluate the SBB on real-world data for the first time. SBB performance is assessed by comparing the compression of normal and anomalous data and by comparing our prioritized data recording with an FIFO strategy. The results show that SBB data compression can increase the anomalous-to-normal memory ratio by ∼25%, while the prioritized recording strategy increases the anomalous-to-normal count ratio when compared to an FIFO strategy. We compare the real-world dataset SBB results to a baseline SBB given ground-truth anomaly labels and conclude that improved general EOI detection methods will greatly improve SBB performance.

1. Introduction

Traditional automotive data collection has focused on low-bandwidth vehicle data such as speed and brake status. However, as autonomous vehicles increasingly seem to be the future of transportation [1], these data collection methods are becoming obsolete. Modern autonomous vehicles require the large-scale collection of high-bandwidth data (e.g., video, point clouds) for algorithm development and validation and verification. Deep-learning networks for common autonomous vehicle perception tasks such as object detection [2,3,4], object tracking [5,6,7], and trajectory prediction [8,9,10,11] need significant quantities of high-bandwidth real-world data for effective training and testing.
The finite on-board storage capacity presents a challenge for high-bandwidth data capture that low-bandwidth data logging systems do not encounter. Recently, event data recorders specialized for such high-bandwidth data capture have been explored. The smart black box (SBB) [12,13] is one such system. The SBB uses pre-defined rules for event-of-interest (EOI) detection and computes the data value according to the detected EOI. The data value is then used to determine data compression factors and as the basis for prioritized data recording.
This paper expands the smart black box to record high-priority video data and applies the SBB to a real-world driving dataset. Rather than using pre-defined rules for EOI detection as in [12,13], machine learning-based methods for generalized EOI detection are applied to derive the data value. Raw data are grouped into buffers, compressed, and stored in a priority queue in order to discard low-value data as the storage capacity is filled. The SBB is assessed on real-world driving video. We focus on video data due to the ubiquity of cameras as a sensor in automotive applications.
This paper offers two primary contributions. Firstly, we apply video anomaly detection (VAD) [14,15] and online action detection (OAD) [15,16] as methods for generalized EOI detection on real-world driving video. To estimate the data value from combined VAD and OAD outputs, we introduce a hybrid value method based on a weighted sum. Secondly, we present an updated SBB pipeline incorporating VAD and OAD and designed for the real-time recording of dash camera video data. We find that while the SBB improves the collection and retention of high-value data, improved EOI detection methods are needed to realize the full potential of the SBB.
The paper is structured as follows. First, related work is explored in Section 2. Then, an overview of the original SBB [13] is presented, and the changes we made for application to real-world data are discussed in Section 3. Section 4 describes our adjusted data classification system, the updated SBB pipeline, and the new value estimation method. Section 5 presents experimental results on a combined real-world dataset and analyzes the performance of our updated SBB. Section 7 concludes the paper and discusses future work.

2. Related Work

2.1. Event Data Recorders

Automotive event data recorders use low-level triggers, such as vehicle impact or engine faults, to log vehicle data leading up to and during anomalous events [17,18]. However, these systems focus on low-bandwidth data and do not sufficiently address the storage problems posed by high-bandwidth sensors. In the case that the on-board memory is filled, one of two strategies is used. The more common strategy writes data until the memory is full, then stops recording data, meaning that the newest data are dismissed. The second strategy uses a circular buffer equivalent to a first-in-first-out (FIFO) queue. In this model, the newest data overwrite the oldest data. Neither of these strategies considers the value of the data being discarded.
High-bandwidth data recorders address this by using prioritized data recording. In the case of [12,13], valuable data are identified using pre-defined rules for EOI detection. This paper seeks to build on the prioritized recording strategy in [12,13] by applying methods for general EOI detection.

2.2. Traffic Video Anomaly Detection and Classification

Several methods exist for identifying EOIs in autonomous vehicles. Pre-defined rules may be applied based on vehicle odometry [19,20,21] to identify certain EOIs. Other approaches use physiological signals from the driver [22]. In recent years, deep-learning computer vision techniques have been applied to anomaly detection in first-person driving videos [14,15,23,24]. Other works further attempt to classify the type of anomaly occurring in the video, either offline after the video is fully observed [25,26,27] or in real time [16]. These methods present a way for generalized EOI detection based only on dash camera video. As such, we use methods from [14,16] in the SBB to assign the data value in accordance with our focus on video data. Table 1 below presents a summary of the reviewed anomaly detection methods.

2.3. Real-World Driving Datasets

The increasing popularity of deep-learning methods for self-driving perception tasks has created a demand for high-quality high-bandwidth datasets. Naturalistic field operation test projects such as [28,29] have been used in the past to gather large amounts of driving data. One such dataset [28] uses 100 cars to log nearly 43,000 h of video and vehicle performance data over a distance of 2,000,000 miles. The more recent Safety Pilot Model Deployment dataset [29] contains roughly 17,000,000 miles of data collected over almost 64,400 h, including 17 TB of video. However, these datasets primarily focus on the capture of low-bandwidth data; the video streams of both datasets are compressed and downsampled to low frame rates.
Recently, high-quality computer vision-oriented datasets have been published. These include general driving datasets like Cityscapes [30], KITTI [31], and BDD100K [32] and traffic anomaly datasets like A3D [14], DADA [33], and Detection of Traffic Anomaly (DoTA) [15]. Cityscapes contains 24,999 labeled images at 55 GB, while KITTI includes 7481 images at 12 GB, in addition to 29 GB of point clouds and GPS and IMU data. BDD100K is one of the largest public driving datasets, having 100,000 HD video clips (1.8 TB) for over 1100 driving hours in a variety of conditions. A3D, DADA, and DoTA focus specifically on traffic anomalies. A3D contains 1500 on-road accident clips with accident start and end times labeled. DADA released 1000 video clips with simulated driver eye-gaze. DoTA is comprised of 4677 videos with spatial, temporal, and anomaly category annotations. Table 2 summarizes some major driving video datasets.
Datasets like BDD100K and DoTA have significantly extended publicly available data access for deep-learning methods to use. However, anomaly-focused datasets are still relatively small; larger datasets like BDD100K contain very few EOIs with which to test self-driving algorithms. As a result, the evaluation of the SBB required the creation of a combined dataset using BDD100K and DoTA video clips in order to have sufficient quantities of both normal and anomalous driving data. The SBB aims to address this problem by providing a method to collect high-value video data across an entire fleet of vehicles.

3. Preliminaries

This work builds upon the smart black box (SBB) intelligent event data recorder proposed in [13]. The original SBB design, data value estimation method, and its issues in the real world are reviewed.

3.1. Smart Black Box Design

The SBB aims to record high-quality high-value data through value-driven data compression and prioritized data recording. At each time step, one data frame is observed and collected. Based on event detectors, a scalar frame value v t [ 0 , 1 ] is computed for each frame. The data frame is then appended to a buffer, which caches seconds or minutes of data. The process of buffering data frames is managed by a deterministic Mealy machine (DMM) which uses the new data value, data similarity, and the current buffer size to determine when to end the current buffer and start a new one [13]. After the DMM terminates, local buffer optimization (LBO) is used to determine the optimal compression factor d t [ 0 , 1 ] , called the LBO decision, for each frame in the buffer. A Gaussian data value filter can be applied over the buffered data to smooth the estimated data value. The buffered data are then compressed according to the LBO decisions and stored in long-term storage. After on-board storage is full, a priority-queue discards the lowest value buffers to make space for higher value buffers.

3.2. SBB Value Estimation

The SBB was previously tested only in a simulation environment, The Open Source Racing Simulator (TORCS) [34]. Experiments done using TORCS in [13] classified each frame as either normal ( ϵ 1 ) or as one of four pre-defined events-of-interest (EOIs): cutin, hardbraking, conflict, or crash, denoted as ϵ 2 , ϵ 3 , ϵ 4 , or ϵ 5 , respectively. The value of each event is pre-computed using its event likelihood in Equation (1):
v ( ϵ j ) = log 2 ( P ( ϵ j ) )
where P ( ϵ j ) is the likelihood of event ϵ j . These event values are then normalized over [ 0 , 1 ] with max j v ( ϵ j ) = 1 . The frame value at time t, v t , is then set according to:
v t = v ( ϵ ( t ) )
where ϵ ( t ) is the event detected at time t.
This data value estimation method works well in a simulation environment. However, it has two main drawbacks that affect its usability in the real world. First, the method relies entirely on a set of pre-defined rules for EOI detection. In reality, the space of traffic EOIs is large and diverse, and capturing them purely using pre-defined rules is insufficient for real-world applications. Second, the detection of the four EOIs is not always possible given only dash camera data. In simulation, the EOIs are easily detectable by tracking the cars surrounding the ego vehicle. However, limiting the available sensing to a single front-facing camera makes the identification of these EOIs significantly more challenging. In this paper, we apply an adjusted event classification system in Section 4.1.1 and a new value estimation method in Section 4.3.

4. Materials and Methods

This section introduces a new event classification system to extend the previous SBB and defines updated data frame and buffer representations in Section 4.1. Then, an updated SBB pipeline for real-world video data is presented in Section 4.2. Finally, methods for data value estimation using video anomaly detection and online action detection are discussed in Section 4.3.

4.1. Data Classification and Representation

4.1.1. Frame Classification

As mentioned in Section 3.2, the event classification system in [13] does not straightforwardly apply to real-world applications. Instead, real-world datasets such as DoTA [15] classify frames based on anomaly type and causation, e.g., an on-coming collision event. As such, we employ the classification system used in the DoTA dataset [15], which defines the eight traffic anomaly categories described in Table 3. Each of these anomaly categories can be further specified as ego or non-ego events. Including the normal event class, this results in 17 total event classes. Online action detection aims to classify frames according to these event classes.
To realize generalized EOI detection, we also utilize a binary anomalous or normal classification. Video anomaly detection is used to solve this binary classification problem.

4.1.2. Data Frame Representation

A data frame is defined as all data, both observed and computed, associated with a single video frame. In this paper, we consider only data derived from camera input. These data include:
  • Image: The video frame captured by the camera. In this paper, we use RGB images at 1280 × 720 resolution.
  • Value: The value of the frame v t [ 0 , 1 ] . The value is calculated according to the value function defined in Section 4.3 and is used in the DMM, as well as in buffer value computation.
  • Cost: The normalized storage cost of the frame c t [ 0 , 1 ] .
  • Anomaly score: The anomaly score s [ 0 , 1 ] of the frame generated using video anomaly detection. More details can be found in Section 4.3.1.
  • Classification scores: The output scores o t = [ o t , 0 , o t , 1 , , o t , 16 ] for each event class in Table 3 from online action detection. More details can be found in Section 4.3.2.
  • Object data: The tracking ID, object type, bounding box, and detector confidence of each object detected in the frame. Object data are used to support buffer tagging; details can be found in Section 4.1.3.

4.1.3. Frame Buffer Representation

A buffer is a collection of frames grouped by the DMM described in Section 3.1. The buffer cost C k and value V k of the kth buffer are computed as
V k = ( 1 + λ ) k max i ( v i d i )
C k = i c i ^
where v i is the value of the ith frame in the buffer, d i is its compression quality, and c i ^ is its post-compression storage cost. The 1 + λ with 0 < λ < < 1 is an aging factor used to slightly favor more recent buffers.
Additionally, buffer tags are high-level descriptions of data buffers which enable buffer indexing and searching in downstream applications. These tags include:
  • Anomaly score: The mean, max, and variance of the anomaly scores of the frames in the buffer.
  • Frame classifications: A list of event classes ϵ for which there is a frame f t in the buffer where o t , ϵ > ρ ϵ and ρ ϵ is a user-defined threshold score for class ϵ .
  • Objects: The tracking ID, object type, and bounding boxes and detector confidences over time of each object in the buffer.

4.2. Updated SBB Design

The updated SBB is separated into four processes running in parallel: video capture, buffer management, value estimation, and prioritization. Figure 1 describes the updated SBB pipeline. Our code is available at https://github.com/rzf16/sbb2_algs (accessed on 8 February 2021).
  • Video Capture reads video input and publishes each video frame to value estimation and buffer management. This module remains unchanged from the original SBB.
  • Value Estimation assigns a value v t [ 0 , 1 ] for each frame to be used in buffer management and storage prioritization. The value estimation module first executes object detection, object tracking [5], and optical flow estimation [35]. The outputs are then used in video anomaly detection and online action detection, which are used to compute the value. Details on this calculation can be found in Section 4.3. The value estimation method used differs significantly from [13]. In [13], perfect EOI detection using pre-defined rules was assumed, and the data value was computed based on detected EOIs. In this paper, we instead use video anomaly detection and action detection methods for generalized EOI detection and calculate the data value based on their output scores. More details on the value estimation of the original SBB can be found in Section 3.2. More details on our updated value estimation can be found in Section 4.3.
  • Buffer Management groups frames into buffers using the DMM from [13] after receiving each frame from Video Capture and its corresponding value from Value Estimation. The similarity of a data frame to the current buffer is computed as the percentage of object tracking IDs in the frame that have already appeared in the buffer. With A being the set of tracking IDs in the frame and B being the set of tracking IDs that have appeared in the buffer, we compute the similarity ξ t = | A B | | A | . Once the DMM terminates, LBO solves an optimization problem over the output buffer to determine the compression quality of each frame.
According to [13], a decoupled LBO strategy can optimize the compression quality of a single frame independently of all other frames in the buffer. Given constant η ζ , the uncoupled LBO objective function for frame f t is:
min d t c t ϕ ( d t ) η ζ v t ^ d t subject to d t [ 0 , 1 ]
where η , ζ 0 are weighting parameters and ϕ ( d t ) maps from the compression quality to the compression ratio. Note that ϕ ( d t ) increases monotonically over d t [ 0 , 1 ] . In this paper, we use the ϕ function of JPEG compression on real-world driving data following [12]. Throughout the paper, the values η = 0.9 and η = 1.7 are used based on [13]. These parameters were assigned to maximize the value-per-memory (VPM) of the recorded data. Further details on η and ζ parameter selection can be found in [13].
DMM and LBO functionality remain the same as [13]. However, the data similarity metric is adjusted to match our focus on dash camera data. In the previous SBB, data similarity was computed using the odometry of the host and surrounding vehicles. However, a single front-facing camera cannot capture sufficient information to use this approach. As such, we compute data similarity using detected objects in the frame, as mentioned above.
Prioritization maintains a buffer priority heap in order to retain high-value buffers and delete low-value buffers as the memory capacity is reached. The buffer value V k and cost C k of the kth buffer are computed according to Equations (3) and (4) respectively. A binary min-heap is constructed to store buffers based on V k following [13]. This module is also unchanged from [13].

4.3. Value Estimation Method

This section introduces the data value estimation method used by the DMM module to group buffers and decide the optimal compression factors. Similar to [12,13], we define the value of a data frame as a measure of data anomaly. The data value is determined by: (1) the anomaly score estimated by a video anomaly detection (VAD) module; (2) the anomaly category detected by an online action detection (OAD) module.

4.3.1. Video Anomaly Detection

A VAD algorithm takes observed image frames and predicts an anomaly score for each frame as a description of the degree of abnormality of that frame. Existing VAD algorithms can be categorized as frame-level VAD and object-level VAD. A frame-level VAD algorithm reconstructs or predicts image frames (e.g., in RGB or grayscale) and computes the L2 error of reconstruction or prediction as the anomaly score [36,37,38]. An object-level algorithm, on the other hand, predicts object appearance and/or motions and computes the anomaly score based on prediction error [39,40] or consistency [14,15].
In this paper, we run an off-the-shelf VAD algorithm to estimate an anomaly score s t of a frame f t and use it to inform our value estimation. To be specific, we trained the TAD algorithm in [14] using the Detection of Traffic Anomaly (DoTA) dataset following [15] and applied it to our data value estimation module.

4.3.2. Online Action Detection

While the anomaly score from VAD provides information about the probability that an anomaly occurs in a frame, it does not assess the anomaly category, which is important information for determining the data value in long-term driving according to [13]. Categorizing anomalous events is essential to the SBB design since it allows the SBB to prioritize high-value categories when the storage limit is encountered, and it allows the SBB to focus on specific event types per a user’s request.
In this paper, we implement an off-the-shelf OAD algorithm to obtain a confidence score vector o t for a frame f t , which is then combined with the anomaly score s t to estimate the data value. To be specific, we trained an OAD algorithm called the temporal recurrent network (TRN) [16] using the DoTA dataset [15]. The TRN outputs a 17D vector o t = [ P t ( ϵ 1 ) , P t ( ϵ 2 ) , , P t ( ϵ 17 ) ] for each frame with j = 0 16 P t ( ϵ j ) = 1 , which represents the confidence score that a frame belongs to each class.

4.3.3. Hybrid Value

A hybrid value estimation method is proposed that sums the VAD and OAD scores according to:
v = max ( 1 , α s + β i = 1 16 w i o i )
where w i is the information measure of class i and α , β are weighting parameters in [ 0 , 1 ] . Because o i estimates the probability that a frame is of class i, the weighted sum over o is equivalent to the expected information measure of the frame. By using the information measure of each anomaly class, anomaly types of higher rarity are assigned a higher value. Note that Class 0, the normal class, is not included in the computation. This is equivalent to w 0 = 0 . Throughout this paper, we use α = β = 1 for simplicity.
The information measures w i for each class are calculated using the class likelihoods in the DoTA dataset found in Table 4. The information measure w i for class i is calculated in Equation (7). Values are normalized to [ 0 , 1 ] by dividing by the maximum information measure.
w i = log 2 ( P ( class = i ) )

5. Experiments

In this section, we conduct SBB data collection experiments on a specifically designed large-scale real-world video dataset and present the results. We discuss the storage requirements of SBB-compressed data to showcase its preservation of valuable data. We then compare our SBB prioritized data recording with an FIFO queue data recording strategy. We examine the SBB results on each anomaly class. Finally, we evaluate the performance of the VAD-OAD hybrid value method with several parameter combinations.

5.1. Dataset

The SBB is designed for high-bandwidth data collection in long-term driving where on-board storage is limited. Therefore, SBB performance evaluation requires a large, high-quality video dataset, which contains both normal driving data, as well as events-of-interest (EOIs). To our best knowledge, there is currently no single dataset that satisfies all these requirements. The BDD100K dataset [32] is one of the largest high-quality driving video datasets and contains 100,000 video clips covering about ∼1100 driving hours. The DoTA dataset [15] is the largest and newest high-quality video dataset for traffic anomalies and contains 4677 anomalous video clips. We combined the 10,000 validation videos in the BDD100K dataset and randomly sampled and interspersed 500 anomalous video clips from the DoTA dataset, resulting in a large testing video with ∼4,000,000 frames at 10 FPS. The frames from BDD100K were compressed using OpenCV with JPEG quality 85 in order to eliminate the difference in image size between BDD100K and DoTA. This combined dataset contains over 100 h of driving video at 1280 × 720 resolution with 0.5% of frames being anomalous, meeting our requirements for a large, high-quality, and mostly non-anomalous dataset. Note that the ST*anomaly class was not included in this combined dataset, as its rarity in the DoTA dataset led to no ST* clips being sampled.

5.2. Results

Experimental results for the SBB are presented and discussed below. In general, a high storage size and decisions are desirable for anomalous data, while the opposite is true for normal data.
  • SBB data compression: SBB data compression statistics with no memory limit are presented in Table 5. It can be seen that the storage cost of normal frames is significantly reduced (427.20 GB to 65.53 GB, 85%) by the SBB. This leads to a 24.4% increase in the ratio of anomalous data storage-to-normal data storage. Both the average (avg.) and median (med.) compression factor decisions of the SBB are higher for the anomalous frames, indicating that the SBB is able to identify and preserve anomalous frames over normal ones. Figure 2 displays normal frames that were highly compressed by the SBB along with preserved anomalous frames; Figure 3 shows two failure cases where anomalous frames were mistakenly compressed. Both of these failures showcase a lack of robustness against cases where anomalous objects are occluded.
However, the decision difference between normal and anomaly frames is not as significant compared to the simulation experiment in [13] due to the fact that the anomalous event detection in the simulation was 100 % accurate, while VAD on real-world data is far from perfect. Moreover, the median decision for a normal frame is significantly lower than the mean, indicating that there are outlier normal frames with unusually high value scores. The standard deviation (std.) of anomalous frames is significantly larger than that in the simulation experiment ( 0.26 vs. ∼0.02), showing how inaccurate VAD and OAD reduces the SBB’s efficiency on real-world data.
The limitations of VAD and OAD are further shown by evaluating the performance of the SBB given ground-truth labels as VAD and OAD scores. The anomalous-to-normal memory ratio increases by 1610%, driven by the massive difference in decisions between normal and anomalous frames. This upper-bound performance of the SBB indicates that as anomaly detection techniques continue to improve, the performance of the SBB will improve as well.
  • Priority queue vs. FIFO:Table 6 compares the recorded frames of a prioritized recording system against that of an FIFO queue at memory limits of M = 3.125 GB, 6.25 GB, 12.5 GB, and 25 GB. These values represent a non-trivial amount of data to upload (depending on Internet connection quality) assuming continuous Internet access is not available. In all scenarios, the prioritized recording saved fewer normal frames and more anomalous frames than with the FIFO strategy. We also note that while the anomaly ratio stays roughly the same in each memory limit for the FIFO queue, the ratio increases at each level for the priority queue. The prioritization strategy of the SBB removes ∼95% of the normal frames while still recording ∼10% anomalous frames at M = 3.125 GB. Compared to the FIFO queue, the anomalous-to-normal count ratio of SBB-recorded data is ∼25% to ∼100% higher.
  • Performance per anomaly class:Figure 4 displays the decision histograms for each anomaly class. The performance of the SBB varies heavily depending on the anomaly category. For example, the decision distribution of class OCindicates very good detection of this anomaly. In OC, an ego-vehicle collision with an oncoming vehicle, the anomalous object (the oncoming vehicle), is almost always both near the camera and largely unoccluded. However, ST, VO, LA*, VO*, and OO*have notably poor performance. ST is an extremely difficult case for OAD due to its visual similarity to AHand LA anomalies, resulting in lower OAD confidence that an anomaly has occurred. VO and VO* involve vehicles hitting obstacles in the roadway. In some scenarios, such as hitting a traffic cone or a fire hydrant, the obstacle may be blocked from view by the anomalous vehicle in a non-ego incident or outside the camera’s field of view in an ego-incident. LA* often involves vehicles slowly moving closer together, making the collision relatively subtle. OO*, a non-ego vehicle leaving the roadway, can be challenging to detect simply due to the distance at which the anomaly occurs.
  • Value estimation method comparison:Table 7 compares the decision statistics for hybrid value estimation with several parameter combinations. We note that the VAD-only method generates the largest decision difference in normal and anomalous frames; we suspect this to be a result of OAD’s inability to consistently differentiate between anomalous and normal frames. Readers are directed to [15] for an in-depth discussion on the poor performance of OAD algorithms.
For applications that value general EOIs, VAD-only value estimation ( α = 1 , β = 0 ) has the greatest ability to distinguish normal and anomalous data. However, users interested in specific EOIs may opt to use a hybrid value in order to incorporate the EOI classification offered by OAD. In terms of hybrid value parameters, Table 7 shows that lower weights result in higher decision differences. However, in situations where retaining data quality is critical, higher α and β values may be used to achieve higher overall decision quality. Additionally, the higher decision differences as α increases shown in Figure 5 indicate once again that VAD contributes more to the differentiation of normal and anomalous frames than does OAD.

6. Discussion

While our results indicate that the SBB improves the recording of anomalous data, the massive difference between the ground-truth and the actual performance of the SBB indicates the need for improved video-based anomaly detection and action detection. Furthermore, a key limitation of our study is the extraction of the data value purely from a single video stream. Most autonomous vehicles feature huge sensor suites, including multiple cameras, radar, LiDAR, CANdata, etc. Future research in intelligent event data recorders may tap into this wealth of sensor data to more effectively detect EOIs and assign the data value.
Additionally, further work towards high-bandwidth vehicular communication networks can serve to ease the onboard memory constraints under which the SBB works. Currently, the SBB is designed to record data over the course of a day or multiple days. However, high-speed vehicle-to-everything communication allowing for rapid data upload in real time would significantly reduce the data recording period of the SBB and possibly transform the SBB into a downstream application to be applied after data upload.
Finally, although our manuscript focuses on the application of the SBB to autonomous cars, the SBB pipeline can be adapted for intelligent data recording in other domains as well. Autonomous and semi-autonomous systems are being developed for truck, sea, and air transport to accommodate increased volume [41,42,43] and to improve safety [44]. As the onboard sensor suites of these mediums continue to increase in complexity and data bandwidth, intelligent event data recorders must be developed to store and manage valuable sensor data.

7. Conclusions

This paper proposes a novel smart black box (SBB) data processing pipeline that uses video anomaly detection and online action detection to efficiently record large-scale high-value video data. We addressed the storage and value estimation problems the SBB will face with real-world data, made adjustments to the data classification and value estimation accordingly, and presented results on a large-scale real-world driving video dataset. Value estimation is changed from an entirely information measure-based method using pre-defined EOIs to use a combination of video anomaly detection and online action detection capable of detecting more generalized EOIs. Observed decision differences between normal and anomalous data indicate that the SBB value estimation can distinguish normal and anomalous frames. In the experiments, a 24.4% increase in the anomalous-to-normal memory ratio was achieved compared to the raw data, in addition to a ∼25% to ∼100% increase in the anomalous-to-normal count ratio. However, we also noted that the SBB’s performance increases significantly given ground-truth anomaly labels, suggesting that improved methods for general EOI detection will further improve the SBB utility. Future research in anomaly detection using sensor fusion, high-bandwidth vehicular communication networks, and intelligent event data recorders for other domains of transport can help realize prioritized data recording and storage for intelligent transportation systems.

Author Contributions

Conceptualization, Y.Y. and E.A.; methodology, R.F. and Y.Y.; software, R.F. and Y.Y.; validation, R.F. and Y.Y.; formal analysis, R.F. and Y.Y.; investigation, R.F. and Y.Y.; resources, E.A.; data curation, R.F.; writing—original draft preparation, R.F.; writing—review and editing, Y.Y. and E.A.; visualization, R.F.; supervision, E.A.; project administration, Y.Y. and E.A.; funding acquisition, E.A. All authors read and agreed to the published version of the manuscript.

Funding

This research was supported by a grant from Ford Motor Company via the Ford-UM Alliance under Award N028603 and the National Science Foundation Award Number CNS 1544844.

Data Availability Statement

The BDD100K [32] and DoTA [15] datasets are available online at https://bdd-data.berkeley.edu/ (accessed on 8 February 2021) and https://github.com/MoonBlvd/Detection-of-Traffic-Anomaly (accessed on 8 February 2021), respectively.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Enoch, M.; Cross, R.; Potter, N.; Davidson, C.; Taylor, S.; Brown, R.; Huang, H.; Parsons, J.; Tucker, S.; Wynne, E.; et al. Future local passenger transport system scenarios and implications for policy and practice. Transp. Policy 2020, 90, 52–67. [Google Scholar] [CrossRef]
  2. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.Y.; Berg, A.C. SSD: Single Shot Multibox Detector. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016. [Google Scholar]
  3. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2018, arXiv:1703.06870. [Google Scholar]
  4. Tian, Z.; Shen, C.; Chen, H.; He, T. Fcos: Fully Convolutional One-Stage Object Detection. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 27 October–2 November 2019. [Google Scholar]
  5. Wojke, N.; Bewley, A.; Paulus, D. Simple Online and Realtime Tracking with a Deep Association Metric. arXiv 2017, arXiv:1703.07402. [Google Scholar]
  6. Choi, W. Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor. arXiv 2015, arXiv:1504.02340. [Google Scholar]
  7. Xiang, Y.; Alahi, A.; Savarese, S. Learning to Track: Online Multi-object Tracking by Decision Making. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 11–18 December 2015; pp. 4705–4713. [Google Scholar]
  8. Alahi, A.; Goel, K.; Ramanathan, V.; Robicquet, A.; Li, F.-F.; Savarese, S. Social lstm: Human trajectory prediction in crowded spaces. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  9. Yao, Y.; Xu, M.; Choi, C.; Crandall, D.J.; Atkins, E.M.; Dariush, B. Egocentric Vision-based Future Vehicle Localization for Intelligent Driving Assistance Systems. arXiv 2019, arXiv:1809.07408. [Google Scholar]
  10. Yao, Y.; Atkins, E.; Johnson-Roberson, M.; Vasudevan, R.; Du, X. BiTraP: Bi-directional Pedestrian Trajectory Prediction with Multi-modal Goal Estimation. arXiv 2020, arXiv:2007.14558. [Google Scholar]
  11. Salzmann, T.; Ivanovic, B.; Chakravarty, P.; Pavone, M. Trajectron++: Multi-Agent Generative Trajectory Forecasting With Heterogeneous Data for Control. arXiv 2020, arXiv:2001.03093. [Google Scholar]
  12. Yao, Y.; Atkins, E. The smart black box: A value-driven automotive event data recorder. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018. [Google Scholar]
  13. Yao, Y.; Atkins, E. The Smart Black Box: A Value-Driven High-Bandwidth Automotive Event Data Recorder. IEEE Trans. Intell. Transp. Syst. 2020. [Google Scholar] [CrossRef] [Green Version]
  14. Yao, Y.; Xu, M.; Wang, Y.; Crandall, D.J.; Atkins, E.M. Unsupervised Traffic Accident Detection in First-Person Videos. arXiv 2019, arXiv:1903.00618. [Google Scholar]
  15. Yao, Y.; Wang, X.; Xu, M.; Pu, Z.; Atkins, E.; Crandall, D. When, Where, and What? A New Dataset for Anomaly Detection in Driving Videos. arXiv 2020, arXiv:2004.03044. [Google Scholar]
  16. Xu, M.; Gao, M.; Chen, Y.T.; Davis, L.S.; Crandall, D.J. Temporal Recurrent Networks for Online Action Detection. arXiv 2019, arXiv:1811.07391. [Google Scholar]
  17. DaSilva, M. Analysis of Event Data Recorder Data for Vehicle Safety Improvement; National Highway Traffic Safety Administration: Washington, DC, USA, 2014; pp. 21–143.
  18. Gabler, H.C.; Hampton, C.E.; Hinch, J. Crash Severity: A Comparison of Event Data Recorder Measurements with Accident Reconstruction Estimates. In Proceedings of the SAE 2004 World Congress & Exhibition, Detroit, MI, USA, 8–11 March 2004. [Google Scholar] [CrossRef] [Green Version]
  19. Takeda, K.; Miyajima, C.; Suzuki, T.; Angkititrakul, P.; Kurumida, K.; Kuroyanagi, Y.; Ishikawa, H.; Terashima, R.; Wakita, T.; Oikawa, M.; et al. Self-Coaching System Based on Recorded Driving Data: Learning From One’s Experiences. IEEE Trans. Intell. Transp. Syst. 2012, 13, 1821–1831. [Google Scholar] [CrossRef]
  20. Zhao, D.; Lam, H.; Peng, H.; Bao, S.; LeBlanc, D.J.; Nobukawa, K.; Pan, C.S. Accelerated Evaluation of Automated Vehicles Safety in Lane-Change Scenarios Based on Importance Sampling Techniques. IEEE Trans. Intell. Transp. Syst. 2017, 18, 595–607. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  21. Dingus, T.A.; Neale, V.L.; Klauer, S.G.; Petersen, A.D.; Carroll, R.J. The development of a naturalistic data collection system to perform critical incident analysis: An investigation of safety and fatigue issues in long-haul trucking. Accid. Anal. Prev. 2006, 38, 1127–1136. [Google Scholar] [CrossRef] [PubMed]
  22. Li, N.; Misu, T.; Miranda, A. Driver behavior event detection for manual annotation by clustering of the driver physiological signals. In Proceedings of the 2016 IEEE 19th International Conference on Intelligent Transportation Systems (ITSC), Rio de Janeiro, Brazil, 1–4 November 2016; pp. 2583–2588. [Google Scholar] [CrossRef]
  23. Chan, F.H.; Chen, Y.; Xiang, Y.; Sun, M. Anticipating Accidents in Dashcam Videos. In Asian Conference on Computer Vision; Springer: Cham, Switzerland, 2016. [Google Scholar]
  24. Herzig, R.; Levi, E.; Xu, H.; Gao, H.; Brosh, E.; Wang, X.; Globerson, A.; Darrell, T. Spatio-Temporal Action Graph Networks. arXiv 2019, arXiv:1812.01233. [Google Scholar]
  25. Wang, L.; Xiong, Y.; Wang, Z.; Qiao, Y.; Lin, D.; Tang, X.; Gool, L.V. Temporal Segment Networks: Towards Good Practices for Deep Action Recognition. arXiv 2016, arXiv:1608.00859. [Google Scholar]
  26. Tran, D.; Wang, H.; Torresani, L.; Ray, J.; LeCun, Y.; Paluri, M. A Closer Look at Spatiotemporal Convolutions for Action Recognition. arXiv 2018, arXiv:1711.11248. [Google Scholar]
  27. Feichtenhofer, C.; Fan, H.; Malik, J.; He, K. SlowFast Networks for Video Recognition. arXiv 2019, arXiv:1812.03982. [Google Scholar]
  28. Lewis, V.; Dingus, T.; Klauer, S.; Sudweeks, J. An Overview of the 100-Car Naturalistic Study and Findings; National Highway Traffic Safety Administration: Washington, DC, USA, 2005.
  29. Bezzina, D.; Sayer, J. Safety Pilot Model Deployment: Test Conductor Team Report 2015; Tech. Rep. DOT HS 812 171, 2014; NHTSA: Washington, DC, USA, 2015.
  30. Cordts, M.; Omran, M.; Ramos, S.; Rehfeld, T.; Enzweiler, M.; Benenson, R.; Franke, U.; Roth, S.; Schiele, B. The Cityscapes Dataset for Semantic Urban Scene Understanding. arXiv 2016, arXiv:1604.01685. [Google Scholar]
  31. Geiger, A.; Lenz, P.; Stiller, C.; Urtasun, R. Vision meets robotics: The KITTI dataset. Int. J. Robot. Res. 2013, 32, 1231–1237. [Google Scholar] [CrossRef] [Green Version]
  32. Yu, F.; Xian, W.; Chen, Y.; Liu, F.; Liao, M.; Madhavan, V.; Darrell, T. BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling. arXiv 2018, arXiv:1805.04687. [Google Scholar]
  33. Fang, J.; Yan, D.; Qiao, J.; Xue, J. DADA: A Large-scale Benchmark and Model for Driver Attention Prediction in Accidental Scenarios. arXiv 2019, arXiv:1912.12148. [Google Scholar]
  34. Espié, E.; Guionneau, C.; Wymann, B.; Dimitrakakis, C.; Coulom, R.; Sumner, A. TORCS, The Open Racing Car Simulator. 2005. Available online: http://torcs.sourceforge.net/ (accessed on 13 December 2020).
  35. Ilg, E.; Mayer, N.; Saikia, T.; Keuper, M.; Dosovitskiy, A.; Brox, T. FlowNet 2.0: Evolution of Optical Flow Estimation with Deep Networks. arXiv 2016, arXiv:1612.01925. [Google Scholar]
  36. Hasan, M.; Choi, J.; Neumann, J.; Roy-Chowdhury, A.K.; Davis, L.S. Learning temporal regularity in video sequences. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
  37. Chong, Y.S.; Tay, Y.H. Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks; Springer: Cham, Switzerland, 2017. [Google Scholar]
  38. Liu, W.; Luo, W.; Lian, D.; Gao, S. Future frame prediction for anomaly detection—A new baseline. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
  39. Ionescu, R.T.; Khan, F.S.; Georgescu, M.I.; Shao, L. Object-centric auto-encoders and dummy anomalies for abnormal event detection in video. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  40. Morais, R.; Le, V.; Tran, T.; Saha, B.; Mansour, M.; Venkatesh, S. Learning regularity in skeleton trajectories for anomaly detection in videos. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  41. Pasha, J.; Dulebenets, M.A.; Kavoosi, M.; Abioye, O.F.; Theophilus, O.; Wang, H.; Kampmann, R.; Guo, W. Holistic tactical-level planning in liner shipping: An exact optimization approach. J. Shipp. Trade 2020, 5, 1–35. [Google Scholar] [CrossRef]
  42. Dulebenets, M.A. An Adaptive Island Evolutionary Algorithm for the berth scheduling problem. Memetic Comput. 2020, 12, 51–72. [Google Scholar] [CrossRef]
  43. Kağan Albayrak, M.B.; Özcan, İ.Ç.; Can, R.; Dobruszkes, F. The determinants of air passenger traffic at Turkish airports. J. Air Transp. Manag. 2020, 86, 101818. [Google Scholar] [CrossRef]
  44. Trösterer, S.; Meneweger, T.; Meschtscherjakov, A.; Tscheligi, M. Transport companies, truck drivers, and the notion of semi-autonomous trucks: A contextual examination. In Proceedings of the 9th International Conference on Automotive User Interfaces and Interactive Vehicular Applications Adjunct, Oldenburg, Germany, 24–27 September 2017; pp. 201–205. [Google Scholar]
Figure 1. The updated smart black box (SBB) data recording pipeline. Green blocks denote new modules that were not present in the original SBB [13]. Black arrows indicate logic flow, and blue arrows indicate data flow. LBO, local buffer optimization.
Figure 1. The updated smart black box (SBB) data recording pipeline. Green blocks denote new modules that were not present in the original SBB [13]. Black arrows indicate logic flow, and blue arrows indicate data flow. LBO, local buffer optimization.
Algorithms 14 00057 g001
Figure 2. Compressed normal frames (left) and preserved anomalies (right).
Figure 2. Compressed normal frames (left) and preserved anomalies (right).
Algorithms 14 00057 g002
Figure 3. Compressed anomaly failure cases. * indicates non-ego.
Figure 3. Compressed anomaly failure cases. * indicates non-ego.
Algorithms 14 00057 g003
Figure 4. Decision histograms per anomaly class (ST: collision with a starting, stopping, or stationary vehicle; AH: ahead collision; LA: lateral collision; OC: oncoming collision; TC: turning or crossing collision; VP: vehicle-pedestrian collision; VO: vehicle-obstacle collision; * indicates non-ego; OO: out-of-control leaving roadway). The x-axis represents decision d t [ 0 , 1 ] , and the y-axis represents the frame count per bin.
Figure 4. Decision histograms per anomaly class (ST: collision with a starting, stopping, or stationary vehicle; AH: ahead collision; LA: lateral collision; OC: oncoming collision; TC: turning or crossing collision; VP: vehicle-pedestrian collision; VO: vehicle-obstacle collision; * indicates non-ego; OO: out-of-control leaving roadway). The x-axis represents decision d t [ 0 , 1 ] , and the y-axis represents the frame count per bin.
Algorithms 14 00057 g004
Figure 5. Mean normal and anomalous decisions for α + β = 1 .
Figure 5. Mean normal and anomalous decisions for α + β = 1 .
Algorithms 14 00057 g005
Table 1. Video-based anomaly detection and action recognition methods. TRN, temporal recurrent network.
Table 1. Video-based anomaly detection and action recognition methods. TRN, temporal recurrent network.
MethodAimDescription
TAD [14]Anomaly detection (unsupervised)Predicts future bounding boxes using RNN encoder-decoders, then takes the standard deviation of the predictions as the anomaly score.
DSA-RNN [23]Anomaly detection (supervised)Uses a dynamic-spatial-attention (DSA)-RNN, which learns to distribute soft attention to objects and model the temporal dependencies of detected cues.
STAG [24]Anomaly detection (supervised)Uses a spatio-temporal action graph (STAG) network to model the spatial and temporal relations among objects.
TSN [25]Action recognition (offline)Sparsely samples video snippets and predicts action using RGB and optical flow data.
R(2+1)D [26]Action recognition (offline)Uses a 3D convolutional neural network with separate 2D and 1D convolutional blocks.
SlowFast [27]Action recognition (offline)Extracts frames from a low frame rate stream to capture spatial information and a high frame rate stream to capture motion.
TRN [16]Action recognition (online)Simultaneously detects the current action and predicts the action of the following frame.
Table 2. High-quality driving video datasets. DoTA, Detection of Traffic Anomaly.
Table 2. High-quality driving video datasets. DoTA, Detection of Traffic Anomaly.
Dataset# of FramesData Size (GB)Anomaly-Focused# of Anomalous Videos
KITTI7481(15 fps)12NoN/A
Cityscapes24,999(17 fps)55NoN/A
BDD100K120,000,000(30 fps)∼1800NoN/A
A3D128,174(10 fps)15Yes1500
DADA648,476(30 fps)53Yes2000
DoTA732,932(10 fps)57Yes4677
Table 3. Event Classes in the DoTA dataset [15].
Table 3. Event Classes in the DoTA dataset [15].
NameIDDescription
N0No anomaly
ST1Collision with another vehicle which starts, stops, or is stationary
AH2Collision with another vehicle moving ahead or waiting
LA3Collision with another vehicle moving laterally in the same direction
OC4Collision with another oncoming vehicle
TC5Collision with another vehicle that turns into or crosses a road
VP6Collision between vehicle and pedestrian
VO7Collision with an obstacle in the roadway
OO8Out-of-control and leaving the roadway to the left or right
Table 4. DoTA Anomaly Class Probabilities and Values. An anomaly label with “*” indicates an event where the ego car is not involved (i.e., non-ego); otherwise, the event is ego-involved.
Table 4. DoTA Anomaly Class Probabilities and Values. An anomaly label with “*” indicates an event where the ego car is not involved (i.e., non-ego); otherwise, the event is ego-involved.
STAHLAOCTCVPVOOOST *AH *LA *OC *TC *VP *VO *OO *
Likelihood0.0110.0570.0540.0230.1630.0120.0100.0890.0100.0910.1040.0810.2070.0100.0110.070
Normalized
info.measure
0.9770.6350.6330.8160.3950.9570.9950.5251.00.5210.4910.5460.3421.00.9900.576
Table 5. Raw, SBB-compressed and ground-truth (GT) SBB-compressed data statistics on the BDD100K+DoTA dataset. VAD, video anomaly detection; OAD, online action detection.
Table 5. Raw, SBB-compressed and ground-truth (GT) SBB-compressed data statistics on the BDD100K+DoTA dataset. VAD, video anomaly detection; OAD, online action detection.
NormalAnomalyAnomaly Ratio
Raw Data# of Frames3,967,97716,768
size (GB)427.201.760.41%
SBB w/VAD+OADsize (GB)65.530.330.51%
avg. d i 0.510.58
med. d i 0.550.63
std. d i 0.240.26
SBB w/GT VAD+OADsize (GB)11.050.736.60%
avg. d i 0.000.92
med. d i 0.000.92
std. d i 0.030.00
Table 6. Comparison of prioritized recording and FIFO.
Table 6. Comparison of prioritized recording and FIFO.
M Normal↓Anomaly↑Anomaly Ratio↑
25 GBFIFO1,739,85578510.45%
Priority1,487,57085450.57%
12.5 GBFIFO889,67943350.49%
Priority734,62551540.70%
6.25 GBFIFO437,67320290.46%
Priority364,66628980.79%
3.125 GBFIFO207,9519620.46%
Priority183,95117060.93%
Table 7. Compression quality decisions for hybrid value estimation.
Table 7. Compression quality decisions for hybrid value estimation.
Value Estimation α β NormalAnomaly
VAD Only1.00.0avg. d i 0.270.38
med. d i 0.150.38
std. d i 0.300.34
OAD Only0.01.0avg. d i 0.100.14
med. d i 0.00.07
std. d i 0.150.17
Hybrid1.01.0avg. d i 0.510.58
med. d i 0.550.63
std. d i 0.240.26
0.90.1avg. d i 0.260.37
med. d i 0.140.36
std. d i 0.290.33
0.50.5avg. d i 0.200.30
med. d i 0.100.27
std. d i 0.240.28
0.10.9avg. d i 0.120.19
med. d i 0.020.16
std. d i 0.160.19
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Feng, R.; Yao, Y.; Atkins, E. Smart Black Box 2.0: Efficient High-Bandwidth Driving Data Collection Based on Video Anomalies. Algorithms 2021, 14, 57. https://0-doi-org.brum.beds.ac.uk/10.3390/a14020057

AMA Style

Feng R, Yao Y, Atkins E. Smart Black Box 2.0: Efficient High-Bandwidth Driving Data Collection Based on Video Anomalies. Algorithms. 2021; 14(2):57. https://0-doi-org.brum.beds.ac.uk/10.3390/a14020057

Chicago/Turabian Style

Feng, Ryan, Yu Yao, and Ella Atkins. 2021. "Smart Black Box 2.0: Efficient High-Bandwidth Driving Data Collection Based on Video Anomalies" Algorithms 14, no. 2: 57. https://0-doi-org.brum.beds.ac.uk/10.3390/a14020057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop