【論文速遞2-14】自動駕駛方向優(yōu)質(zhì)的論文及其代碼

InfoRich 2022-02-14

展開全文

Autonomous Vehicles - 自動駕駛

1. 【Autonomous Vehicles】Curriculum Domain Adaptation for Semantic Segmentation of Urban Scenes

【自動駕駛】城市場景語義分割的課程域自適應

作者：Yang Zhang, Philip David, Boqing Gong

鏈接：

https:///abs/1707.09465v5

代碼：

https://github.com/YangZhang4065/AdaptationSeg

英文摘要：

During the last half decade, convolutional neural networks (CNNs) have triumphed over semantic segmentation, which is one of the core tasks in many applications such as autonomous driving. However, to train CNNs requires a considerable amount of data, which is difficult to collect and laborious to annotate. Recent advances in computer graphics make it possible to train CNNs on photo-realistic synthetic imagery with computer-generated annotations. Despite this, the domain mismatch between the real images and the synthetic data cripples the models' performance. Hence, we propose a curriculum-style learning approach to minimize the domain gap in urban scenery semantic segmentation. The curriculum domain adaptation solves easy tasks first to infer necessary properties about the target domain; in particular, the first task is to learn global label distributions over images and local distributions over landmark superpixels. These are easy to estimate because images of urban scenes have strong idiosyncrasies (e.g., the size and spatial relations of buildings, streets, cars, etc.). We then train a segmentation network while regularizing its predictions in the target domain to follow those inferred properties. In experiments, our method outperforms the baselines on two datasets and two backbone networks. We also report extensive ablation studies about our approach.

中文摘要：

在過去的五年中，卷積神經(jīng)網(wǎng)絡(CNN)戰(zhàn)勝了語義分割，這是自動駕駛等許多應用的核心任務之一。然而，訓練CNN需要大量的數(shù)據(jù)，這些數(shù)據(jù)難以收集且難以標注。計算機圖形學的最新進展使得使用計算機生成的注釋在照片般逼真的合成圖像上訓練CNN成為可能。盡管如此，真實圖像和合成數(shù)據(jù)之間的域不匹配會削弱模型的性能。因此，我們提出了一種課程式學習方法，以最小化城市景觀語義分割中的領域差距。課程領域適應首先解決了簡單的任務，以推斷目標領域的必要屬性；特別是，第一個任務是學習圖像上的全局標簽分布和地標超像素上的局部分布。這些很容易估計，因為城市場景的圖像具有很強的特性（例如，建筑物、街道、汽車等的大小和空間關系）。然后，我們訓練一個分割網(wǎng)絡，同時規(guī)范其在目標域中的預測，以遵循這些推斷的屬性。在實驗中，我們的方法在兩個數(shù)據(jù)集和兩個骨干網(wǎng)絡上優(yōu)于基線。我們還報告了有關我們方法的廣泛消融研究。

2. 【Autonomous Vehicles】Fast Scene Understanding for Autonomous Driving

【自動駕駛】自動駕駛的快速場景理解

作者：Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool

鏈接：

https:///abs/1708.02550v1

代碼：

https://github.com/davyneven/fastSceneUnderstanding

英文摘要：

Most approaches for instance-aware semantic labeling traditionally focus on accuracy. Other aspects like runtime and memory footprint are arguably as important for real-time applications such as autonomous driving. Motivated by this observation and inspired by recent works that tackle multiple tasks with a single integrated architecture, in this paper we present a real-time efficient implementation based on ENet that solves three autonomous driving related tasks at once: semantic scene segmentation, instance segmentation and monocular depth estimation. Our approach builds upon a branched ENet architecture with a shared encoder but different decoder branches for each of the three tasks. The presented method can run at 21 fps at a resolution of 1024x512 on the Cityscapes dataset without sacrificing accuracy compared to running each task separately.

中文摘要：

大多數(shù)實例感知語義標簽的方法傳統(tǒng)上都側(cè)重于準確性。運行時間和內(nèi)存占用等其他方面可以說對于自動駕駛等實時應用程序同樣重要。受這一觀察的啟發(fā)，并受到最近使用單一集成架構處理多個任務的工作的啟發(fā)，在本文中，我們提出了一種基于ENet的實時高效實現(xiàn)，它同時解決了三個與自動駕駛相關的任務：語義場景分割、實例分割和單目深度估計。我們的方法建立在一個分支的ENet架構上，該架構具有一個共享的編碼器，但三個任務中的每一個任務都有不同的解碼器分支。與單獨運行每個任務相比，所提出的方法可以在Cityscapes數(shù)據(jù)集上以1024x512的分辨率以21fps的速度運行，而不會犧牲準確性。

3. 【Autonomous Vehicles】Deep Steering: Learning End-to-End Driving Model from Spatial and Temporal Visual Cues

【自動駕駛】深度轉(zhuǎn)向：從空間和時間視覺線索中學習端到端駕駛模型

作者：Lu Chi, Yadong Mu

鏈接：

https:///abs/1708.03798v1

代碼：

https://github.com/abhileshborode/Behavorial-Clonng-Self-driving-cars

英文摘要：

In recent years, autonomous driving algorithms using low-cost vehicle-mounted cameras have attracted increasing endeavors from both academia and industry. There are multiple fronts to these endeavors, including object detection on roads, 3-D reconstruction etc., but in this work we focus on a vision-based model that directly maps raw input images to steering angles using deep networks. This represents a nascent research topic in computer vision. The technical contributions of this work are three-fold. First, the model is learned and evaluated on real human driving videos that are time-synchronized with other vehicle sensors. This differs from many prior models trained from synthetic data in racing games. Second, state-of-the-art models, such as PilotNet, mostly predict the wheel angles independently on each video frame, which contradicts common understanding of driving as a stateful process. Instead, our proposed model strikes a combination of spatial and temporal cues, jointly investigating instantaneous monocular camera observations and vehicle's historical states. This is in practice accomplished by inserting carefully-designed recurrent units (e.g., LSTM and Conv-LSTM) at proper network layers. Third, to facilitate the interpretability of the learned model, we utilize a visual back-propagation scheme for discovering and visualizing image regions crucially influencing the final steering prediction. Our experimental study is based on about 6 hours of human driving data provided by Udacity. Comprehensive quantitative evaluations demonstrate the effectiveness and robustness of our model, even under scenarios like drastic lighting changes and abrupt turning. The comparison with other state-of-the-art models clearly reveals its superior performance in predicting the due wheel angle for a self-driving car.

中文摘要：

近年來，使用低成本車載攝像頭的自動駕駛算法吸引了學術界和工業(yè)界越來越多的努力。這些努力有多個方面，包括道路上的對象檢測、3-D重建等，但在這項工作中，我們專注于基于視覺的模型，該模型使用深度網(wǎng)絡將原始輸入圖像直接映射到轉(zhuǎn)向角。這代表了計算機視覺領域的一個新興研究課題。這項工作的技術貢獻是三方面的。首先，在與其他車輛傳感器時間同步的真實人類駕駛視頻上學習和評估模型。這與許多從賽車游戲中的合成數(shù)據(jù)訓練的先前模型不同。其次，最先進的模型，如PilotNet，大多獨立地預測每個視頻幀上的車輪角度，這與將駕駛作為有狀態(tài)過程的普遍理解相矛盾。相反，我們提出的模型結合了空間和時間線索，共同研究瞬時單目相機觀察和車輛的歷史狀態(tài)。這實際上是通過在適當?shù)木W(wǎng)絡層插入精心設計的循環(huán)單元（例如LSTM和Conv-LSTM）來實現(xiàn)的。第三，為了促進學習模型的可解釋性，我們利用視覺反向傳播方案來發(fā)現(xiàn)和可視化對最終轉(zhuǎn)向預測產(chǎn)生關鍵影響的圖像區(qū)域。我們的實驗研究基于Udacity提供的大約6小時的人類駕駛數(shù)據(jù)。全面的定量評估證明了我們模型的有效性和穩(wěn)健性，即使在劇烈的照明變化和突然轉(zhuǎn)彎等場景下也是如此。與其他最先進模型的比較清楚地表明，它在預測自動駕駛汽車的應有車輪角度方面具有卓越的性能。

4. 【Autonomous Vehicles】Free Space Estimation using Occupancy Grids and Dynamic Object Detection

【自動駕駛】使用占用網(wǎng)格和動態(tài)對象檢測的自由空間估計

作者：Raghavender Sahdev

鏈接：

https:///abs/1708.04989v1

代碼：

https://github.com/raghavendersahdev/Free-Space

英文摘要：

In this paper we present an approach to estimate Free Space from a Stereo image pair using stochastic occupancy grids. We do this in the domain of autonomous driving on the famous benchmark dataset KITTI. Later based on the generated occupancy grid we match 2 image sequences to compute the top view representation of the map. We do this to map the environment. We compute a transformation between the occupancy grids of two successive images and use it to compute the top view map. Two issues need to be addressed for mapping are discussed - computing a map and dealing with dynamic objects for computing the map. Dynamic Objects are detected in successive images based on an idea similar to tracking of foreground objects from the background objects based on motion flow. A novel RANSAC based segmentation approach has been proposed here to address this issue.

中文摘要：

在本文中，我們提出了一種使用隨機占用網(wǎng)格從立體圖像對估計自由空間的方法。我們在著名的基準數(shù)據(jù)集KITTI上的自動駕駛領域這樣做。稍后基于生成的占用網(wǎng)格，我們匹配2個圖像序列來計算地圖的頂視圖表示。我們這樣做是為了映射環(huán)境。我們計算兩個連續(xù)圖像的占用網(wǎng)格之間的轉(zhuǎn)換，并使用它來計算頂視圖地圖。討論了映射需要解決的兩個問題-計算地圖和處理用于計算地圖的動態(tài)對象。基于類似于基于運動流從背景對象跟蹤前景對象的想法，在連續(xù)圖像中檢測動態(tài)對象。這里提出了一種新的基于RANSAC的分割方法來解決這個問題。

5. 【Autonomous Vehicles】Arguing Machines: Human Supervision of Black Box AI Systems That Make Life-Critical Decisions

【自動駕駛】爭論機器：人類對黑匣子人工智能系統(tǒng)的監(jiān)督，這些系統(tǒng)會做出至關重要的決定

作者：Lex Fridman, Li Ding, Benedikt Jenik, Bryan Reimer

鏈接：

https:///abs/1710.04459v2

代碼：

https://github.com/scope-lab-vu/deep-nn-car

英文摘要：

We consider the paradigm of a black box AI system that makes life-critical decisions. We propose an 'arguing machines' framework that pairs the primary AI system with a secondary one that is independently trained to perform the same task. We show that disagreement between the two systems, without any knowledge of underlying system design or operation, is sufficient to arbitrarily improve the accuracy of the overall decision pipeline given human supervision over disagreements. We demonstrate this system in two applications: (1) an illustrative example of image classification and (2) on large-scale real-world semi-autonomous driving data. For the first application, we apply this framework to image classification achieving a reduction from 8.0% to 2.8% top-5 error on ImageNet. For the second application, we apply this framework to Tesla Autopilot and demonstrate the ability to predict 90.4% of system disengagements that were labeled by human annotators as challenging and needing human supervision.

中文摘要：

我們考慮了一個黑盒人工智能系統(tǒng)的范式，它可以做出至關重要的決定。我們提出了一個“爭論機器”框架，它將主要的AI系統(tǒng)與經(jīng)過獨立訓練以執(zhí)行相同任務的輔助系統(tǒng)配對。我們表明，在沒有任何底層系統(tǒng)設計或操作知識的情況下，兩個系統(tǒng)之間的分歧足以在人工監(jiān)督分歧的情況下任意提高整體決策管道的準確性。我們在兩個應用中演示了該系統(tǒng)：（1）圖像分類的說明性示例和（2）大規(guī)模真實世界的半自動駕駛數(shù)據(jù)。對于第一個應用，我們將此框架應用于圖像分類，將ImageNet上的top-5錯誤從8.0%減少到2.8%。對于第二個應用，我們將此框架應用于Tesla Autopilot，并展示了預測90.4%的系統(tǒng)脫離的能力，這些脫離被人工注釋者標記為具有挑戰(zhàn)性且需要人工監(jiān)督。