李月標++李力++張毅
摘 要:數據缺失問題是交通領域中的主要難題之一。為了解決這一問題,國內外的學者在近年來提出了大量的數據補償算法,這些算法雖然都能在一定程度上提高交通數據的準確性,但其精度和運算速度均有所區別。從已有算法中選取精度高、運算速度快的算法對提高交通系統的性能具有重要的意義。該研究以目前的主流預測算法為對象,分析了各類算法的優缺點,并選取典型的預測類補償算法、插值類補償算法和統計類補償算法對PeMS線圈數據進行補償,幾種算法的準確性和運算速度的結果表明主成份分析法PPCA具有最好的補償效果。進一步分析PPCA算法與其改進算法KPPCA和MPPCA對單點數據補償的效果,結果表明,改進算法的補償精度稍優于PPCA算法,但其計算時間也明顯高于PPCA算法。在此基礎上,分析PPCA算法和KPPCA算法對多點數據進行補償的效果,結果表明考慮多點數據的空間關聯性可以使PPCA算法和KPPCA算法的補償精度得到明顯提高。同時考慮多點數據的時間關聯性和空間關聯性時,KPPCA算法精度優于PPCA算法,但其運算效率明顯低于PPCA算法。因此,對單點數據進行補償或多點數據間的時間關聯性不強時,選用PPCA算法進行補償能同時獲得較高的補償精度和運算速度。在不考慮運算時間成本時,KPPCA算法可以獲得更高的補償精度。
關鍵詞:數據補償 主成份分析法 基于Kernel的主成份分析法 時空特征
Comparison of Traffic Imputation Methods Based on Spatial and Temporal Characteristics
Li Yuebiao Li Li Zhang Yi
(Tsinghua University)
Abstract:Data Missing is one of the major problems in the traffic area. In order to solve this problem, numerous data imputing algorithms have been proposed in recent years. All these algorithms can improve the accuracy of the collected data to some extent, but the precision and calculating speed can vary greatly. Selecting the algorithm with high accuracy and calculating speed is significant to improve the performance of the traffic systems. This research analyzes the typical algorithms of prediction methods, interpolation methods and statistical learning methods. And the advantages and disadvantages of these methods are compared. Using these typical algorithms to imputing the data from PeMS, the results show PPCA algorithm has optimal imputing effect. By further comparing the imputing effects of PPCA algorithm and improved PPCA algorithms - KPPCA algorithm and MPPCA algorithm for single detector data, we find that improved algorithms show higher accuracy but long calculating time. On this basis, this study analyzes the performances of PPCA and KPPCA algorithms for multiple detector data imputation. It turns out that considering data spatial characteristics can reduce imputing errors for both PPCA and KPPCA algorithms. While the imputing accuracy will improve for KPPCA but reduce for PPCA when taking time lag of data into account. Therefore, for single detector data or multiple detectors data whose time correlation is not obvious, PPCA is a best data imputation choice which has both high accuracy and calculating efficiency. KPPCA will show high performance on accuracy when not considering calculating time cost.
Key Words:Data Imputation; PPCA; KPPCA; Temporal and spatial characteristics