裴樹軍 孔德凱 苗輝



摘要:云環境下傳統的任務調度算法整體效率較低,為了提高任務調度的整體效率,在Map/Reduce基礎上提出了一種基于處理時間的DMS任務調度算法。首先,對復雜任務進行預處理,將復雜任務轉化為DAG圖,依據任務依賴關系大小產生最佳拓撲排序,并依據排序結果將復雜任務交給work節點進行處理;其次,通過將節點處理任務的預測時間與節點處理能力的比值作為子任務在每個節點的處理“時間”進行量化建模,建立任務和處理時間的度量矩陣,依據DMS算法進行處理,從而獲得任務分配最佳方案;最后,從任務調度效率與資源使用率的角度將DMS算法與公平調度算法、遺傳算法行對比驗證。實驗結果表明,DMS算法能明顯提高任務調度整體效率,充分利用各節點的計算能力提高了Map/Reduce的調度效率。
關鍵詞:
云計算;Map/Reduce;任務調度;差值矩陣
DOI:10?15938/j?jhust?2019?01?012
中圖分類號: TP319
文獻標志碼: A
文章編號: 1007-2683(2019)01-0071-07
DMS Algorithm in the Application of the Map/Reduce Tasks Schedule
PEI Shu?jun,KONG De?kai,MIAO Hui
(School of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080, China)
Abstract:The whole efficiency of traditional task scheduling algorithms is low under the cloud environment, In order to improve the whole efficiency of the task scheduling, this article based on Map/Reduce presents a Difference Matrix Scheduling tasks schedule algorithm based on processing time?Firstly, pretreatment of complex tasks, the complex tasks is converted to Directed Acyclic Graph figure, the tasks are topological sorted in an optimal manner according to the size of the task dependencies, and the work node is accordance with the sort to processing the complex tasks; Secondly, using the ratio of predictive time that node process tasks to node process capacity as a subtask in each node time quantitative modeling, then establish the task and the metric matrix of process time, according the Difference Matrix Scheduling to processing the matrix, and obtain the optimal scheme of task assignment. Finally, the experiment evaluates the Difference Matrix Scheduling ,fair scheduling algorithm, genetic algorithm in the task scheduling and resource utilization efficiency angles?The results show that the algorithm can significantly improve the overall efficiency of complex task scheduling and make full use of the capacity of the compute nodes to improve the Map / Reduce scheduling efficiency
Keywords:cloud computing; map/reduce; tasks assign; difference matrix
0引言
隨著物聯網、移動互聯網、社會化網絡的快速發展,數據來源的渠道逐漸增多,半結構化及非結構化數據呈幾何倍增長,從而加速了大數據[1-2]處理技術的快速發展與變革。云計算作為一種新興的商業計算模式,采用并行的處理方式提高了大數據的處理效率。任務調度[3-5]問題一直是云計算系統關注的核心問題,而影響任務調度效率的因素很多,其中任務調度模型與算法的好壞能夠直接影響云計算系統的整體性能?,F在很多學者都提出了很多有效的方法:國內的Hadoop[6]技術論壇的總編易劍等學者提出了Map?Balance?Reduce模型,即在Map節點處理完任務形成中間任務后,使用一個balance的循環過程進行均衡Reduce的輸入,這樣可以用來解決輸入不均衡問題;Abhishek Verma提出了一種LATE調度算法,該算法主要是通過計算待執行和正在執行任務的剩余時間,將執行最慢的任務進行備份從而縮短Map/Reduce作業執行時間;Tang Zhou等提出了MTSD算法,該算法主要考慮數據的本地行與集群異構特點,并且以任務執行截止期限作為依據。依據節點計算能力大小決定數據存儲的大小,提高了任務數據本地性。