基于獎勵因子的囚徒困境博弈模型研究

2016-04-14 06:56:44陳維春尚麗輝

電子科技 2016年3期

陳維春,尚麗輝

(上海理工大學光電信息與計算機工程學院,上海　200093)

基于獎勵因子的囚徒困境博弈模型研究

陳維春,尚麗輝

(上海理工大學光電信息與計算機工程學院,上海200093)

摘要針對合作演化問題,通過引入獎勵因子。根據演化博弈論,研究在空間囚徒困境博弈中,獎勵因子和記憶長度對策略改變的影響。先后分析了合作率與對背叛誘惑之間的關系圖,合作率與記憶長度的關系圖以及臨界值和獎勵因子變化的關系圖等。研究發現,與傳統囚徒困境博弈模型相比,增加獎勵因子或減少記憶長度能有效促進合作的演化。

關鍵詞囚徒困境博弈;獎勵因子;收益系數

Research on Prisoner’s Dilemma Game Based on Reward Factor

CHEN Weichun,SHANG Lihui

(School of Optical-Electrical and Computer Engineering,University of Shanghai for Science and Technology,Shanghai 200093,China)

AbstractThe evolution of cooperation is discussed according to the evolutionary game theory.We study the reward factor and memory during the strategy update process by introducing the reward factor in the spatial Prisoner’s Dilemma Game.Firstly,we analyze the rate of cooperation and the temptation of betrayal,the relationship cooperative rate and memory length,and the relationship diagram of critical value and reward factor.A comparison with traditional Prisoner’s Dilemma Game Model shows that increasing the reward factor or reducing the memory length effectively promotes the evolution of cooperation.

Keywordsprisoner’s dilemma game;reward factor;coefficient

在自然界和人類社會中廣泛存在著合作行為[1-9]。在傳統的囚徒困境博弈模型中[10],每個參與者均有兩種策略選擇:合作或者背叛。若兩個參與者均選擇合作,則每個人可得到較高收益,記為R;若兩個參與者均選擇背叛,則每個人會得到較低的收益,記為P;而如果一個參與者選擇合作而另一個選擇背叛,則合作者會得到最低的傻瓜式報酬,記為S,而背叛者可得到最高的收益,即對背叛的誘惑,記為T。因此,可得到T>R>P>S?？梢钥闯?自私的個體會選擇背叛,而不會考慮對手的選擇。因而,這種困境行為會導致背叛者大量出現。

本文通過引入獎勵因子的概念,研究空間囚徒困境博弈中,獎勵因子和記憶長度對策略改變的影響。若參與者能將自身的策略連續M代或M代以上逐代傳給其的后代而不間斷,就給予適當的獎勵。在博弈……

登錄APP查看全文

電子科技 2016年3期

電子科技的其它文章: 一種基于Burg譜估計和FFT的頻偏估計方法; 兩種針對小信號的脈壓優化算法及實現; Two-way中繼系統的中繼選擇算法分析; 一種快速的超像素分割方法; 基于Q-Learning的虛擬機動態伸縮算法; 基于改進小波包變換的音頻指紋提取算法