999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Information tracing model based on PageRank

2017-09-08 09:14:53LIQianLAIJiaweiXIAOYunpengWUBin
網絡與信息安全學報 2017年8期

LI Qian, LAI Jia-wei, XIAO Yun-peng, WU Bin

(1. Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. Chongqing Engineering Laboratory of Network and Information Security, Chongqing University of Posts and Telecommunications, Chongqing 400065, China)

Information tracing model based on PageRank

LI Qian1,2, LAI Jia-wei2, XIAO Yun-peng2, WU Bin1

(1. Beijing Key Laboratory of Intelligent Telecommunications Software and Multimedia, Beijing University of Posts and Telecommunications, Beijing 100876, China 2. Chongqing Engineering Laboratory of Network and Information Security, Chongqing University of Posts and Telecommunications, Chongqing 400065, China)

In social network, original publisher and important nodes in the diffusion process can be traced by analyzing the spreading network of a hot topic. The participated users and spreading network structure of a hot topic build an information tracing model, which mines the source and important diffusion nodes. Firstly, it analyzed the development trend of a hot topic and extracts the users involved. Secondly, it established a user network according to the following relationship of the users involved. Thirdly, the contribution rate of users on the development of the hot topic was initialized, and the PageRank algorithm was used to construct the information tracing model. Finally, the Top k users were selected as the information publisher and important users of the hot topic according to the contribution rate. Experimental results showed that our model can effectively discover the hot topic of the publisher and important users.

social network, hot topic, information tracing, PageRank

1 Introduction

With the development and popularization of the Internet, human is entering a stage of information explosion gradually. In information society, the online social network has built a huge social function platform. It is using its power that social function to change people's way of life and form of society, and become a research platform for scholars to excavate the law of user behavior and the way of the information diffusion.

In order to find out the law of situational development of network group events, also to achieve the purpose of studying information traceability of network group event, this section first introduces the method of information traceability that based on PageRank algorithm in network group events. It can be seen from Fig 1, network group event and network structure have certain relationship. A user tends to be interested in a network group and make a certain reflection, which reflects a large probability that the person is concerned about the friend, such as the user's comment or forward behavior. Whether or not to participate in the spread of the situation of the network group events, it is also related to the user influence and user interest factors. Of course, there are also random probabilities, such as watching the online group event on a hot topic and to get involved in the hot topic.

Therefore, the development and diffusion of the situation of the network group events, greatly depend on the network structure in the social network. The information source publishes the message, which is forwarded and commented by its fans, andspread out in this way. This mode of information dissemination is similar to water waves.

Fig.1 The classical pattern of information dissemination

The structural features of this mode of transmission can be visually seen in Fig 1: each node can be traced back to the information source according to the network structure. And the shortest path to the information source is intuitively short. According to these characteristics, many scholars have proposed the method and model of information traceability, which based on centrality. The method is that to calculate the mean and variance of all the shortest paths from other nodes to one node in a network structure, and the nodes with smaller mean and variance are estimated as the source of information. The method of information traceability based on centrality has the advantages of simple principle and easy implementation. But at the same time, the complexity of time and space is relatively high, and the calculated results tend to be locally optimal or hovering around the optimum.

The characteristics of information dissemination can be found in Fig 1, sources of information is a large contribution on the trend of development of network group events, this contribution rate can be reflected by its forwarders, the contribution rate of the forwarder can also be provided by the contribution rate of its forwarder. The method of superposition of contribution rates is particularly similar to the thought of the PageRank algorithm. Therefore, the author designs a model of information traceability that based on PageRank algorithm in the process of analyzing and studying the information traceability of network group events.

The current research mainly study information tracing by the user role, related research for the late exploration laid the foundation. However, the pertinence and dynamic of information tracing presents new challenges for this research. This paper presents an information tracing model based on PageRank for hot topic, and establishes different information tracing models for different hot topics.

The highlights of this article are as follows.

1) Consider that each hot topic has different development trend, we build different information tracing model for every topic. According to the development trend of a hot topic, we extract users that promote the diffusion process. By leveraging the involved users, a following network is established, and contribution rate of the topic for every user are initialized.

2) Based on the PageRank algorithm, the information source model of hot topic is constructed, which is determined by the sorted contribution rate. This study can not only determine the source of the hot topic, but also discover the important users in the diffusion process.

The rest of the paper is organized as follows. Section 2 describe the related work. Section 3 formulates the problem. Section 4 explains the proposed model and corresponding learning algorithm. Section 5 presents the experimental data and analyzes the experimental results. Finally, section 6 summarizes the related work done in this paper.

2 Related work

At present, information tracing mainly focuses on evaluation of the node attributes and method based on statistical reasoning. The evaluation of node attributes is mainly to measure the networknodes. Previous research on this aspect focused on the network structure, for example, the size of influence can be measured by degree centrality, closeness centrality, betweenness centrality and so on[1~3]. Subsequently, some scholars have discovered the method based on the random walk, such as the Hypertext-Induced Topic Search algorithm[4~6]and the PageRank algorithm introduced by Google[7,8]. With the deepening of the research, the scholars found that the influence of the user is related to their behavior. There are four types of relational networks defined in Ref.[9], and mine the influence individual of the topic combined with the random walk. Ref.[10] measures the influence of the different hot topics of every user according to the concern network and the similarity of the users’ interest. Tang, et al. have proposed a model of topic-factor diagram based on the network structure, the similarity of topics, users and distribution of topic information. This model can measure the intensity of the effect, and it not only can be applied to the homogeneous network, but also to the heterogeneous network[11].

Based on the method of Statistical inference, the main reason is that the current situation of network communication comes from the infectious disease model[12]. On this assumption, different mathematical models are used to trace the information publishers. Antulov-Fantulin et al. Presumed that the propagation status is determined by the SIR model, using the statistical reasoning framework of the maximum likelihood estimation method to trace information according to a hot topic communication network. On this basis, the researchers also proposed the AvgTopK likelihood estimation function algorithm, AUCDF estimation function algorithm and Naive Bayesian likelihood estimation function[13, 14].

3 Problem definition

The problem to be solved in this paper as follows: confirm the publisher and the important user nodes of the hot topics through the propagation process of the hot topic. First, a few basic concepts and related definitions are presented.

1) How to represent the contribution rate of every user?

2) How to determine the publisher and important users of the hot topic?

4 Model

4.1 Framework

To solve the problems above, this paper puts forward the information tracing model for the participated users of hot topic and the relationship network. The model quantifies the contribution rate of participated users for the topic development to determine the publisher and important user. The PageRank-based information tracing model framework is shown in Fig 2.

Firstly, the users involved in the hot topic are extracted through analyzing the development trend of the hot topic; Secondly, according to the user relationship, the relationship network is built. In the relationship network the user's followee as in degree and the user's following as out degree. Then, thecontribution rate of users for the topic development is initialized. By using the PageRank algorithm, the information tracing model is established. Finally, according to the rank of contribution rate computed by the model, the publisher and important e users can be extracted.

Fig.2 Model framework

4.2 Model detail

The model can be divided into three parts, including online acquisition, information tracing model and output, which can be seen from the overall framework of information tracing model. The development of hot topics has experienced four stages: the proposed period, the rapid development period, the peak period and the declining period. According to the topic life cycle, the basic information and relationship of users can be obtained from the social network.

The analysis for user information and relationship network shows that the development of most hot topics has some obviously common features. When a tweet is posted by a user, the following users are able to comment and forward it. Then the cascade following users also can comment and forward the tweet. Then, the topic can spread like a wave.

For the first question to express the contribution of each user for the hot topic, this paper adopts the PageRank algorithm to quantify it. In the research of page rank, the PageRank value indicates the authority of the page. However, in the research of information tracing, the PageRank value means the contribution rate of users to the topic development. Further, how to build the network when applying the PageRank algorithm to information tracing? To quantify contribute of users for the development of topic, the side structure of user needs to be analyzed. In social network, user A can get the information of user B if user B is followed by user A. The direction of information flow is the opposite direction of the contribution rate flow, which is shown in Fig 3.

Fig.3 Relationship of users

The contribution rate is updated after initializing the contribution rate of each user. According to the PageRank algorithm, the update formula is as follows.

where n is the user node in the network. The first part of the above formula shows the contribution rate through the follower, and the latter part represents the contribution rate represent by any node in the network structure. The total contribution rate of user is formed of the two parts. In order to measure the importance of the two parts, the tunable parameters are introduced to control their influence according PageRank algorithm. The formula is as follows.

In order to solve second problem namely how to determine information sources user and important node users of the hot topics? Through the PageRank calculating the final PageRank value of each node, this model selects the appropriate k value to take Top k users in all user nodes. Top1 user is chosen as the information publisher of hot topic, at the same time, Top k users are important node users.

In the analysis of social network, PageRank was applied to discovery opinion leaders and evaluate influence etc. Opinion leaders discovery was oriented to all users and structure of social network, and we assume that the network is static. But in reality, the users’ relationship in social network is changing all the time. In the research of a hot topic, it is necessary to study the transmission way and characteristics of this hot topic. Therefore, considering the object of information tracing model based on PageRank is a hot topic, so the established network user nodes are also the user nodes participating in the event. The relationship among them relatively changes a little for the contribution of the situation development of hot topics in the short term.

4.3 Model learning algorithm

The input of the information tracing model is the users in hot topic and its relationship network . The purpose of this algorithm is to find the publisher of the hot topic and the important user nodes in the development of the event. The information tracing model algorithm based on PageRank is described as Fig.4.

Fig.4 Information tracing algorithm

Information tracing model is mainly divided into two parts. The first part is to calculate the contribution rate of each user in network. The second part is to confirm the information publisher and the important users through the contribution ranking. As the information tracing model algorithm described above, we can build the network with the relationship of participants. Then, initialize the parameters and the contribution rate of each user namely the value of PageRank, and use the PageRank to update the contribution rate. Finally determine the user publisher and important transmission users. If the total number of users participating in the event is n, the operation complexity of the model is T~O(N).

5 Experimental results

5.1 DataSet

In this paper, the experimental data collected from Tencent micro-blogging social platform. Tencent micro-blogging is one of the most popular online social platforms in China. In order to evaluate the information tracing model this paper takes three hot topics such as “Personal Tailor”(Topic A), “Dad where do we go”(Topic B) and ‘Panda blood girl in urgent need of help’(Topic C). We use these examples to implement the model.

Topic A is posted by “qqent”. Topic B is posted by “long story short” whose ID is “pictalks”. Topic C is posted by “liu16261”. The input of the information tracing model is the users who participate in hot topics and the relationship among them. These users are regarded as the nodes, the concerned followings and the distribution of the fans respectively as the output and the input of the chain. Fig. 5 shows the network structures build from the Topic A, Topic B and Topic C through the way mentioned above.

Fig.5 The disseminate structure of three hot topics

When the network structure constructed, the PageRank algorithm is used to realize the measurement and renewal of the contribution of each node.In other applications based on PageRank, it is found that the rationality of many effects is compared with the degree of entry or out of the node. For matching the results later better, Table 1 shows limited users’in-degree in three hot topics.

Table 1 Limited users’ in-degree in three hot topics

The purpose of the information tracing is to find the information publishers and important node users, the evaluation criteria is whether find out the information publishers and how much forward the real information publisher's rank position in all users. The metrics is mathematically defined as follows.

5.2 Performance analysis

The Fig.6 shows that the rate of algorithm convergence is faster when α changes from 0.7 to 0.9 comparing other conditions. Based on this result, when setting α=0.8 through the commonly method, the rate of algorithm convergence is fastest in the Fig. 5.

Fig.6 The value of different α

Finally, through the rank of all nodes’ contribution according to the PageRank, this article consid-ers the Top1 user as releasing the information. Table 2~4 shows the pr and number of in-links of Top10 users.

Table 2 The Top10 users’ pr value and in-degree of Topic A

Table 3 The Top10 users’ pr value and in-degree of Topic B

Table 4 The Top10 users’ pr value and in-degree of Topic C

From the Table 2~4, Topic A, Topic B, and Topic C, the number of in-links of Top10 users based on the contribution rate is large, indicating that the node users who play important role in transmitting information have been found. The publishers of Topic A and Topic B are "qqent" and "pictalks", and the actual publisher of two hot topics is also the predicted user node.

The analysis result shows that the publisher of Topic C is "QQGenius", but in fact it is "liu16261".The reason why cause such a result is "QQGenius" user has much bigger number of in-links than others when getting the number of in-links of Topic C. Therefore, it also explains a shortcoming of the information tracing model. If the gap of in-links structure is too large in information transmission, the final judgment will be affected. For the process of information transmission, if the gap of in-links is not enormous such as Topic A and Topic B, it is very suitable for information tracing model to analyze the source of information and find important node users.

Finally, the accuracy of three topics is shown in Table 5.

Table 5 Accuracy of three topics

6 Conclusion

In this paper, a hot topic information tracing model based on PageRank is proposed, which uses the users who promote the transmission of event and the structure of social network. The model finds the source of information and important nodes of hot topics. Firstly, according to the trend of a hot topic we find these users who promote the transmission of the topic. Secondly, we initialize the contribution rate of each participating user to the development ofhot topic, and use the PageRank to construct the information tracing model. Finally, through ranking the contribution rate we find the source of information and important user nodes in Top k users. This study can be applied into recommendation systems, expert discovery, advertising and other fields. Experiments show that the information tracing model based on PageRank can predict the information publishers of hot topics and the important user nodes in the process of information transmission.

[1] OPSAHL T, AGNEESSENS F, SKVORETZ J. Node centrality in weighted networks: generalizing degree and shortest paths[J]. Social networks.2010, 32: 245-251.

[2] OLSEN P W, LABOUSEUR A G, HWAN J H. Efficient top-k closeness centrality search[C]//Data Engineering (ICDE). 2014: 196-207.

[3] RIONDATO M, KORNAROPOULOS E M. Fast approximation of betweenness centrality through sampling[J]. Data Mining and Knowledge Discovery, New York, 2016, 30: 438-475.

[4] MORONE F, MAKSE H A. Influence maximization in complex networks through optimal percolation[J]. Nature, 2015, 524: 65-68.

[5] AKOGLU L, TONG H, KOUTRA D. Graph based anomaly detection and description: a survey[J]. Data Mining and Knowledge Discovery. 2015, 29: 626-688.

[6] BAO J, ZHENG Y, WILKIE D, et al. Recommendations in location-based social networks: a survey[J]. GeoInformatica, 2015, 19: 525-565.

[7] MA N, LIU Y. SuperedgeRank algorithm and its application in identifying opinion leader of online public opinion supernetwork[J]. Expert Systems with Applications, 2014(3): 1357-1368.

[8] ALEAHMAD A, KARISANI P, RAHGOZAR M, et al. OLFinder: Finding opinion leaders in online social networks[J]. Journal of Information Science, 2015, 42(9): 659-674.

[9] ROMERO D M, GALUBA W, ASUR S, et al. Influence and Passivity in Social Media[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases. 2011: 18-33.

[10] WENG J S, LIM P E, JIANG J, et al. Twitterrank: finding topic-sensitive influential twitterers[C]//The third ACM international conference on Web search and data mining. 2011: 261-270.

[11] TANG J, SUN J, WANG C, et al. Social influence analysis in large-scale network[C]//The 15th ACM SIGKDD international conference on Knowledge discovery and data mining. 2009: 807-816.

[12] ANTULOV-FANTULIN N, LANCIC A, STEFANCIC H, et al. Statistical inference framework for source detection of contagion processes on arbitrary network structures[C]//Self-Adaptive and Self-Organizing Systems Workshops (SASOW). 2014: 78-83.

[13] JIANG J J, WEN S, YU S, et al. Identifying propagation sources in networks: State-of-the-art and comparative studies[C]//IEEE Communications Surveys & Tutorials. 2016: 165-481

[14] ALTARELLI F, BRAUNSTEIN A, DALL’ASTA L, et al. The patient-zero problem with noisy observations[J]. Journal of Statistical Mechanics: Theory and Experiment, 2014(10).

About the authors:

LI Qian (1980-), born in Chongqing. She is working on her Ph.D. degree at Beijing university of posts and telecommunications. Her research interests include social network analysis and information dissemination dynamics.

LAI Jiawei (1992-), born in Jiangxi. She is working on her master degree at Chongqing university of posts and telecommunications. Her research interests include social network analysis and machine learning.

10.11959/j.issn.2096-109x.2017.00190

eng (1979-), born in Anhui. He

his Ph.D degree of computer science from Beijing university of posts and telecommunications. He is an associate professor in Chongqing university of posts and telecommunications. His research interests include social network analysis and machine learning.

Wu Bin (1969-), born in Hunan. He received his Ph.D degree of institute of computing Chinese academy of sciences in 2002. He is a professor in Beijing university of posts and telecommunications. His research interests include complex network, data mining, massive data parallel processing, visual analysis and telecom customer relationship management.

Received Data: 2017-06-20, Revised Data: 2017-07-05. Corresponding Author: LI Qian, liqian@cqupt.edu.cn

s: National Key Basic Research Program (973 program) of China (No.2013CB329606), Chongqing Science and Technology Commission Project(No.cstc2017jcyjAX0099), Science and Technology Research Program of the Chongqing Municipal Education Committee(No.KJ1500425), Ministry of Education of China and China Mobile Research Fund (No.MCM20130351)

主站蜘蛛池模板: 成人福利视频网| 91精品小视频| 97色伦色在线综合视频| 国产综合色在线视频播放线视| 尤物午夜福利视频| 国产成人av一区二区三区| 萌白酱国产一区二区| 国产精品香蕉| 色婷婷天天综合在线| 久久国产亚洲偷自| 国产视频入口| 久久性妇女精品免费| 国产超碰一区二区三区| 国产精品尹人在线观看| 在线国产资源| 久久精品欧美一区二区| 久久不卡国产精品无码| 91国内在线观看| 日韩欧美中文在线| 国产精品久久久精品三级| 超碰aⅴ人人做人人爽欧美| 在线观看精品自拍视频| 国产精品一区在线麻豆| 国产美女无遮挡免费视频| 高潮爽到爆的喷水女主播视频| 日韩国产精品无码一区二区三区 | 无码国内精品人妻少妇蜜桃视频| 青青草国产精品久久久久| 免费无码AV片在线观看中文| 久久无码高潮喷水| 粉嫩国产白浆在线观看| 久久天天躁狠狠躁夜夜躁| 久久免费观看视频| 欧美va亚洲va香蕉在线| 58av国产精品| 制服丝袜无码每日更新| 欧美a√在线| 青青草原国产免费av观看| 99热最新在线| 动漫精品啪啪一区二区三区| 国产免费观看av大片的网站| 久久久久久久蜜桃| 国产精品分类视频分类一区| 亚洲美女视频一区| 国产精品片在线观看手机版| 日韩精品毛片| 国产精品9| 91香蕉国产亚洲一二三区| 狼友视频国产精品首页| 国产欧美在线观看精品一区污| 中文字幕天无码久久精品视频免费 | 91精品久久久久久无码人妻| 国产麻豆另类AV| 国产亚洲精品97AA片在线播放| 九九这里只有精品视频| 中文字幕日韩欧美| 91口爆吞精国产对白第三集| 四虎国产永久在线观看| 亚洲香蕉在线| 伊人久久婷婷| 青青草久久伊人| 亚洲AV无码久久天堂| 免费毛片视频| 最新国产你懂的在线网址| 国产激情无码一区二区免费| 一区二区偷拍美女撒尿视频| 成人午夜视频网站| 高潮爽到爆的喷水女主播视频| 日本中文字幕久久网站| 一区二区三区国产| 久久美女精品| 久久久久免费精品国产| 欧美在线国产| 中文字幕在线播放不卡| A级毛片无码久久精品免费| 欧美色图第一页| 蜜臀av性久久久久蜜臀aⅴ麻豆| 亚洲专区一区二区在线观看| 3344在线观看无码| 国产v精品成人免费视频71pao| 亚洲自偷自拍另类小说| 国产精品太粉嫩高中在线观看 |