







摘要: 針對(duì)自動(dòng)編碼器僅對(duì)單個(gè)數(shù)據(jù)所包含的內(nèi)容信息進(jìn)行特征提取, 忽略了數(shù)據(jù)之間結(jié)構(gòu)信息的問(wèn)題, 提出一種基于異構(gòu)融合和判別損失的深度圖聚類網(wǎng)絡(luò). 首先, 將兩個(gè)自動(dòng)編碼器獲取的異質(zhì)信息進(jìn)行融合, 解決了采用單一自動(dòng)編碼器提取特征時(shí)的信息丟失問(wèn)題; 其次, 在聚類訓(xùn)練模塊基于類內(nèi)分布一致性設(shè)計(jì)判別損失函數(shù), 使模型可以端到端地訓(xùn)練, 避免了兩階段訓(xùn)練方法中出現(xiàn)特征提取與聚類算法提前假設(shè)不匹配的情況; 最后, 在6個(gè)常用數(shù)據(jù)集上進(jìn)行實(shí)驗(yàn)并驗(yàn)證了該方法的有效性. 實(shí)驗(yàn)結(jié)果表明, 與現(xiàn)有的大多數(shù)深度圖聚類模型相比, 該方法在非圖數(shù)據(jù)集和圖數(shù)據(jù)集上的聚類性能有明顯提升.
關(guān)鍵詞: 圖聚類; 深度學(xué)習(xí); 判別損失; 異構(gòu)融合
中圖分類號(hào): TP391 文獻(xiàn)標(biāo)志碼: A 文章編號(hào): 1671-5489(2023)04-0853-10
Graph Embedding Clustering Based on Heterogeneous Fusion and Discriminant Loss
YAO Bo, WANG Weiwei
(School of Mathematics and Statistics, Xidian University, Xi’an 710126, China)
Abstract: [JP+1]Aiming at the problem that autoencoder" only extracted features from" the content information contained in a single data, ignoring the structure information of data, we proposed a deep graph clustering network based on heterogeneous fusion and discriminant loss. Firstly, the heterogeneous[JP]information obtained by two autoencoders was fused, and the problem of information loss was solved when a single autoencoder was used to extract features. Secondly, the discriminant loss function was designed in the clustering training module based on the consistency of distribution within the same cluster, so that the model could be trained end-to-end, and avoiding the mismatch between the feature extraction and the assumptions of the clustering algorithm in the two-stage training methods. Finally, experiments were carried out on six commonly used datasets to verify the effectiveness of the proposed method. The experimental results show that compared with most existing deep graph clustering models, the proposed method" significantly improves the clustering performance on both non-graph and graph datasets.
Keywords: graph clustering; deep learning; discriminant loss; heterogeneous fusion
聚類是機(jī)器學(xué)習(xí)領(lǐng)域中的一項(xiàng)基本無(wú)監(jiān)督任務(wù), 其基本思想是利用不同的相似度衡量方法將數(shù)據(jù)劃分為不同的類別. 經(jīng)典的聚類方法如基于劃分方法的K均值(K-means)聚類 [1]、 基于密度的聚類方法DBSCAN(density-based spatial clustering of applications with noise)[2]、 譜聚類(spectral clustering, SC)[3]、 高斯混合(Gaussian mixed model, GMM)聚類[4]、 非負(fù)矩陣分解(non-negative matrix factorization, NMF)聚類[5]都是直接對(duì)數(shù)據(jù)進(jìn)行操作, 這類方法適用于低維數(shù)據(jù). 隨著信息科技的發(fā)展, 現(xiàn)實(shí)生活中信息量巨增, 數(shù)據(jù)的維度越來(lái)越高. 高維數(shù)據(jù)常伴隨信息冗余以及噪聲, 這不僅會(huì)導(dǎo)致算法的計(jì)算復(fù)雜度上升, 還會(huì)影響聚類精度[6]. 上述方法不再適用于此類數(shù)據(jù), 目前普遍的解決方法是首先利用數(shù)據(jù)降維方法……