


摘要:文本分類與情感分類是自然語言處理中基礎的領域,為幫助初學者對文本情感多分類的項目學習,在機器學習的基礎上,分析了線性邏輯回歸算法、樸素貝葉斯模型在文本情感分類項目中的應用,并針對數據處理、模型構建、模型訓練、模型測試過程中初學者難以解決和易出錯的部分進行分析與實現。結合kaggle上的比賽數據實例,實現了完整的文本情感多分類項目并做出詳細分析,項目評測結果較為可觀,證實可以幫助初學者更易上手文本情感多分類和機器學習。同時提出了基于傳統二分類問題的多分類問題解決方法。
關鍵詞:機器學習;文本分類;情感分類;自然語言處理;多分類
中圖分類號:TP18 文獻標識碼:A
文章編號:1009-3044(2020)20-0181-02
Study and Research on Text Emotion Multi-Classification Based on Machine Learning
LIU Cheng
(Central China Normal University, Wuhan 430079,China)
Abstract: Text categorization and emotion classification are basic fieldsin natural language processing. To help beginners leam theitems of text sentiment multi-classification. based on machine learning, the linear logistic regression algorithm and Bayesian modelare analyzed in the text sentiment classification project. In the process of data processing、model building、model training and mod-el testing, it is difficult for beginners to solve and error-prone parts are analyzed and implemented. Combined with the game dataexamples on Kaggle, a complete text emotion multi-classification project has been implemented and detailed analysis has beenmade. The results are considerable, which proves that it can help beginners get started with text emotion classification and machineleaming. At the same time, a multi-classification problem solving method based on the traditional two-classification problem isproposed.
Key words : machine leaming ; text categorization; emotion classification: NLP; Multi-classification
隨著人工智能的飛速發展,作為最核心與最具挑戰性領域之一的自然語言處理在最近幾年逐漸進入研究高潮,進入這個領域的初學者也越來越多。在NLP(Natural Language Process-ing.)領域,所需知識比較繁雜,掌握難度較大,缺乏對于初學者入門級學習的研究,導致初學者大多難以適應。
本文基于NLP中基礎性的文本情感分類項目,區別于傳統的二分類問題,將情感類型細化為五類,更貼合生活實際,做出挑戰。同時結合kaggle上的比賽數據實例,基于機器學習中的線性邏輯回歸算法和樸素貝葉斯模型算法,完成了整個情感多分類項目流程并做出研究。針對初學者在數據處理、特征選擇、模型網絡構建、模型訓練與測試過程中難以解決和易出錯的部分做出了詳細分析與說明,以幫助初學者進行NLP領域的項目學習。……