节点文献

短视频文本评论的话题检测和情感分析研究

Research of Topic Detection and Sentiment Analysis of Short Video Text Comments

【作者】 李俊杰

【导师】 曹晖;

【作者基本信息】 西北民族大学 , 计算机技术(专业学位), 2023, 硕士

【摘要】 大数据时代,短视频平台在社会生活中的占比越来越大,其内容涉及到社会新闻、文化娱乐以及电商营销等多个方面,短视频用户数量呈指数级增长。基于短视频平台,大量用户对各种各样的短视频自由发表自己的意见与看法,产生了海量的文本评论数据。对短视频文本评论数据进行话题检测和情感分析研究,可以在一定程度上正确引导社会舆论,了解民生民意,为商业营销计划和政府相关管控措施的制定提供一定的参考价值。本文主要工作分为以下三个方面:首先,通过Python数据爬取相关技术,在限定时间域内对国内抖音短视频平台网页版中多个社会热点事件相关的短视频文本评论数据进行采集,并对收集到的数据进行清洗过滤。以谱聚类算法和成对相似度计算为核心提出TP-PS-Spectral聚类算法,在短视频文本评论数据集和篇章级中文新闻报道公开数据集上与所设置的对比算法相比表现均为最优,在ARI值上分别达到了96.83%和95.91%;并对聚类之后的每个聚类簇分别使用三种关键词提取方法LDA、TF-IDF和Text Rank进行话题关键词抽取。其次,对话题检测涉及的事件开展情感倾向性分析。本文将Bi LSTM模型、注意力机制、DPCNN模型进行组合优化,构建基于ERNIE预训练模型的双通道神经网络模型(DC-EBAD)作为情感极性二分类实验模型,并进一步提出了拥有双重注意力机制的DC-EB2AD改进模型,旨在进一步提高情感极性分类的准确率。实验结果表明,DC-EBAD模型在本文研究中所用的短视频文本评论数据集和中文外卖评论公开数据集中的情感倾向性判别精确率分别达到了92.50%和92.73%,表现较为良好,相比于作为对比的单通道模型在各项指标上均有所提升,且DC-EB2AD模型在两个数据集中的精确率分别为93.22%和93.15%,相比于DC-EBAD模型也有小幅度提升。最后,对短视频文本评论话题所涉事件的情感倾向性进行可视化分析。一方面以具体关键词的提取与筛选为核心,基于Python采用词云图的方式进行可视化展示,直观地了解用户对相关事件的情感观点;另一方面以正则处理和关键短句为核心,基于Neo4j构建用户情感观点图,对每个事件中的正负向情感观点进行展示和分析总结。

【Abstract】 In the era of big data,short video platforms account for an increasing proportion of social life.Its content involves social news,cultural entertainment,e-commerce marketing and other aspects.The number of short video users is increasing exponentially.A large number of users freely express their opinions and views on all kinds of short videos based on the short video platform,resulting in a large amount of text review data.The topic detection and emotional analysis of short video text commentary data can correctly guide public opinion and understand people’s livelihood to a certain extent,and provide certain reference value for the commercial marketing plan and the formulation of relevant government control measures.The main tasks of this thesis are mainly divided into three aspects:Firstly,through Python data crawling technology,collecting short video text review data related to multiple social hot events on webpage version of Tiktok short video platform in a limited time domain,and clean and filter the collected data.TP-PS-Spectral clustering algorithm is proposed based on spectral clustering algorithm and pairwise similarity calculation.It performs best on short video text review dataset and Chapter level Chinese news report public dataset compared with the set comparison algorithm,reaching 96.83% and 95.91% on ARI value respectively.For each cluster after clustering,three keyword extraction methods,LDA,TF-IDF and Text Rank,are used to extract topic keywords.Secondly,conducting an sentiment orientation analysis of the events involved in the topic detection.In this thesis,Bi LSTM model,attention mechanism and DPCNN model are combined and optimized,and a dual-channel neural network model(DC-EBAD)based on ERNIE pre-training model is constructed as an experimental model for emotional polarity binary classification.An improved DC-EB2 AD model with dual attention mechanism is proposed to further improve the accuracy of emotional polarity classification.The experimental results show that the accuracy of affective propensity discrimination of DC-EBAD model in the short video text commentary dataset used in thesis research and Chinese takeaway comments public dataset is 92.50% and 92.73% respectively.The performance of DC-EBAD model is better than that of other single-channel models as comparison.And the accuracy of DC-EB2 AD model in the two datasets is 93.22% and 93.15% respectively,which is slightly higher than DC-EBAD model.Finally,visualizing the sentiment orientation of the events involved in the short video text comment topic.On the one hand,taking the extraction and filtering of specific keywords as the core,Python-based visual display uses the way of word cloud map to intuitively understand users’ emotional views on related events;On the other hand,with regular processing and key short sentences as the core,Neo4j-based user sentiment view map is constructed to show and analyze the positive and negative emotional views in each event.

  • 【分类号】TP391.1
节点文献中: