Hate Speech Classification of Codeswitched Data - Lawrence Muchemi - 图书 - Eliva Press - 9781952751899 - 2020年9月23日
如封面与标题不符,以标题为准

Hate Speech Classification of Codeswitched Data

价格
元 437
不含税

远程仓调货

预计送达时间 年7月13日 - 年7月29日
添加至iMusic心愿单

Not rated yet

Identifying short text messages containing hate speech from the gigantic content generated by users on social media is a challenging classification task. Social media data presents unprecedented challenges to conventional natural language processing techniques regarding extracting high-quality features from the noisy, highly dimensional, codeswitched, and big unstructured data. Besides, a systematic review of previous studies indicated lack of publicly available annotated datasets for comparative studies, little evidence of theoretical underpinning for the annotation schemes used, and hardly any study on codeswitched data. To address these gaps, this book explores a data-driven approach in identifying highly qualitative and discriminative features in hate text messages from social media. The goal was to subsequently use these features to train a better performing machine classification model in effectively capturing subtle hate speech text messages from social media. Approximately 400k messages were crawled from social media for a period of one year during the 2017 general election period in Kenya using a combination of problematic hashtags, ethnic slurs, hate patterns, and messages from pro-hate user accounts. A random sample of 50k messages was manually labeled into three classes, i.e., Hate Speech, Offensive, or Neither, by a team of 27 human annotators. Subsequently, this dataset was further reduced by extracting a psychosocial feature subset (PDC) informed by the conceptual framework using a hierarchical probability modeling technique. To evaluate and select the best model, a grid search was performed over all the combination of features using a 5- fold cross-validation, with a tenth of the data reserved for evaluation as well as to avoid over-fitting the model. Based on the results of the experiments, the novel psychosocial feature set (PDC) was effective in identifying hate speech and outperformed the conventional features in training the best classifier, i.e., using the linear SVM algorithm, with accuracies of 82.8%. The Passion (P) and Distance (D) components proved the most salient with accuracies of 74.3% and 74.2%, respectively. Besides, the psychosocial feature framework generalized better in handling other types of hate speech.

介质类型 图书     Paperback Book   (平装胶订图书)
已发行 2020年9月23日
ISBN13 9781952751899
出版商 Eliva Press
页数 186
商品尺寸 152 × 229 × 10 mm   ·   254 g
语言 英语  

Lawrence Muchemi的更多作品

显示全部

Mere med samme udgiver