Lawrence Muchemi
Hate Speech Classification of Codeswitched Data Lawrence Muchemi

Name: Hate Speech Classification of Codeswitched Data
Price: 438 CNY
Availability: OutOfStock
Author: Lawrence Muchemi

价格

元 438

不含税

远程仓调货

预计送达时间年7月14日 - 年7月30日

顾客评价：

Top-vurdering på Google Reviews, baseret på tusinder af anmeldelser.

根据欧洲消费者保护法享受14天退换货政策

Trustpilot平台高分认证

添加至iMusic心愿单

Not rated yet

Hate Speech Classification of Codeswitched Data

Lawrence Muchemi

Identifying short text messages containing hate speech from the gigantic content generated by users on social media is a challenging classification task. Social media data presents unprecedented challenges to conventional natural language processing techniques regarding extracting high-quality features from the noisy, highly dimensional, codeswitched, and big unstructured data. Besides, a systematic review of previous studies indicated lack of publicly available annotated datasets for comparative studies, little evidence of theoretical underpinning for the annotation schemes used, and hardly any study on codeswitched data. To address these gaps, this book explores a data-driven approach in identifying highly qualitative and discriminative features in hate text messages from social media. The goal was to subsequently use these features to train a better performing machine classification model in effectively capturing subtle hate speech text messages from social media. Approximately 400k messages were crawled from social media for a period of one year during the 2017 general election period in Kenya using a combination of problematic hashtags, ethnic slurs, hate patterns, and messages from pro-hate user accounts. A random sample of 50k messages was manually labeled into three classes, i.e., Hate Speech, Offensive, or Neither, by a team of 27 human annotators. Subsequently, this dataset was further reduced by extracting a psychosocial feature subset (PDC) informed by the conceptual framework using a hierarchical probability modeling technique. To evaluate and select the best model, a grid search was performed over all the combination of features using a 5- fold cross-validation, with a tenth of the data reserved for evaluation as well as to avoid over-fitting the model. Based on the results of the experiments, the novel psychosocial feature set (PDC) was effective in identifying hate speech and outperformed the conventional features in training the best classifier, i.e., using the linear SVM algorithm, with accuracies of 82.8%. The Passion (P) and Distance (D) components proved the most salient with accuracies of 74.3% and 74.2%, respectively. Besides, the psychosocial feature framework generalized better in handling other types of hate speech.

介质类型	图书 Paperback Book (平装胶订图书)
已发行	2020年9月23日
ISBN13	9781952751899
出版商	Eliva Press
页数	186
商品尺寸	152 × 229 × 10 mm · 254 g
语言	英语