Offensive Language Detection on Social Media Based on Text Classification
Volume: 14 - Issue: 05 - Date: 02-05-2025
Approved ISSN: 2278-1412
Published Id: IJAECESTU441 | Page No.:
Author: Dr. K. Hari Krishna
Co- Author: VANKADARI YASASWINI,,,
Abstract:-The increasing prevalence of offensive language on social media platforms poses a significant threat to individuals and communities, often leading to bullying and emotional harm. To address this issue, the research community has explored various supervised learning approaches and developed specialized datasets aimed at the automatic detection of offensive content. In this study, we propose a robust model for offensive language detection that integrates a modular preprocessing phase, three embedding techniques, and eight classifiers.Our model begins with a comprehensive cleaning and tokenization process to prepare the data for analysis. We then explore three different embedding methods, including Term Frequency-Inverse Document Frequency (TF-IDF), to capture the textual features effectively. The classification phase involves eight machine learning algorithms, with a focus on maximizing detection accuracy through hyperparameter optimization.The model was evaluated using a dataset collected from Twitter, a popular social media platform known for its diverse and often volatile user-generated content. Our experiments demonstrate that the combination of AdaBoost, Support Vector Machines (SVM), KNN, CNN classifiers with the TF-IDF embedding method achieved the highest average F1-scores, indicating superior performance in detecting offensive language.
Key Words:- Offensive language detection, supervised learning, machine learning, text classification, social media, Twitter, TF-IDF, embeddings, AdaBoost, Support Vector Machines (SVM), Multi-Layer Perceptron (MLP), tokenization, preprocessing, hyperparameter optimization
Area:-Other
DOI Member: 22.81.442
DOI Member:
Preview This Article