View on GitHub

Abusive-Chat-Detection

Automatic detection of abusive chat messages

Abusive Chat Detection

Using machine learning to detect hate speech, offenive messages, and profanity in the chat space.

Angry

Description

In this project, labeled tweets are examined to find associations between words in the tweets and hate or offensive language. The data sets are then processed and vectorized as a means to create machine learning models that predict whether or not a message contains hate speech or offensive language. The models are then used in addition to a list of profane words to create a pipeline for screening messages for inappropriate language. The pipeline is then tested on a labeled random sample from a customer service chat data set. Finally, a use case for this service is explained for the customer service chat domain.

Data

Labeled Tweet Data
Bitext’s Customer Support Dataset

Contents

Data
Final Datasets - Contains final datasets used for modeling
Raw Data - Raw datasets straight from the source

Saved Models - Contains hate speech model and offensive speech model saved using PyCaret

Exploratory Data Analysis.ipynb - Initial loading of data, word clouds
Predictive Modeling.ipynb - Data preprocessing, modeling prep, modeling, evalatuation, use case illustration

Report.pdf - Comprehensive technical report on the project
Presentation Deck.pdf - Presentation Video

Tools

Authors

Samuel Sears @ssears219

Acknowledgments