View on GitHub

Chatbot-Intent-Architecture

Creating a chatbot intent architecture using clustering methods

Chatbot Intent Architecure

Creating an intent architecture using clustering

Chatbot

Description

Given a dataset of user utterances, how do we determine intents, or classifications, we should train a chatbot on? We could manually label the utterances one-by-one with their respective intent, but that could take too much time. We could filter the utterances by keywords, but different words may mean the same thing – or the same words may mean different things. We could deploy intents iteratively, but we would have a high chance of mistaking untrained utterances as trained ones. This project explores a solution; cluster the entire dataset of user utterances based on their similarity and use the resulting clusters as the intents in the bot.

Data

Bitext Free Dataset

bitext_free_dataset.csv - starting data from Bitext
Training Dataset.xlsx - data used for training the AWS Lex bot
Hierarchical Clustering and Cosine Similarity.ipynb - Initial attempt at cluserting
Clustering with Word2Vec word embedding.ipynb - Final clustering method notebook
Chatbot Intent Architecture.docx - Writeup
Chatbot Intent Architecture.pptx - Presentation Video
Results.xlsx - Performance Metrics from testing

Tools

Python
Gensim
SKLearn
Scipy
NLTK

Author

Samuel Sears @ssears219