Chatbot Intent Architecure
Creating an intent architecture using clustering
Description
Given a dataset of user utterances, how do we determine intents, or classifications, we should train a chatbot on? We could manually label the utterances one-by-one with their respective intent, but that could take too much time. We could filter the utterances by keywords, but different words may mean the same thing – or the same words may mean different things. We could deploy intents iteratively, but we would have a high chance of mistaking untrained utterances as trained ones. This project explores a solution; cluster the entire dataset of user utterances based on their similarity and use the resulting clusters as the intents in the bot.
Data
Contents
- bitext_free_dataset.csv - starting data from Bitext
- Training Dataset.xlsx - data used for training the AWS Lex bot
- Hierarchical Clustering and Cosine Similarity.ipynb - Initial attempt at cluserting
- Clustering with Word2Vec word embedding.ipynb - Final clustering method notebook
- Chatbot Intent Architecture.docx - Writeup
- Chatbot Intent Architecture.pptx - Presentation Video
- Results.xlsx - Performance Metrics from testing
Tools
- Python
- Gensim
- SKLearn
- Scipy
- NLTK
Author
Samuel Sears @ssears219