This repo is an attemt to solve the Clubs history challenge!
This chatbot can have a comprehensive conversation with the user while giving him/ her some information about the history of La Liga Clubs
It seems that the App is Live and public but not yet verified by facebook whcih, means that no one can use it but Admin and Testers, I will work on this now
To install the project dependencies, run the following command
sudo -H pip install -r requirements.txt
-
Run Flask Server
python app.py
, remember to changeapp.run(port=8888, debug=True)
to your port number -
Run ngrok
ngrok http 8888
- optionally you can open ngrok web interface
http://127.0.0.1:4042/inspect/http
to keep an eye on theGET
andPOST
requests
- optionally you can open ngrok web interface
- Start testing the La Liga Chatbot. Join this page and start chatting! Hope you enjoy it!
-
Crawl the teams data by following the links of each team's wikipedia page in the La Liga wikipedia page
-
Write the inents examples under all these three given intents
great
,laliga_questions
,city_questions
,thankyou
,stadium
,stadium_capacity
andclub_location
great
: User is greating the botlaliga_questions
: User's questions about the history of La Liga Clubscity_questions
: User's questions about the citythankyou
: User is thanking the botstadium
: User's questions about the name of a club's stadiumstadium_capacity
: User's questions about the capacity of a club's stadiumclub_location
: User's questions about the location of a club's stadium Note: That we could add as much intents as we want, this is something you decide based on the domain of your chatbot
-
Define the configuration of the Rasa NLU pipeline in the
laliga_intents_config.yml
, I defined my NLU pipeline that builds the intent classifier as following:- Message Tokenization
- POS tagging
- Glove vectors extracted for each token
- Concatentate those vecotrs to form a feature vector for each sentence
- Build a multiclass SVM model for intent classification
- CRF trained on message tokens ans POS tags for entity extraction
-
Build and user intent classifier to classifiy the user message as one of three given intents
great
,laliga_questions
,city_questions
andthankyou
- using the following command
python -m rasa_nlu.train -c laliga_intents_config.yml --data laliga_intents.md -o models --fixed_model_name nlu --project current --verbose
- this command create a directory
./models/current/nlu
where the model and its metadata resides - then using
rasa_nlu.model.Interpreter
module I generate the intent of the message and its confidence ratio
- using the following command
-
Using spacy multi-language model I attempt to extract intities in the user message like
LOC
,ORG
andPER
which will help along side theintent_comfidence
andintent_type
to decide on the most suitable reply to return to the user given the intent islaliga_questions
offcourse- Download the multi-language model using the following command:
python -m spacy download xx_ent_wiki_sm
- Download the multi-language model using the following command:
-
crawler.py
: the crawler that I used to crawl the teams data including (team name, team wikipedia page link, a summary about team and the history of the team)- To build this crawler I used the following tools wikipedia which is a MediaWiki API python wrapper and Beautifulsoup
-
The intents examples are in
laliga_intents.md
where each intent example is written as the following example:## intent:thankyou - Thanks so much!
-
core.py
contains the core of the chatbot as inget_bot_response(message)
which returns the chatbot reply after recieving the user message -
app.py
contains the code that connects the Facebook messenger API withngrok
thenFlask server
to the La Liga chatbot which sends its responses back to the user through thPymessenger
python wrapper
- The chatbot is not resilient/ immune to spelling mistakes
- I am planning to implement a different way to decide on the most suitable reply to return which uses the wordvectors of the club names and compare it to the extracted entities from the user message/ question. Unfortunaltely due to memory limitstion I wasnot able to go all the way through this approach
- I am also planning to build an end-to-end deep learning system that shoud be able to convey more open conversation about the La Liga clubs including
What team is [player]playing for ?
List of players of [team]
Who is the coach of [team] ?
What team is coached by [coach] ?
Get fixtures of [team]