Skip to content

Commit

Permalink
Added Sample Demo to README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
AtharvaUmbarkar committed Jul 19, 2020
1 parent 6689a70 commit 047ae33
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 23 deletions.
48 changes: 25 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@ Group :

## Project Brief

The main aim of this project was to make a conversation bot able to take in audio input and output a meaningful reply.
The main aim of this project was to make a conversation bot able to take in audio input and output a meaningful reply.

The three main parts of this project were:

Expand All @@ -40,15 +40,15 @@ This model is implemented to convert the audio messages of the user into text.

This model is implemtented to conver the response generation part of the conversational bot.

## ENCODER
## ENCODER

![encoder](https://user-images.githubusercontent.com/56124350/85904325-0c215300-b826-11ea-9312-e8ccd9cb2ce1.png)

## DECODER
## DECODER

![decoder](https://user-images.githubusercontent.com/56124350/85904328-0e83ad00-b826-11ea-9f48-179de5c00319.png)

## LDA MODEL
## LDA MODEL

This model is implemented to add topic awareness to ENCODER - DECODER Model for better response generation.

Expand All @@ -60,39 +60,41 @@ This graph shows the optimal number of topics we need to set for news articles d

## Gensim LDA Model parameters

* **corpus** — Stream of document vectors or sparse matrix of shape (num_terms, num_documents) <
* **id2word** – Mapping from word IDs to words. It is used to determine the vocabulary size, as well as for debugging and topic printing.
* **num_topics** — The number of requested latent topics to be extracted from the training corpus.
* **random_state** — Either a randomState object or a seed to generate one. Useful for reproducibility.
* **update_every** — Number of documents to be iterated through for each update. Set to 0 for batch learning, > 1 for online iterative learning.
* **chunksize** — Number of documents to be used in each training chunk.
* **passes** — Number of passes through the corpus during training.
* **alpha** — auto: Learns an asymmetric prior from the corpus
* **per_word_topics** — If True, the model also computes a list of topics, sorted in descending order of most likely topics for each word, along with their phi values multiplied by the feature-length (i.e. word count)

- **corpus** — Stream of document vectors or sparse matrix of shape (num_terms, num_documents) <
- **id2word** – Mapping from word IDs to words. It is used to determine the vocabulary size, as well as for debugging and topic printing.
- **num_topics** — The number of requested latent topics to be extracted from the training corpus.
- **random_state** — Either a randomState object or a seed to generate one. Useful for reproducibility.
- **update_every** — Number of documents to be iterated through for each update. Set to 0 for batch learning, > 1 for online iterative learning.
- **chunksize** — Number of documents to be used in each training chunk.
- **passes** — Number of passes through the corpus during training.
- **alpha** — auto: Learns an asymmetric prior from the corpus
- **per_word_topics** — If True, the model also computes a list of topics, sorted in descending order of most likely topics for each word, along with their phi values multiplied by the feature-length (i.e. word count)

## About pyLDAvis

* The size of the bubbles tells us how dominant a topic is across all the documents (our corpus)
* The words on the right are the keywords driving that topic
* The closer the bubbles the more similar the topic. The farther they are apart the less similar
* Preferably, we want non-overlapping bubbles as much as possible spread across the chart.

- The size of the bubbles tells us how dominant a topic is across all the documents (our corpus)
- The words on the right are the keywords driving that topic
- The closer the bubbles the more similar the topic. The farther they are apart the less similar
- Preferably, we want non-overlapping bubbles as much as possible spread across the chart.

## Text to Audio

[gTTs](https://pypi.org/project/gTTS/), a python library was used to make a function to output audio from the generated responses.


---

## Installation and usage
## Installation and usage

The requirements are in the requirements.txt file.
* `pip install -r requirements.txt `
* `python3 -m spacy download en`

- `pip install -r requirements.txt`
- `python3 -m spacy download en`

The main file o focus is tf_attention_model.py.

To run the bot ->
`python3 tf_attention_model.py -d False`

## Demo

![Sample Demo](sample2.png "Sample")
Binary file added sample2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 047ae33

Please sign in to comment.