Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrating NewsAPI to fetch Real-time, Financial, and latest data using Semantic search with SentenceTransformers #109

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 27 additions & 4 deletions PULL_REQUEST_TEMPLATE.md
Original file line number Diff line number Diff line change
@@ -1,20 +1,43 @@

## Scope
`[Add context about what this feature is about and explain why of the feature and your technical decisions.]`
Integrating NewsAPI to fetch Real-time, Financial, and latest data using Semantic search with SentenceTransformers


This pull request is made to make the bot more user-friendly. Now, this bot can provide answers dealing with real-time data,
financial data and latest news after 2019, which an openAI bot cannot answer. This bot handles the limitations of the Generic OpenAI bot

Strategy Used:

1. I integrated a 3rd party NewsAPI to extract the latest up-to-date news in the form json strings
2. After extracting all the news articles, I used semantic search with sentenceTransformers, with which we can semantically search
for the most similar or relevant document to the user's query.
3. Then I ranked the top kth relevant documents based on cosine similarity scores
4. The user will be able to view the document Titles and URLs whenever he asks for real-time or financial or latest news that an OpenAI
cannot answer

Libraries Installed
1. Installed SentenceTransformers library for semantic search
2. Integrated NewsAPI with a unique API key that one can freely obtain after logging onto the NewsAPI website



- [ ] `[Sub task]`


### Screenshots
---
![textbase_screenshot1_latestnews](https://github.com/Jeevana38/textbase/assets/77659039/6ffa52a4-ce63-4c22-9eb3-10f34fb93aba)
![textbase_screenshot2_financialnews](https://github.com/Jeevana38/textbase/assets/77659039/0cfa7620-2d2b-46b7-a809-3dc3b7332b40)
![textbase_screenshot3_realtimeweather_1](https://github.com/Jeevana38/textbase/assets/77659039/111ca022-763a-4bb4-b92c-ffb8a636dd5b)
![textbase_screenshot4_realtime_2](https://github.com/Jeevana38/textbase/assets/77659039/7a116ebd-afdb-4804-9672-5ac5819e4a7d)
---Attached screenshots in examples --> openai-bot -- > screenshots (folder)


## Code improvements
- `[Did you add some generic like utility, component, or anything else useful outside of this PR]`
- Added functionality with which the bot is able to answer Real-Time queries, Financial data, and latest news
with the help of NewsAPI integration using SentenceTransformers semantic search


### Developer checklist
- [ ] I’ve manually tested that code works locally on desktop and mobile browsers.
- [ ] I’ve reviewed my code.
- [ ] I’ve removed all my personal credentials (API keys etc.) from the code.
- [ ] I’ve removed all my personal credentials (API keys etc.) from the code.
65 changes: 62 additions & 3 deletions examples/openai-bot/main.py
Original file line number Diff line number Diff line change
@@ -1,26 +1,85 @@
from textbase import bot, Message
from textbase.models import OpenAI
from typing import List

from sentence_transformers import SentenceTransformer, util
import torch
import requests
# Load your OpenAI API key
OpenAI.api_key = ""
OpenAI.api_key = "<ENTER YOUR OPENAI API>"

# Prompt for GPT-3.5 Turbo
SYSTEM_PROMPT = """You are chatting with an AI. There are no specific prefixes for responses, so you can ask or talk about anything you like.
The AI will respond in a natural, conversational manner. Feel free to start the conversation with any question or topic, and let's have a
The AI will respond in a natural, conversational manner.
Also, feel free to have real time conversations..Like "Today's weather in India..". AI will give you list of relevant articles
Start the conversation with any question or topic, and let's have a
pleasant chat!
"""
def fetchApiData(new_query): #It's a method to return relevant document urls using semantic search on data fetched from NEWSAPI
#relevancy means how much similar is the document fetched from API with the user's query. The more it is similar.. the more is the cosine similarity score
url = (f'https://newsapi.org/v2/everything?q={new_query}&apiKey=<ENTER YOUR NEWS API>') #api key is unique
res = requests.get(url) #retrieving data from NEWSAPI
json_docs = res.json()
embedder = SentenceTransformer('bert-base-nli-mean-tokens')
content_list = []
url_list = []
title_list = []
articles = json_docs['articles']
for article in articles:
if article['title'] and article['description']: #if article's title and description are not null
url_list.append(article['url']) #appending the article's url extracted from json string
title_list.append(article['title']) #appending article's title
content_list.append(article['content'][:200])#appending article's content
else:
url_list.append("")
title_list.append("")
content_list.append("")


content_list_embeddings = embedder.encode(content_list, convert_to_tensor=True) #converting the article's content to embeddings
top_k = min(10, len(content_list)) #we need to get top k number of relevant documents - if present
query_embedding = embedder.encode(new_query, convert_to_tensor=True) #encoding the user query
cos_scores = util.cos_sim(query_embedding, content_list_embeddings)[0] #computing cosine similarity scores of user query embedding with each article's content embedding
top_results = torch.topk(cos_scores, k=top_k) #preparing the top k number of relevant results
final_results = []
for score, idx in zip(top_results[0], top_results[1]):
final_results.append(title_list[idx] +" "+url_list[idx]) #appending the most relevant titles and urls


return final_results

@bot()
def on_message(message_history: List[Message], state: dict = None):

# Generate GPT-3.5 Turbo response

bot_response = OpenAI.generate(
system_prompt=SYSTEM_PROMPT,
message_history=message_history, # Assuming history is the list of user messages
model="gpt-3.5-turbo",
)


real_time_words = ["today","yesterday","tomorrow","day before yesterday","day after tomorrow","today's","yesterday's","current","last month","this"
"live","now","at this moment","present","up-to-date","past","future","next year","next month","last week","last year","next"]

#these are the real time words, which means once these words are found.. they indicate user asks for real time data

notfound = "false"
if "knowledge cutoff"in bot_response or "sorry" in bot_response or "real-time information" in bot_response or "I'm sure there will be more information" in bot_response or "As of my" in bot_response:
notfound = "true" #if the bot generates a response where it's not able to find the response which users ask..then we set a variable to true


user_prompt = message_history[-1]["content"]
for word in real_time_words:
query = str(user_prompt[0]['value']).lower()
if word in query or notfound=="true": #if user is asking for a real time data or a financial data or latest news after 2021 then this is executed
new_query = query.strip().replace(" ","%20") #replacing the white spaces in url to %20 before calling the below function
final_results = fetchApiData(new_query)
res = "Here are the top related Articles\n"
res += "".join("\n"+"\n"+ele+"\n"+"\n" for ele in final_results) #As the final_results is a list.. we convert it to string of urls
bot_response=res
break

response = {
"data": {
"messages": [
Expand Down
18 changes: 18 additions & 0 deletions examples/openai-bot/readme.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
Integrating NewsAPI to fetch Real-time, Financial and latest data using Semantic search with SentenceTransformers


This pull request is made to make the bot more user friendly. Now, this bot can provide answers dealing with real time data,
financial data and latest news after 2019, which an openAI bot cannot answer. This bot handles the limitations of Generic OpenAI bot

Strategy Used:

1. I integrated a 3rd party NewsAPI to extract the latest up to date news in the form json strings
2. After extracting all the news articles, I used semantic search with sentenceTransformers, with which we can semantically search
for the most similar or relavent document to the user's query.
3. Then I ranked the top kth relevant documents based on cosine similarity scores
4. The user will be able to view the document Titles and URLs whenever he asks for real time or financial or latest news that an OpenAI
cannot answer

Libraries Installed
1. installed SentenceTransformers library for semantic search
2. Integrated NewsAPI with a unique api key that one can freely obtain after logging onto NewsAPI website
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.