-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathapp.py
153 lines (106 loc) · 8.16 KB
/
app.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
# import functions
from functions import read_sql_query, llm_response, apply_model
# import libraries
from langchain_core.prompts import PromptTemplate
import streamlit as st
import os
# ---------------
# LangSmith Setup
# ---------------
os.environ['LANGCHAIN_TRACING_V2']="true"
os.environ['LANGCHAIN_ENDPOINT']="https://api.smith.langchain.com"
os.environ['LANGCHAIN_API_KEY']="ls__8ba8cebf388a4bbca87ffdd714f18150"
os.environ['LANGCHAIN_PROJECT']="OpenFoodTox AI Analyzer"
# ---------------
# Defining the Prompt
# ---------------
template = """
Context: You are an expert in converting English questions to SQL lite queries!
The SQL database has the name OpenFoodTox_TEST.db and has the following tables:
- Substance_Characterization with the following columns: Substance, has, Component, CASNumber, ECRefNo, MolecularFormula, and smiles.
- Genotoxicity with the following columns: Substance, Author, Year, OutputID, Genotoxicity.
- EFSAOUTPUTS with the following columns: Substance, OutputID, LegalBasis, Panel, Published, Title, OutputType, DOI, URL.
- Reference_Values with the following columns: Substance, Author, Year, OutputID, Assessment, qualfier, value, unit, Population.
- Reference_Points with the following columns: Substance, Author, Year, OutputID, Study, TestType, Species, Route, DurationDays, Endpoint, qualifier, value, unit, Effect, Toxicity
For example:
Example 1 - How many substances are available within OpenFoodTox?
The SQL command will be something like this: SELECT COUNT(DISTINCT Substance) FROM;
Example 2 - What is the CAS number of the Substance trans-3-Hexenyl hexanoate?
The SQL command will be something like this: SELECT CASNumber FROM Substance_Characterization WHERE Substance = 'trans-3-Hexenyl hexanoate';
Example 3 - How many substances are not genotoxic?
The SQL command will be something like this: SELECT COUNT(DISTINCT Substance) FROM Genotoxicity WHERE Genotoxicity = 'Negative';
Example 4 - How many substances are genotoxic?
The SQL command will be something like this: SELECT COUNT(DISTINCT Substance) FROM Genotoxicity WHERE Genotoxicity = 'Positive';
Example 5 - Is the following substance'(+)-Lupanine' genotoxic?
The SQL command will be something like this: SELECT DISTINCT Genotoxicity FROM Genotoxicity WHERE Substance = '(+)-Lupanine';
Example 6 - How many Efsa opinions are available in OpenFoodTox?
The SQL command will be something like this: SELECT COUNT(DISTINCT OutputID) FROM EFSAOUTPUTS WHERE OutputType = 'EFSA opinion';
Example 7 - How many Efsa opinions are available in OpenFoodTox for the following substance: '(-)-Alpha-elemol'?
The SQL command will be something like this: SELECT COUNT(DISTINCT OutputID) FROM EFSAOUTPUTS WHERE OutputType = 'EFSA opinion' AND Substance = '(-)-Alpha-elemol';
Example 8 - Tell me how many EFSA statements are available in OpenFoodTox?
The SQL command will be something like this: SELECT COUNT(DISTINCT OutputID) FROM EFSAOUTPUTS WHERE OutputType = 'EFSA statement';
Example 9 - What is the average reference value by each population?
The SQL command will be something like this: SELECT DISTINCT Population, AVG(value) as avg, unit FROM Reference_Values GROUP BY Population ORDER BY avg DESC;
Example 10 - What is the average reference value for the following susbtance: '(R)-(-)-Lavandulol'?
The SQL command will be something like this: SELECT AVG(value) as avg, unit FROM Reference_Values WHERE Substance='(R)-(-)-Lavandulol';
Example 11 - What is the average reference value by assessment?
The SQL command will be something like this: SELECT DISTINCT Assessment, AVG(value) as avg, unit FROM Reference_Values GROUP BY Assessment ORDER BY avg DESC;
Example 11 - Give me the average reference value by assessment for the following substance: '(Z)-Nerol'?
The SQL command will be something like this: SELECT DISTINCT Assessment, AVG(value) as avg, unit FROM Reference_Values WHERE Substance='(Z)-Nerol' GROUP BY Assessment ORDER BY avg DESC;
Example 12 - What are the reported effects for the following susbstance: '(2E)-Methylcrotonic acid' ?
The SQL command will be something like this:SELECT DISTINCT Effect FROM REFERENCE_POINTS WHERE SUBSTANCE='(2E)-Methylcrotonic acid';
Example 13 - Tell me the average endpoint values for the following substance: '(RS)-2,4-dinitro-6-(octan-2-yl)phenyl (2 E/Z)-but-2-enoate'?
The SQL command will be something like this: SELECT Substance, Endpoint, AVG(value) AS avg, unit FROM REFERENCE_POINTS WHERE SUBSTANCE=''(RS)-2,4-dinitro-6-(octan-2-yl)phenyl (2 E/Z)-but-2-enoate' GROUP BY Endpoint ORDER BY avg DESC;
Example 14 - What is the average endpoint value by each study?
The SQL command will be something like this: SELECT Study, Endpoint, AVG(value) AS avg, unit FROM REFERENCE_POINTS GROUP BY Endpoint ORDER BY avg DESC;
Example 15 - What is the average NOEL value for the following substance: '1,1-bis(Ethylthio)-ethane'?
The SQL command will be something like this: SELECT AVG(value), unit FROM REFERENCE_POINTS WHERE Study='NOEL';
Example 16 - What is the average NOAEL value for rats?
The SQL command will be something like this: SELECT AVG(value), unit FROM REFERENCE_POINTS WHERE Study='NOAEL' AND Species='Rat';
Example 18 - What is the average LD50 value for the following study: 'Human health'?
The SQL command will be something like this: SELECT AVG(value), unit FROM REFERENCE_POINTS WHERE Study='LD50' AND Study='Human health';
Example 19 - How many studies are available for 'Human health'?
The SQL command will be something like this:SELECT COUNT (DISTINCT OutputID) FROM REFERENCE_POINTS WHERE Study = 'Human health';
Example 20 - How many EFSA outputs are available for the following study: 'Human health'?
The SQL command will be something like this:SELECT COUNT (DISTINCT OutputID) FROM REFERENCE_POINTS WHERE Study = 'Human health';
Example 21 - Within OpenFoodTox how many outputs are available for 'Animal (target species) health'?
The SQL command will be something like this:SELECT COUNT (DISTINCT OutputID) FROM REFERENCE_POINTS WHERE Study = 'Animal (target species) health';
Example 21 - Within OpenFoodTox how many outputs are available for each study?
The SQL command will be something like this:SELECT Study ,COUNT (DISTINCT OutputID) AS count FROM REFERENCE_POINTS GROUP BY Study ORDER BY count DESC;
Also, the SQL command should not have ' and the () at the beginning or at the end of the SQL word in the output.Your output should be just the SQL command.
For example:
If I ask you 'What is the CAS number of the Substance trans-3-Hexenyl hexanoate?' Your output must be just the SQL command, like this: SELECT CASNumber FROM Substance_Characterization WHERE Substance = 'trans-3-Hexenyl hexanoate'.
Don't Say anything else!!!!!
Question: {query}
SQL QUERY: """
prompt = PromptTemplate(input_variables=['query'],
template = template)
# ---------------
# Streamlit APP
# ---------------
page_title="I can help you to analyse the OpenFoodTox database"
layout = 'wide'
st.set_page_config(page_title=page_title,layout=layout)
st.header("OpenFoodTox AI Analyzer")
st.write("Hello I'm your AI assistant and I'm here to help you to gain insights about the OpenFoodTox database")
st.image(image='openfoodtox_update.png', width=600)
with st.sidebar:
st.image(image='EFSA_horizon_RGB_EN [email protected]')
st.header('*What Is the OpenFoodTox AI Analyzer?*')
st.write('''OpenFoodTox is your go-to resource for understanding chemical hazards in food. It's an open-source database filled with toxicological information, accessible to everyone.
And now, with the OpenFoodTox AI Analyzer, you have an easy-to-use tool at your fingertips. It's perfect for risk assessors, managers, and anyone else involved in food safety.
Here's what it offers:
* Instant access to toxicological data through prompt responses.
* Quick identification of chemical hazards in food products.''')
st.divider()
st.caption("Author: Rafael Vieira; [email protected]")
question = st.text_input("Question", key='input')
submit = st.button('Ask the question')
# if submit is clicked
if submit:
model_name = "google/gemma-7b"
data = apply_model(model_name=model_name,question=question, prompt=prompt)
st.subheader('Answer:')
for row in data:
print(row)
st.header(row)