[ad_1]
Introduction
In the world of data analysis, extracting useful information from spreadsheet data can be a daunting task. Conventional approaches typically require manual data exploration and analysis, which can require significant effort, time, or manpower.
The emergence of advanced language models such as ChatGPT has introduced a promising and innovative approach to extracting useful information from tabular data. OpenAI has announced an official API for ChatGPT featuring gpt-3.5-turbo, OpenAI’s most advanced language model.
This blog describes the process of extracting useful information from table data using the ChatGPT API.
data set
For all the illustrations in this post, we’ll use the following data. This data is taken from: https://www.kaggle.com/datasets/spscientist/students-performance-in-exams
Note that we only considered 30 records from the data set.
Please see the data provided below that will be used for the purposes of this blog.
gender | race | Education at the level of parents | dinner | Test preparation course | mathematics | reading score | writing score |
woman | Group B | Bachelor’s degree | standard | no one | 72 | 72 | 74 |
woman | Group C | Some college | standard | it’s over | 69 | 90 | 88 |
woman | Group B | Master | standard | no one | 92 | 95 | 93 |
male | Group A | Associate | free/reduced | no one | 47 | 57 | 44 |
male | Group C | Some college | standard | no one | 76 | 78 | 75 |
woman | Group B | Associate | standard | no one | 71 | 83 | 78 |
woman | Group B | Some college | standard | it’s over | 88 | 95 | 92 |
male | Group B | Some college | free/reduced | no one | 40 | 43 | 39 |
male | Group D | high school | free/reduced | it’s over | 64 | 64 | 67 |
woman | Group B | high school | free/reduced | no one | 38 | 60 | 50 |
male | Group C | Associate | standard | no one | 58 | 54 | 52 |
male | Group D | Associate | standard | no one | 40 | 52 | 43 |
woman | Group B | high school | standard | no one | 65 | 81 | 73 |
male | Group A | Some college | standard | it’s over | 78 | 72 | 70 |
woman | Group A | Master | standard | no one | 50 | 53 | 58 |
woman | Group C | Some high school | standard | no one | 69 | 75 | 78 |
male | Group C | high school | standard | no one | 88 | 89 | 86 |
woman | Group B | Some high school | free/reduced | no one | 18 | 32 | 28 |
male | Group C | Master | free/reduced | it’s over | 46 | 42 | 46 |
woman | Group C | Associate | free/reduced | no one | 54 | 58 | 61 |
male | Group D | high school | standard | no one | 66 | 69 | 63 |
woman | Group B | Some college | free/reduced | it’s over | 65 | 75 | 70 |
male | Group D | Some college | standard | no one | 44 | 54 | 53 |
woman | Group C | Some high school | standard | no one | 69 | 73 | 73 |
male | Group D | Bachelor’s degree | free/reduced | it’s over | 74 | 71 | 80 |
male | group A |
Master | free/reduced | no one | 73 | 74 | 72 |
male | Group B | Some college | standard | no one | 69 | 54 | 55 |
woman | Group C | Bachelor’s degree | standard | no one | 67 | 69 | 75 |
male | Group C | high school | standard | no one | 70 | 70 | 65 |
ChatGPT relies solely on Natural Language Processing (NLP) techniques to understand and extract information from spreadsheet data. It can analyze the text-based input provided by the user, interpret the request and generate a response based on the content of the tabular data.
Here is an example of how you can use the ChatGPT API to extract information from table data:
Step 1: Prepare the input
We saved our table data in a CSV file, you can read the CSV file using the “Pandas” Python library and pass the data to the ChatGPT API to retrieve the information.
import pandas as pd
read_csv=pd.read_csv("Student.csv")
Step 2: Use the ChatGPT API
Before we start using the ChatGPT API, make sure you have the OpenAI Python library installed on your system.
pip install openai
You can retrieve information by entering table data and text into the ChatGPT API. This can be done by reading the table data from a CSV file, preparing the input for the API, and passing it along with the input text. The API will then extract the relevant information from the data and provide it in the response.
import openai
import pandas as pd
openai.api_key = '<YOUR OPENAI API KEY>'
read_csv = pd.read_csv("Student.csv")
input_text='''What is the average math score for male students?'''
prompt = """Please regard the following data:\n . Answer the following question and please return only value: """.format(read_csv, input_text)
request = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0301",
messages=[
"role": "user", "content": prompt,
]
)
result = request['choices'][0]['message']['content']
print("ChatGPT Response=>",result)
The following is the response received from the ChatGPT API:
Now let’s look at the answers to some questions based on the information gathered.
Read it | ChatGPT response | Analysis |
What is the gender of the student who scored 72 points in mathematics? | woman | correct |
What race/ethnicity group does the student with the highest score on the question belong to? | Group C | Wrong It should be “group B” |
Did the student who scored 47 in math complete the test prep course? | No | correct |
What is the average math score for male students? |
65.4 | Wrong It should be “62.2” |
What is the writing score for a student who belongs to the C race/ethnicity group and has a master’s degree? | The specified student’s writing score is not included in the data. | Wrong it should be “46” |
We found that ChatGPT can’t perform aggregations such as summarizing or averaging table records and sometimes struggles to answer basic questions.
You can add your table and test it immediately using the ChatGPT playground without using the API. Please visit the link below
https://chat.openai.com/
Now we can use ChatGPT again, but this time not for direct table analysis. Instead, we can use ChatGPT to generate SQL statements for the database that contains the data. SQL can handle filtering, sorting, aggregation, and summarization logic to help you perform the exact calculations you need.
We used SQLite as the database engine and used the sqlite3 Python library to interact with it.
Step 1: Create a SQLite database and table
Here’s the Python code that creates the database and table in SQLite:
import sqlite3
# Connect to SQLite database (this will create a new database file if it doesn't exist)
conn = sqlite3.connect("chatgpt.db")
cursor = conn.cursor()
# Create a table
cursor.execute("""
CREATE TABLE IF NOT EXISTS student (
gender TEXT,
race TEXT,
parentallevelofeducation TEXT,
lunch TEXT,
testpreparationcourse TEXT,
mathscore INTEGER,
readingscore INTEGER,
writingscore INTEGER
)
""")
# Commit the transaction and close the connection
conn.commit()
conn.close()
Step 2: Add data to the database
Here’s some Python code that reads data from a CSV file using pandas and inserts it into a SQLite database:
import sqlite3
import pandas as pd
df=pd.read_csv("Student.csv")
# Connect to SQLite database
conn = sqlite3.connect('chatgpt.db')
# Insert DataFrame into SQLite database
df.to_sql('student', conn, if_exists='replace', index=False)
# Close database connection
conn.close()
Step 3: Use the ChatGPT API
You can retrieve information by providing the database table name, its corresponding columns, and input text to the ChatGPT API. The ChatGPT API will generate an SQL query from the given input text and then use that query to pull from the database.
Here’s an example of how you can retrieve information using the ChatGPT API by providing a database table name, its corresponding columns, and input text:
import sqlite3
import openai
# Connect to SQLite database
conn = sqlite3.connect('chatgpt.db')
cursor = conn.cursor()
openai.api_key = '<YOUR OPENAI API KEY>'
# Function to get table columns from SQLite database
def get_table_columns(table_name):
cursor.execute("PRAGMA table_info()".format(table_name))
columns = cursor.fetchall()
print(columns)
return [column[1] for column in columns]
# Function to generate SQL query from input text using ChatGPT
def generate_sql_query(table_name,text,columns):
prompt = """You are a ChatGPT language model that can generate SQL queries. Please provide a natural language input text, and I will generate the corresponding SQL query for you.The table name is and corresponding columns are .\nInput: \nSQL Query:""".format(table_name,columns,text)
print(prompt)
request = openai.ChatCompletion.create(
model="gpt-3.5-turbo-0301",
messages=[
"role": "user", "content": prompt,
]
)
sql_query = request['choices'][0]['message']['content']
return sql_query
# Function to execute SQL query on SQLite database
def execute_sql_query(query):
cursor.execute(query)
result = cursor.fetchall()
return result
text="What is the average math score for male students?"
table_name = 'student'
columns = get_table_columns(table_name)
sql_query=generate_sql_query(table_name,text,columns)
print("Generated SQL query: ",sql_query)
if sql_query:
result=execute_sql_query(sql_query)
print("ChatGPT Response=>",result)
# Close database connection
cursor.close()
conn.close()
After sending the input text to the ChatGPT API, you will receive a response containing the generated SQL query. You can then use this query to retrieve the desired data from the database.
The answer to this approach looks like this:
Read it | SQL query generated by ChatGPT | answer | Analysis |
What is the gender of the student who scored 72 points in mathematics? | SELECT gender FROM student WHERE mathscore=72; | woman | correct |
What race/ethnicity group does the student with the highest score on the question belong to? | Select Race as “Race/Ethnicity”, MAX(Question Score) as “Highest Question Score” from the student group by race Order by “highest reading score”. LIMIT 1; |
Group B | correct |
Did the student who scored 47 in math complete the test prep course? | Choose a test preparation course from the student WHERE mathscore = 47 |
no one | correct |
What is the average math score for male students? | SELECT AVG(math grade) FROM student WHERE gender = ‘Male’ | 62.2 | correct |
What is the writing score for a student who belongs to the C race/ethnicity group and has a master’s degree? | select grade from student where race = ‘group c’ and parents’ education = ‘master’ | 46 | correct |
ChatGPT without SQL relies solely on NLP techniques to understand and generate responses based on table data, which can lead to incorrect responses. However, when ChatGPT is combined with SQL capabilities, it can perform more advanced and flexible interactions with table data by executing SQL queries directly. This allows for more accurate and precise information retrieval from databases, enhancing the overall capabilities of ChatGPT when interacting with tabular data.
[ad_2]
Source link