The AI Book
    Facebook Twitter Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook Twitter Instagram
    The AI Book
    AI Language processing (NLP)

    Data extraction from tabular data with ChatGPT

    9 May 2023No Comments7 Mins Read

    [ad_1]

    Introduction

    In the world of data analysis, extracting useful information from spreadsheet data can be a daunting task. Conventional approaches typically require manual data exploration and analysis, which can require significant effort, time, or manpower.

    The emergence of advanced language models such as ChatGPT has introduced a promising and innovative approach to extracting useful information from tabular data. OpenAI has announced an official API for ChatGPT featuring gpt-3.5-turbo, OpenAI’s most advanced language model.

    This blog describes the process of extracting useful information from table data using the ChatGPT API.

    data set

    For all the illustrations in this post, we’ll use the following data. This data is taken from: https://www.kaggle.com/datasets/spscientist/students-performance-in-exams

    Note that we only considered 30 records from the data set.

    Please see the data provided below that will be used for the purposes of this blog.

    gender race Education at the level of parents dinner Test preparation course mathematics reading score writing score
    woman Group B Bachelor’s degree standard no one 72 72 74
    woman Group C Some college standard it’s over 69 90 88
    woman Group B Master standard no one 92 95 93
    male Group A Associate free/reduced no one 47 57 44
    male Group C Some college standard no one 76 78 75
    woman Group B Associate standard no one 71 83 78
    woman Group B Some college standard it’s over 88 95 92
    male Group B Some college free/reduced no one 40 43 39
    male Group D high school free/reduced it’s over 64 64 67
    woman Group B high school free/reduced no one 38 60 50
    male Group C Associate standard no one 58 54 52
    male Group D Associate standard no one 40 52 43
    woman Group B high school standard no one 65 81 73
    male Group A Some college standard it’s over 78 72 70
    woman Group A Master standard no one 50 53 58
    woman Group C Some high school standard no one 69 75 78
    male Group C high school standard no one 88 89 86
    woman Group B Some high school free/reduced no one 18 32 28
    male Group C Master free/reduced it’s over 46 42 46
    woman Group C Associate free/reduced no one 54 58 61
    male Group D high school standard no one 66 69 63
    woman Group B Some college free/reduced it’s over 65 75 70
    male Group D Some college standard no one 44 54 53
    woman Group C Some high school standard no one 69 73 73
    male Group D Bachelor’s degree free/reduced it’s over 74 71 80
    male group
    A
    Master free/reduced no one 73 74 72
    male Group B Some college standard no one 69 54 55
    woman Group C Bachelor’s degree standard no one 67 69 75
    male Group C high school standard no one 70 70 65

    ChatGPT relies solely on Natural Language Processing (NLP) techniques to understand and extract information from spreadsheet data. It can analyze the text-based input provided by the user, interpret the request and generate a response based on the content of the tabular data.

    Here is an example of how you can use the ChatGPT API to extract information from table data:

    Step 1: Prepare the input

    We saved our table data in a CSV file, you can read the CSV file using the “Pandas” Python library and pass the data to the ChatGPT API to retrieve the information.

    import pandas as pd
    
    read_csv=pd.read_csv("Student.csv")
    Step 2: Use the ChatGPT API

    Before we start using the ChatGPT API, make sure you have the OpenAI Python library installed on your system.

    pip install openai

    You can retrieve information by entering table data and text into the ChatGPT API. This can be done by reading the table data from a CSV file, preparing the input for the API, and passing it along with the input text. The API will then extract the relevant information from the data and provide it in the response.

    import openai
    import pandas as pd
    
    openai.api_key = '<YOUR OPENAI API KEY>'
    
    read_csv = pd.read_csv("Student.csv")
    
    
    input_text='''What is the average math score for male students?'''
    
    prompt = """Please regard the following data:\n . Answer the following question and please return only value: """.format(read_csv, input_text)
    
    request = openai.ChatCompletion.create(
        model="gpt-3.5-turbo-0301",
        messages=[
            "role": "user", "content": prompt,
        ]
    )
    result = request['choices'][0]['message']['content']
    
    print("ChatGPT Response=>",result)  

    The following is the response received from the ChatGPT API:

    Now let’s look at the answers to some questions based on the information gathered.

    Read it ChatGPT response Analysis
    What is the gender of the student who scored 72 points in mathematics? woman correct
    What race/ethnicity group does the student with the highest score on the question belong to? Group C Wrong
    It should be “group B”
    Did the student who scored 47 in math complete the test prep course? No correct

    What is the average math score for male students?

    65.4 Wrong
    It should be “62.2”
    What is the writing score for a student who belongs to the C race/ethnicity group and has a master’s degree? The specified student’s writing score is not included in the data. Wrong
    it should be “46”

    We found that ChatGPT can’t perform aggregations such as summarizing or averaging table records and sometimes struggles to answer basic questions.

    You can add your table and test it immediately using the ChatGPT playground without using the API. Please visit the link below
    https://chat.openai.com/

    Now we can use ChatGPT again, but this time not for direct table analysis. Instead, we can use ChatGPT to generate SQL statements for the database that contains the data. SQL can handle filtering, sorting, aggregation, and summarization logic to help you perform the exact calculations you need.
    We used SQLite as the database engine and used the sqlite3 Python library to interact with it.

    Step 1: Create a SQLite database and table

    Here’s the Python code that creates the database and table in SQLite:

    import sqlite3
    
    # Connect to SQLite database (this will create a new database file if it doesn't exist)
    conn = sqlite3.connect("chatgpt.db")
    cursor = conn.cursor()
    
    # Create a table
    cursor.execute("""
        CREATE TABLE IF NOT EXISTS student (
            gender TEXT,
            race TEXT,
            parentallevelofeducation TEXT,
            lunch TEXT,
            testpreparationcourse TEXT,
            mathscore INTEGER,
            readingscore INTEGER,
            writingscore INTEGER
        )
    """)
    
    # Commit the transaction and close the connection
    conn.commit()
    conn.close()
    Step 2: Add data to the database

    Here’s some Python code that reads data from a CSV file using pandas and inserts it into a SQLite database:

    import sqlite3
    import pandas as pd
    
    df=pd.read_csv("Student.csv")
    # Connect to SQLite database
    conn = sqlite3.connect('chatgpt.db')
    
    # Insert DataFrame into SQLite database
    df.to_sql('student', conn, if_exists='replace', index=False)
    
    # Close database connection
    conn.close()
    Step 3: Use the ChatGPT API

    You can retrieve information by providing the database table name, its corresponding columns, and input text to the ChatGPT API. The ChatGPT API will generate an SQL query from the given input text and then use that query to pull from the database.

    Here’s an example of how you can retrieve information using the ChatGPT API by providing a database table name, its corresponding columns, and input text:

    import sqlite3
    import openai
    
    # Connect to SQLite database
    conn = sqlite3.connect('chatgpt.db')
    cursor = conn.cursor()
    
    
    openai.api_key = '<YOUR OPENAI API KEY>'
    
    # Function to get table columns from SQLite database
    def get_table_columns(table_name):
        cursor.execute("PRAGMA table_info()".format(table_name))
        columns = cursor.fetchall()
        print(columns)
        return [column[1] for column in columns]
    
    
    # Function to generate SQL query from input text using ChatGPT
    def generate_sql_query(table_name,text,columns):
        prompt = """You are a ChatGPT language model that can generate SQL queries. Please provide a natural language input text, and I will generate the corresponding SQL query for you.The table name is  and corresponding columns are .\nInput: \nSQL Query:""".format(table_name,columns,text)
        print(prompt)
        request = openai.ChatCompletion.create(
            model="gpt-3.5-turbo-0301",
            messages=[
                "role": "user", "content": prompt,
            ]
        )
        sql_query = request['choices'][0]['message']['content']
        return sql_query    
    
    # Function to execute SQL query on SQLite database
    def execute_sql_query(query):
        cursor.execute(query)
        result = cursor.fetchall()
        return result
    
    
    text="What is the average math score for male students?"
    
    table_name = 'student'
    columns = get_table_columns(table_name)
    sql_query=generate_sql_query(table_name,text,columns)
    print("Generated SQL query: ",sql_query)
    if sql_query:
        result=execute_sql_query(sql_query)
        print("ChatGPT Response=>",result)
        
    # Close database connection
    cursor.close()
    conn.close()
    

    After sending the input text to the ChatGPT API, you will receive a response containing the generated SQL query. You can then use this query to retrieve the desired data from the database.

    The answer to this approach looks like this:

    Read it SQL query generated by ChatGPT answer Analysis
    What is the gender of the student who scored 72 points in mathematics? SELECT gender FROM student WHERE mathscore=72; woman correct
    What race/ethnicity group does the student with the highest score on the question belong to? Select Race as “Race/Ethnicity”, MAX(Question Score) as “Highest Question Score”
    from the student
    group by race
    Order by “highest reading score”.
    LIMIT 1;
    Group B correct
    Did the student who scored 47 in math complete the test prep course? Choose a test preparation course
    from the student
    WHERE mathscore = 47
    no one correct
    What is the average math score for male students? SELECT AVG(math grade) FROM student WHERE gender = ‘Male’ 62.2 correct
    What is the writing score for a student who belongs to the C race/ethnicity group and has a master’s degree? select grade from student where race = ‘group c’ and parents’ education = ‘master’ 46 correct

    ChatGPT without SQL relies solely on NLP techniques to understand and generate responses based on table data, which can lead to incorrect responses. However, when ChatGPT is combined with SQL capabilities, it can perform more advanced and flexible interactions with table data by executing SQL queries directly. This allows for more accurate and precise information retrieval from databases, enhancing the overall capabilities of ChatGPT when interacting with tabular data.



    [ad_2]

    Source link

    Previous ArticleCreative Intelligence for Influencer Marketing Delivers Higher ROI
    Next Article HuggingGPT: The secret weapon for solving complex AI tasks
    The AI Book

    Related Posts

    AI Language processing (NLP)

    The RedPajama Project: An Open Source Initiative to Democratize LLMs

    24 July 2023
    AI Language processing (NLP)

    Mastering Data Science with Microsoft Fabric: A Tutorial for Beginners

    23 July 2023
    AI Language processing (NLP)

    Will AI kill your job?

    22 July 2023
    Add A Comment

    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2025 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.