Close Menu
The AI Book
    Facebook X (Twitter) Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook X (Twitter) Instagram
    The AI Book
    AI Media Processing

    What ChatGPT Knows About You: OpenAI’s Journey to Data Privacy | By Andrea Valenzuela | May 2023

    10 May 2023No Comments6 Mins Read

    [ad_1]

    Companies that allow users to request their personal data are required to comply with the aforementioned GDPR regulation. However, there is a catch: The file format may make the data unreadable for most of the population. In this case we got both html and json files. but html can be read directly json The files can be more difficult to interpret. I personally believe that the new regulation should also implement a readable data format. But for now…

    Let’s explore the files one by one to get the most out of this new feature!

    The first file is chat.html which contains all my chat history with ChatGPT. Conversations are saved under their respective titles. User questions and ChatGPT answers are marked as assistantand userAccordingly.

    If you’ve ever trained an AI model, this labeling system will be familiar to you.

    Let’s look at a sample conversation from my story:

    Self-made screenshot from my ChatGPT history. The conversation title is highlighted in blue. The user/assistant labels are highlighted in red and green, respectively.

    Have you ever seen the thumbs up, up down (👍👎) next to any ChatGPT reply?

    This information is treated by ChatGPT as feedback for the given responsewhich will then help you train the chatbot.

    This information is stored message_feedback.json A file containing any feedback you have provided to ChatGPT using the thumbs icons. Information is stored in the following format:

    ["message_id": <MESSAGE ID>, "conversation_id": <CONVERSATION ID>, "user_id": <USER ID>, "rating": "thumbsDown", "content": "\"tags\": [\"not-helpful\"]"]

    The thumbsDown The rating takes into account incorrectly generated responses, while thumbsUp Reports generated correctly.

    There is also a file (user.json) contains the following personal data of the user:

    false], "phone_number": <USER PONE>

    Some platforms are known for creating user models based on platform usage. For example, if Google User search is mainly about programming, Google It is likely that the user is a programmer and uses this information to display personalized ads.

    ChatGPT can do the same with information obtained from conversations, but they are currently required to include this inferred information in exported data..

    ⚠️ FYI, One can access What Google knows about them begining Gmail by clicking account >> Data and Privacy >> Personalized ads >> My advertising center.

    There is another file that contains conversation history and also contains metadata. This file is named conversations.json and Includes information such as creation time, several identifiers, and the model behind ChatGPT, among others.

    ⚠️ Metadata provides information about the underlying data. It may contain information such as the origin of the data, its meaning, location, ownership and creation. Metadata includes information related to, but not part of, the underlying data.

    Let’s explore the same conversation A320 hydraulic system failure is revealed in this first example json format. The conversation itself consists of the following questions and answers:

    From this simple conversation, OpenAI stores quite a bit of information. Let’s take a look at the saved information:

    • Main areas json The file contains the following information:

    Valley moderation_results empty since then In this particular case, ChatGPT did not provide feedback. Besides, [+] in the symbol mapping A field means more information is available.

    • in fact mapping The field contains all the information about the conversation itself. Since a conversation has four interactions, Routing saves one children Login per interaction.

    again, [+] A symbol indicates that more information is available. Let’s take a look at the different entries!

    • mapping_id: contains a id For conversation, as well as information about the time of creation and the type of content, among others. As far as we can tell, it also creates a parent_id for conversation etc children_id which corresponds to the user’s next interaction with ChatGPT. Here’s an example:
    • children_idX: New children A record is created for each interaction from a user or assistant. Since conversation has four interactions, json The file shows four children records. each one children The record has the following structure:

    Პirveli children The recording is embedded within the conversation mapping_id As a parent and other interaction – Reply from ChatGP – as a second child.

    • Children which corresponds to the ChatGPT response contains additional fields. For example, for the second interaction:

    In the case of a ChatGPT response, We get information about the ChatGPT model and stop words. It also shows the first children as it parent and third children As the following interaction.

    The full file can be found at this GitHub.

    Have you ever used the “regenerate response” button when you’re not entirely sure of the response ChatGPT provided?

    Homemade screenshot of the reply regeneration button in ChatGPT.

    This feedback information is also saved!

    is the last file named model_comparisons.json that Contains snippets of conversations and follow-up attempts any time ChatGPT has updated a response. The information contains only text without a title, but contains other metadata. Here is the basic structure of this file:


    "id":"<id>",
    "user_id":"<user_id>",
    "input":[+],
    "output":[+],
    "metadata":[+],
    "create_time": "<time>"

    The metadata The field contains some important information such as the country and continent where the conversation took place and information about it https Access scheme, among others. Here comes the interesting part of this file input/output Records:

    input

    The input Contains a collection of messages from the original conversation. Interactions are labeled according to the author And, as in the previous cases, some additional information is also stored. Let’s take a look at the messages saved for our sample conversation:

    User/Assistant Entries are pending, but I’m sure we’re all wondering at this point Why is A system label?

    and moreover Why do they make such an initial statement at the beginning of every conversation?

    Does ChatGPT pre-feed the current date in any new chat?

    Yes, These records are so-called system messages.

    System messages

    System messages provide general instructions to the assistant. They help determine the behavior of the assistant. In the web interface, system messages are transparent to the user, so we cannot see them directly.

    The advantage of the system message is that it allows the developer to set up the assistant without the request itself becoming part of the conversation.. System messages can be delivered using the API. For example, if you are building a car sales assistant, one possible system message might be “You are a car sales assistant. Use a friendly tone and ask customers questions until you understand their needs. Then explain the available cars that match their preferences”. You can provide a list of cars, specifications and prices so that the assistant can provide this information as well.

    [ad_2]

    Source link

    Previous ArticleUsing conversational artificial intelligence for IT support for remote workers, part 2
    Next Article Distributed training of the Bengali ALBERT model
    The AI Book

    Related Posts

    AI Media Processing

    A new set of Arctic images will help artificial intelligence research MIT News

    25 July 2023
    AI Media Processing

    Analyzing rodent infestations using the geospatial capabilities of Amazon SageMaker

    24 July 2023
    AI Media Processing

    Using knowledge of social context for responsible use of artificial intelligence – Google Research Blog

    23 July 2023
    Add A Comment
    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2026 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.