Close Menu
The AI Book
    Facebook X (Twitter) Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook X (Twitter) Instagram
    The AI Book
    AI Language processing (NLP)

    Interested in ChatGPT: Explore the origins of generative AI and natural language processing

    18 May 2023No Comments9 Mins Read

    [ad_1]

    Editor's note: This post was co-authored by Mary Osborne and Ali Dixon and follows Curious about ChatGPT: Explore the Uses of AI in Education.

    By now most people have at least heard of ChatGPT and there are mixed opinions around it – people love it, people hate it and people fear it. It can create a recipe for chocolate chip cookies, write a Broadway-style song about your children, and generate usable code.

    Since February 14th falls this year, it can be used to write or inspire your Valentine’s Day notes. Check out the love note below that ChatGPT wrote about SAS Software. How did we get to a place where a conversational chatbot can quickly create a personalized letter? Join us as we look at some of the key innovations of the past 50 years that help inform how we respond and what we can expect in the future.

    1966: Eliza

    In 1966, a chatbot named ELIZA took the computer science world by storm. ELIZA was built by Joseph Weisenbaum at the MIT Artificial Intelligence Laboratory and was designed to mimic Rogerian psychotherapists. Rogerian psychotherapists are non-directive but supportive, so they often mirror what the patient is saying. ELIZA used pattern matching—think regular expressions and string substitutions—to fix this. You can try ELIZA for yourself by clicking on the image below.

    ELIZA was rudimentary but felt compelling and was an incredible leap forward for chatbots. Since it was one of the first chatbots ever created, it was also one of the first programs capable of attempting the Turing Test. The Turing Test is a simulation game that tests a machine’s ability to exhibit human-like intelligent behavior. When asked if ChatGPT can pass the Turing Test, it responds as follows:

    1970-1990s

    Methods for refining the way unstructured text data are analyzed have continued to evolve. In the 1970s, call bottoms, case grammars, semantic networks, and conceptual dependency theory were introduced. The 1980s saw the birth of big hair, glamour, ontologies and expert systems (such as DENDRAL for chemical analysis). In the 90s we got grunge, statistical models, recurrent neural networks and long short term memory models (LSTM).

    2000 – 2015

    The new millennium brought us low-rise jeans, trucker hats, and greater advances in language modeling, word embedding, and Google Translate. However, in the last 12 years, some great magic has happened in NLP. Word2Vec, encoder-decoder models, attention and transformers, pre-trained models and transfer models paved the way for what we see now – GPT and large language models that can accept billions of parameters.

    2015 and beyond – Word2vec, GloVe and FASTTEXT

    Word2vec, GloVe and FASTTEXT focus on word embedding or word vectorization. Word vectorization is an NLP methodology used to map words or phrases from a vocabulary to a corresponding vector of real numbers, which is used to predict words and find word similarity or semantics. The basic idea behind word vectorization is that words that have similar meanings will have similar vector representations.

    Word2vec is one of the most common word vectorization methods. It uses a neural network to learn vector representations of words from a large corpus of text. Vectors are learned so that words used in similar contexts will have similar vector representations. For example, the vectors “cat” and “dog” will be different, but the vectors “cat” and “kitten” will be similar.

    Another technique used to create word vectors is called GloVe (Global Vectors for Word Representation). GloVe uses a different approach than word2vec and learns word vectors by training on co-occurrence matrices.

    After learning a set of word vectors, they can be used in a variety of natural language processing (NLP) tasks, such as text classification, language translation, and question answering.

    Transformer models of 2017

    Transformer models were introduced in a 2017 paper by Google researchers called “Attention is All You Need” and have truly revolutionized how we use machine learning to analyze unstructured data.

    One of the key innovations in transformer models is the use of a self-attention mechanism that allows the model to weigh the importance of different parts of the input when making predictions. This allows the model to better handle long-term input dependencies, which is particularly useful for tasks such as language translation, where the meaning of a word may depend on words that appear multiple words earlier in the sentence. Another important feature of transformer models is the use of multi-head attention, which allows the model to attend to different parts of the input in parallel rather than sequentially. This makes the model more efficient because it can process the input in parallel rather than processing it in stages.

    ELMo

    ELMo, or Embeddings from Language Model, is not a transformer model—it is a two-way LSTM. A bidirectional LSTM is a type of recurrent neural network (RNN) that processes input sequences in both forward and backward directions, sequentially gathering contextual information from both past and future words. In ELMo, a bidirectional LSTM network is trained on large amounts of textual data to generate context-sensitive word embeddings that capture rich semantic and syntactic information about word usage in context. It helps to manage ambiguity, specifically polysemy. Polysemy is when a single word can have multiple meanings depending on the context. A bank is an example of polysemy. The author can refer to the bank of the river or the bank where you keep your money. ELMo can help decipher which meaning was intended because it can better manage words in context. This ability to manage words in context offered a dramatic improvement over vector value models such as word2vect and GloVe, which used a bag of words that did not take context into account.

    Bertie

    BERT uses a transformer-based architecture that allows it to efficiently handle longer input sequences and capture context from the left and right side of a token or word (the B in BERT stands for bidirectional). ELMo, on the other hand, uses a recurrent neural network (RNN) architecture, which is less efficient for processing longer input sequences.

    BERT is pre-trained on vast amounts of text data and can be fine-tuned for specific tasks such as answering questions and analyzing sentiment. ELMo, on the other hand, is pre-trained on only a small amount of text data and is not fine-tuned.

    BERT also uses a masked language modeling object that randomly masks some input tokens and then trains a model to predict the original values ​​of the masked tokens. This allows BERT to learn a deeper understanding of the context in which words appear. ELMo, on the other hand, only serves the purpose of predicting the next word.

    GPT

    GPT or Generative Pretrained Models came to the market with BERT and were designed for a different purpose. BERT was designed to understand the meaning of sentences. GPT models are designed to generate text. GPT models are general-purpose language models trained on large amounts of text data to perform a wide range of NLP tasks such as text generation, translation, summarization, and more.

    GPT-1 (2018)

    It was the first GPT model and was trained on a large corpus of text data from the Internet. It had 117 million parameters and was able to generate text that was very similar in style and content to the training data.

    GPT-2 (2019)

    This model was even larger than GPT-1, with 1.5 billion parameters, and was trained on an even larger corpus of text data. This model was able to create text that was much more coherent and human-like than its predecessor.

    GPT-3 (2020)

    It was the latest and largest generic GPT model, with 175 billion parameters. It was trained on an even larger corpus of text data and can perform a wide range of natural language processing tasks, such as translation, question answering and summarization, at a human level.

    GPT-3.5 or ChatGPT (2022)

    ChatGPT is also known as GPT-3.5 and is a slightly different take on the GPT model. It’s a conversational AI model that’s optimized to perform well in conversational AI-related tasks, such as answering questions, but not always truthfully. ChatGPT was trained on a smaller dataset that is more focused on conversational data, allowing it to generate more relevant and contextual responses compared to GPT-3.

    Google Bard

    Google announced a conversational search approach called Bard on February 6, 2023, and after that Microsoft announced that they would be incorporating ChatGPT into Bing. It seems that the future will be conversational and people will try to improve their response engine optimization rather than traditional search engine optimization. The landscape is constantly evolving with OpenAI planning to release GPT-4 in the first quarter of 2023.

    In the spirit of Valentine’s Day, we asked ChatGPT to write a love note to BARD, its chatbot competitor. The answer is given below.

    Looks good, doesn’t it? However, when we directly asked ChatGPT about Google BARD, he admits that he was not aware of it. All he really knew in the first prompt was the word BARD, and when we explained that it was a chatbot competitor, that helped him confidently respond. ChatGPT’s responses depend entirely on the syntax and content of the question asked. Based on the responses, you’d think ChatGPT knows about BARD, but its training data stops around 2021. Our advice? Choose your words wisely!

    This is a time of great advances in generative AI and natural language processing, and you need to be careful to make sure the information is accurate. New technologies and technologies are being explored every day. As you can see from ChatGPT’s response, “Who knows, maybe one day we’ll put our differences aside and join forces to create something truly amazing.”

    Learn more

    [ad_2]

    Source link

    Previous ArticleEmployee Advocacy 2.0 Team Growth Storytelling
    Next Article A comprehensive comparison of conversational AI platforms
    The AI Book

    Related Posts

    AI Language processing (NLP)

    The RedPajama Project: An Open Source Initiative to Democratize LLMs

    24 July 2023
    AI Language processing (NLP)

    Mastering Data Science with Microsoft Fabric: A Tutorial for Beginners

    23 July 2023
    AI Language processing (NLP)

    Will AI kill your job?

    22 July 2023
    Add A Comment
    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2025 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.