Close Menu
The AI Book
    Facebook X (Twitter) Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook X (Twitter) Instagram
    The AI Book
    Daily AI News

    MosaicML launches MPT-7B-8K, a 7B-parameter open-source LLM

    19 July 2023No Comments3 Mins Read

    [ad_1]

    Head over to our on-demand library to view sessions from VB Transform 2023. Register Here


    MosaicML has unveiled MPT-7B-8K, an open-source large language model (LLM) with 7 billion parameters and an 8k context length. 

    According to the company, the model is trained on the MosaicML platform and underwent a pretraining process commencing from the MPT-7B checkpoint. The pretraining phase was conducted using Nvidia H100s, with an additional three days of training on 256 H100s, incorporating an impressive 500 billion tokens of data.

    Previously, MosaicML had made waves in the AI community with its release of MPT-30B, an open-source and commercially licensed decoder-based LLM. The company claimed it to be more powerful than GPT-3-175B, with only 17% of GPT-3’s parameters, equivalent to 30 billion. 

    MPT-30B surpassed GPT-3’s performance across various tasks and proved more efficient to train than models of similar sizes. For instance, LLaMA-30B required approximately 1.44 times more FLOPs budget than MPT-30B, while Falcon-40B had a 1.27 times higher FLOPs budget than MPT-30B.

    Event

    VB Transform 2023 On-Demand

    Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

     

    Register Now

    MosaicML claims that the new model MPT-7B-8K exhibits exceptional proficiency in document summarization and question-answering tasks compared to all previously released models. 

    The company said the model is specifically optimized for accelerated training and inference for quicker results. Moreover, it allows fine-tuning of domain-specific data within the MosaicML platform.

    The company has also announced the availability of commercial-use licensing for MPT-7B-8k, highlighting its exceptional training on an extensive dataset comprising 1.5 trillion tokens, surpassing similar models like XGen, LLaMA, Pythia, OpenLLaMA and StableLM.

    MosaicML claims that through the use of FlashAttention and FasterTransformer, the model excels in rapid training and inference while benefiting from the open-source training code available through the llm-foundry repository.

    The company has released the model in three variations:

    • MPT-7B-8k-Base: This decoder-style transformer is pretrained based on MPT-7B and further optimized with an extended sequence length of 8k. It undergoes additional training with 500 billion tokens, resulting in a substantial corpus of 1.5 trillion tokens encompassing text and code.
    • MPT-7B-8k-Instruct: This model is designed for long-form instruction tasks, including summarization and question-answering. It is crafted by fine-tuning MPT-7B-8k using carefully curated datasets.
    • MPT-7B-8k-Chat: This variant functions as a chatbot-like model, focusing on dialogue generation. It is created by finetuning MPT-7B-8k with approximately 1.5 billion tokens of chat data.

    Mosaic asserts that MPT-7B-8k models exhibit comparable or superior performance to other currently available open-source models with an 8k context length, as confirmed by the company’s in-context learning evaluation harness.

    The announcement coincides with Meta’s unveiling of the LLaMA 2 model, now available on Microsoft Azure. Unlike LLaMA 1, LLaMA 2 offers various model sizes, boasting 7, 13 and 70 billion parameters.

    Meta asserts that these pre-trained models were trained on a vast dataset, 40% larger than that of LLaMA 1, with an expanded context length of two trillion tokens, twice the size of LLaMA 1. LLaMA 2 outperforms its predecessor according to Meta’s benchmarks.

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.



    [ad_2]

    Source link

    Previous ArticleNvidia became a $1 trillion company thanks to AI. Look inside their lavish Star Trek-inspired HQ  | The AI Beat
    Next Article AI21 Labs debuts a plug-and-play AI engine for enterprise data
    The AI Book

    Related Posts

    Daily AI News

    Adobe Previews New GenAI Tools for Video Workflows

    16 April 2024
    Daily AI News

    Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

    15 April 2024
    Daily AI News

    8 Reasons to Make the Switch

    15 April 2024
    Add A Comment
    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2025 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.