The AI Book
    Facebook Twitter Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook Twitter Instagram
    The AI Book
    Daily AI News

    OctoML debuts self-optimizing compute services for generative AI

    14 June 2023No Comments4 Mins Read

    [ad_1]

    Join top executives in San Francisco on July 11-12, to hear how leaders are integrating and optimizing AI investments for success. Learn More


    Seattle based startup OctoML today released its new OctoAI self optimizing infrastructure service to help organizations build and deploy generative AI applications.

    OctoML got its start in 2019 as a spinout from the University of Washington with the foundation of the company’s technology stack relying on the open source Apache TVM machine learning (ML) compiler framework. Its original focus was to help organizations optimize ML models for deployment, an effort that helped the company raise a total of $131.9 million to date, including an $85 million Series C round in 2021. In June 2022, OctoML added technology to help transform ML models into software functions. Now, the company is going a step further with its OctoAI service, which is all about optimizing the deployment of ML on infrastructure to help improve performance and manage costs.

    “The demand for compute is just absurd,” Luis Ceze, Octo ML CEO, told VentureBeat.  “Because generative AI models use a lot of compute, making compute efficient for AI is at the very core of the value proposition for OctoML.”

    Solving the last mile problem with AI

    With its new platform, OctoML is helping to solve the last mile problem with AI: Getting models deployed so users can benefit from the power of generative AI.

    Event

    Transform 2023

    Join us in San Francisco on July 11-12, where top executives will share how they have integrated and optimized AI investments for success and avoided common pitfalls.

     

    Register Now

    Ceze, still a Professor at the University of Washington, said that when OctoML was founded, the focus was on data scientists building ML systems. From there, the company evolved into a platform with a model optimization service that takes model inputs, then optimizes the packages into containers.

    With model optimization, Ceze said organizations still had to take the container and find the right hosting configuration infrastructure for deployment. The new OctoAI platform addresses that challenge with a fully managed compute service. 

    “We can abstract away all the complexities of optimizing the model, packaging and deploying with a fully managed infrastructure,” Ceze said.

    Part of the new service is a library of popular open source large language models (LLMs) that developers can use to build and extend. At launch, supported models include Stable Diffusion 2.1, Dolly v2, LLaMA 65B, Whisper, FlanUL and Vicuna. 

    How the OctoAI service works

    OctoML is not the only vendor looking to help developers deploy common open-source LLMs.

    Among the vendors that have recently offered similar types of services is Anyscale, the lead commercial sponsor behind the open source Ray ML framework for workload scaling. At the end of May, Anyscale launched its Aviary open source project as a technology to help developers deploy and scale open-source LLMs.

    Ceze explained that the OctoAI service is not using Ray for scaling workloads; it has developed its own proprietary approach. The Apache TVM project continues to play a foundational, helping turn a model into code that will run efficiently on GPU infrastructure.

    “We basically built an engine that for any given model, we deeply optimize the model for the hardware target and produce a deployable artifact,” Ceze said.

    The service also abstracts the physical cloud infrastructure on which the models run. At launch, the OctoAI service runs on Amazon Web Services (AWS), with plans to expand to other cloud providers. Ceze said he doesn’t want users to have to deal with the underlying complexity of choosing a specific type of processor or cloud instance to run a workload.

    “We want to make sure that users tell us the expected performance, then we’re going to go and choose the right hardware that works for them and has the right cost structure,” Ceze said.

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

    [ad_2]

    Source link

    Previous ArticleWhyLabs launches LangKit to make large language models safe and responsible
    Next Article Datasaur launches LLM tool for training custom ChatGPT models
    The AI Book

    Related Posts

    Daily AI News

    Adobe Previews New GenAI Tools for Video Workflows

    16 April 2024
    Daily AI News

    Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

    15 April 2024
    Daily AI News

    8 Reasons to Make the Switch

    15 April 2024
    Add A Comment

    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2025 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.