Close Menu
The AI Book
    Facebook X (Twitter) Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook X (Twitter) Instagram
    The AI Book
    Daily AI News

    DeepInfra gets $8M to make running AI inferences more affordable

    10 November 2023No Comments6 Mins Read

    [ad_1]

    VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Hear from top industry leaders on Nov 15. Reserve your free pass


    Ok, let’s say you’re one of the company leaders or IT decision-makers who has heard enough about all this generative AI stuff — you’re finally ready to take the plunge and offer a large language model (LLM) chatbot to your employees or customers. The problem is: how do you actually launch it and how much should you pay to run it?

    DeepInfra, a new company founded by former engineers at IMO Messenger, wants to answer those questions succinctly for business leaders: they’ll get the models up and running on their private servers on behalf of their customers, and they are charging an aggressively low rate of $1 per 1 million tokens in or out compared to $10 per 1 million tokens for OpenAI’s GPT-4 Turbo or $11.02 per 1 million tokens for Anthropic’s Claude 2.

    Today, DeepInfra emerged from stealth exclusively to VentureBeat, announcing it has raised an $8 million seed round led by A.Capital and Felicis. It plans to offer a range of open source model inferences to customers, including Meta’s Llama 2 and CodeLlama, as well as variants and tuned versions of these and other open source models.

    “We wanted to provide CPUs and a low-cost way of deploying trained machine learning models,” said Nikola Borisov, DeepInfra’s Founder and CEO, in a video conference interview with VentureBeat. “We already saw a lot of people working on the training side of things and we wanted to provide value on the inference side.”

    VB Event

    AI Unleashed

    Don’t miss out on AI Unleashed on November 15! This virtual event will showcase exclusive insights and best practices from data leaders including Albertsons, Intuit, and more.

     

    Register for free here

    DeepInfra’s value prop

    While there have been many articles written about the immense GPU resources needed to train machine learning and large language models (LLMs) now in vogue among enterprises, with outpaced demand leading to a GPU shortage, less attention has been paid downstream, to the fact that these models also need hefty compute to actually run reliably and be useful to end-users, also known as inferencing.

    According to Borisov, “the challenge for when you’re serving a model is how to fit number of concurrent users onto the same hardware and model at the same time…The way that large language models produce tokens is they have to do it one token at a time, and each token requires a lot of computation and memory bandwidth. So the challenge is to kind of fit people together onto the same servers.”

    In other words: if you plan your LLM or LLM-powered app to have more than a single user, you’re going to need to think about — or someone will need to think about — how to optimize that usage and gain efficiencies from users querying the same tokens in order to avoid filling up your precious server space with redundant computing operations.

    To deal with this challenge, Borisov and his co-founders who worked at IMO Messenger with its 200 million users relied upon their prior experience “running large fleets of servers in data centers around the world with the right connectivity.”

    Top investor endorsement

    The three co-founders are the equivalent of “international programming Olympic gold medal winners,” according to Aydin Senkut, the legendary serial entrepreneur and founder and managing partner of Felicis, who joined VentureBeat’s call to explain why his firm backed DeepInfra. “They actually have an insane experience. I think other than the WhatsApp team, they are maybe first or second in the world to having the capability to build efficient infrastructure to serve hundreds of millions of people.”

    It’s this efficiency at building server infrastructure and compute resources that allow DeepInfra to keep its costs so low, and what Senkut in particular was attracted to when considering the investment.

    When it comes to AI and LLMs, “the use cases are endless, but cost is a big factor,” observed Senkut. “Everybody’s singing the praises of the potential, yet everybody’s complaining about the cost. So if a company can have up to a 10x cost advantage, it could be a huge market disrupter.”

    That’s not only the case for DeepInfra, but the customers who rely on it and seek to leverage LLM tech affordably in their applications and experiences.

    Targeting SMBs with open-source AI offerings

    For now, DeepInfra plans to target small-to-medium sized businesses (SMBs) with its inference hosting offerings, as those companies tend to be the most cost sensitive.

    “Our initial target customers are essentially people wanting to just get access to the large open source language models and other machine learning models that are state of the art,” Borisov told VentureBeat.

    As a result, DeepInfra plans to keep a close watch on the open source AI community and the advances occurring there as new models are released and tuned to achieve greater and greater and more specialized performance for different classes of tasks, from text generation and summarization to computer vision applications to coding.

    “We firmly believe there will be a large deployment and variety and in general, the open source way to flourish,” said Borisov. “Once a large good language models like Llama gets published, then there’s a ton of people who can basically build their own variants of them with not too much computation needed…that’s kind of the flywheel effect there where more and more effort is being put into same ecosystem.”

    That thinking tracks with VentureBeat’s own analysis that the open source LLM and generative AI community had a banner year, and will likely eclipse usage of OpenAI’s GPT-4 and other closed models since the costs to running them are so much lower, and there are fewer barriers built-in to the process of fine-tuning them to specific use cases.

    “We are constantly trying to onboard new models that are just coming out,” Borisov said. “One common thing is people are looking for a longer context model… that’s definitely going to be the future.”

    Borisov also believes DeepInfra’s inference hosting service will win fans among those enterprises concerned about data privacy and security. “We don’t really store or use any of the prompts people put in,” he noted, as those are immediately discarded once the model chat window closes.

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

    [ad_2]

    Source link

    Previous ArticleSnap adds ChatGPT to its AR with a focus on AI at Lens Fest
    Next Article OpenAI rolls out GPTs to all subscribers despite DDoS attack
    The AI Book

    Related Posts

    Daily AI News

    Adobe Previews New GenAI Tools for Video Workflows

    16 April 2024
    Daily AI News

    Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

    15 April 2024
    Daily AI News

    8 Reasons to Make the Switch

    15 April 2024
    Add A Comment
    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2026 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.