What’s a NIM? Nvidia Inference Manager is new approach to gen AI model deployment that could change the industry

[ad_1]

Join leaders in Boston on March 27 for an exclusive night of networking, insights, and conversation. Request an invite here.

Nvidia is aiming to dramatically accelerate and optimize the deployment of generative AI large language models (LLMs) with a new approach to delivering models for rapid inference.

At Nvidia GTC today, the AI giant is announcing its Nvidia Inference Manager (NIM) software technology, which packages optimized inference engines, industry standard APIs and support for AI models into containers for easy deployment. While NIM provides prebuilt models, it also allows organizations to bring their own proprietary data and will support and help to accelerate Retrieval Augmented Generation (RAG) deployment.

The NIM technology marks a major milestone for gen AI deployment as the foundation of Nvidia’s next- generation strategy for inference that will have an impact on almost every model developer and data platform in the space. Nvidia has worked with large software vendors including SAP, Adobe, Cadence, CrowdStrike, Getty Images, ServiceNow and Shutterstock, as well as a long list of data platform vendors including BOX, Cohesity, Cloudera, Databricks, Datastax, Dropbox, NetApp and Snowflake to support NIM.

NIM is part of the NVIDIA Enterprise AI software suite, which is getting its 5.0 release today at GTC.

VB Event

The AI Impact Tour – Atlanta

Continuing our tour, we’re headed to Atlanta for the AI Impact Tour stop on April 10th. This exclusive, invite-only event, in partnership with Microsoft, will feature discussions on how generative AI is transforming the security workforce. Space is limited, so request an invite today.

Request an invite

“We believe that Nvidia NIM is the best software package, the best runtime for developers to build on top of, so that they can focus on the enterprise applications,” Manuvir Das, VP enterprise computing at Nvidia, explained during a press pre-briefing.

What exactly is Nvidia NIM?

At the most basic level, a NIM is a container full of microservices.

The container can include any type of model, ranging from open to proprietary models, that can run anywhere there is an Nvidia GPU — be that in the cloud, or even just in a laptop. In turn, that container can be deployed anywhere a container can run, which could be a Kubernetes deployment in the cloud, a Linux server or even a serverless Function-as-a-Service model. Nvidia will have the serverless function approach on its new ai.nvidia.com website, where developers can go to begin working with NIM prior to deployment.

To be clear, a NIM isn’t a replacement for any prior approach to model delivery from Nvidia. It’s a container that includes a highly optimized model for Nvidia GPUs along with the necessary technologies to improve inference.

In response to a question from VentureBeat during the press briefing, Kari Briski, VP for gen AI software product management, emphasized that Nvidia is a platform company. She noted that all the ways that Nvidia has helped to support inference, with tools such as Tensor RT, as well as the Triton Inference Server, are still important technologies.

“What we have found is that putting all these pieces together for a production environment to run gen AI at scale requires a lot of know-how and expertise, so that’s why we’ve packaged it together,” said Briski.

NIMs will help power responsive RAG capabilities for enterprises

A strong use case for NIMs will be in support of RAG deployment models.

“Pretty much every customer we talk to has already implemented dozens or hundreds of these RAGs,” said Das. “The question really is about how do we go to production? How do we take the prototyping that we’ve done, and now deliver real business value by going into production with the use of these models?”

Nvidia and a number of leading data vendors are hoping that NIMs are the answer to that question. Vector database capabilities are critical to enabling RAG, and there are a number of vector database vendors supporting NIMs. Among those are Apache Lucene, Datastax, Faiss, Kinetica, Milvus, Redis and Weaviate.

The RAG approach will benefit from the integration of NVIDIA NeMo Retriever microservices inside of NIM deployments. NeMo Retriever is a technology that Nvidia announced in November 2023 as an approach to help enable RAG with an optimized approach for data retrieval.

“When you add a retriever that’s both accelerated and trained on some of the highest quality datasets, it matters,” said Briski.

VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

[ad_2]

Source link

Adobe Previews New GenAI Tools for Video Workflows

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

8 Reasons to Make the Switch