The AI Book
    Facebook Twitter Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook Twitter Instagram
    The AI Book
    Daily AI News

    Arize AI wants to improve enterprise LLMs with ‘Prompt Playground,’ new data analysis tools

    30 August 2023No Comments6 Mins Read

    [ad_1]

    Head over to our on-demand library to view sessions from VB Transform 2023. Register Here


    We all know enterprises are racing at varying speeds to analyze and reap the benefits of generative AI — ideally in a smart, secure and cost-effective way. Survey after survey over the last year has shown this.

    But once an organization identifies a large language model (LLM) or several it wishes to use, the hard work is far from over. In fact, deploying the LLM in a way that benefits an organization requires understanding the best prompts employees or customers can use to generate helpful results — otherwise it’s pretty much worthless — as well as what data to include in those prompts from the organization or user.

    “You can’t just take a Twitter demo [of an LLM] and put it into the real world,” said Aparna Dhinakaran, cofounder and chief product officer of Arize AI, in an exclusive video interview with VentureBeat. “It’s actually going to fail. And so how do you know where it fails? And how do you know what to improve? That’s what we focus on.”

    Three-year-old business-to-business (B2) machine learning software provider Arize AI would know, as it has since day one been focused on making AI more observable (less technical and more understandable) to organizations.

    Event

    VB Transform 2023 On-Demand

    Did you miss a session from VB Transform 2023? Register to access the on-demand library for all of our featured sessions.

     

    Register Now

    Today, the VB Transform award-winning company announced, at Google’s Cloud Next 23 conference, industry-first capabilities for optimizing the performance of LLMs deployed by enterprises, including a new “Prompt Playground” for selecting between and iterating on stored prompts designed for enterprises, and a new retrieval augmented generation (RAG) workflow to help organizations understand what data of theirs would be helpful to include in an LLMs responses.

    Almost a year ago, Arize debuted its initial platform in the Google Cloud Marketplace. Now it is augmenting its presence there with these powerful new features for its enterprise customers.

    Prompt Playground and new workflows

    Arize’s new prompt engineering workflows, including the Prompt Playground, enable teams to uncover poorly performing prompt templates, iterate on them in real time, and verify improved LLM outputs before deployment.

    Screenshot of Arize AI’s Prompt Playground tool. Credit: Arize AI

    Prompt analysis is an important but often overlooked part of troubleshooting an LLM’s performance, which can simply be boosted by testing different prompt templates or iterating on one for better responses.

    With these new workflows, teams can easily:

    • Uncover responses with poor user feedback or evaluation scores
    • Identify the underlying prompt template associated with poor responses
    • Iterate on the existing prompt template to improve coverage of edge cases
    • Compare responses across prompt templates in the Prompt Playground prior to implementation

    As Dhinakaran explained, prompt engineering is absolutely key to staying competitive with LLMs in the market today. The company’s new prompt analysis and iteration workflows help teams ensure their prompts cover necessary use cases and potential edge scenarios that may come up with real users.

    “You’ve got to make sure that the prompt you’re putting into your model is pretty damn good to stay competitive,” Dhinakaran said. “What we launched helps teams engineer better prompts for better performance. That’s as simple as it is: We help you focus on making sure that that prompt is performant and covers all of these cases that you need it to handle.”

    For example, prompts for an education LLM chatbot need to ensure no inappropriate responses, while customer service prompts should cover potential edge cases and nuances around services offered or not offered.

    Arize is also providing the industry’s first insights into the private or contextual data that influences LLM outputs — what Dhinakaran called the “secret sauce” companies provide. The company uniquely analyzes embeddings to evaluate the relevance of private data fused into prompts.

    “What we rolled out is a way for AI teams to now monitor, look at their prompts, make it better, and then also, specifically understand the private data that’s now being put into those those prompts, because the private data part makes sense,” Dhinakaran said.

    Dhinakaran told VentureBeat that enterprises can deploy its solutions on premises for security reasons, and that they are SOC-2 compliant.

    The importance of private organizational data

    These new capabilities enable examination of whether the right context is present in prompts to handle real user queries. Teams can identify areas where they may need to add more content around common questions lacking coverage in the current knowledge base.

    “No one else out there is really focusing on troubleshooting this private data, which is really like the secret sauce that companies have to influence the prompt,” Dhinakaran noted.

    Arize also launched complementary workflows using search and retrieval to help teams troubleshoot issues stemming from the retrieval component of RAG models.

    These workflows will empower teams to pinpoint where they may need to add additional context into their knowledge base, identify cases where retrieval failed to surface the most relevant information, and ultimately understand why their LLM may have hallucinated or generated suboptimal responses.

    Understanding context and relevance — and where they are lacking

    Dhinakaran gave an example of how Arize looks at query and knowledge base embeddings to uncover irrelevant retrieved documents that may have led to a faulty response.

    Screenshot of Arize AI’s embeddings analysis tool. Credit: Arize AI

    “You can click on, let’s say, a user question in our product, and it’ll show you all of the relevant documents that it could have pulled, and which one it did finally pull to actually use in the response,” Dhinakaran explained. Then “you can see where the model may have hallucinated or provided suboptimal responses based on deficiencies in the knowledge base.”

    This end-to-end observability and troubleshooting of prompts, private data, and retrieval is designed to help teams optimize LLMs responsibly after initial deployment, when models invariably struggle to handle real-world variability.

    Dhinakaran summarized Arize’s focus: “We’re not just a day one solution; we help you actually ongoing get it to work.”

    The company aims to provide the monitoring and debugging capabilities organizations are missing, so they can continuously improve their LLMs post-deployment. This allows them to move past theoretical value to real-world impact across industries.

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

    [ad_2]

    Source link

    Previous ArticleTypeface teams with GrowthLoop and Google Cloud to launch unified ‘GenAI Marketing Solution’
    Next Article Call of Duty partners with Modulate to use AI to fight toxicity in voice chat
    The AI Book

    Related Posts

    Daily AI News

    Adobe Previews New GenAI Tools for Video Workflows

    16 April 2024
    Daily AI News

    Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

    15 April 2024
    Daily AI News

    8 Reasons to Make the Switch

    15 April 2024
    Add A Comment

    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2025 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.