The AI Book
    Facebook Twitter Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook Twitter Instagram
    The AI Book
    Daily AI News

    The GAIA benchmark: Next-gen AI faces off against real-world challenges

    27 November 2023No Comments4 Mins Read

    [ad_1]

    Are you ready to bring more awareness to your brand? Consider becoming a sponsor for The AI Impact Tour. Learn more about the opportunities here.


    A new artificial intelligence benchmark called GAIA aims to evaluate whether chatbots like ChatGPT can demonstrate human-like reasoning and competence on everyday tasks. 

    Created by researchers from Meta, Hugging Face, AutoGPT and GenAI, the benchmark “proposes real-world questions that require a set of fundamental abilities such as reasoning, multi-modality handling, web browsing, and generally tool-use proficiency,” the researchers wrote in a paper published on arXiv.

    The researchers said GAIA questions are “conceptually simple for humans yet challenging for most advanced AIs.” They tested the benchmark on human respondents and GPT-4, finding that humans scored 92 percent while GPT-4 with plugins scored only 15 percent.

    credit: arxiv.org

    “This notable performance disparity contrasts with the recent trend of LLMs [large language models] outperforming humans on tasks requiring professional skills in e.g. law or chemistry,” the paper states.

    VB Event

    The AI Impact Tour

    Connect with the enterprise AI community at VentureBeat’s AI Impact Tour coming to a city near you!

     

    Learn More

    GAIA focuses on human-like competence, not expertise 

    Rather than focusing on tasks difficult for humans, the researchers suggest benchmarks should target tasks that demonstrate an AI system has similar robustness to the average human.

    The GAIA methodology led the researchers to devise 466 real-world questions with unambiguous answers. Three-hundred answers are being held privately to power a public GAIA leaderboard, while 166 questions and answers were released as a development set.

    “Solving GAIA would represent a milestone in AI research,” said lead author Grégoire Mialon of Meta AI. “We believe the successful resolution of GAIA would be an important milestone towards the next generation of AI systems.”

    credit: arxiv.org

    The human vs. AI performance gap

    So far, the leading GAIA score belongs to GPT-4 with manually selected plugins, at 30% accuracy. The benchmark creators said a system that solves GAIA could be considered an artificial general intelligence within a reasonable timeframe.

    “Tasks that are difficult for humans are not necessarily difficult for recent systems,” the paper states, critiquing the common practice of testing AIs on complex math, science and law exams. 

    Instead, GAIA focuses on questions like, “Which city hosted the 2022 Eurovision Song Contest according to the official website?” and “How many images are there in the latest 2022 Lego Wikipedia article?”

    “We posit that the advent of Artificial General Intelligence (AGI) hinges on a system’s capability to exhibit similar robustness as the average human does on such questions,” the researchers wrote.

    GAIA could shape the future trajectory of AI 

    The release of GAIA represents an exciting new direction for AI research that could have broad implications. By focusing on human-like competence at everyday tasks rather than specialized expertise, GAIA pushes the field beyond more narrow AI benchmarks.

    If future systems can demonstrate human-level common sense, adaptability and reasoning as measured by GAIA, it suggests they will have achieved artificial general intelligence (AGI) in a practical sense. This could accelerate deployment of AI assistants, services and products.

    However, the authors caution that today’s chatbots still have a long way to go to solve GAIA. Their performance shows current limitations in reasoning, tool use and handling diverse real-world situations.

    As researchers rise to the GAIA challenge, their results will reveal progress in making AI systems more capable, general and trustworthy. But benchmarks like GAIA also lead to reflection on how to shape AI that benefits humanity.

    “We believe the successful resolution of GAIA would be an important milestone towards the next generation of AI systems,” the researchers wrote. So in addition to driving technical advances, GAIA could help guide AI in a direction that emphasizes shared human values like empathy, creativity and ethical judgment.

    You can view the GAIA benchmark leaderboard right here to see which next-generation LLM is currently performing the best at this evaluation.

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

    [ad_2]

    Source link

    Previous ArticleOpenAI, Q* and the anxiety of AI hype | The AI Beat 
    Next Article Elon Musk’s xAI will launch ‘Grok’ chatbot this week: what to expect
    The AI Book

    Related Posts

    Daily AI News

    Adobe Previews New GenAI Tools for Video Workflows

    16 April 2024
    Daily AI News

    Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

    15 April 2024
    Daily AI News

    8 Reasons to Make the Switch

    15 April 2024
    Add A Comment

    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2025 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.