Close Menu
The AI Book
    Facebook X (Twitter) Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook X (Twitter) Instagram
    The AI Book
    Daily AI News

    Why GPT-4 is vulnerable to multimodal prompt injection image attacks

    24 October 2023No Comments5 Mins Read

    [ad_1]

    VentureBeat presents: AI Unleashed – An exclusive executive event for enterprise data leaders. Network and learn with industry peers. Learn More


    OpenAI’s new GPT-4V release supports image uploads — creating a whole new attack vector making large language models (LLMs) vulnerable to multimodal injection image attacks. Attackers can embed commands, malicious scripts and code in images, and the model will comply. 

    Multimodal prompt injection image attacks can exfiltrate data, redirect queries, create misinformation and perform more complex scripts to redefine how an LLM interprets data. They can redirect an LLM to ignore its previous safety guardrails and perform commands that can compromise an organization in ways from fraud to operational sabotage.    

    While all businesses that have adopted LLMs as part of their workflows are at risk, those that rely on LLMs to analyze and classify images as a core part of their business have the greatest exposure. Attackers using various techniques could quickly change how images are interpreted and classified, creating more chaotic outcomes due to misinformation. 

    Once an LLM’s prompt is overridden, the chances become greater that it will be even more blind to malicious commands and execution scripts. By embedding commands in a series of images uploaded to an LLM, attackers could launch fraud and operational sabotage while contributing to social engineering attacks. 

    Event

    AI Unleashed

    An exclusive invite-only evening of insights and networking, designed for senior enterprise executives overseeing data stacks and strategies.

     

    Learn More

    Images are an attack vector LLMs can’t defend against 

    Because LLMs don’t have a data sanitization step in their processing, every image is trusted. Just as it is dangerous to let identities roam free on a network with no access controls for each data set, application or resource, the same holds for images uploaded into LLMs. Enterprises with private LLMs must adopt least privilege access as a core cybersecurity strategy.

    Simon Willison detailed why GPT-4V is a primary vector for prompt injection attacks in a recent blog post, observing that LLMs are fundamentally gullible.

    “(LLMs’) only source of information is their training data combined with the information you feed them,” Willison writes. “If you feed them a prompt that includes malicious instructions — however those instructions are presented — they will follow those instructions.”

    Willison has also shown how prompt injection can hijack autonomous AI agents like Auto-GPT. He explained how a simple visual prompt injection could start with commands embedded in a single image, followed by an example of a visual prompt injection exfiltration attack. 

    According to Paul Ekwere, senior manager for data analytics and AI at BDO UK, “prompt injection attacks pose a serious threat to the security and reliability of LLMs, especially vision-based models that process images or videos. These models are widely used in various domains, such as face recognition, autonomous driving, medical diagnosis and surveillance.”

    OpenAI doesn’t yet have a solution for shutting down multimodal prompt injection image attacks — users and enterprises are on their own. An Nvidia Developer blog post provides prescriptive guidance, including enforcing least privilege access to all data stores and systems.

    How multimodal prompt injection image attacks work

    Multimodal prompt injection attacks exploit the gaps in how GPT-4V processes visual imagery to execute malicious commands that go undetected. GPT-4V relies on a vision transformer encoder to convert an image into a latent space representation. The image and text data are combined to create a response. 

    The model has no method to sanitize visual input before it’s encoded. Attackers could embed as many commands as they want and GPT-4 would see them as legitimate. Attackers automating a multimodal prompt injection attack against private LLMs would go unnoticed.

    Containing injection image attacks

    What’s troubling about images as an unprotected attack vector is that attackers could render the data LLMs train to be less credible and have lower fidelity over time.  

    A recent study provides guidelines on how LLMs can better protect themselves against prompt injection attacks. Looking to identify the extent of risks and potential solutions, a team of researchers sought to determine how effective attacks are at penetrating LLM-integrated applications, and it is noteworthy for its methodology. The team found that 31 LLM-integrated applications are vulnerable to injection.

    The study made the following recommendations for containing injection image attacks:

    Improve the sanitation and validation of user inputs

    For enterprises standardizing on private LLMs, identity-access management (IAM) and least privilege access are table stakes. LLM providers need to consider how image data can be more sanitized before passing them along for processing. 

    Improve the platform architecture and separate user input from system logic

    The goal should be to remove the risk of user input directly affecting the code and data of an LLM. Any image prompt needs to be processed so that it doesn’t impact internal logic or workflows.  

    Adopt a multi-stage processing workflow to identify malicious attacks

    Creating a multi-stage process to trap image-based attacks early can help manage this threat vector.  

    Custom defense prompts that target jailbreaking

    Jailbreaking is a common prompt engineering technique to misdirect LLMs to perform illegal behaviors. Appending prompts to image inputs that appear malicious can help protect LLMs. Researchers caution, however, that advanced attacks could still bypass this approach. 

    A fast-growing threat

    With more LLMs becoming multimodal, images are becoming the newest threat vector attackers can rely on to bypass and redefine guardrails. Image-based attacks could range in severity from simple commands to more complex attack scenarios where industrial sabotage and widespread misinformation are the goal. 

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

    [ad_2]

    Source link

    Previous ArticleGoogle Bard appears to be censoring Israel-Palestine responses
    Next Article Amazon’s AI-Powered Van Inspections Give It a Powerful New Data Feed
    The AI Book

    Related Posts

    Daily AI News

    Adobe Previews New GenAI Tools for Video Workflows

    16 April 2024
    Daily AI News

    Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

    15 April 2024
    Daily AI News

    8 Reasons to Make the Switch

    15 April 2024
    Add A Comment
    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2026 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.