The AI Book
    Facebook Twitter Instagram
    The AI BookThe AI Book
    • Home
    • Categories
      • AI Media Processing
      • AI Language processing (NLP)
      • AI Marketing
      • AI Business Applications
    • Guides
    • Contact
    Subscribe
    Facebook Twitter Instagram
    The AI Book
    Daily AI News

    Stable Diffusion 3.0 debuts new diffusion transformation architecture to reinvent text-to-image gen AI

    22 February 2024No Comments4 Mins Read

    [ad_1]

    Stability AI is out today with an early preview of its Stable Diffusion 3.0 next-generation flagship text-to-image generative AI model. 

    Stability AI has been steadily iterating and releasing multiple image models over the past year, each showing increasing levels of sophistication and quality. The SDXL release in July dramatically improved the Stable Diffusion base model and now the company is looking to go significantly further.

    The new Stable Diffusion 3.0 model aims to provide improved image quality and better performance in generating images from multi-subject prompts. It will also provide significantly better typography than prior Stable Diffusion models enabling more accurate and consistent spelling inside of generated images. Typography has been an area of weakness for Stable Diffusion in the past and one that rivals including DALL-E 3, Ideogram and Midjourney have also been working on with recent releases. Stability AI is building out Stable Diffusion 3.0 in multiple model sizes ranging from 800M to 8B parameters.

    Stable Diffusion 3.0 isn’t just a new version of a model that Stability AI has already released, it’s actually based on a new architecture.

    VB Event

    The AI Impact Tour – NYC

    We’ll be in New York on February 29 in partnership with Microsoft to discuss how to balance risks and rewards of AI applications. Request an invite to the exclusive event below.

     

    Request an invite

    “Stable Diffusion 3 is a diffusion transformer, a new type of architecture similar to the one used in the recent OpenAI Sora model,” Emad Mostaque, CEO of Stability AI told VentureBeat. “It is the real successor to the original Stable Diffusion.”

    Diffusion transformers and flow matching will enable a new era of image generation

    Stability AI has been experimenting with multiple types of approaches for generating images.

    Earlier this month the company released a preview of Stable Cascade that uses the Würstchen architecture to improve performance and accuracy. Stable Diffusion 3.0 is taking a different approach by using diffusion transformers.

    “Stable Diffusion did not have a transformer before,” Mostaque said.

    Transformers are at the foundation of much of the gen AI revolution and are widely used as the basis of text generation models. Image generation has largely been in the realm of diffusion models. The research paper that details Diffusion Transformers (DiTs), explains that it is a new architecture for diffusion models that replaces the commonly used U-Net backbone with a transformer operating on latent image patches. The DiTs approach can use compute more efficiently and can outperform other forms of diffusion image generation.

    The other big innovation that Stable Diffusion benefits from is flow matching. The research paper on flow matching explains that it is a new method for training Continuous Normalizing Flows (CNFs) to model complex data distributions. According to the researchers, using Conditional Flow Matching (CFM) with optimal transport paths leads to faster training, more efficient sampling, and better performance compared to diffusion paths.

    Credit: Stability AI (generated with Stable Diffusion 3.0)

    Stable Diffusion has learned how to spell

    The improved typography in Stable Diffusion 3.0 is the result of several improvements that Stability AI has built into the new model.

    “This is thanks to both the transformer architecture and additional text encoders,” Mostaque said. “Full sentences are now possible as is coherent style.” 

    While Stable Diffusion 3.0 is initially being demonstrated as a text-to-image gen AI technology, it will be the basis for much more. Stability AI has also been building out 3D image generation as well as video generation capabilities in recent months.

    “We make open models that can be used anywhere and adapted to any need,” Mostaque said. “This is a series of models across sizes and will underpin the development of our next generation visual models, including video, 3D, and more.”

    VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative enterprise technology and transact. Discover our Briefings.

    [ad_2]

    Source link

    Previous Article17 Tips to Take Your ChatGPT Prompts to the Next Level
    Next Article Get ready for the age of sovereign AI | Jensen Huang interview
    The AI Book

    Related Posts

    Daily AI News

    Adobe Previews New GenAI Tools for Video Workflows

    16 April 2024
    Daily AI News

    Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

    15 April 2024
    Daily AI News

    8 Reasons to Make the Switch

    15 April 2024
    Add A Comment

    Leave A Reply Cancel Reply

    • Privacy Policy
    • Terms and Conditions
    • About Us
    • Contact Form
    © 2025 The AI Book.

    Type above and press Enter to search. Press Esc to cancel.