Gemini breaks new ground: a faster model, longer context and AI agents

We’re introducing a series of updates across the Gemini family of models, including the new 1.5 Flash, our lightweight model for speed and efficiency.

Web & Cloud

May 15, 2024 - 16:57

May 15, 2024 - 18:56

In December, we launched our first natively multimodal model Gemini 1.0 in three sizes: Ultra, Pro and Nano. Just a few months later we released 1.5 Pro, with enhanced performance and a breakthrough long context window of 1 million tokens.

Developers and enterprise customers have been putting 1.5 Pro to use in incredible ways and finding its long context window, multimodal reasoning capabilities, and impressive overall performance incredibly useful.

We know from user feedback that some applications need lower latency and a lower cost to serve. This inspired us to keep innovating, so today, we’re introducing Gemini 1.5 Flash: a model that’s lighter-weight than 1.5 Pro, and designed to be fast and efficient to serve at scale.

Both 1.5 Pro and 1.5 Flash are available in public preview with a 1 million token context window in Google AI Studio and Vertex AI. And now, 1.5 Pro is also available with a 2 million token context window via waitlist to developers using the API and to Google Cloud customers.

Sign up today to receive $350 in free credits and free usage of 20+ products on Google Cloud.

We’re also introducing updates across the Gemini family of models, announcing our next generation of open models, Gemma 2, and sharing progress on the future of AI assistants, with Project Astra.

Context lengths of leading foundation models compared with Gemini 1.5’s 2 million token capability

Gemini family of model updates

The new 1.5 Flash, optimized for speed and efficiency

1.5 Flash is the newest addition to the Gemini model family and the fastest Gemini model served in the API. It’s optimized for high-volume, high-frequency tasks at scale, is more cost-efficient to serve and features our breakthrough long context window.

While it’s a lighter weight model than 1.5 Pro, it’s highly capable of multimodal reasoning across vast amounts of information and delivers impressive quality for its size.

Illustration of icons and text explaining three key features of the new Gemini 1.5 Flash model: speed and efficiency, multimodal reasoning and a long context window.

The new Gemini 1.5 Flash model is optimized for speed and efficiency, is highly capable of multimodal reasoning and features our breakthrough long context window.

1.5 Flash excels at summarization, chat applications, image and video captioning, data extraction from long documents and tables, and more. This is because it’s been trained by 1.5 Pro through a process called “distillation,” where the most essential knowledge and skills from a larger model are transferred to a smaller, more efficient model.

Read more about 1.5 Flash on the Gemini technology page, and learn about 1.5 Flash’s availability and pricing. We’ll share more details in an updated Gemini 1.5 technical report soon.

Significantly improving 1.5 Pro

Over the last few months, we’ve significantly improved 1.5 Pro, our best model for general performance across a wide range of tasks.

Beyond extending its context window to 2 million tokens, we’ve enhanced its code generation, logical reasoning and planning, multi-turn conversation, and audio and image understanding through data and algorithmic advances. We see strong improvements on public and internal benchmarks for each of these tasks.

1.5 Pro can now follow increasingly complex and nuanced instructions, including ones that specify product-level behavior involving role, format and style. We’ve improved control over the model’s responses for specific use cases, like crafting the persona and response style of a chat agent or automating workflows through multiple function calls. And we’ve enabled users to steer model behavior by setting system instructions.

We added audio understanding in the Gemini API and Google AI Studio, so 1.5 Pro can now reason across image and audio for videos uploaded in Google AI Studio. And we’re now integrating 1.5 Pro into Google products, including Gemini Advanced and in Workspace apps.

Read more about 1.5 Pro on the Gemini technology page. More details are coming soon in our updated Gemini 1.5 technical report.

Gemini Nano understands multimodal inputs

Gemini Nano is expanding beyond text-only inputs to include images as well. Starting with Pixel, applications using Gemini Nano with Multimodality will be able to understand the world the way people do — not just through text, but also through sight, sound and spoken language.

Read more about Gemini 1.0 Nano on Android.

Next generation of open models

Today, we’re also sharing a series of updates to Gemma, our family of open models built from the same research and technology used to create the Gemini models.

We’re announcing Gemma 2, our next generation of open models for responsible AI innovation. Gemma 2 has a new architecture designed for breakthrough performance and efficiency, and will be available in new sizes.

The Gemma family is also expanding with PaliGemma, our first vision-language model inspired by PaLI-3. And we’ve upgraded our Responsible Generative AI Toolkit with LLM Comparator for evaluating the quality of model responses.

Progress developing universal AI agents

As part of Google DeepMind’s mission to build AI responsibly to benefit humanity, we’ve always wanted to develop universal AI agents that can be helpful in everyday life. That’s why today, we’re sharing our progress in building the future of AI assistants with Project Astra (advanced seeing and talking responsive agent).

To be truly useful, an agent needs to understand and respond to the complex and dynamic world just like people do — and take in and remember what it sees and hears to understand context and take action. It also needs to be proactive, teachable and personal, so users can talk to it naturally and without lag or delay.

While we’ve made incredible progress developing AI systems that can understand multimodal information, getting response time down to something conversational is a difficult engineering challenge. Over the past few years, we've been working to improve how our models perceive, reason and converse to make the pace and quality of interaction feel more natural.

Tags:

See how GPT-4o can revolutionize your world.

Web & Cloud Need help implementing innovative technology, with tech business management, or tech support/Helpdesk? Web & Cloud is here to take the heavy load off your shoulders! We’ve been serving the global market, offering top-notch tech implementation and support services since 2003. Request an obligation-free quote from My.webandcloud.com and let's discuss your challenges.

Sponsor to Give Hope, Transform, and Uplift Lives.

	Need help implementing innovative technology, with tech support or management? You can count on us.
	24-7 Press Release - Let's distribute your Press Releases to traditional and digital media outlets. Get started!
	Reliable Website Security Solutions, built for small businesses, web professionals, and enterprise organizations.
	Paternity Lab - bringing DNA Paternity Testing closer to people. We offer accurate, affordable, and easy DNA Paternity Testing. Also at home.
	Rexing USA - exclusive cameras, car gadgets, and EV accessories with unique designs, innovative technology, and in affordable price ranges.

The Rising Wave of Blockchain Technology Adop...

HackaTRON Season 7 Launches With Google Cloud...

Skybridge Founder: Kamala Harris Open-Minded ...

Auradine Ships 3nm Teraflux Bitcoin Mining Pl...

Wazirx Details Plan to Resume Withdrawals and...

Agentic AI Leaders to Showcase Latest Advance...

NVIDIA Releases NIM Microservices to Safeguar...

How AI Is Enhancing Surgical Safety and Educa...

NVIDIA and IQVIA Build Domain-Expert Agentic ...

AI Gets Real for Retailers: 9 Out of 10 Retai...

Alleged Co-Founder of Garantex Arrested in India

Feds Link $150M Cyberheist to 2022 LastPass H...

Who is the DOGE and X Technician Branden Spikes?

Notorious Malware, Spam Host “Prospero” Moves...

U.S. Soldier Charged in AT&T Hack Searched “C...

Gemini breaks new ground: a faster model, longer context and AI agents

We’re introducing a series of updates across the Gemini family of models, including the new 1.5 Flash, our lightweight model for speed and efficiency.

Gemini family of model updates

The new 1.5 Flash, optimized for speed and efficiency

Significantly improving 1.5 Pro

Gemini Nano understands multimodal inputs

Next generation of open models

Progress developing universal AI agents

Tags:

See how GPT-4o can revolutionize your world.

AiM Future Introduces Next-Generation NeuroMosAIc Processors

Marquis Who's Who Honors Moumine Ballo, PhD, for Expert...

Seamless in Seattle: NVIDIA Research Showcases Advancem...

June 2023 Patch Tuesday: 78 Vulnerabilities with 6 Rate...

Change language

SPONSORED

Recommended for you

Great Opportunity You Can't Reject! (No, Seriously...

Pause and let's talk about responsible spending an...

Experts Estimate £20 Million+ Loss from Heathrow A...

Welcome to ProtoPie

Ready to turn your innovative tech business dream ...

Gold Could Surge to $40,000 per Ounce, Strategist ...

Web & Cloud - Engineering Tech for a Better Tomorrow!

Introducing: Techatty Aerospace

Gemini breaks new ground: a faster model, longer context and AI agents

We’re introducing a series of updates across the Gemini family of models, including the new 1.5 Flash, our lightweight model for speed and efficiency.

Gemini family of model updates

The new 1.5 Flash, optimized for speed and efficiency

Significantly improving 1.5 Pro

Gemini Nano understands multimodal inputs

Next generation of open models

Progress developing universal AI agents

Tags:

Related Posts

Change language

SPONSORED

Recommended for you