Overview

With the Gemini API, you can leverage Google's latest generative models. After getting familiar with the general functionalities offered by the API, try a quickstart in your preferred language to start developing.

Note: If you're new to generative AI models, visit the Concepts Guide, or start prototyping prompts in Google AI Studio.

Models

Gemini is a family of multimodal generative AI models developed by Google. Gemini models can accept text and images (depending on the model variant you choose) in a prompt and output a text response. Legacy PaLM models accept plain text and output text responses.

For more detailed model information, see the Models page. You can also use the list_models method to list all available models, and then use the get_model method to get the metadata of a specific model.

Prompt Data and Design

Specific Gemini models accept both image and text data as inputs. This capability provides many additional possibilities for content generation, data analysis, and problem-solving. You'll need to consider some limitations and requirements, including the general input token limits of the model you are using. Refer to Gemini models for model specific token limitations.

Image Requirements for Prompts

Prompts using image data are subject to the following limitations and requirements:

Images must be one of the following image data MIME types:

PNG - image/png

JPEG - image/jpeg

WEBP - image/webp

HEIC - image/heic

HEIF - image/heif

Up to 16 images

The entire prompt (including images and text) may not exceed 4MB

There are no specific limits on the number of pixels in an image; however, larger images are scaled down to fit within a maximum resolution (3072 x 3072) while preserving their original aspect ratio.

When using images in prompts, follow these recommendations for the best results:

Prompts containing a single image tend to produce better results.

Prompt Design and Text Input

Creating effective prompts (prompt engineering) is a combination of art and science. For guidance on how to prompt, refer to the Prompting Guide; for different prompting approaches, refer to the Prompt 101 guide.

Generate Content

With the Gemini API, you can use text and image data for prompting, depending on the model variant you use. For example, you can use text prompts to generate text through the gemini-pro model, and prompt the gemini-pro-vision model with text and image data. This section provides simple code examples for each approach. For a detailed example covering all parameters, see the generateGenerate API reference documentation.

Embeddings

The embeddings service in the Gemini API generates advanced embeddings for words, phrases, and sentences. The resulting embeddings can then be used for NLP tasks such as semantic search, text classification, and clustering, among others. See the Embeddings Guide for an introduction to what embeddings are and some key use cases for the embeddings service to help you get started.

Next Steps

See the Google AI Studio Quickstart to get started with the Google AI Studio interface.

Check out the Python, Go, or Node.js quickstarts to try server-side access to the Gemini API.

Refer to the Web Quickstart to start building web applications.

Follow the Swift Quickstart or Android Quickstart to start building mobile applications.

If you're already a Google Cloud user (or are looking to use Gemini on Vertex to leverage the powerful Google Cloud ecosystem), see Generative AI on Vertex AI for details.

Models#

Prompt Data and Design#

Image Requirements for Prompts#

Prompt Design and Text Input#

Generate Content#

Embeddings#