Google Gemini API
  1. Model Capabilities
Google Gemini API
  • Get API key
  • Release notes
  • Libraries
  • Run Gemini on Google Cloud
  • Model Capabilities
    • Overview
    • Long context
    • Structured output
    • Document understanding
    • Image understanding
    • Video understanding
    • Audio understanding
    • Text generation
      • Text input
      • Image input
      • Streaming output
      • Multi-turn conversations
      • Multi-turn conversations (Streaming)
      • Configuration parameters
    • Generate images
      • Generate images using Gemini
      • Image editing with Gemini
      • Generate images using Imagen 3
    • Gemini thinking
      • Use thinking models
      • Set budget on thinking models
    • Function calling
      • Function Calling with the Gemini API
  • models
    • All Model
    • Pricing
    • Rate limits
    • Billing info
  • Safety
    • Safety settings
    • Safety guidance
  1. Model Capabilities

Overview

With the Gemini API, you can leverage Google's latest generative models. After getting familiar with the general functionalities offered by the API, try a quickstart in your preferred language to start developing.
Note: If you're new to generative AI models, visit the Concepts Guide, or start prototyping prompts in Google AI Studio.

Models#

Gemini is a family of multimodal generative AI models developed by Google. Gemini models can accept text and images (depending on the model variant you choose) in a prompt and output a text response. Legacy PaLM models accept plain text and output text responses.
For more detailed model information, see the Models page. You can also use the list_models method to list all available models, and then use the get_model method to get the metadata of a specific model.

Prompt Data and Design#

Specific Gemini models accept both image and text data as inputs. This capability provides many additional possibilities for content generation, data analysis, and problem-solving. You'll need to consider some limitations and requirements, including the general input token limits of the model you are using. Refer to Gemini models for model specific token limitations.

Image Requirements for Prompts#

Prompts using image data are subject to the following limitations and requirements:
Images must be one of the following image data MIME types:
PNG - image/png
JPEG - image/jpeg
WEBP - image/webp
HEIC - image/heic
HEIF - image/heif
Up to 16 images
The entire prompt (including images and text) may not exceed 4MB
There are no specific limits on the number of pixels in an image; however, larger images are scaled down to fit within a maximum resolution (3072 x 3072) while preserving their original aspect ratio.
When using images in prompts, follow these recommendations for the best results:
Prompts containing a single image tend to produce better results.

Prompt Design and Text Input#

Creating effective prompts (prompt engineering) is a combination of art and science. For guidance on how to prompt, refer to the Prompting Guide; for different prompting approaches, refer to the Prompt 101 guide.

Generate Content#

With the Gemini API, you can use text and image data for prompting, depending on the model variant you use. For example, you can use text prompts to generate text through the gemini-pro model, and prompt the gemini-pro-vision model with text and image data. This section provides simple code examples for each approach. For a detailed example covering all parameters, see the generateGenerate API reference documentation.

Embeddings#

The embeddings service in the Gemini API generates advanced embeddings for words, phrases, and sentences. The resulting embeddings can then be used for NLP tasks such as semantic search, text classification, and clustering, among others. See the Embeddings Guide for an introduction to what embeddings are and some key use cases for the embeddings service to help you get started.

Next Steps#

See the Google AI Studio Quickstart to get started with the Google AI Studio interface.
Check out the Python, Go, or Node.js quickstarts to try server-side access to the Gemini API.
Refer to the Web Quickstart to start building web applications.
Follow the Swift Quickstart or Android Quickstart to start building mobile applications.
If you're already a Google Cloud user (or are looking to use Gemini on Vertex to leverage the powerful Google Cloud ecosystem), see Generative AI on Vertex AI for details.
Previous
Run Gemini on Google Cloud
Next
Long context
Built with