본문 바로가기

Biusiness Insight/Gen AI · Data Analytics

[Google Cloud] Vertex AI Gemini API 및 Python SDK 실습

반응형

Vertex AI Gemini API

Vertex AI Gemini API는 여러 Gemini 모델과의 상호작용을 위한 통합 인터페이스를 제공함.

현재 Gemini API에서 사용할 수 있는 모델

  • Gemini 1.0 Pro 모델(gemini-1.0-pro): 자연어 작업, 멀티턴 텍스트 및 코드 채팅, 코드 생성을 처리하도록 설계된 모델
  • Gemini 1.0 Pro Vision 모델(gemini-1.0-pro-vision): 멀티모달 프롬프트를 지원하는 모델로, 프롬프트 요청에 텍스트, 이미지, 동영상을 포함하고 텍스트 또는 코드 형태의 대답을 얻을 수 있

 

Vertex AI Workbench 접속하기
 

1. Google Cloud 콘솔 접속 ☞ GCP Console

2. Google Cloud 프로젝트에서 Vertex AI Workbench 로 이동 후 JupyterLab 실행

 

 

Gemini 1.0 Pro 모델 및 Pro Vision 모델 사용하기

 

Overview

Gemini

Gemini is a family of generative AI models developed by Google DeepMind that is designed for multimodal use cases. The Gemini API gives you access to the Gemini Pro and Gemini Pro Vision models.

Vertex AI Gemini API

The Vertex AI Gemini API provides a unified interface for interacting with Gemini models. There are two Gemini 1.0 Pro models available in the Gemini API:

  • Gemini 1.0 Pro model (gemini-1.0-pro): Designed to handle natural language tasks, multi-turn text and code chat, and code generation.
  • Gemini 1.0 Pro Vision model (gemini-1.0-pro-vision): Supports multimodal prompts. You can include text, images, and video in your prompt requests and get text or code responses.

You can interact with the Gemini API using the following methods:

  • Use Vertex AI Studio for quick testing and command generation
  • Use cURL commands
  • Use the Vertex AI SDK

This notebook focuses on using the Vertex AI SDK for Python to call the Vertex AI Gemini API.

For more information, see the Generative AI on Vertex AI documentation.

Objectives

In this tutorial, you will learn how to use the Vertex AI Gemini API with the Vertex AI SDK for Python to interact with the Gemini 1.0 Pro (gemini-1.0-pro) model and the Gemini 1.0 Pro Vision (gemini-1.0-pro-vision) model.

You will complete the following tasks:

  • Install the Vertex AI SDK for Python
  • Use the Vertex AI Gemini API to interact with each model
    • Gemini 1.0 Pro (gemini-1.0-pro) model:
      • Generate text from text prompts
      • Explore various features and configuration options
    • Gemini 1.0 Pro Vision (gemini-1.0-pro-vision) model:
      • Generate text from image(s) and text prompts
      • Generate text from video and text prompts

Costs

This tutorial uses billable components of Google Cloud:

  • Vertex AI

Learn about Vertex AI pricing and use the Pricing Calculator to generate a cost estimate based on your projected usage.

Getting Started

Install Vertex AI SDK for Python

In [1]:
! pip3 install --upgrade --user google-cloud-aiplatform
Requirement already satisfied: google-cloud-aiplatform in /opt/conda/lib/python3.10/site-packages (1.58.0)
Collecting google-cloud-aiplatform
  Downloading google_cloud_aiplatform-1.60.0-py2.py3-none-any.whl.metadata (31 kB)
Requirement already satisfied: google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1 in /opt/conda/lib/python3.10/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform) (1.34.1)
Requirement already satisfied: google-auth<3.0.0dev,>=2.14.1 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (2.32.0)
Requirement already satisfied: proto-plus<2.0.0dev,>=1.22.3 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (1.24.0)
Requirement already satisfied: protobuf!=3.20.0,!=3.20.1,!=4.21.0,!=4.21.1,!=4.21.2,!=4.21.3,!=4.21.4,!=4.21.5,<5.0.0dev,>=3.19.5 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (3.20.3)
Requirement already satisfied: packaging>=14.3 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (24.1)
Requirement already satisfied: google-cloud-storage<3.0.0dev,>=1.32.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (2.14.0)
Requirement already satisfied: google-cloud-bigquery!=3.20.0,<4.0.0dev,>=1.15.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (3.25.0)
Requirement already satisfied: google-cloud-resource-manager<3.0.0dev,>=1.3.3 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (1.12.4)
Requirement already satisfied: shapely<3.0.0dev in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (2.0.4)
Requirement already satisfied: pydantic<3 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (2.8.2)
Requirement already satisfied: docstring-parser<1 in /opt/conda/lib/python3.10/site-packages (from google-cloud-aiplatform) (0.16)
Requirement already satisfied: googleapis-common-protos<2.0dev,>=1.56.2 in /opt/conda/lib/python3.10/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform) (1.63.2)
Requirement already satisfied: requests<3.0.0dev,>=2.18.0 in /opt/conda/lib/python3.10/site-packages (from google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform) (2.32.3)
Requirement already satisfied: grpcio<2.0dev,>=1.33.2 in /opt/conda/lib/python3.10/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform) (1.64.1)
Requirement already satisfied: grpcio-status<2.0dev,>=1.33.2 in /opt/conda/lib/python3.10/site-packages (from google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform) (1.48.2)
Requirement already satisfied: cachetools<6.0,>=2.0.0 in /opt/conda/lib/python3.10/site-packages (from google-auth<3.0.0dev,>=2.14.1->google-cloud-aiplatform) (5.3.3)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.10/site-packages (from google-auth<3.0.0dev,>=2.14.1->google-cloud-aiplatform) (0.4.0)
Requirement already satisfied: rsa<5,>=3.1.4 in /opt/conda/lib/python3.10/site-packages (from google-auth<3.0.0dev,>=2.14.1->google-cloud-aiplatform) (4.9)
Requirement already satisfied: google-cloud-core<3.0.0dev,>=1.6.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-bigquery!=3.20.0,<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.4.1)
Requirement already satisfied: google-resumable-media<3.0dev,>=0.6.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-bigquery!=3.20.0,<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.7.1)
Requirement already satisfied: python-dateutil<3.0dev,>=2.7.2 in /opt/conda/lib/python3.10/site-packages (from google-cloud-bigquery!=3.20.0,<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (2.9.0)
Requirement already satisfied: grpc-google-iam-v1<1.0.0dev,>=0.12.4 in /opt/conda/lib/python3.10/site-packages (from google-cloud-resource-manager<3.0.0dev,>=1.3.3->google-cloud-aiplatform) (0.13.1)
Requirement already satisfied: google-crc32c<2.0dev,>=1.0 in /opt/conda/lib/python3.10/site-packages (from google-cloud-storage<3.0.0dev,>=1.32.0->google-cloud-aiplatform) (1.5.0)
Requirement already satisfied: annotated-types>=0.4.0 in /opt/conda/lib/python3.10/site-packages (from pydantic<3->google-cloud-aiplatform) (0.7.0)
Requirement already satisfied: pydantic-core==2.20.1 in /opt/conda/lib/python3.10/site-packages (from pydantic<3->google-cloud-aiplatform) (2.20.1)
Requirement already satisfied: typing-extensions>=4.6.1 in /opt/conda/lib/python3.10/site-packages (from pydantic<3->google-cloud-aiplatform) (4.12.2)
Requirement already satisfied: numpy<3,>=1.14 in /opt/conda/lib/python3.10/site-packages (from shapely<3.0.0dev->google-cloud-aiplatform) (1.26.4)
Requirement already satisfied: pyasn1<0.7.0,>=0.4.6 in /opt/conda/lib/python3.10/site-packages (from pyasn1-modules>=0.2.1->google-auth<3.0.0dev,>=2.14.1->google-cloud-aiplatform) (0.6.0)
Requirement already satisfied: six>=1.5 in /opt/conda/lib/python3.10/site-packages (from python-dateutil<3.0dev,>=2.7.2->google-cloud-bigquery!=3.20.0,<4.0.0dev,>=1.15.0->google-cloud-aiplatform) (1.16.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform) (3.7)
Requirement already satisfied: urllib3<3,>=1.21.1 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform) (1.26.19)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.10/site-packages (from requests<3.0.0dev,>=2.18.0->google-api-core!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-api-core[grpc]!=2.0.*,!=2.1.*,!=2.2.*,!=2.3.*,!=2.4.*,!=2.5.*,!=2.6.*,!=2.7.*,<3.0.0dev,>=1.34.1->google-cloud-aiplatform) (2024.7.4)
Downloading google_cloud_aiplatform-1.60.0-py2.py3-none-any.whl (5.1 MB)
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 5.1/5.1 MB 29.6 MB/s eta 0:00:0000:0100:01
Installing collected packages: google-cloud-aiplatform
  WARNING: The script tb-gcp-uploader is installed in '/home/jupyter/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed google-cloud-aiplatform-1.60.0
 

Restart current runtime

To use the newly installed packages in this Jupyter runtime, you must restart the runtime. You can do this by running the cell below, which will restart the current kernel.

In [2]:
# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)
Out[2]:
{'status': 'ok', 'restart': True}
⚠️ The kernel is going to restart. Please wait until it is finished before continuing to the next step. ⚠️

Authenticate your notebook environment (Colab only)

If you are running this notebook on Google Colab, run the following cell to authenticate your environment. This step is not required if you are using Vertex AI Workbench.

In [1]:
import sys

# Additional authentication is required for Google Colab
if "google.colab" in sys.modules:
    # Authenticate user to Google Cloud
    from google.colab import auth

    auth.authenticate_user()

Set Google Cloud project information and initialize Vertex AI SDK

To get started using Vertex AI, you must have an existing Google Cloud project and enable the Vertex AI API.

Learn more about setting up a project and a development environment.

In [2]:
# Define project information
PROJECT_ID = "[your-project-id]"  # @param {type:"string"}
LOCATION = "[your-region]"  # @param {type:"string"}

# Initialize Vertex AI
import vertexai

vertexai.init(project=PROJECT_ID, location=LOCATION)

Import libraries

In [3]:
from vertexai.generative_models import GenerationConfig, GenerativeModel, Image, Part

 

Use the Gemini 1.0 Pro model

The Gemini 1.0 Pro (gemini-1.0-pro) model is designed to handle natural language tasks, multi-turn text and code chat, and code generation.

Load the Gemini 1.0 Pro model

In [4]:
model = GenerativeModel("gemini-1.0-pro")

Generate text from text prompts

Send a text prompt to the model. The Gemini 1.0 Pro (gemini-1.0-pro) model provides a streaming response mechanism. With this approach, you don't need to wait for the complete response; you can start processing fragments as soon as they're accessible.

In [5]:
responses = model.generate_content("Why is the sky blue?", stream=True)

for response in responses:
    print(response.text, end="")
The sky appears blue due to a phenomenon called Rayleigh scattering. This scattering occurs when sunlight enters the Earth's atmosphere and interacts with the tiny molecules of air, such as nitrogen and oxygen. These molecules are much smaller than the wavelength of visible light, which is the range of colors that our eyes can see. 

When sunlight hits these molecules, it is scattered in all directions. However, shorter wavelengths of light, such as blue and violet, are scattered more strongly than longer wavelengths, such as red and orange. This is why we see the blue light scattered across the sky, while the longer wavelengths pass through the atmosphere and reach our eyes directly from the sun.

Here's a breakdown of the process:

1. **Sunlight enters the Earth's atmosphere:** Sunlight is composed of all colors of the visible spectrum, including red, orange, yellow, green, blue, indigo, and violet.
2. **Interaction with air molecules:** As sunlight travels through the atmosphere, it encounters air molecules, primarily nitrogen and oxygen.
3. **Rayleigh scattering:** These molecules scatter the sunlight in all directions. However, shorter wavelengths (blue and violet) are scattered more effectively than longer wavelengths (red and orange).
4. **Blue light scattered across the sky:** The scattered blue light reaches our eyes from all directions in the sky, creating the appearance of a blue sky.
5. **Longer wavelengths pass through:** The longer wavelengths (red and orange) are scattered less and pass through the atmosphere more directly. This is why we see the sun as yellow or orange, as the red and orange light are not scattered as much as the blue light.

It's important to note that the sky doesn't actually have a blue color. It's the scattering of sunlight by air molecules that creates the illusion of a blue sky. 

Here are some additional factors that can affect the color of the sky:

* **Time of day:** The sky appears more blue at midday when the sun is high in the sky and the sunlight has to travel through less atmosphere. At sunrise and sunset, the sunlight travels through more atmosphere, scattering more of the blue light and allowing the longer wavelengths, such as red and orange, to reach our eyes. This is why we see beautiful red and orange colors during sunrise and sunset.
* **Altitude:** The higher you go, the less atmosphere there is to scatter the light. This is why the sky appears darker and more blue at higher altitudes.
* **Pollution:** Pollution particles in the atmosphere can scatter light differently, sometimes making the sky appear hazy or even reddish.

I hope this explanation helps! Let me know if you have any other questions.
 

Try your own prompts

  • What are the biggest challenges facing the healthcare industry?
  • What are the latest developments in the automotive industry?
  • What are the biggest opportunities in retail industry?
  • (Try your own prompts!)
In [6]:
prompt = """Create a numbered list of 10 items. Each item in the list should be a trend in the tech industry.

Each trend should be less than 5 words."""  # try your own prompt

responses = model.generate_content(prompt, stream=True)

for response in responses:
    print(response.text, end="")
## 10 Tech Industry Trends:

1. AI Everywhere
2. Metaverse Mania
3. Smarter Automation
4. Quantum Computing
5. Cybersecurity Renaissance
6. Green Tech Explosion
7. Hyper-Personalization
8. The Rise of No-Code
9. Democratized Space Tech
10. Biohacking Breakthroughs

Model parameters

Every prompt you send to the model includes parameter values that control how the model generates a response. The model can generate different results for different parameter values. You can experiment with different model parameters to see how the results change.

In [7]:
generation_config = GenerationConfig(
    temperature=0.9,
    top_p=1.0,
    top_k=32,
    candidate_count=1,
    max_output_tokens=8192,
)

responses = model.generate_content(
    "Why is the sky blue?",
    generation_config=generation_config,
    stream=True,
)

for response in responses:
    print(response.text, end="")
The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight, which is composed of all colors of the rainbow, travels through Earth's atmosphere. The shorter wavelengths of light, such as blue and violet, are scattered more strongly by the air molecules than the longer wavelengths, like red and orange. This scattering is what causes the blue light to reach our eyes from all directions in the sky, making it appear blue.

Here's a more detailed explanation:

1. **Sunlight:** Sunlight is a mixture of different colors, each with its own wavelength. Blue light has a shorter wavelength than red light.
2. **Atmosphere:** Earth has an atmosphere, which is a layer of gas that surrounds the planet. The air molecules in the atmosphere are constantly moving and colliding with each other.
3. **Rayleigh scattering:** When sunlight enters the atmosphere, the air molecules scatter the light in all directions. This scattering is called Rayleigh scattering.
4. **Blue light scattered more strongly:** The shorter wavelengths of light, like blue, are scattered more strongly by the air molecules than the longer wavelengths, like red.
5. **Blue sky:** As a result of Rayleigh scattering, we see blue light coming from all directions in the sky. This makes the sky appear blue.

Additionally, the blueness of the sky can vary depending on several factors, such as the time of day, the amount of pollution in the air, and the observer's altitude. For example, the sky often appears more blue at midday when the Sun is high in the sky and the sunlight has to travel through less atmosphere.

Test chat prompts

The Gemini 1.0 Pro model supports natural multi-turn conversations and is ideal for text tasks that require back-and-forth interactions. The following examples show how the model responds during a multi-turn conversation.

In [9]:
chat = model.start_chat()

prompt = """My name is Ned. You are my personal assistant. My favorite movies are Lord of the Rings and Hobbit.

Suggest another movie I might like.
"""

responses = chat.send_message(prompt, stream=True)

for response in responses:
    print(response.text, end="")
Hi Ned, It's great to meet you! 

Based on your enjoyment of "Lord of the Rings" and "The Hobbit," I would recommend checking out "The Chronicles of Narnia: The Lion, the Witch, and the Wardrobe." This movie shares many similarities with your favorite films, including:

* **Epic fantasy setting:** Both "The Chronicles of Narnia" and "The Lord of the Rings" feature sprawling, magical worlds filled with mythical creatures and breathtaking landscapes.
* **Good vs. evil storyline:** The films center around a battle between the forces of good and evil, with brave heroes fighting to overcome powerful villains.
* **Themes of friendship and courage:** The characters in both films learn the importance of friendship, loyalty, and bravery in the face of adversity.
* **Visually stunning effects:** Both films boast impressive special effects and cinematography that bring the fantasy worlds to life.

If you enjoyed the immersive world-building, epic battles, and heartwarming themes of "Lord of the Rings" and "The Hobbit," "The Chronicles of Narnia" is sure to captivate you as well. Let me know what you think! 

This follow-up prompt shows how the model responds based on the previous prompt:

In [10]:
prompt = "Are my favorite movies based on a book series?"

responses = chat.send_message(prompt, stream=True)

for response in responses:
    print(response.text, end="")
That's right! Both "The Lord of the Rings" and "The Hobbit" are based on book series written by J.R.R. Tolkien. "The Lord of the Rings" is a trilogy consisting of "The Fellowship of the Ring," "The Two Towers," and "The Return of the King." "The Hobbit" is a single book that serves as a prequel to "The Lord of the Rings." Both series are considered classics of fantasy literature and have been enjoyed by millions of readers around the world.

If you enjoyed the movies, I highly recommend checking out the books. They provide a much deeper dive into the characters, story, and world of Middle-earth. You'll discover additional details, plot points, and character motivations that weren't included in the films. 

Would you like me to find you links to purchase the books or audiobooks? 

You can also view the chat history:

In [11]:
print(chat.history)
[role: "user"
parts {
  text: "My name is Ned. You are my personal assistant. My favorite movies are Lord of the Rings and Hobbit.\n\nSuggest another movie I might like.\n"
}
, role: "model"
parts {
  text: "Hi Ned, It\'s great to meet you! \n\nBased on your enjoyment of \"Lord of the Rings\" and \"The Hobbit,\" I would recommend checking out \"The Chronicles of Narnia: The Lion, the Witch, and the Wardrobe.\" This movie shares many similarities with your favorite films, including:\n\n* **Epic fantasy setting:** Both \"The Chronicles of Narnia\" and \"The Lord of the Rings\" feature sprawling, magical worlds filled with mythical creatures and breathtaking landscapes.\n* **Good vs. evil storyline:** The films center around a battle between the forces of good and evil, with brave heroes fighting to overcome powerful villains.\n* **Themes of friendship and courage:** The characters in both films learn the importance of friendship, loyalty, and bravery in the face of adversity.\n* **Visually stunning effects:** Both films boast impressive special effects and cinematography that bring the fantasy worlds to life.\n\nIf you enjoyed the immersive world-building, epic battles, and heartwarming themes of \"Lord of the Rings\" and \"The Hobbit,\" \"The Chronicles of Narnia\" is sure to captivate you as well. Let me know what you think! \n"
}
, role: "user"
parts {
  text: "Are my favorite movies based on a book series?"
}
, role: "model"
parts {
  text: "That\'s right! Both \"The Lord of the Rings\" and \"The Hobbit\" are based on book series written by J.R.R. Tolkien. \"The Lord of the Rings\" is a trilogy consisting of \"The Fellowship of the Ring,\" \"The Two Towers,\" and \"The Return of the King.\" \"The Hobbit\" is a single book that serves as a prequel to \"The Lord of the Rings.\" Both series are considered classics of fantasy literature and have been enjoyed by millions of readers around the world.\n\nIf you enjoyed the movies, I highly recommend checking out the books. They provide a much deeper dive into the characters, story, and world of Middle-earth. You\'ll discover additional details, plot points, and character motivations that weren\'t included in the films. \n\nWould you like me to find you links to purchase the books or audiobooks? \n"
}
]

 

Use the Gemini 1.0 Pro Vision model

Gemini 1.0 Pro Vision (gemini-1.0-pro-vision) is a multimodal model that supports multimodal prompts. You can include text, image(s), and video in your prompt requests and get text or code responses.

Load the Gemini 1.0 Pro Vision model

In [12]:
multimodal_model = GenerativeModel("gemini-1.0-pro-vision")

Define helper functions

Define helper functions to load and display images.

In [13]:
import http.client
import typing
import urllib.request

import IPython.display
from PIL import Image as PIL_Image
from PIL import ImageOps as PIL_ImageOps


def display_images(
    images: typing.Iterable[Image],
    max_width: int = 600,
    max_height: int = 350,
) -> None:
    for image in images:
        pil_image = typing.cast(PIL_Image.Image, image._pil_image)
        if pil_image.mode != "RGB":
            # RGB is supported by all Jupyter environments (e.g. RGBA is not yet)
            pil_image = pil_image.convert("RGB")
        image_width, image_height = pil_image.size
        if max_width < image_width or max_height < image_height:
            # Resize to display a smaller notebook image
            pil_image = PIL_ImageOps.contain(pil_image, (max_width, max_height))
        IPython.display.display(pil_image)


def get_image_bytes_from_url(image_url: str) -> bytes:
    with urllib.request.urlopen(image_url) as response:
        response = typing.cast(http.client.HTTPResponse, response)
        image_bytes = response.read()
    return image_bytes


def load_image_from_url(image_url: str) -> Image:
    image_bytes = get_image_bytes_from_url(image_url)
    return Image.from_bytes(image_bytes)


def get_url_from_gcs(gcs_uri: str) -> str:
    # converts gcs uri to url for image display.
    url = "https://storage.googleapis.com/" + gcs_uri.replace("gs://", "").replace(
        " ", "%20"
    )
    return url


def print_multimodal_prompt(contents: list):
    """
    Given contents that would be sent to Gemini,
    output the full multimodal prompt for ease of readability.
    """
    for content in contents:
        if isinstance(content, Image):
            display_images([content])
        elif isinstance(content, Part):
            url = get_url_from_gcs(content.file_data.file_uri)
            IPython.display.display(load_image_from_url(url))
        else:
            print(content)

Generate text from local image and text

Use the Image.load_from_file method to load a local file as the image to generate text for.

In [14]:
# Download an image from Google Cloud Storage
! gsutil cp "gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg" ./image.jpg

# Load from local file
image = Image.load_from_file("image.jpg")

# Prepare contents
prompt = "Describe this image?"
contents = [image, prompt]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")
Copying gs://cloud-samples-data/generative-ai/image/320px-Felis_catus-cat_on_snow.jpg...
/ [1 files][ 17.4 KiB/ 17.4 KiB]                                                
Operation completed over 1 objects/17.4 KiB.                                     
-------Prompt--------

Describe this image?
-------Response--------
 The image shows a gray cat with black stripes walking in the snow. The cat has yellow eyes and a long tail. The cat is walking towards the camera.

Generate text from text & image(s)

Images with Cloud Storage URIs

If your images are stored in Cloud Storage, you can specify the Cloud Storage URI of the image to include in the prompt. You must also specify the mime_type field. The supported MIME types for images include image/png and image/jpeg.

Note that the URI (not to be confused with URL) for a Cloud Storage object should always start with gs://.

In [15]:
# Load image from Cloud Storage URI
gcs_uri = "gs://cloud-samples-data/generative-ai/image/boats.jpeg"

# Prepare contents
image = Part.from_uri(gcs_uri, mime_type="image/jpeg")
prompt = "Describe the scene?"
contents = [image, prompt]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")
-------Prompt--------
Describe the scene?

-------Response--------
 Two pontoon boats are anchored in the Charles River in Boston, Massachusetts. In the background are two bridges and the Boston skyline.

 

You can also use direct links to images, as shown below. The helper function load_image_from_url() (that was declared earlier) converts the image to bytes and returns it as an Image object that can be then be sent to Gemini 1.0 Pro Vision along with the text prompt.

In [16]:
# Load image from Cloud Storage URI
image_url = (
    "https://storage.googleapis.com/cloud-samples-data/generative-ai/image/boats.jpeg"
)
image = load_image_from_url(image_url)  # convert to bytes

# Prepare contents
prompt = "Describe the scene?"
contents = [image, prompt]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")

-------Prompt--------

Describe the scene?

-------Response--------
 Two pontoon boats are anchored in the Charles River in Boston, Massachusetts. In the background is the Longfellow Bridge and the Boston skyline.

 

 

Combining multiple images and text prompts for few-shot prompting

You can send more than one image at a time, and also place your images anywhere alongside your text prompt.

In the example below, few-shot prompting is performed to have Gemini 1.0 Pro Vision return the city and landmark in a specific JSON format.

In [17]:
# Load images from Cloud Storage URI
image1_url = "https://storage.googleapis.com/github-repo/img/gemini/intro/landmark1.jpg"
image2_url = "https://storage.googleapis.com/github-repo/img/gemini/intro/landmark2.jpg"
image3_url = "https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg"
image1 = load_image_from_url(image1_url)
image2 = load_image_from_url(image2_url)
image3 = load_image_from_url(image3_url)

# Prepare prompts
prompt1 = """{"city": "London", "Landmark:", "Big Ben"}"""
prompt2 = """{"city": "Paris", "Landmark:", "Eiffel Tower"}"""

# Prepare contents
contents = [image1, prompt1, image2, prompt2, image3]

responses = multimodal_model.generate_content(contents, stream=True)

print("-------Prompt--------")
print_multimodal_prompt(contents)

print("\n-------Response--------")
for response in responses:
    print(response.text, end="")

-------Prompt--------

{"city": "London", "Landmark:", "Big Ben"}

{"city": "Paris", "Landmark:", "Eiffel Tower"}

-------Response--------
 {"city": "Rome", "Landmark:", "Colosseum"}

Generate text from a video file

Specify the Cloud Storage URI of the video to include in the prompt. The bucket that stores the file must be in the same Google Cloud project that's sending the request. You must also specify the mime_type field. The supported MIME type for video includes video/mp4.

In [18]:
file_path = "github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4"
video_uri = f"gs://{file_path}"
video_url = f"https://storage.googleapis.com/{file_path}"

IPython.display.Video(video_url, width=450)
Out[18]:
In [19]:
prompt = """
Answer the following questions using the video only:
What is the profession of the main person?
What are the main features of the phone highlighted?
Which city was this recorded in?
Provide the answer JSON.
"""

video = Part.from_uri(video_uri, mime_type="video/mp4")
contents = [prompt, video]

responses = multimodal_model.generate_content(contents, stream=True)

for response in responses:
    print(response.text, end="")
 ```json
{
  "person": {
    "name": "Saeka Shimada",
    "profession": "photographer"
  },
  "phone": {
    "brand": "Google",
    "model": "Pixel 8",
    "features": ["Video Boost", "Night Sight"]
  },
  "city": "Tokyo",
  "country": "Japan"
}

 

(source : Google Cloud)

반응형