VertexAI [Anthropic, Gemini, Model Garden]

Pre-requisites

pip install google-cloud-aiplatform (pre-installed on proxy docker image)
Authentication:
- run gcloud auth application-default login See Google Cloud Docs
- Alternatively you can set GOOGLE_APPLICATION_CREDENTIALS
  Here's how: Jump to Code
  - Create a service account on GCP
  - Export the credentials as a json
  - load the json and json.dump the json as a string
  - store the json string in your environment as GOOGLE_APPLICATION_CREDENTIALS

Sample Usage

import litellm
litellm.vertex_project = "hardy-device-38811" # Your Project ID
litellm.vertex_location = "us-central1"  # proj location

response = litellm.completion(model="gemini-pro", messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}])

Usage with LiteLLM Proxy Server

Here's how to use Vertex AI with the LiteLLM Proxy Server

Modify the config.yaml

Different location per model
One location all vertex models

Use this when you need to set a different location for each vertex model

model_list:
  - model_name: gemini-vision
    litellm_params:
      model: vertex_ai/gemini-1.0-pro-vision-001
      vertex_project: "project-id"
      vertex_location: "us-central1"
  - model_name: gemini-vision
    litellm_params:
      model: vertex_ai/gemini-1.0-pro-vision-001
      vertex_project: "project-id2"
      vertex_location: "us-east"

Use this when you have one vertex location for all models

litellm_settings: 
  vertex_project: "hardy-device-38811" # Your Project ID
  vertex_location: "us-central1" # proj location

model_list: 
  -model_name: team1-gemini-pro
  litellm_params: 
    model: gemini-pro

Start the proxy
```
$ litellm --config /path/to/config.yaml
```

Send Request to LiteLLM Proxy Server

OpenAI Python v1.0.0+
curl

import openai
client = openai.OpenAI(
    api_key="sk-1234",             # pass litellm proxy key, if you're using virtual keys
    base_url="http://0.0.0.0:4000" # litellm-proxy-base url
)

response = client.chat.completions.create(
    model="team1-gemini-pro",
    messages = [
        {
            "role": "user",
            "content": "what llm are you"
        }
    ],
)

print(response)

curl --location 'http://0.0.0.0:4000/chat/completions' \
    --header 'Authorization: Bearer sk-1234' \
    --header 'Content-Type: application/json' \
    --data '{
    "model": "team1-gemini-pro",
    "messages": [
        {
        "role": "user",
        "content": "what llm are you"
        }
    ],
}'

Specifying Safety Settings

In certain use-cases you may need to make calls to the models and pass safety settigns different from the defaults. To do so, simple pass the safety_settings argument to completion or acompletion. For example:

SDK
Proxy

response = completion(
    model="gemini/gemini-pro", 
    messages=[{"role": "user", "content": "write code for saying hi from LiteLLM"}]
    safety_settings=[
        {
            "category": "HARM_CATEGORY_HARASSMENT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_HATE_SPEECH",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
            "threshold": "BLOCK_NONE",
        },
        {
            "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
            "threshold": "BLOCK_NONE",
        },
    ]
)

Option 1: Set in config

model_list:
  - model_name: gemini-experimental
    litellm_params:
      model: vertex_ai/gemini-experimental
      vertex_project: litellm-epic
      vertex_location: us-central1
      safety_settings:
      - category: HARM_CATEGORY_HARASSMENT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_HATE_SPEECH
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_SEXUALLY_EXPLICIT
        threshold: BLOCK_NONE
      - category: HARM_CATEGORY_DANGEROUS_CONTENT
        threshold: BLOCK_NONE

Option 2: Set on call

response = client.chat.completions.create(
    model="gemini-experimental",
    messages=[
        {
            "role": "user",
            "content": "Can you write exploits?",
        }
    ],
    max_tokens=8192,
    stream=False,
    temperature=0.0,

    extra_body={
        "safety_settings": [
            {
                "category": "HARM_CATEGORY_HARASSMENT",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_HATE_SPEECH",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_SEXUALLY_EXPLICIT",
                "threshold": "BLOCK_NONE",
            },
            {
                "category": "HARM_CATEGORY_DANGEROUS_CONTENT",
                "threshold": "BLOCK_NONE",
            },
        ],
    }
)

Set Vertex Project & Vertex Location

All calls using Vertex AI require the following parameters:

Your Project ID

import os, litellm 

# set via env var
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811" # Your Project ID`

### OR ###

# set directly on module 
litellm.vertex_project = "hardy-device-38811" # Your Project ID`

Your Project Location

import os, litellm 

# set via env var
os.environ["VERTEXAI_LOCATION"] = "us-central1 # Your Location

### OR ###

# set directly on module 
litellm.vertex_location = "us-central1 # Your Location

Anthropic

Model Name	Function Call
claude-3-opus@20240229	`completion('vertex_ai/claude-3-opus@20240229', messages)`
claude-3-sonnet@20240229	`completion('vertex_ai/claude-3-sonnet@20240229', messages)`
claude-3-haiku@20240307	`completion('vertex_ai/claude-3-haiku@20240307', messages)`

Usage

SDK
Proxy

from litellm import completion
import os

os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = ""

model = "claude-3-sonnet@20240229"

vertex_ai_project = "your-vertex-project" # can also set this as os.environ["VERTEXAI_PROJECT"]
vertex_ai_location = "your-vertex-location" # can also set this as os.environ["VERTEXAI_LOCATION"]

response = completion(
    model="vertex_ai/" + model,
    messages=[{"role": "user", "content": "hi"}],
    temperature=0.7,
    vertex_ai_project=vertex_ai_project,
    vertex_ai_location=vertex_ai_location,
)
print("\nModel Response", response)

1. Add to config

model_list:
    - model_name: anthropic-vertex
      litellm_params:
        model: vertex_ai/claude-3-sonnet@20240229
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-east-1"
    - model_name: anthropic-vertex
      litellm_params:
        model: vertex_ai/claude-3-sonnet@20240229
        vertex_ai_project: "my-test-project"
        vertex_ai_location: "us-west-1"

2. Start proxy

litellm --config /path/to/config.yaml

# RUNNING at http://0.0.0.0:4000

3. Test it!

curl --location 'http://0.0.0.0:4000/chat/completions' \
      --header 'Authorization: Bearer sk-1234' \
      --header 'Content-Type: application/json' \
      --data '{
            "model": "anthropic-vertex", # 👈 the 'model_name' in config
            "messages": [
                {
                "role": "user",
                "content": "what llm are you"
                }
            ],
        }'

Model Garden

Model Name	Function Call
llama2	`completion('vertex_ai/<endpoint_id>', messages)`

Using Model Garden

from litellm import completion
import os

## set ENV variables
os.environ["VERTEXAI_PROJECT"] = "hardy-device-38811"
os.environ["VERTEXAI_LOCATION"] = "us-central1"

response = completion(
  model="vertex_ai/<your-endpoint-id>", 
  messages=[{ "content": "Hello, how are you?","role": "user"}]
)

Gemini Pro

Model Name	Function Call
gemini-pro	`completion('gemini-pro', messages)`, `completion('vertex_ai/gemini-pro', messages)`

Gemini Pro Vision

Model Name	Function Call
gemini-pro-vision	`completion('gemini-pro-vision', messages)`, `completion('vertex_ai/gemini-pro-vision', messages)`

Gemini 1.5 Pro (and Vision)

Model Name	Function Call
gemini-1.5-pro	`completion('gemini-1.5-pro', messages)`, `completion('vertex_ai/gemini-pro', messages)`

Using Gemini Pro Vision

Call gemini-pro-vision in the same input/output format as OpenAI gpt-4-vision

LiteLLM Supports the following image types passed in url

Images with Cloud Storage URIs - gs://cloud-samples-data/generative-ai/image/boats.jpeg
Images with direct links - https://storage.googleapis.com/github-repo/img/gemini/intro/landmark3.jpg
Videos with Cloud Storage URIs - https://storage.googleapis.com/github-repo/img/gemini/multimodality_usecases_overview/pixel8.mp4
Base64 Encoded Local Images

Example Request - image url

Images with direct links
Local Base64 Images

import litellm

response = litellm.completion(
  model = "vertex_ai/gemini-pro-vision",
  messages=[
      {
          "role": "user",
          "content": [
                          {
                              "type": "text",
                              "text": "Whats in this image?"
                          },
                          {
                              "type": "image_url",
                              "image_url": {
                              "url": "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
                              }
                          }
                      ]
      }
  ],
)
print(response)

import litellm

def encode_image(image_path):
    import base64

    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

image_path = "cached_logo.jpg"
# Getting the base64 string
base64_image = encode_image(image_path)
response = litellm.completion(
    model="vertex_ai/gemini-pro-vision",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Whats in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": "data:image/jpeg;base64," + base64_image
                    },
                },
            ],
        }
    ],
)
print(response)

Chat Models

Model Name	Function Call
chat-bison-32k	`completion('chat-bison-32k', messages)`
chat-bison	`completion('chat-bison', messages)`
chat-bison@001	`completion('chat-bison@001', messages)`

Code Chat Models

Model Name	Function Call
codechat-bison	`completion('codechat-bison', messages)`
codechat-bison-32k	`completion('codechat-bison-32k', messages)`
codechat-bison@001	`completion('codechat-bison@001', messages)`

Text Models

Model Name	Function Call
text-bison	`completion('text-bison', messages)`
text-bison@001	`completion('text-bison@001', messages)`

Code Text Models

Model Name	Function Call
code-bison	`completion('code-bison', messages)`
code-bison@001	`completion('code-bison@001', messages)`
code-gecko@001	`completion('code-gecko@001', messages)`
code-gecko@latest	`completion('code-gecko@latest', messages)`

Extra

Using `GOOGLE_APPLICATION_CREDENTIALS`

Here's the code for storing your service account credentials as GOOGLE_APPLICATION_CREDENTIALS environment variable:

def load_vertex_ai_credentials():
  # Define the path to the vertex_key.json file
  print("loading vertex ai credentials")
  filepath = os.path.dirname(os.path.abspath(__file__))
  vertex_key_path = filepath + "/vertex_key.json"

  # Read the existing content of the file or create an empty dictionary
  try:
      with open(vertex_key_path, "r") as file:
          # Read the file content
          print("Read vertexai file path")
          content = file.read()

          # If the file is empty or not valid JSON, create an empty dictionary
          if not content or not content.strip():
              service_account_key_data = {}
          else:
              # Attempt to load the existing JSON content
              file.seek(0)
              service_account_key_data = json.load(file)
  except FileNotFoundError:
      # If the file doesn't exist, create an empty dictionary
      service_account_key_data = {}

  # Create a temporary file
  with tempfile.NamedTemporaryFile(mode="w+", delete=False) as temp_file:
      # Write the updated content to the temporary file
      json.dump(service_account_key_data, temp_file, indent=2)

  # Export the temporary file as GOOGLE_APPLICATION_CREDENTIALS
  os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = os.path.abspath(temp_file.name)

Using GCP Service Account

Figure out the Service Account bound to the Google Cloud Run service

Get the FULL EMAIL address of the corresponding Service Account
Next, go to IAM & Admin > Manage Resources , select your top-level project that houses your Google Cloud Run Service

Click Add Principal

Specify the Service Account as the principal and Vertex AI User as the role

Once that's done, when you deploy the new container in the Google Cloud Run service, LiteLLM will have automatic access to all Vertex AI endpoints.

s/o @Darien Kindlund for this tutorial

VertexAI [Anthropic, Gemini, Model Garden]

Pre-requisites​

Sample Usage​

Usage with LiteLLM Proxy Server​

Specifying Safety Settings​

Set Vertex Project & Vertex Location​

Anthropic​

Usage​

Model Garden​

Using Model Garden​

Gemini Pro​

Gemini Pro Vision​

Gemini 1.5 Pro (and Vision)​

Using Gemini Pro Vision​

Chat Models​

Code Chat Models​

Text Models​

Code Text Models​

Extra​

Using GOOGLE_APPLICATION_CREDENTIALS​

Using GCP Service Account​