Skip to main content

Image-to-Text

Some of our models have image-to-text capabilities compatible with the OpenAI protocol, capable of deep analysis of images and generating detailed text descriptions.

Features

  • Image Content Recognition: Can identify objects, scenes, people, and other elements in images
  • Detailed Description: Provides comprehensive descriptions of important details in images
  • Multi-language Support: Supports both English and Chinese output

Usage Examples

Basic Example

import os
from openai import OpenAI
import base64

client = OpenAI(
base_url="https://api.infly.cn/v1",
api_key=os.getenv("INF_OPENAPI_API_KEY"),
)

# Converting image file to base64
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')

# Preparing image
image_path = "path/to/your/image.jpg"
base64_image = encode_image(image_path)

# Calling the model
response = client.chat.completions.create(
model="inf-image-chat-v1",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please describe the content of this image in detail"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
]
)

print(response.choices[0].message.content)

Multi-turn Conversation Example

# Initiating first round of conversation
response1 = client.chat.completions.create(
model="inf-image-chat-v1",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
]
)

# Initiating second round of conversation
response2 = client.chat.completions.create(
model="inf-image-chat-v1",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
},
{
"role": "assistant",
"content": response1.choices[0].message.content
},
{
"role": "user",
"content": "Can you describe the facial expressions of the people in the image in detail?"
}
]
)

Notes

  1. Image Format Support:

    • Supported formats: JPEG, PNG, GIF
    • Recommended image size: under 10MB
    • Recommended resolution: within 1024x1024
    • Image quality requirements: good clarity, no obvious noise
  2. Usage Limitations:

    • Image content must comply with relevant laws and regulations
    • It's recommended to include specific analysis requirements in the request for more accurate results
  3. Best Practices:

    • Provide clear images, avoid blurry or over-compressed images
    • Clearly specify the focus areas in your request
    • For complex scenes, describe them step by step
    • Use multi-turn conversations to obtain more detailed information
    • Adjust prompts according to actual needs for more accurate results