Image-to-Text
Some of our models have image-to-text capabilities compatible with the OpenAI protocol, capable of deep analysis of images and generating detailed text descriptions.
Features
- Image Content Recognition: Can identify objects, scenes, people, and other elements in images
- Detailed Description: Provides comprehensive descriptions of important details in images
- Multi-language Support: Supports both English and Chinese output
Usage Examples
Basic Example
import os
from openai import OpenAI
import base64
client = OpenAI(
base_url="https://api.infly.cn/v1",
api_key=os.getenv("INF_OPENAPI_API_KEY"),
)
# Converting image file to base64
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
# Preparing image
image_path = "path/to/your/image.jpg"
base64_image = encode_image(image_path)
# Calling the model
response = client.chat.completions.create(
model="inf-image-chat-v1",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "Please describe the content of this image in detail"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
]
)
print(response.choices[0].message.content)
Multi-turn Conversation Example
# Initiating first round of conversation
response1 = client.chat.completions.create(
model="inf-image-chat-v1",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
}
]
)
# Initiating second round of conversation
response2 = client.chat.completions.create(
model="inf-image-chat-v1",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": "What's in this image?"},
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}"
}
}
]
},
{
"role": "assistant",
"content": response1.choices[0].message.content
},
{
"role": "user",
"content": "Can you describe the facial expressions of the people in the image in detail?"
}
]
)
Notes
-
Image Format Support:
- Supported formats: JPEG, PNG, GIF
- Recommended image size: under 10MB
- Recommended resolution: within 1024x1024
- Image quality requirements: good clarity, no obvious noise
-
Usage Limitations:
- Image content must comply with relevant laws and regulations
- It's recommended to include specific analysis requirements in the request for more accurate results
-
Best Practices:
- Provide clear images, avoid blurry or over-compressed images
- Clearly specify the focus areas in your request
- For complex scenes, describe them step by step
- Use multi-turn conversations to obtain more detailed information
- Adjust prompts according to actual needs for more accurate results