Image-to-Text

Some of our models have image-to-text capabilities compatible with the OpenAI protocol, capable of deep analysis of images and generating detailed text descriptions.

Features

Image Content Recognition: Can identify objects, scenes, people, and other elements in images
Detailed Description: Provides comprehensive descriptions of important details in images
Multi-language Support: Supports both English and Chinese output

Usage Examples

Basic Example

import os
from openai import OpenAI
import base64

client = OpenAI(
    base_url="https://api.infly.cn/v1",
    api_key=os.getenv("INF_OPENAPI_API_KEY"),
)

# Converting image file to base64
def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode('utf-8')

# Preparing image
image_path = "path/to/your/image.jpg"
base64_image = encode_image(image_path)

# Calling the model
response = client.chat.completions.create(
    model="inf-image-chat-v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "Please describe the content of this image in detail"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
)

print(response.choices[0].message.content)

Multi-turn Conversation Example

# Initiating first round of conversation
response1 = client.chat.completions.create(
    model="inf-image-chat-v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        }
    ]
)

# Initiating second round of conversation
response2 = client.chat.completions.create(
    model="inf-image-chat-v1",
    messages=[
        {
            "role": "user",
            "content": [
                {"type": "text", "text": "What's in this image?"},
                {
                    "type": "image_url",
                    "image_url": {
                        "url": f"data:image/jpeg;base64,{base64_image}"
                    }
                }
            ]
        },
        {
            "role": "assistant",
            "content": response1.choices[0].message.content
        },
        {
            "role": "user",
            "content": "Can you describe the facial expressions of the people in the image in detail?"
        }
    ]
)

Notes

Image Format Support:
- Supported formats: JPEG, PNG, GIF
- Recommended image size: under 10MB
- Recommended resolution: within 1024x1024
- Image quality requirements: good clarity, no obvious noise
Usage Limitations:
- Image content must comply with relevant laws and regulations
- It's recommended to include specific analysis requirements in the request for more accurate results
Best Practices:
- Provide clear images, avoid blurry or over-compressed images
- Clearly specify the focus areas in your request
- For complex scenes, describe them step by step
- Use multi-turn conversations to obtain more detailed information
- Adjust prompts according to actual needs for more accurate results

Features​

Usage Examples​

Basic Example​

Multi-turn Conversation Example​

Notes​

Features

Usage Examples

Basic Example

Multi-turn Conversation Example

Notes