Product Pricing
Basic Concepts
Billing Unit
We use Token as the basic billing unit. For the definition of Token, refer to the User Guide section.
Billing Logic
We charge for both Input and Output based on the actual number of Tokens corresponding to each request's Input and Output.
Model Pricing
Model | Context Length | Features | Scenarios | Price/1M tokens |
---|---|---|---|---|
inf-chat-v1 | 32k | Our model is designed for Chinese and English conversations, ensuring smooth and accurate interactions in these two languages. Although our model also supports other languages, the main optimization is for Chinese and English. Additionally, our model performs excellently in financial and medical applications, effectively supporting professionals in these fields to solve industry-specific problems, improving work efficiency and decision quality. | General conversation, finance, medical | ¥10 |
inf-chat-fin-v1 | 32k | Finance | ¥20 | |
inf-med-chat-v1 | 32k | Medical | ¥20 | |
inf-chat-int-v1 | 32k | Function calls, structured output | General | ¥20 |
Account Rate Limiting
Why Rate Limiting
Rate limiting for API interfaces is a common practice, mainly for the following reasons:
- Prevent attacks: Rate limiting helps prevent malicious traffic attacks on the API. For example, a malicious attack might try to overload or disrupt the service by sending a large number of requests to the API. By setting rate limits, platform users can be protected from such attacks.
- Ensure fair access: Rate limiting ensures that all users can use the API and get quick responses. Without these limits, a few users might consume excessive resources, reducing the experience for others. By reasonably configuring rate limit strategies based on actual user needs, most users can have the best experience.
- Ensure infrastructure stability: Rate limiting helps manage the overall load on the API infrastructure, which is crucial for maintaining service reliability and performance. Especially in cases of sudden demand spikes, controlling the frequency of user requests allows the API service provider to better manage resources and avoid performance bottlenecks or service interruptions.
Rate Limiting Concepts
- Concurrency: The maximum number of requests you can make to us at the same time.
- RPM: The maximum number of requests you can make to us in one minute.
- TPM: The maximum number of tokens you can interact with us in one minute.
Rate Limiting Levels
User Level | Cumulative Recharge Amount | Concurrency | RPM | TPM |
---|---|---|---|---|
Free | ¥0 | 1 | 4 | 32,000 |
Tier1 | ¥500 | 4 | 20 | 128,000 |
Tier2 | ¥3000 | 8 | 80 | 256,000 |
Tier3 | ¥10000 | 24 | 240 | 384,000 |
Tier4 | ¥50000 | 32 | 320 | 512,000 |
Tier5 | ¥100000 | 48 | 480 | 768,000 |