Skip to main content

Product Pricing

Basic Concepts

Billing Unit

We use Token as the basic billing unit. For the definition of Token, refer to the User Guide section.

Billing Logic

We charge for both Input and Output based on the actual number of Tokens corresponding to each request's Input and Output.

Model Pricing

Model
Context LengthFeaturesScenariosPrice/1M tokens
inf-chat-v132kOur model is designed for Chinese and English conversations, ensuring smooth and accurate interactions in these two languages. Although our model also supports other languages, the main optimization is for Chinese and English. Additionally, our model performs excellently in financial and medical applications, effectively supporting professionals in these fields to solve industry-specific problems, improving work efficiency and decision quality.General conversation, finance, medical¥10
inf-chat-fin-v132kFinance¥20
inf-med-chat-v132kMedical¥20
inf-chat-int-v132kFunction calls, structured outputGeneral¥20

Account Rate Limiting

Why Rate Limiting

Rate limiting for API interfaces is a common practice, mainly for the following reasons:

  • Prevent attacks: Rate limiting helps prevent malicious traffic attacks on the API. For example, a malicious attack might try to overload or disrupt the service by sending a large number of requests to the API. By setting rate limits, platform users can be protected from such attacks.
  • Ensure fair access: Rate limiting ensures that all users can use the API and get quick responses. Without these limits, a few users might consume excessive resources, reducing the experience for others. By reasonably configuring rate limit strategies based on actual user needs, most users can have the best experience.
  • Ensure infrastructure stability: Rate limiting helps manage the overall load on the API infrastructure, which is crucial for maintaining service reliability and performance. Especially in cases of sudden demand spikes, controlling the frequency of user requests allows the API service provider to better manage resources and avoid performance bottlenecks or service interruptions.

Rate Limiting Concepts

  • Concurrency: The maximum number of requests you can make to us at the same time.
  • RPM: The maximum number of requests you can make to us in one minute.
  • TPM: The maximum number of tokens you can interact with us in one minute.

Rate Limiting Levels

User LevelCumulative Recharge AmountConcurrencyRPMTPM
Free¥01432,000
Tier1¥500420128,000
Tier2¥3000880256,000
Tier3¥1000024240384,000
Tier4¥5000032320512,000
Tier5¥10000048480768,000