Product Pricing

Basic Concepts

We use Token as the basic billing unit. For the definition of Token, refer to the User Guide section.

We charge for both Input and Output based on the actual number of Tokens corresponding to each request's Input and Output.

Model	Context Length	Features	Scenarios	Price/1M tokens
inf-chat-v1	32k	Our model is designed for Chinese and English conversations, ensuring smooth and accurate interactions in these two languages. Although our model also supports other languages, the main optimization is for Chinese and English. Additionally, our model performs excellently in financial and medical applications, effectively supporting professionals in these fields to solve industry-specific problems, improving work efficiency and decision quality.	General conversation, finance, medical	¥10
inf-chat-fin-v1	32k		Finance	¥20
inf-med-chat-v1	32k		Medical	¥20
inf-chat-int-v1	32k	Function calls, structured output	General	¥20
inf-image-chat-v1	32k	Image to text	Multi-modal	¥5

Rate limiting for API interfaces is a common practice, mainly for the following reasons:

Prevent attacks: Rate limiting helps prevent malicious traffic attacks on the API. For example, a malicious attack might try to overload or disrupt the service by sending a large number of requests to the API. By setting rate limits, platform users can be protected from such attacks.
Ensure fair access: Rate limiting ensures that all users can use the API and get quick responses. Without these limits, a few users might consume excessive resources, reducing the experience for others. By reasonably configuring rate limit strategies based on actual user needs, most users can have the best experience.
Ensure infrastructure stability: Rate limiting helps manage the overall load on the API infrastructure, which is crucial for maintaining service reliability and performance. Especially in cases of sudden demand spikes, controlling the frequency of user requests allows the API service provider to better manage resources and avoid performance bottlenecks or service interruptions.

Concurrency: The maximum number of requests you can make to us at the same time.
RPM: The maximum number of requests you can make to us in one minute.
TPM: The maximum number of tokens you can interact with us in one minute.

User Level	Cumulative Recharge Amount	Concurrency	RPM	TPM
Free	¥0	1	4	32,000
Tier1	¥500	4	20	128,000
Tier2	¥3000	8	80	256,000
Tier3	¥10000	24	240	384,000
Tier4	¥50000	32	320	512,000
Tier5	¥100000	48	480	768,000