OpenAI Goes Online with Flex Processing Mode: API Costs Halved at the Expense of Response Speed and Stability

As reported today by TechCrunch, OpenAI has announced the launch of a new API service called "Flex Processing Mode" in an effort to further invest in its generative AI rivals like Google - in exchange for accepting slower response times and "occasional unavailability of resources", OpenAI says it is now available for the newly released o3 and o4-mini models. OpenAI said that Flex processing is now open for testing with the newly released o3 and o4-mini inference models, which are aimed at lower-priority "non-production" tasks such as model evaluation, data augmentation, and asynchronous processing. Flex Processing is currently open for testing for the newly released o3 and o4-mini inference models, primarily for lower priority "non-production" tasks such as model evaluation, data expansion, and asynchronous processing. With Flex processing, the API cost will be directly halved. For example, o3 charges $5 per million input tokens and $20 per million output tokens in Flex mode. For reference, the standard prices are $10 and $40 respectively. For o4-mini, the cost in Flex mode drops from $1.10 per million input tokens and $4.40 per million output tokens to $0.55 per million input tokens and $2.20 per million output tokens.

Search