We’re excited to announce significant updates to Azure OpenAI Service to help our 60,000+ customers manage AI deployments more efficiently and cost-effectively than current pricing requires. By introducing self-service deployments, we want to help make your quota and deployment processes more flexible, faster to market, and more economical.
We are pleased to provide you with important updates for Azure OpenAI Serviceto help our 60,000+ customers manage AI deployments more efficiently and cost-effectively, beyond current pricing. By introducing self-service provisioning with provisioning, we aim to help make your quota and provisioning processes more flexible, faster to market, and more economical. The technical value proposition remains the same – provisioned deployments continue to be the best option for latency-sensitive and high-throughput applications. Today’s announcement includes self-service provisioning, visibility into service capacity and availability, and the introduction of Provisioned (PTU) hourly pricing and reservations to support cost management and savings.
What’s new?
Self-service provisioning and model-independent quota requirements
We’re introducing self-service provisioning alongside standard tokens, giving you more flexibility and efficiency in requesting Provisioned Throughput Units (PTUs). This new capability allows you to independently manage your Azure OpenAI Service quota deployments without relying on support from your account team. By decoupling quota requirements from specific models, you can now allocate resources based on your immediate needs and adjust them as your requirements change. This change simplifies the process and accelerates your ability to deploy and scale your applications.

Transparency regarding service capacity and availability
Gain greater insight into service capacity and availability so you can make informed decisions about your deployments. This new feature gives you real-time access to information about service capacity across regions, allowing you to plan and manage your deployments more effectively. This visibility helps you avoid potential capacity issues and optimize the distribution of your workloads across available resources, resulting in improved performance and reliability of your applications.
Provided hourly rates and reservations
We are pleased to introduce two new self-service purchasing options for PTUs:
- Hourly shopping without obligation
- You can now create a Provisioned deployment for as little as one hour, with a fixed hourly rate of $2 per unit per hour. This model-agnostic pricing makes it easy to provision and tear down deployments as needed and provides maximum flexibility. This is ideal for test scenarios or transition periods without a long-term commitment.
- Monthly and annual Azure reservations for provisioned deployments
- For production environments with a constant volume of requests, Azure OpenAI Service Provisioned Reservations offer significant cost savings. By committing to a monthly or annual reservation, You can save up to 82% or 85%, or hourly rates. Reservations are now decoupled from specific models and deployments, providing unmatched flexibility. This approach allows companies to optimize their costs while retaining the ability to switch models and adjust deployments as needed. Read our technical blog about reservations here.
Benefits for decision makers
These updates are designed to provide flexibility, cost-effectiveness and ease of use, making it easier for decision makers to manage AI deployments.
- flexibility: Self-service provisioning and hourly pricing let you scale your deployments up or down based on your immediate needs without making long-term commitments.
- Cost efficiency: Azure reservations provide significant savings over long-term usage and enable better budget planning and cost management.
- User-friendliness: Improved visibility and simplified deployment processes reduce administrative overhead, allowing your team to focus on strategic initiatives rather than operational details.
Success stories of our customers
Before we made self-service available, select customers were already able to benefit from these options.

- Visor solutions: By leveraging Provisioned Throughput Units (PTUs) with Azure OpenAI Service, Visier Solutions has significantly enhanced its AI-powered workforce analytics tool Vee. With PTUs, Visier guarantees fast, consistent response times, which are critical to handling the high volume of requests from its extensive customer base. This powerful synergy between Visier’s innovative solutions and Azure’s robust infrastructure not only increases customer satisfaction through fast and accurate insights, but also underscores Visier’s commitment to driving transformative change in workforce analytics using cutting-edge technology. Read the Microsoft case study.
- An analytics and insights company: Switched from standard deployments to GPT-4 Turbo PTUs and saw a significant reduction in response times from 10-20 seconds to just 2-3 seconds.
- A chatbot service company: Reported improved stability and lower latency with Azure PTUs, improving the performance of their services.
- A visual entertainment company: A dramatic improvement in latency from 12-13 seconds to 2-3 seconds was observed, resulting in improved user engagement.
All customers can create with the Azure OpenAI Service
These new updates do not change the technical excellence of Provisioned deployments, which continue to offer low and predictable latencies. Instead, they introduce a more flexible and cost-effective procurement model, making the Azure OpenAI Service more accessible than ever. With self-service provisioning, model-agnostic units, and both hourly and reserved pricing options, the barriers to entry have been dramatically lowered.
For more information on improving the reliability, security, and performance of your cloud and AI investments, see the following additional resources.