Microsoft Dev: Azure Cosmos DB Conf 2026 Recap: Lessons from Production

You provisioned Azure Cosmos DB with ample Request Units (RUs), your application’s P99 latency is creeping up, and throttling errors are becoming more frequent. Sound familiar? This isn’t a capacity problem; it’s a design problem. The Azure Cosmos DB Conference 2026 made one thing brutally clear: the platform exposes your data modeling and partition key choices like a harsh spotlight.

The Unseen Bottleneck: Partition Keys and Skewed Distribution

The single most impactful decision you make for Cosmos DB is the partition key. Forget throwing more RUs at the problem; if your partition key leads to skewed distribution, you’re battling hot partitions. This results in 100% RU utilization on some physical partitions while others languish, leading to relentless throttling and unacceptable latency spikes, even if your aggregate RU usage appears low.

Diagnosing the Dreaded Hot Partition

Azure Monitor is your primary battlefield. Look for:

  • Normalized RU Consumption (%) By PartitionKeyRangeID: This metric is gold. High values on specific PartitionKeyRangeIDs scream “hot partition.”
  • PhysicalPartitionThroughput: This shows the actual RU/s being consumed by individual physical partitions.

The fundamental truth from 2026 discussions is that your partition key MUST align with your most frequent access patterns and distribute load evenly. Common pitfalls include using a userId when one user generates a disproportionate amount of traffic, or not considering the 20 GB logical partition limit per physical partition.

Strategic Partitioning: Beyond the Single Key

When a single partition key can’t guarantee even distribution, or when dealing with massive tenants, Hierarchical Partition Keys are your savior. These allow up to three levels of subpartitioning, effectively scaling beyond the 20 GB limit for a single logical partition and offering finer-grained control over data distribution.

Data Modeling: Embracing Denormalization and Avoiding Pitfalls

Cosmos DB thrives on flexibility, but that doesn’t mean a free-for-all.

  • Denormalize and Embed: For one-to-few or contained relationships, embed related data directly within the parent document. This minimizes costly cross-partition queries.
  • Avoid Unbounded Arrays: Large, unbounded arrays can lead to large documents and potential RU spikes during updates.
  • Multiple Containers for Specific Access Patterns: Don’t force one container to serve all purposes. Use the Change Feed to propagate data to specialized containers optimized for different query needs, drastically reducing cross-partition query overhead.

Throughput Management Beyond Provisioning

Beyond basic RU provisioning, proactive management is key:

  • Monitor utilizationOf20GBLogicalPartition and set alerts. When a logical partition approaches its 20 GB limit, it’s a precursor to issues.
  • While not ideal, skewed workloads can sometimes be mitigated by manually redistributing throughput across physical partitions using PowerShell or the Azure CLI. This is a workaround, not a fix for bad design.

The Ecosystem and The Hard Truths

Discussions on platforms like Hacker News (even years later) reveal a mixed sentiment. Early criticisms often cited misleading “multi-model” marketing (Cosmos DB excels primarily as a document DB), sparse documentation, and cost concerns. Alternatives like Amazon DynamoDB, MongoDB Atlas, and Google Cloud Firestore are frequently mentioned.

The critical takeaway from production deployments is that the 20 GB logical partition and 10,000 RU/s per logical partition limits are non-negotiable hard limits. They necessitate careful design from day one.

When to Reconsider Cosmos DB

If your workload is inherently relational with complex, multi-table joins as a primary requirement, or if you’re seeking a bare-bones, cost-optimized NoSQL solution for extremely low-throughput scenarios, Cosmos DB might be overkill or a poor fit. It demands a deep understanding of your data and access patterns.

The honest verdict? Azure Cosmos DB is a powerhouse for globally distributed, high-scale, real-time, and AI-driven applications with flexible data. But its success is entirely dependent on disciplined data modeling and partition key design that perfectly matches your workload. Increasing RUs will only delay the inevitable confrontation with a flawed design. Cosmos DB doesn’t hide your design problems; it amplifies them. Choose your partition key wisely.

Next post

Cloudflare: Introducing Dynamic Workflows for Durable Execution

Cloudflare: Introducing Dynamic Workflows for Durable Execution