Dapr: Service Mesh Done Right?
With the recent announcement that Dapr has reached its first production-ready release, we finally see a viable response to Istio and perhaps the rest of the service mesh industry from Microsoft. If you’re not familiar, Dapr is a coding framework intended to solve the challenges of modern distributed applications. You may be asking, “But isn’t that what a service mesh is for?” Yes, except that service meshes have the wrong focus. They focus on networking infrastructure concerns; Dapr focuses on what developers need to build microservices. This shift may be what the industry needs to solve the problems of distributed architecture.
The Problems Of Distributed Architecture
As your apps modernize from a monolith to microservices, you trade the stability of the call stack for the chaos and insecurity of the network. Hand coding all the things needed to handle that just doesn’t scale: It’s cumbersome and prone to human error. Service meshes have risen as a means to solve this problem, but they’re constrained by two issues:
- Many vendors view the problem through the lens of network operations, but the problem is one of software development and architecture. Viewing it as purely a networking problem is like viewing dynamic library linking as purely a disk I/O problem. Dynamic library linking requires solving a development problem that’s much more than just loading a DLL or JAR file from disk; and similarly, distributed architecture is more than just a networking problem.
- Service meshes can be complicated, especially Istio. To support a service mesh, developers need to either outsource it to an I&O team or else they need to become I&O experts to support the mesh themselves. The former defeats a key purpose of microservices: autonomous teams who can move quickly with minimal dependencies on other teams. The latter defeats the purpose of the mesh: to free developers from the infrastructure to just focus on the business logic.
Dapr follows the sidecar model of service meshes, but its abstraction exists in the application code layer above the seven-layer network stack. Although the aforementioned chaotic network is a chief concern for distributed developers, it isn’t the only problem of distributed architecture. Other concerns include managing state, publishing messages between microservices, and triggering events through event bindings. Service meshes are incapable of solving these problems because they exist in the network layers. Because it’s in the application code layer, Dapr tackles these problems by abstracting away the infrastructure concerns needed for those coding patterns. Besides accelerating delivery, it should also increase cloud portability. Need to migrate from Amazon Web Services to Azure? Change your YAML config for state management from DynamoDB to CosmosDB. Now without any code changes, your microservice persists its state in CosmosDB.
Dapr Vs. Service Mesh
Below is a diagram I put together to give an idea of the overlap. Distributed tracing straddles the line because being above the network layer enables Dapr to do more. Service meshes can trace HTTP connections only; transactions that flow through event message brokers are invisible to its tracing. Being above the network layer enables Dapr to trace through both HTTP service calls and event message brokers.
As a developer, I find the stuff in the blue box much more exciting than the stuff in the green box. Why should I even care about the stuff on the bottom unique to service meshes?
Dapr strikes me as the right response to the complexity of translating business logic to the underlying infrastructure. We may have finally found a service mesh done right, and ironically, it’s not a service mesh. Will service mesh vendors compete with Dapr above the network layer? Or will they stop trying to solve development problems in the infrastructure layer and instead focus on becoming the next generation of virtual networking? Service mesh vendors need to make a decision.