AKF Partners

Abbott, Keeven & Fisher PartnersPartners In Hyper Growth

Communicating Across Swim-lanes

We often emphasize the need to break apart complex applications into separate swim-laned services. Doing so not only improves architectural scalability but also permits the organization to scale and innovate faster as each separate team can operate almost as its own independent startup.

Ideally, there would be no need to pass information between these services and they could be easily stitched together on the client-side with a little HTML for a seamless user experience. However, in the real world, you’re likely to find that some message passing between services is needed (If only to save your users the trouble of not entering information twice).

Every cross-service call adds complexity to your application. Often times, teams will implement internal APIs when cross-service communication is needed. This is a good start and solves part of the problem by formalizing communication and discouraging unnecessary cross-service calls.

However, APIs alone don’t address the more important issue with cross-service communication — the reduction in availability. Anytime one service synchronously communicates with another you’ve effectively connected them in series and reduced overall availability. If Service-A calls Service-B synchronously, and Service-B crashes or slows to a crawl, Service-A will do the same. To avoid this trap, communication needs to take place asynchronously between services. This, in fact, is how we define “swim-laning”.

So what are the best means to facilitate asynchronous cross-service communication and what are the pro and cons of each method?

Client Side

Information that needs to be shared between services can be passed via JavaScript & JSON in the browser. This has the advantage of keeping the backend services completely separate and gives you the option to move business logic to the client side, thus reducing load on your transactional systems.

The downside, however, is the increased security risks. It doesn’t take an expert hacker to manipulate variables in JavaScript. (Just think, $price_var gets set to $0 somewhere between the shopping cart and checkout services). Furthermore, data is passed back to the server-side these cross-service calls will now suffer from the same latency and reliability issues as any TCP call over the internet.

Message Bus / Enterprise Service Bus

Message buses and enterprise service buses provide a ready means to transmit messages between services asynchronously in pub/sub model. Advantages include providing a rich set of features for tracking, manipulating, and delivering messages, as well as the ability centralized logging and monitoring. However, the potential for congestion as well as occasional message loss makes them less desirable in many cases than asynchronous point-to-point calls (discussed below).

To limit congestion, it’s best to implement several independent channels and limit message bus traffic to events that will have multiple subscribers.

Asynchronous Point-to-Point

Point-to-point communication (i.e. one service directly calling another) is another effective means of message passing. Advantages include simplicity, speed, and reliability. However, be sure to implement this asynchronously with a queuing mechanism with timeouts, retries (if needed), and exceptions handled in the event of service failure. This will prevent failures from propagating across service boundaries.

If properly implemented, asynchronous point-to-point communication is excellent for invoking a function in another service or transmitting a small amount of information. This method can be implemented for the majority of cross-service communication needs. However, for larger data transfers, you’ll need to consider one of the methods below.

ETL

ETL jobs can be used to move a large amount of data from one datastore to another. Since these are generally implemented as separate batch processes they won’t bring down the availability of your services. However, drawbacks include increased load on transactional databases (unless a read replica of the source DB is used) and the poor timeliness/consistency of data resulting from periodic batch process.

ETL processes are likely best reserved for transferring data from your OLTP to OLAP systems, where immediate consistency isn’t required. If you need both a large amount of data transferred and up to the second consistency consider a DB read replica.

DB Read-Replica

Most common databases (Oracle, MySQL, PostgreSQL) support native replication of DB clones with replication lag measured in milliseconds. By placing a read replica of one service’s DB into another service’s swim-lane, you can successfully transfer a large amount of data in near real-time. The downsides are increased IOPS requirements and a lack of abstraction between services that — in contrast to the abstraction provided by an API — fosters greater service interdependency.

Conclusion

In conclusion, you can see there are a variety of cross-service communication methods that permit fault-isolation between services. The trick is knowing the strengths and weaknesses of each and implementing the right method where appropriate.