In our last couple of blog posts, we’ve looked at how accurate clock synchronization enables better insight into latency issues. Accurately synchronizing the Sender and Receiver enables the measurement of the true one-way delay of each packet rather than estimating it as half the round trip time (i.e., OWD = RTT/2), as has been the practice till now.
In modern data centers the actual network OWDs are small compared to the “switch around time” of the acknowledgment at the Receiver, so RTT/2 turns out to be a poor estimate. In case there are multiple hops, as happens, for example, to the trajectory of a transaction in a Microservice System, the OWD of each leg can be precisely determined.
This allows the total completion time of a request to be decomposed into (1) the outbound network time, (2) the compute time at the child microservice, and (3) the inbound network time. This is a significant improvement over the current practice used in trace collectors like Jaeger and Zipkin which tend to “center” the child span with respect to the parent span, thereby making the assumption: outbound network time = inbound network time. This assumption can be way off the mark!
For example, the figure above shows the “waterfall view” of a request in a microservice mesh. The request completion time at each service is broken down into the network and compute times. Drilling down into the “checkoutservice” and “cartservice”, we see that the total request time of 4.1ms is decomposed as 2.5ms for the compute time, 1.5ms for the outbound network time, and just 0.1ms for the inbound network time.
With accurate OWD measurements and decomposition, you can diagnose and resolve latency issues quickly in any modern, complex multi-hop system. Contact us to learn more and schedule a demo.
Interested in solving challenging engineering problems and building the platform that powers the next generation time-sensitive application? Join our world-class engineering team.