My favourite use case for this: By the same derivation as this blog, one can prove that, if you have any two probability distributions X and Y (they can be different), the probability distribution of X+Y is a convolution of the PMFs/PDFs of X and Y.
On similar lines the MAX operator on the random variables become PRODUCT operator on its distribution.
It's fun to play with the (Max, +)algebra of random variables and infer it's distribution.
This turns out to be quite useful in estimating completion time of dependant parallel jobs.
Spawning multiple parallel jobs becomes a Max operation and chaining sequential jobs becomes a '+' operation on the completion times. This expression tree of Max'es and Plus'es can be algebraically processed to obtain bounds on the completion time distribution.
One example is the straggler problem in mapreduce/Hadoop. In the naive case, the completion time is the max of each parallel subtask. (*)
If the tasks have a heavy tail, which sometimes they do, the straggler's completion time can be really bad. This can be mitigated by k-out-n set up, where you encode the problem in such a way that only k out of n jobs need to finish to obtain the final result. One can play with this trade-off between potentially wasted computation and expected completion time.
For heavy tailed distributions another simplification is possible. The tails of Max and + start becoming of the same order, so one can switch between convolutions and products.
(*) This shows up in microservices architectures also. The owner/maintainer of a microservice end point might be very happy with its tail latency. However an end user who consumes and composed the results of the endpoints can experience really bad tail latencies for the final output.
What ? I've never heard of this math. Do you mean literally those are the resulting operations in the general case or are those approximate explanation and we need to find more specific cases to make this true ?
Thanks for sharing the name of that problem! I've encountered it before while optimizing batched LLM inference. The whole batch would last until all queries in a batch were done, and by changing the batch size, you'd trade off per-query-speed (better in a larger batch) with overall performance (worse with a larger batch).
Nowadays I think this is solved in an entirely different way, though.
My favourite use case for this: By the same derivation as this blog, one can prove that, if you have any two probability distributions X and Y (they can be different), the probability distribution of X+Y is a convolution of the PMFs/PDFs of X and Y.
On similar lines the MAX operator on the random variables become PRODUCT operator on its distribution.
It's fun to play with the (Max, +)algebra of random variables and infer it's distribution.
This turns out to be quite useful in estimating completion time of dependant parallel jobs.
Spawning multiple parallel jobs becomes a Max operation and chaining sequential jobs becomes a '+' operation on the completion times. This expression tree of Max'es and Plus'es can be algebraically processed to obtain bounds on the completion time distribution.
One example is the straggler problem in mapreduce/Hadoop. In the naive case, the completion time is the max of each parallel subtask. (*)
If the tasks have a heavy tail, which sometimes they do, the straggler's completion time can be really bad. This can be mitigated by k-out-n set up, where you encode the problem in such a way that only k out of n jobs need to finish to obtain the final result. One can play with this trade-off between potentially wasted computation and expected completion time.
For heavy tailed distributions another simplification is possible. The tails of Max and + start becoming of the same order, so one can switch between convolutions and products.
(*) This shows up in microservices architectures also. The owner/maintainer of a microservice end point might be very happy with its tail latency. However an end user who consumes and composed the results of the endpoints can experience really bad tail latencies for the final output.
What ? I've never heard of this math. Do you mean literally those are the resulting operations in the general case or are those approximate explanation and we need to find more specific cases to make this true ?
Thanks for sharing the name of that problem! I've encountered it before while optimizing batched LLM inference. The whole batch would last until all queries in a batch were done, and by changing the batch size, you'd trade off per-query-speed (better in a larger batch) with overall performance (worse with a larger batch).
Nowadays I think this is solved in an entirely different way, though.
The visualizations make the concept easy to grasp.
This and complex analysis are fascinating topics in Undergraduate studies.
[dead]