I've got a server in which about 100 times per day a specific event happens. We now wanted to number those events using the unix time in milliseconds (the number of milliseconds that have passed since 1970-01-01). A colleague of mine said that would cause overlaps if events happen in the same millisecond, but I thought the chance of overlapping would not be very high. We can't figure out how to calculate the chance of overlap though. This is our reasoning so far:
- every 24 hours has
24 hours * 60 minutes * 60 seconds * 1000 milliseconds= 86,400,000 possible numbers. - if one event has happened the chance of the second one overlapping is
1/86,400,000. - the chance of the third event overlapping is
2/86,400,000.
And here is where we're stuck; I don't think we can simply add the first chance 1/86,400,000 to the second chance of 2/86,400,000, because that would mean repeating this pattern could lead to a chance of more than 100% with less than 86,400,000 events per day. For example. Let's say that the event has happend 43,200,000 times already (half the number of milliseconds) and we still have no overlap. Another one happening would have a chance of 50%. But that is also false, because if it already happened so many times, and the still haven't overlapped, they are all in the past and only the last one could have happened in the same millisecond.
As you can see we're math's noobs and we're totally stuck. Could anybody help us out calculating the chance of two events happening in the same millisecond if we have 100 randomly timed events per day?
And further; what is the chance of two events happening in the same millisecond calculated over a full year?
$\endgroup$3 Answers
$\begingroup$This is essentially the classical Birthday Problem, which asks for a number $n$ of people with uniformly distributed birthdays the probability that at least two share a birthday (in an annual calendar of $d$ days). In this problem, the people are replaced by events ($n_0 = 100$) and the days are replaced by the millisecond intervals ($d_0 = 8.64 \cdot 10^7$).
The probability of an overlapping event (common birthday) on any given day is $$P(n, d) = \frac{d!}{d^n (n - d)!} ,$$ and in the regime $n \ll d$ (which certainly applies for our $n_0$ and $d_0$) this quantity is well-approximated by$$P(n, d) \approx 1 - \exp\left(\frac{n^2}{2 d}\right) .$$
Substituting our particular values, the probability is $$P(n_0, d_0) \approx 5.79 \cdot 10^{-5} .$$
Since this is probability is very small, the probability of an overlap occuring in a given year of $365$ days is $$1 - (1 - P(n_0, d_0))^{365} \approx 365 P(n_0, d_0) \approx 2.11\% .$$
(Instead of using the above approximations, one can use a suitable CAS to compute exact values for these figures, which to three significant figures respectively work out to be $5.73 \cdot 10^{-5}$ and $2.07\%$.)
$\endgroup$ 7 $\begingroup$The most basic "reasonable" model for waiting times between events is the memoryless model with a constant rate. Here the waiting times between events are independent exponential random variables with a fixed rate parameter say $\lambda$, which is the reciprocal of the average time between events. In this case the number of events in a given time interval of length $t$ is Poisson distributed with mean $\lambda t$. So in your case, in units of events per millisecond,
$$\lambda=\frac{100}{24 \cdot 3600 \cdot 1000} \approx 1.2 \cdot 10^{-6}.$$ The probability that you will have one or zero events in a single time interval of length $t$ is then
$$e^{-\lambda t} \left ( 1 + \lambda t \right ).$$
The probability that this happens for $n$ non-overlapping intervals all of length $t$ is
$$e^{-n\lambda t} \left ( 1 + \lambda t \right )^n.$$
So for instance you can plug in $\lambda$ as above (you should ideally not round it),$t=1$, and $n=86,400,000$ to get the probability of no collisions in a full day. It turns out to be about $1-5.8\cdot 10^{-5}$. On the other hand the probability of no collisions in a full year is only about $0.979$.
Note that this is not the only model out there by any means. Also note that this model predicts significant fluctuations in the number of events in a given day. The maximum typical fluctuation will be about 30 (3 standard deviations). This might be undesirable for your purpose. The memoryless model has only one free parameter, so there is no way to tune an additional "variance" parameter to tighten up the distribution. But models with memory can have such additional parameters.
$\endgroup$ $\begingroup$The probability that one or more of the daily events overlaps with another is $100\% - \operatorname{P}(\text{0 events overlap})$. We will use the rule of product to help us count. Let $p=\left.1\middle/\left(8.64\times10^7\right)\right.$.
- When the first event occurs does not matter.
- The probability that the second event does not overlap with the first is $1-p$.
- The probability that the third event does not overlap with the second is $1-2p$. We subtract $2p$, because the third event has $2$ time slots it must avoid.
This pattern continues all the way up to $1-99p$. To get the probability of all of these happening—that is, that none of the events overlap—we have to multiply them. You would enter this into a graphing calculator as follows:
$$\prod_{k=1}^{99}\left( 1-kp \right)$$
When rounding, this comes out to $$\approx 0.999\,427\,099\,52$$ which means the the probability that at least one event overlaps with another is $$\approx0.000\,057\,290\,047\,521\,8$$
Say you wanted to extend this to a time period consisting of $m$ milliseconds during which $e$ of these events occur. The general formula for the probability that none of them overlap is
$$1-\prod_{k=1}^{e-1}\left( 1-\frac{k}{m} \right)$$
$\endgroup$ 2