Dice

"Rolling a fair, six-sided die" often has the implication of a uniform distribution over the set \(\{1,2,3,4,5,6\}\). But which part of that statement induced the constraint on the die faces? Are some die... more fair than others?

We begin by working through some of the mathematics used to describe dice, in pursuit of analyzing a rollercoaster of a dice tournament run by my friend.



Terminology

[A fair, six-sided, standard die]
  • A fair die entails that each face has an equal probability of being rolled.
  • An \(n\)-sided die is a die with \(n\) faces.
  • A standard die hereby refers to a die whose faces are the set \(\{1,2,\ldots,n\}\), where \(n\) is the number of faces.
Taken together, a fair, six-sided, standard die is the die we are most familiar with, with faces \(\{1,2,3,4,5,6\}\). Take notice of our insistence to use the word "standard".

[A fair, six-sided die] A fair, six-sided die is often taken as nominally standard. Defining a standard die is clunky, and is often omitted. Here though, we define a fair, six-sided die as a die with six faces \(\{a_1,\ldots,a_6\}\) such that each face has an equal \(\frac{1}{6}\) probability of being rolled. There are no restrictions on the values of \(a_1,\ldots,a_6\).
Yes, our non-standard die still define a valid probability distribution. Cry about it. (You don't need to read that.)


Properties of Dice

[Expectation] The expected value of a random variable \(X\) is the weighted average of all possible values of \(X\), weighted by their respective probabilities.

Traditionally, we represent the probability distribution of \(X\) as a function \(f_X(x)\) that maps a value \(x\) to its probability \(f_X(x)\). This allows us to write the expected value of \(X\) as \[\mathbb{E}[X]=\sum_{x\in\Omega}xf_X(x),\] where \(\Omega\) is the set of all possible values of \(X\), or the sample space.

In our case, \(\Omega\) is just the faces of the die. Moreover for an \(n\)-sided fair die, \(f_X(x)=\frac{1}{n}\), as all faces are equally likely.
The expected value of a fair, six-sided, standard die is thus \[\mathbb{E}[X]=\sum_{x=1}^{6} x\cdot \frac{1}{6} = \frac{1}{6}(1+2+3+4+5+6)=3.5\]

[Order Statistics] Given random variables \(X_1,\ldots,X_n\) all independent with identical distribution, the order statistics \(X_{(1)},\ldots,X_{(n)}\) are defined by sorting the realizations (observed values) of \(X_1,\ldots,X_n\) in ascending order.

For example, if we are to roll a fair, six-sided, standard die three times, we would have three random variables \(X_1,X_2,X_3\), each representing one roll. If our outcomes are realized to be \(x_1=3,x_2=1,x_3=5\), then the order statistics are realized as \(x_{(1)}=1,x_{(2)}=3,x_{(3)}=5\).

What is interesting though, is that the order statistics of random variables are themselves random variables. We may define their distributions and compute their expected values.

[Distribution of Order Statistics] Let \(X_1,\ldots,X_n\) be independent and identically distributed random variables with common distribution \(F(x)\) and common density \(f(x)\). Find the distribution of the \(j\)-th order statistic \(X_{(j)}\).

Proof. The distribution of a random variable \(X\) is defined by its cumulative distribution function (CDF) \(F_X(x)\), the probability that \(X\) is less than or equal to \(x\). So in this case, we may compute the CDF of \(X_{(j)}\) as \(F_{X_{(j)}}(x) = P(X_{(j)}\leq x).\)

But by definition, \(X_{(j)}\leq x\) if and only if at least \(j\) of the \(X_1,\ldots,X_n\) random variables are less than or equal to \(x\). We can compute this probability by viewing it as a binomial distribution, where the probability of success is \(F(x)\) and the number of trials is \(n\). Thus, we have \[F_{X_{(j)}}(x) = P(X_{(j)}\leq x) = \sum_{k=j}^{n}\binom{n}{k} F(x)^k (1-F(x))^{n-k}.\] If you are not familiar with binomial distributions, notice that \(F(x)^k\) is the probability that \(k\) of the \(X_1,\ldots,X_n\) random variables are less than or equal to \(x\), and \((1-F(x))^{n-k}\) is the probability that the remaining \(n-k\) random variables are greater than \(x\). Then, we sum from \(k=j\) to \(n\) for all amounts of \(k\) such that at least \(j\) random variables are less than or equal to \(x\), and \(\binom{n}{k}\) counts how many ways we can choose which \(k\) random variables were less than or equal to \(x\). \(\blacksquare\)

[Density of Order Statistics] Let \(X_1,\ldots,X_n\) be independent and identically distributed random variables with common distribution \(F(x)\) and common density \(f(x)\). Find the density of the \(j\)-th order statistic \(X_{(j)}\).

Proof. The density of a random variable \(X\) is the function discussed in the previous section, \(f_X(x)\), which maps values \(x\) in the sample space to their probability \(f_X(x)\).

We can do a similar combinatorial argument here. We want to find the probability that \(X_{(j)}\) is equal to some value \(x\). This entails that exactly \(j-1\) of the \(X_1,\ldots,X_n\) random variables are less than or equal to \(x\), while \(n-j\) random variables are greater than \(x\), and with the final remaining random variable being equal to \(x\).

Thus, we have that the density of \(X_{(j)}\) is \[f_{X_{(j)}}(x) = \binom{n}{j-1, 1, n-j} F(x)^{j-1} (1-F(x))^{n-j} f(x).\] The multinomial coefficient \(\binom{n}{j-1, 1, n-j}\) is an extension of the binomial coefficient in the section before, and counts the number of ways we can divide the \(n\) random variables into three groups:
  • \(j-1\) random variables less than or equal to \(x\)
  • 1 random variable equal to \(x\)
  • \(n-j\) random variables greater than \(x\)
The interpretations of \(F(x)^{j-1}\) and \((1-F(x))^{n-j+1}\) are analogous to before, but we also have the additional \(f(x)\) term, for the probability that the final remaining random variable is equal to \(x\). \(\blacksquare\)

[The Maximum of Dice Rolls] Suppose we roll a fair, six-sided, standard dice \(n\) times. What is the expected value of the highest roll?

Proof. We can describe the \(n\) rolls as \(n\) independent and identically distributed random variables \(X_1,\ldots,X_n\), where each \(X_i\) is a random variable representing the value of the \(i\)-th roll. Then, the maximum of the \(n\) rolls is the \(n\)-th order statistic \(X_{(n)}\).

We can compute the expected value of \(X_{(n)}\) by using the density of order statistics from the previous section. \[\mathbb{E}[X_{(n)}] = \sum_{x=1}^{6} x \binom{n}{n-1, 1, n-n} F(x)^{n-1} (1-F(x))^{n-n} f(x).\] Notice that since \(j=n\), a lot of the terms in the density of order statistics cancel out, leaving us with the much simpler \[\mathbb{E}[X_{(n)}] = \sum_{x=1}^{6} x n F(x)^{n-1} f(x).\] Also, we know that \(f(x)=\frac{1}{6}\), and it is easily thought through how \(F(x)=\frac{x}{6}\). So we can substitute these in, giving the workable \[\mathbb{E}[X_{(n)}] = \sum_{x=1}^{6} x n \left(\frac{x}{6}\right)^{n-1} \frac{1}{6}.\] Now, we can just rearrange the terms to get a simple result: \[\mathbb{E}[X_{(n)}] = \frac{n}{6^n} \sum_{x=1}^{6} x^{n}.\quad \blacksquare\]


Beginning our Descent into Madness

[Introduction to the Tournament] My friend's tournament was proposed as follows:
  • You can put any positive rational numbers on each of the 6 sides of the dice, as long as all the numbers add up to Exactly 21.
  • You may not put 3.5 on every face (to prevent a potential situation where there cannot be a winner).
"I will create a randomly seeded single elimination bracket tournament (depending on number of participants, this format might change) with all participants, and we'll see who is the luckiest along the way! The basic gameplay is that I roll each of the dice and we see who gets a higher number to see who moves on. There might be some undisclosed twists and turns along the way as we get further into the bracket! (However I will never reverse the core gameplay mechanic of higher number is better).
You may submit up to 3 dice, which may be the same or different, it's up to you! This is to prevent someone from being very unlucky and eliminated too early."

Exactly 21 is, of course, a reference to the sum of the numbers on the faces of a fair, six-sided, standard die. Is merely sharing the same sum enough to make most dice similar? How much does the game-theoretic component of the tournament matter?
I was honestly fascinated, and decided to submit my dice.

[My Submission] So, I only learned about this tournament the night before the deadline, and wasn't exactly in a place to bash out calculations. I decided to just submit off my intuition, and hope for the best. My submission was as follows:
  • \(\{4/3, 7/3, 13/3, 13/3, 13/3, 13/3\}\)
  • \(\{1/2, 3/2, 7/2, 7/2, 9/2, 15/2\}\)
  • \(\{1/5, 1, 16/5, 16/5, 26/5, 41/5\}\)
My thought process was to mostly make balanced dice, maximizing my likelihood to beat unbalanced dice that often had a high number on one face, but low numbers on the rest. In hindsight, I should have thought less about the dice themselves, and more about the tournament format... But more on this later.

[Revisiting Maximum of Dice Rolls] Let's revisit the maximum of \(n\) dice rolls, but for non-standard die. The only thing that changes are the faces and distribution \(F(x)\), which is no longer \(\frac{x}{6}\). Instead, we must compute the distribution as \[F(x) = P(X\leq x) = \sum_{a_i\leq x} \frac{a_i}{6},\] where \(a_i\) is the value on the \(i\)-th face of the die.

If we substitute this into the expected value of the maximum of \(n\) dice rolls, we get \[\mathbb{E}[X_{(n)}] = \frac{n}{6^n} \sum_{i=1}^{6} a_i \Big(\sum_{a_i\leq x} a_i\Big)^{n-1}.\quad \blacksquare\]

[Rolls] In the case of a single roll, we have \(n=1\), and this reduces to \[\mathbb{E}[X_{(1)}] = \frac{1}{6} \sum_{i=1}^{6} a_i = \mathbb{E}[X],\] which makes sense. The expected value of the maximum of one roll, is just the expected value of one roll.
Moreover, we have the constraint that the sum of the faces must be 21, so \[\mathbb{E}[X] = \frac{1}{6} \sum_{i=1}^{6} a_i = \frac{21}{6}.\quad \blacksquare\]
But we don't care about single rolls in this tournament. Even in the simplest situation, we have 2 rolls, one for each player.

Unfortunately though, the two dice do not have a common distribution. We could define the faces of the dice as \(a_i\) and \(b_i\) and proceed with computation, but this would be a mess. Things are getting out of hand fast, and we need a more efficient representation.

[Rolloff] The steps we need are:
  • Find a robust way to look at the maximum of \(n\) rolls fr arbitrary faces
  • Find a robust way to compare two dice
  • Find a robust way to consider the sum of dice
Notes: Since all dice are rational, we can reduce the problem to integers by multiplying by the least common denominator. Thank god.

[Generating Functions]
"A generating function is a clothesline on which we hang up a sequence of numbers for display."
— Herbert Wilf, Generatingfunctionology (1994)