Sums of powers of naturals

Given the natural numbers, we can do basic arithmetic. For example, we can add up all the natural numbers less than n to get

sum(n): ≡ sum(: i ←i :n); ≡ 0 +1 +… +n−1
add that to a second copy of itself, but with the terms in reverse order, and divide by two; this may seem complicated, but it clearly gets us the same answer:; = ((0 +1 +… +n−1) +(n−1 +… +1 +0))/2
now interleave the first half with the second:; = ((0 +n−1) +(1 +n−2) +… +(n−1 +0))/2
but now each pair of terms adds one drawn from the increasing sequence 0, 1, … to one drawn from the decreasing sequence, n−1, n−2, …, hence each pair adds up simply to n−1; and there are n such pairs, so the total comes to:; = n.(n−1)/2

and, since either n is even (so n/2 is whole) or n is odd (so (n−1)/2 is whole), this always is a whole number, despite dividing by 2. We can confirm this result inductively. First notice that sum(0) = sum({}) has no values to add together, so it's 0, which is indeed 0.(0−1)/2, in so far as we can make any sense of 0−1 (which isn't a natural thing to do to the natural numbers – but we can do the arithmetic in the integers, which subsume the naturals); in case that leaves anyone feeling nervous, we can separately look at sum(1) = sum({0}) = 0 because that's the single term being summed; and 1.(1−1)/2 is indeed 0. Now suppose we have some n (e.g. 0 or 1) for which sum(n) is n.(n−1)/2; then sum(1+n) = n+sum(n) = n +n.(n−1)/2 = n.(2 +n−1)/2 = n.(n+1)/2 = (n+1).((n+1)−1)/2, so our formula works for n+1 whenever it works for n; but, given that it works for 0 (and 1) this implies, inductively, that it works for any natural n.

Chosen Sums

Well, that was nice and easy; so how about summing the squares of the naturals less than n, for given n ? Well, I can't think of a trick like the duplicate in reverse, add pairwise and halve trick that got us the right formula for the sum of the naturals themselves, so let's see if we can come up with an inductive approach. (For reference, there is a beautiful way to prove it geometrically, exhibited by this video.) In fact, it turns out there's a slightly more complex-looking problem that, with a little ingenuity, is easy to solve, and from which we can infer the sum of squares; best of all, it generalizes nicely to give us the sum of any natural power. (As before for 0−1, we'll have some negative integers show up; you can chose to do arithmetic in the integers to resolve these, or you can note that they always arise in terms that have at least one factor of 0, meaning that we could have shuffled the sets over which we're summing to simply omit the terms that have a 0 factor, hence also all the terms that involve an unnatural term in them.) So, instead of sum(: i.i ←i :n), let's look at:

sum(: i.(i−1) ←i :n): = sum(: i.(i−1).(i+1 − (i−2))/3 ←i :n)
because the extra factor is just a long-winded way of writing 3/3; but we can now re-arrange this to:; = sum(: (i+1).i.(i−1) −i.(i−1).(i−2) ←i :n)/3; = (sum(: (i+1).i.(i−1) ←i :n) −sum(: j.(j−1).(j−2) ←j :n))/3
Now, the j = 0 term in the second sum is zero, so we can discard it; thus each j for which the second sum contributes a non-zero value is 1+i for some i – and look what happens if I write j = 1+i:; = (sum(: (i+1).i.(i−1) ←i :n) −sum(: (i+i).i.(i−1) ←1+i :n))/3
the two terms are summing the same values ! Of course, the second is summing over a different set of values for i, but we can at least cancel all the terms from values of i that show up for both sums; that is, every i in n for which 1+i is also in n; which is, in fact, every i in n except i = n−1, so this difference of sums just reduces to:; = n.(n−1).(n−2)/3

and we can get sum(: i.i ←i :n) as sum(: i.(i−1) +i ←i :n) = n.(n−1).(n−2)/3 +n.(n−1)/2 = n.(n−1).(2.n−1)/6. Note that, once again, we divide n.(n−1).(n−2) by 3 but necessarily get a whole number because n, n−1 and n−2 are three consecutive integers, so one of them is a multiple of three.

Well, I said we could generalize that to get the sum of any power; you might have guessed already that I'll do it by actually summing products of successive integers, instead of powers; once we know the sums for powers up to k, we can compute the sum of products of k+1 terms, i.(i−1)…(i−k), and deduce the sum of (k+1)-th powers from this, using our earlier knowledge of the sums of powers up to k. Now, because we're dealing with products of consecutive naturals, it actually simplifies matters to introduce a function (the number of subsets having m members in a set with n members; but I'll here use the transpose of the usual chose(n, m) = C(m, n)) related to Pascal's triangle:

C = (: ({naturals}: n!/(n−m)!/m! ←n |{naturals}) ←m |{naturals})

which lets us re-write the previous result as sum(: C(2) :n) = C(3, n) and (since C(1) is simply the identity on {naturals}) the initial sum(n) result as sum(: C(1) :n) = C(2, n); indeed I'll now show that, in general, sum(: C(k) :n) = C(1+k, n). Crucially, C(k+1, i+1) is the number of ways of chosing k+1 distinct members from a set with i+1 members; if we pick one of the i+1 members of the set and ask whether it's chosen, we can split this into C(k, i) ways of chosing the other k members of the set, when our picked member is one of the chosen, plus C(k+1, i) ways of chosing all k+1 when our picked member isn't one of the k+1 chosen. Thus C(k+1, i+1) = C(k, i) +C(k+1, i), from which we can infer C(k, i) = C(k+1, i+1) −C(k+1, i). Summing this:

sum(: C(k) :n): = sum(: C(k+1, i+1) − C(k+1, i) ←i :n); = sum(: C(k+1, i+1) ←i :n) − sum(: C(k+1, i) ←i :n)
and, as before, C(k+1, 0) is 0, so we can ignore the i=0 term in the second sum and replace it with sum(: C(k+1, j+1) ←j+1 :n), which cancels every term in the first sum that has i+1 in n, leaving only the i = n−1 term:; = C(k+1, n)

This formula turns out useful in a whole mess of different places. To take one example, suppose you have sets of size k that are subsets of some natural n, with 0 < k ≤ n; you can ask what's the largest member of each such subset; so you can ask what's the average of the largest, over all such subsets. For the largest to be m in n, we must have picked m as one member of our subset and k−1 members of m as the others; there are C(k−1, m) ways to do that. When we sum this over all possible values of m in n, we'll get the C(k, n) ways to pick k members of n. The average of the largest value is then simply sum(: m.C(k−1, m) ←m :n)/C(k, n). Now, (1+m).C(k−1, m) = (1+m).m!/(m+1−k)!/(k−1)! = C(k, 1+m).k, so our average can be written (averaging one more than the largest element and subtracting one at the end):

sum(: (1+m).C(k−1, m) ←m :n)/C(k, n) −1: = k.sum(: C(k, 1+m) ←m :n)/C(k, n) −1
in which the sum has C(k, i) for m+1 = i from 1 to n, omitting i = 0; but C(k, 0) is zero unless k = 0, when the fact that we multiply the sum by k makes it irrelevant; so we can extend the sum to i from 0 to n, i.e. i in 1+n:; = k.sum(: C(k) :n+1)/C(k, n) −1; = k.C(k+1, n+1)/C(k, n) −1; = k.(n+1)/(k+1) −1; = (k.n −1)/(k+1)

Likewise, whem m is the smallest member of our set of k members of n, the other k−1 members of the set are in n but > m; and there are n−1−m candidates for this, so we have C(k−1, n−m−1) ways to realise this, whose sum over m in n is again C(k, n), so the average of the smallest is sum(: m.C(k−1, n−m−1) ←m :n)/C(k, n) and the substitution j = n−1−m turns this into n−1 −sum(: j.C(k−1, j) ←j :)/C(k, n) which is just n−1 minus the average of the highest member (as should, in fact, be no surprise), which makes the average of the lowest member n−1 −(k.n −1)/(k+1) = (n−k)/(k+1) = (n+1)/(k+1) −1.

Summing Powers

As promised, we can also use our formula for summing C to obtain the sums of powers of naturals:

sum(: i ←i :n): = sum(n) = sum(:C(1):n) = C(2, n) = n.(n−1)/2
sum(: i.i ←i :n): = sum(: 2.C(2) +C(1) :n) = 2.C(3, n) +C(2, n); = n.(n−1).((n−2)/3 + 1/2); = n.(n−1).(2.n−1)/6 = sum(n).(2.n −1)/3
sum(: power(3) :n) = sum(: i.i.i ←i :n): = sum(: 6.C(3) :n) +3.sum(: i.i ←i :n) −2.sum(n); = 6.C(4, n) +n.(n−1).(2.n−1)/2 −n.(n−1); = (n.(n−1)/2).((n−2).(n−3)/2 +2.n−1 −2); = n.n.(n−1).(n−1)/4 = sum(n)²

which gets rather fiddly to work out, so naturally I wrote some code to do the job for me (the .PowerSum(n) constructor of study.maths.polynomial.Polynomial); here are the next few (asuming I've finally fixed all the bugs in the code, of course):

sum(: power(4) :n): = n.(n−1).(2.n−1).(3.n.n −3.n −1)/30; = sum(: power(2) :n).(3.n.n −3.n −1)/5
sum(: power(5) :n): = n.n.(n−1).(n−1).(2.n.n −2.n −1)/12; = sum(: power(3) :n).(2.n.n −2.n −1)/3
sum(: power(6) :n): = sum(: power(2) :n).(3.n.n.n.n −6.n.n.n +3.n +1)/7; = 3.sum(: power(2) :n).(n.n −n.(1 −√(2.√(31/3) +6)/2) +(√(31/3) +1)/4 −√(2.√(31/3) +6)/4).(n.n −n.(1 +√(2.√(31/3) +6)/2) +(√(31/3) +1)/4 +√(2.√(31/3) +6)/4)/7
sum(: power(7) :n): = sum(: power(3) :n).(3.n.n.n.n −6.n.n.n −n.n +4.n +2)/6
sum(: power(8) :n): = sum(: power(2) :n).(5.n.n.n.(n.n.n −3.n.n +n +3) −n.n −9.n −3)/15

Using these, we can of course also sum any polynomial over any arithmetic sequence (that is, any sequence in which the difference between successive entries is constant). Note that (although the way I've displayed the sum polynomials above doesn't make it obvious) the leading order term in sum(: power(k) :n) is power(k+1, n)/(k+1); this can fairly easilly be derived, independently, by observing that the difference between its values at n and 1+n is just power(n, k), the single term added to the sum.

Statistics for fair dice

We can use the results above to obtain the mean and variance for uniform discrete random variates with values ranging from 1 to n (i.e. fair n-sided die-rolls), as functions of natural n. The mean is sum(1+n)/n = (1+n)/2 and the expected value of the square is sum(: i.i ←i :1+n)/n = (n+1).(2.n+1)/6, giving a variance of

(n+1).(2.n+1)/6 −(1+n).(1+n)/4: = (4.n +2 −3 −3.n).(n+1)/12; = (n−1).(n+1)/12; = (n.n −1) / 12

yielding, for the dice commonly used in rôle-playing games:

die	d4	d6	d8	d10	d12	d20
mean	2.5	3.5	4.5	5.5	6.5	10.5
variance	5/4	35/12	21/4	33/4	143/12	133/4
standard deviation	1.12	1.71	2.29	2.87	3.45	5.77

Note that, when you add several die rolls, the sum's mean and variance are the sums of the means and variances of the individual rolls added; the standard deviation of the sum is then the square root of its variance and the typical spread is one or two standard deviations either side of the mean. Thus 3d6 has mean 3×3.5 = 10.5, variance 3×35/12 = 35/4 = 8.75 and standard deviation 2.96, so nearly always falls in the range from 5 to 16 (for 3 and 18 the probabilities are 1/216; for 4 and 17 they are 1/72; for a total of 8/216 = 1/27; so nearly always, here, is over 96% of the time) and usually falls in the range from 8 to 13. For 3d4+3, we get mean 10.5 and variance 3.75, hence standard deviation 1.94, so results are usually in the range 9 through 12 and nearly always in the range 7 through 14 (with probability 1/64 for each of 6 and 15, so nearly always is almost 97% in this case).

Sums of inverted powers

Once we have rationals, we can consider sum(: power(−n) :N) for assorted naturals n and N; in some cases, we can even allow N = {naturals}.

One related sum is 1 −1/3 +1/5 −1/7 … or sum(: power(i, −1)/(2.i +1) ←i |N) for various N. (Fun fact: for large N, this sum gets close to π/4.)

Written by Eddy.