**Combinatoric selections**

**Problem 53**

There are exactly ten ways of selecting three from five, 12345:

123, 124, 125, 134, 135, 145, 234, 235, 245, and 345

In combinatorics, we use the notation, ^{5}C_{3} = 10.

In general,

^{n}C_{r} =*n*! / !(*n−r*)!,

where *r* ≤ *n*, *n*! = *n*×(*n*−1)×…×3×2×1, and 0! = 1.

It is not until *n* = 23, that a value exceeds one-million: ^{23}C_{10} = 1144066.

How many, not necessarily distinct, values of ^{n}C_{r}, for 1 ≤ *n* ≤ 100, are greater than one-million?

Somebody just asked me “Is there any Euler solution that you are particularly proud of?”. I am sure I have written something nice in the 50+ problems that I have solved till now however, I could not recollect any solution at that moment that I could be proud of. Anyway, this is one of the solution that I am particularly proud of. Although, computationally I was not able to made it super-fast but at least I changed it to something which took infinite amount of time to something which is acceptable time.

The base R `combn`

function gives us all the combinations taken m at a time. So

combn(5, 3) gives
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 1 1 1 1 1 1 2 2 2 3
[2,] 2 2 2 3 3 4 3 3 4 4
[3,] 3 4 5 4 5 5 4 5 5 5

This is all the possible combinations taking m (3) at a time. Here, we do not need all the combinations but just the count of number of combinations which we can get by `ncol(combn(5, 3))`

which returns 10.

Now the brute force approach is to write a two for loop and calculate number of elements for which `ncol(combn(i, j)) > million`

. Just for comparison I would have even included this example but it becomes computationally very expensive as it progresses. Moreover, it cannot even count for large numbers. For eg :- `ncol(combn(30, 17))`

gives an error : `Cannot allocate vector of size 7.6 gb`

. This was just 30 and we need to calculate values till 100. I definitely needed some better approach.

I then started to find out some patterns to calculate these combinations. So If we check values for all the values of 10 i.e combn(10, 1), combn(10, 2)….

sapply(seq(10), function(x) ncol(combn(10, x)))
#[1] 10 45 120 210 252 210 120 45 10 1

So, if you observe the values follow a circular path, they start on some minimum value, keeps on increasing till it reaches maximum value and then values started decreasing again ending at 1. For 10, value[5] is max value, value[4] = value[6], value[3] = value[7] and so on.

For 12 –

sapply(seq(12), function(x) ncol(combn(12, x)))
#[1] 12 66 220 495 792 924 792 495 220 66 12 1

Similarly, for 12, value[6] is maximum, value[5] = value[7] and so on. So you see the pattern, right? This is all for even values, let’s check for some odd values.

sapply(seq(11), function(x) ncol(combn(11, x)))
#[1] 11 55 165 330 462 462 330 165 55 11 1

For 11, we get max value at two positions, value[5] and value[6], then value[4] = value[7], value[3] = value[8] and so on.

sapply(seq(13), function(x) ncol(combn(13, x)))
#[1] 13 78 286 715 1287 1716 1716 1287 715 286 78 13 1

Similarly, for 13 we get max high value at value[6] and value[7] and later it keeps on forming pair across different indexes as shown in previous cases.

In the question, we need to find number of values which take value more than a million. Now based on the above observation can we devise some formula which will calculate the number of values which can be above some threshold instead of calculating `ncol(combn(x, y))`

for each number. For simplicity, in the above examples I assumed that the threshold is 100 and tried finding number of values which are greater than the threshold.

Let x be the first index which exceeds value of the threshold. Now , find how far this value is from the middle value because as the numbers are equally distributed, the number of values greater than threshold at left of mid is going to be equal to the value right of mid.

For 10, x = 3, mid = 5, end = 7

For 12, x = 3, mid = 6, end = 9

So for even numbers, number of values greater than 100 are 5 and 7 respectively. We can make a general formula for even numbers by `2 * (mid – x) + 1`

For 10, 2 * (5-3) + 1 = 5 and

For 12, 2 * (6 – 3) + 1 = 7.

For 11, x = 3, mid = floor(11/2) = 5, end = 8

For 13, x = 3, mid = floor(13/2) = 6, end = 10

For odd numbers, number of values greater than 100 are 6 and 8 respectively. We can make a general formula for odd number by `2 * (mid – x) + 2`

For 11, 2 * (5-3) + 2 = 6

For 13, 2 * (6-3) + 2 = 8.

I tried and studied so many examples of both odd and even numbers before coming up with this formula. The 2 of them shown here are just for illustration purpose only. So for each combination, we need to find the index of that one number which turns out the output of `ncol(combn..`

to be over a million and then we can calculate the number of total values over threshold directly without calculating any other values. This was a cool trick exploiting the property of combinatorics.

get_combinatoric_selection <- function(max_number, limit) {
counter = 0
for(i in seq(23, max_number)) {
for(j in seq(i)) {
if(ncol(combn(i, j)) > limit) {
mid = floor(i/2)
if (i %% 2 == 0) {
counter = counter + (2*(mid - j) + 1)
}
else {
counter = counter + (2*(mid - j) + 2)
}
break
}
}
}
}

Made it as a separate function which accepts the `max_number`

(upper limit) and the `limit`

(threshold) value.

system.time(get_combinatoric_selection(100, 1000000))
#user system elapsed
#303.96 2.14 306.78
print(counter)
#[1] 4075