My Blog

Euler Problem 27 – Quadratic Primes

Euler published the remarkable quadratic formula:

n² + n + 41

It turns out that the formula will produce 40 primes for the consecutive values n = 0 to 39. However, when n = 40, 402 + 40 + 41 = 40(40 + 1) + 41 is divisible by 41, and certainly when n = 41, 41² + 41 + 41 is clearly divisible by 41.

Using computers, the incredible formula n² – 79n + 1601 was discovered, which produces 80 primes for the consecutive values n = 0 to 79. The product of the coefficients, -79 and 1601, is -126479.

Considering quadratics of the form:

n² + an + b, where |a| < 1000 and |b| < 1000

Where |n| is the modulus/absolute value of n
e.g. |11| = 11 and |-4| = 4

Find the product of the coefficients, a and b, for the quadratic expression that produces the maximum number of primes for consecutive values of n, starting with n = 0.

Basically, we need to find a quadratic equation of the form n2 + an + b which produces maximum number of primes for consecutive values of n. The range of values which a can take is from -999 to 999 and that b can take goes from -1000 to 1000. After we find that perfect/longest quadratic equation which generates the primes we need to find the product of a and b.

Even after solving 26 questions, I haven’t improved as such in my thinking. I mean the first approach which comes to my mind is the donkey approach (yep that’s what I call the approach which requires no thinking). So the approach would be to check all possible combinations of a, b within themselves and check which combination can reach the maximum n generating primes. Make a note of maximum of n, a and b & update them if you find another n greater than the previous one.

I took some time to think of another optimal solution but couldn’t think of any. So I just decided to proceed with this approach for now and maybe after some time I would get any idea or a better solution. So my first raw version of the solution was :

source("/Users/ronakshah/Google Drive/Project Euler/3.Check_if_Prime.R")
get_quadratic_primes = function() {
max_n = 0
for (a in seq(-999, 999)){
  for (b in seq(-1000, 1000)) {
    n = 0
    is_number_prime = TRUE
    while(is_number_prime) {
      generate_next_number = n^2 + (a * n) + b
      print(paste(a, b, n, generate_next_number))
      if(generate_next_number > 0) {
        if(is_prime(generate_next_number)) {
          n = n + 1
       }
      else {
        is_number_prime = FALSE
        if(n > max_n) {
          max_a = a
          max_b = b
          max_n = n
          print(max_n)
        }
      }
      }
      else {
        is_number_prime = FALSE
      }
    }
  }
} 
return(list(max_a = max_a, max_b = max_b, max_primes = max_n))
}

Here, I am using the is_prime function from Problem 3. It just returns whether a particular number is prime or not. Also another interesting thing which I came to know while solving this problem was that negative numbers are never prime. I was a little confused at first as these numbers also had negative numbers, I just confirmed it with a quick search. Also I received a couple of errors while solving this one in my is_prime function. I had not handled the cases for number 1 and 2 which I have added and updated.

Although I wasn’t concerned right now about the time but just to benchmark I checked, it came out to be

# user  system elapsed
#  192.892   3.167 200.767

This was expected as there were almost 2000 X 2000 iterations with a variable iteration of n . Also for all these numbers we call the is_prime function which further increases the computation. Although this isn’t optimal but it gave me the right answer which was

$max_a
[1] -61

$max_b
[1] 971

$max_primes
[1] 71

This combination of 971 and – 61 produces maximum number of primes for consecutive values of n which is 71.

Now thinking of optimizing this problem I google searched few of the approaches and turns out there is no trick or formula to solve this problem ‘intelligently’. All you can do is reduce some of the iterations which are obvious and not needed.

Case 1:

Substituting n = 0 in the expression: n2 + an + b we get 0 + 0 + b whose answer should be prime which means b is prime. So we should filter only those b’s which are prime.

Case 2:

Substituting n = 1 in the expression : n2 + an + b, we get  1 + a + b whose answer should also be prime. Now prime numbers are always odd (except 2) so for 1 + a + b to be prime when a is even, b has to be even to keep the sum as odd. Also when a is odd, b has to be odd to keep the sum odd. By this we would reduce half amount of iterations.

Incorporating these changes, we have

get_quadratic_primes_v3 = function() {
  max_n = 0
  for (a in seq(-999, 999)){
    for (b in seq(2, 1000)) {
      n = 0
      if(is_prime(b) & identical(a %% 2, b %% 2)) {
        is_number_prime = TRUE
        while(is_number_prime) {
          generate_next_number = n^2 + (a * n) + b
          print(paste(a, b, n, generate_next_number))
          if(generate_next_number > 0) {
            if(is_prime(generate_next_number)) {
              n = n + 1
            }
            else {
              is_number_prime = FALSE
              if(n > max_n) {
                max_a = a
                max_b = b
                max_n = n
                print(max_n)
              }
            }
          }
          else {
            is_number_prime = FALSE
          }
        }
      }
    } 
  }
  return(list(max_a = max_a, max_b = max_b, max_primes = max_n))
}

#user  system elapsed 
#63.822   1.088  67.377

The time reduced drastically from 200 seconds to 67 which is a good achievement in itself.

Euler Problem 26 – Reciprocal Cycles

Reciprocal cycles

Problem 26

A unit fraction contains 1 in the numerator. The decimal representation of the unit fractions with denominators 2 to 10 are given:

1/2 = 0.5
1/3 = 0.(3)
1/4 = 0.25
1/5 = 0.2
1/6 = 0.1(6)
1/7 = 0.(142857)
1/8 = 0.125
1/9 = 0.(1)
1/10 = 0.1

Where 0.1(6) means 0.166666…, and has a 1-digit recurring cycle. It can be seen that 1/7 has a 6-digit recurring cycle.Find the value of d < 1000 for which 1/d contains the longest recurring cycle in its decimal fraction part.

This was one tough question especially because it involved decimal numbers. The level of project Euler questions is increasing significantly. I am excited about the new questions coming up in the future, every question teaches so much.

I had a slight idea about computers facing precision issues with large numbers as already seen in problem 25 but this was different. For people who aren’t aware what do I mean by precision issue here is a Wikipedia article for the same, floating point arithmetic. This is not any language specific issue but every computer faces this issue. I have read a few articles and saw some videos about it though I haven’t got the entire thing but now I have a basic idea about it.

I had some basic unsuccessful attempts. I thought of using the gmp library for this problem as well. There was a function div.bigz which is used for division of big numbers. I couldn’t get the solution with it. Maybe I was not reading the document clearly or the function does something else. I also had another thought that 1/7 would give me same numbers as 1000000000000/7. So by doing this there would not be an issue of floating point precision as well. However, this wasn’t successful with base as well as with gmp library as well. It still had those floating point issues.

Finally after googling to understand the logic I came across this post which gives a better approach. To calculate recurring cycle we are taking into account the quotient instead we can also note the remainder. Since for same quotient there would be same remainder every time. For example, for 1/7 , it has quotient as 0 and remainder as 1. We then multiply the remainder with 10 and get the new remainder by dividing it by 7. This continues till we encounter a duplicate remainder in which case the cycle would just repeat after that. Also not every number has recurring cycles, some are perfectly divisible and end up getting 0 as remainder values so we can just ignore them.

find_longest_reciprocal_cycle_under_n = function(n) {
max_length = 1
reciprocal_list =  list(max_length = 0, max_num = 0)
for( i in 2:n) {
  unit_mod = 1
  latest_mod = 1
  while(!any(duplicated(unit_mod)) & latest_mod != 0)  {
    latest_mod = (latest_mod * 10) %% i
    unit_mod = c(unit_mod, latest_mod)
  }
  latest_length = length(unit_mod) - 1
  if (latest_length > reciprocal_list$max_length) {
    reciprocal_list$max_length = latest_length
    reciprocal_list$max_num = i
  }
}
return(reciprocal_list)
}

find_longest_reciprocal_cycle_under_n(1000)
#$max_length
#[1] 982

#$max_num
#[1] 983


system.time(find_longest_reciprocal_cycle_under_n(1000))
#user  system elapsed 
#0.783   0.027   0.817 

#with additional check of latest_mod != 0
system.time(find_longest_reciprocal_cycle_under_n(1000))
#user  system elapsed 
#0.841   0.018   0.866 

Euler Problem 25 – 1000-digit Fibonacci number

1000-digit Fibonacci number

Problem 25

The Fibonacci sequence is defined by the recurrence relation:

Fn = Fn−1 + Fn−2, where F1 = 1 and F2 = 1.

Hence the first 12 terms will be:

F1 = 1
F2 = 1
F3 = 2
F4 = 3
F5 = 5
F6 = 8
F7 = 13
F8 = 21
F9 = 34
F10 = 55
F11 = 89
F12 = 144

The 12th term, F12, is the first term to contain three digits.

What is the index of the first term in the Fibonacci sequence to contain 1000 digits?

This was going to be one big problem. Finding a Fibonacci number with 1000 digits in it !!!! I have been bragging about capacity of R in handling large numbers in some of the previous examples but this was big. I was aware that R can handle numbers around 320 digits but this was 1000 digits.

I wrote a function ignoring the fact that this is such a big number. I came up with this:

factorial_for_n_digits = function(no_of_digits) {

first_number = 1
second_number = 1
next_number = first_number + second_number
current_index = 3

while(nchar(next_number) != no_of_digits) {
first_number = second_number
second_number = next_number
next_number =  first_number + second_number
current_index = current_index + 1
}
return(current_index)
}

The function is simple, I keep on generating Fibonacci numbers until it is equal to no_of_digits passed as parameter in the function. I am also keeping track of the index of the number with current_index as ultimately the question is about finding the index.

As usual, I tested my function first on a smaller example. So I did factorial_for_n_digits(3) which gave me 12 which is right because 144 is the 12th Fibonacci number and it is the first number which has 3 digits. Going further, I decided to try this function for 1000 digit. So when I did factorial_for_n_digits(1000), it was running for a long time. 2 mins, 10 mins, 15, 30 an hour, couple of hours, 3 hours………..At first, I thought its OK to take so much of time because a number with 1000 digits is a big number and it might take some time for the process to complete but when it crossed 3 hour boundary I was sure there was something wrong.

Definitely must be something with that big number thing because this function was working fine with the other example. Now back to basics, when we type any number in R console it returns the same number back. For example try type 12 it will return 12 back. Now when I type
10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

in my R console, it returns Inf as output. So that is the problem. After some time when the line

next_number =  first_number + second_number

Generates a bigger number which R cannot handle. The value of next_number becomes Inf and when we check nchar(Inf) it returns 3 so the condition is not satisfied and it again goes inside the loop. The next_number is Inf again and this cycle continues and it goes in infinite loop. I had already used gmp package last time to handle large value as this. So a minor change in the code and it worked.

library(gmp)
factorial_for_n_digits = function(no_of_digits) {

first_number = 1
second_number = 1
next_number = first_number + second_number
current_index = 3

while(nchar(as.character(next_number)) != no_of_digits) {
first_number = second_number
second_number = next_number
next_number = as.bigz(first_number) + as.bigz(second_number)
current_index = current_index + 1
}
return(current_index)
}

This took a very less time.

#system.time(factorial_for_n_digits(1000))
#user  system elapsed
#0.143   0.002   0.147

While researching for this problem, I also came across a formula to calculate nth Fibonacci number. It is called as Binet’ s formula

where n is the index of the Fibonacci number. The nth Fibonacci number.

Euler Problem 24 – Lexicographic permutations

Lexicographic permutations

Problem 24

A permutation is an ordered arrangement of objects. For example, 3124 is one possible permutation of the digits 1, 2, 3 and 4. If all of the permutations are listed numerically or alphabetically, we call it lexicographic order. The lexicographic permutations of 0, 1 and 2 are:

012   021   102   120   201   210

What is the millionth lexicographic permutation of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8 and 9?

Although this looks a bit complicated but I could find a one-liner solution (although it isn’t that efficient). The key here is to follow a lexicographic order. Here, it means as alphabetic order for numbers. So consider 0 as a , 1 as b and 2 as c. So all the possible permutations of a , b and c sorted in a lexicographic order is

     abc, acb, bac, bca, cab, cba

which is the same as explained in the example above.

We need to find all permutations for numbers 0 to 9 which has 10 numbers and will have 10! list of all possible permutations, sort them lexicographically, and select the one millionth permutation. There is practically 10! – 1 possibilities to get this wrong.

I have used the function permn here from combinat package. It gives all the possible permutations of a vector. Try

library(combinat)
permn(0:2)

[[1]]
[1] 0 1 2
[[2]]
[1] 0 2 1
[[3]]
[1] 2 0 1
[[4]]
[1] 2 1 0
[[5]]
[1] 1 2 0
[[6]]
[1] 1 0 2

However, if you noticed that although it returns all the permutations of the vctor, it doesn’t give the order in lexicographic way. The expected output would be

012   021   102   120   201   210

as shown in the example.

permn function returns a list. We paste element of list together and sort them. We had discussed this in one of the previous example I think where we had used sort on strings. Sort works on numbers stored as strings as well. Try :

 
sort(c("5", "1", "6", "9", "7"))
which gives :
#[1] "1" "5" "6" "7" "9"

So when we paste the elements together and sort them it gives us the expected output :

sort(sapply(permn(0:2), paste0, collapse =""))
#[1] "012" "021" "102" "120" "201" "210"

We just do the same for 0 to 9 digits and select the millionth permutation to get the output.

library(combinat)
sort(sapply(permn(0:9), paste0, collapse = ""))[1000000]
#[1] 2783915460

system.time(sort(sapply(permn(0:9), paste0, collapse = ""))[1000000])

#user      system elapsed
#232.427   1.715 237.966

As mentioned at the beginning this isn’t the most efficient solution and there might be some better alternatives to this. Let me know if you have any.

Euler Problem 23 – Non abundant sums

Non-abundant sums

Problem 23

A perfect number is a number for which the sum of its proper divisors is exactly equal to the number. For example, the sum of the proper divisors of 28 would be 1 + 2 + 4 + 7 + 14 = 28, which means that 28 is a perfect number.

A number n is called deficient if the sum of its proper divisors is less than n and it is called abundant if this sum exceeds n.

As 12 is the smallest abundant number, 1 + 2 + 3 + 4 + 6 = 16, the smallest number that can be written as the sum of two abundant numbers is 24. By mathematical analysis, it can be shown that all integers greater than 28123 can be written as the sum of two abundant numbers. However, this upper limit cannot be reduced any further by analysis even though it is known that the greatest number that cannot be expressed as the sum of two abundant numbers is less than this limit.

Find the sum of all the positive integers which cannot be written as the sum of two abundant numbers.

I have divided this problem into two parts. The first part is finding the list of all abundant numbers below a certain number (here 28123). Now we need to find those numbers which cannot be written as sum of two abundant numbers. So first we make pair of each of these abundant number with another abundant number, sum them, take only unique values of them and finally sort them. So these are all the possible numbers which are sum of two abundant numbers under a particular number.

For example, 3, 4, 6, 8 are the abundant numbers under 15 (which they are obviously not, these are just small numbers so that it is easy to explain with an example). So we have sums as 3 + 4 = 7, 3 + 6 = 9, 3 + 8 = 11, 4 + 6 = 10, 4 + 8 = 12, 6 + 8 = 14. So the numbers are 7,9,11,10,12,14. We take unique values because we do not want repetitive same values of sum with different numbers. So if in the above list of abundant numbers there were 2 and 10 as well, their sum would have been 12 which is already included in the list (4 + 8), to avoid such repetitions we are taking only unique values.

Now as we have the list of all possible numbers that are sum of two abundant numbers, we just find out the numbers which are not part of this list and sum them. So continuing the above example. We have list of numbers (7,9,10,11,12,14)  and we have sequence of numbers 1 to 15. So the numbers 1,2,3,4,5,6,8,13 and 15 are not the part of the list. Sum of these numbers is the answer.

source("/Users/ronakshah/Google Drive/Project Euler/12.Get_Factors.R")
get_abundant_numbers = function(n) {
  abundant_numbers = numeric()
  for(i in seq(3, n)) {
    sumOfFactors = sum(head(getFactors(i), -1))
    if(sumOfFactors > i)
      abundant_numbers = c(abundant_numbers, i)
  }
  return(abundant_numbers)
}
source("/Users/ronakshah/Google Drive/23 . Abundant Numbers.R")
not_sum_of_abundant_numbers = function(num) {
abundant_numbers = get_abundant_numbers(num)
sum_of_two_abundant_numbers = sort(unique(rowSums(expand.grid(abundant_numbers, abundant_numbers))))
sum(setdiff(seq(num), sum_of_two_abundant_numbers))
}

system.time(not_sum_of_abundant_numbers(28123))
#user  system elapsed 
#34.421   1.808  36.766 

These took around 36 seconds to execute which was expected as I did not optimize it at any point. I mean this is the worst possible solution and there is nothing that could be costlier than this.
After reading a little about abundant sums I got two important points.
1) All integers greater than 20161 are expressible as the sum of two abundant numbers in at least one way.
2) All even numbers greater than 46 can be expressed as sum of two abundant numbers in at least one way.
Implementing the above two facts in the code, I did improve my time.

source("/Users/ronakshah/Google Drive/23 . Abundant Numbers.R")
not_sum_of_abundant_numbers = function(num) {
abundant_numbers = get_abundant_numbers(num)
sum_of_two_abundant_numbers = sort(unique(rowSums(expand.grid(abundant_numbers, abundant_numbers))))
sum(setdiff(c(1:47, seq(49, num, 2)), sum_of_two_abundant_numbers))
}
system.time(not_sum_of_abundant_numbers(20161))
#user  system elapsed 
#17.730   0.815  19.029

This is one of the way to solve this and I am sure there are lots of other ways as well.

Problem 22 – Names Scores

Names scores

Problem 22

Using names.txt (right click and ‘Save Link/Target As…’), a 46K text file containing over five-thousand first names, begin by sorting it into alphabetical order. Then working out the alphabetical value for each name, multiply this value by its alphabetical position in the list to obtain a name score.

For example, when the list is sorted into alphabetical order, COLIN, which is worth 3 + 15 + 12 + 9 + 14 = 53, is the 938th name in the list. So, COLIN would obtain a score of 938 × 53 = 49714.

What is the total of all the name scores in the file?

The logic is simple in this case. You read the file, sort them, assign each letter, its corresponding number, sum the name and finally sum the entire text file with the name score.

However, I was sure there is going to be some edge case or some trick in this file and it’s not going to be as simple as it seems.

I was solving it as usual in R and was thinking of various ways to read the text file. Should I read it directly as text/string? Or matrix/data frame? It would be simple to maintain and verify results if I read it as data frame. Each row would consist of a name.

I used read.table to read the text file with comma ( , ) as  a separator as each name is separated by it. The sep as a separator between columns so this divides every name as separate column, we then take transpose of it so that we get each name as separate rows.

Now as I was highly doubtful that there might be some edge cases or tricks that maybe hidden in these names which will make the calculation go wrong, just to take precautions, I verified few things

#No repetition of names
any(table(df$names) > 1)

#Check if all the values are in uppercase
all(grepl("^[[:upper:]]+$", df$names))

Both of these cases passed.

After reading it in data frame, I converted the column as sorted vector. sort command in R handles numbers as well as letters. So we now have just sorted string vector to handle. Now we use mapply and pass two parameters, one sorted vector (x) and other its position in the vector ( y ).For every word we split it into letters, get its corresponding letter number using match, sum all the letter numbers of that word, multiply it with its position in the sorted vector. After this we get array of names scores which we sum finally.   When I entered this answer on the website, it showed as wrong which was not surprising as I often miss some or the other edge case. Now, this was actually a straight forward problem. I had already tested some cases which I thought could be tricky. I was not sure what was I missing.  I went through the entire process again and while going through this, I noticed that the column in data frame (df$names ) has length 1 greater than sortedNames. Where did 1 name go? I had only done sort(df$names) . As suspected, there was an NA in the names which I could find when I did any(is.na(df$names)) . Now if you do ,

sort(c("D", "A", "C", "B", "F", NA, "E"))
You get

[1] "A" "B" "C" "D" "E" "F
this is where one NA went.  In ?read.table function you can mention which strings should be read as NA, by default it is “NA” so when it encounters “NA” it converts it into NA and then sort ignores it. To avoid this we can specifically mention na.strings as anything other than NA. I used as 0. So if there is 0 in the file it will be treated as NA. With this “NA” is treated just as another name and it performs the correct calculation and gives the answer.

calculate_names_scores = function(file_path) {
df = data.frame(names = t(read.table(file = file_path, sep = ",", na.strings = 0)), stringsAsFactors = F)
sortedNames = sort(df$names)
total_sum = sum(mapply(function(x, y) sum(match(unlist(strsplit(x, "")), LETTERS)) * y,  
                         sortedNames, seq_along(sortedNames)))
print(total_sum)
}

system.time(calculate_names_scores("C:/Users/RonakShah/Downloads/p022_names.txt"))
#[1] 871198282

#user  system elapsed 
#1.34    0.00    1.36 

From this post, going forward I am also mentioning the time the function requires to execute. If anybody has a better/optimal solution can share.

Euler Problem 21 – Amicable numbers

Amicable numbers

Problem 21

Let d(n) be defined as the sum of proper divisors of n (numbers less than n which divide evenly into n).
If d(a) = b and d(b) = a, where a ≠ b, then a and b are an amicable pair and each of a and b are called amicable numbers.

For example, the proper divisors of 220 are 1, 2, 4, 5, 10, 11, 20, 22, 44, 55 and 110; therefore d(220) = 284. The proper divisors of 284 are 1, 2, 4, 71 and 142; so d(284) = 220.

Evaluate the sum of all the amicable numbers under 10000.

After reading the question I saw some videos and read a little about amicable numbers. Amicable numbers are known as couples who are in love 😉 . The numbers 220 and 284 are most romantic numbers.

Anyway, getting back to solve this problem. The immediate approach that came to mind was for every number, we check every other number and see if the sum of their divisors satisfy the condition. However, this solution was going to be costly as its time complexity was O(n2). After some thinking I realized that for every number we need not check every other number. I mean if a number is amicable number then its pair is going to be only one and there is no point checking it with all the other numbers. So if we have a number (say x) we get all its factors, sum them (name it as y)  and then sum the divisors of only y and check if it is equal to x. So only number which has a possibility to become amicable number is y and no need to check any other number other than that. Here both x and y are amicable numbers. We need to find all such amicable numbers and sum them.

In this problem, to get the factors of a number I am using the same function as written in problem 12.

source("C:/Users/Ronak Shah/Google Drive/Git-Project-Euler/12.Get_Factors.R")
get_sum_of_amicable_number_under_n = function(n) {
 amicable_numbers = numeric()
 for(i in 3:n) {

#Excluding the number itself, as they are sorted so excluding the last one
 divisors =  head(getFactors2(i), -1)

#The sum of divisors of first would be the possible pair of amicable numbers
 opposite_number = sum(divisors)
 opposite_divisors = head(getFactors2(opposite_number), -1)

#If the sum of factors of opposite number is same as original number and 
#opposite number is not same as original number
#and both the numbers are not previously visited
if (sum(opposite_divisors) == i & i != opposite_number & !(i %in% amicable_numbers)) {
amicable_numbers =  append(amicable_numbers, c(i, opposite_number))
}
}
#returning sum of all the amicable numbers
return(sum(amicable_numbers))
}

In this problem, they have specified to exclude the number itself from the list of the divisors. So after calling factors2 function, I am neglecting the last element with -1 which is the number itself. Now we get the possible opposite pair by summing the divisors of the first number which is i in this case. We then get factors of opposite_number and sum its divisors as well. If the sum of divisors of opposite number is equal to a then these are two amicable numbers.

Now, there were some cases which need to be handled.

Case 1: When the opposite number is same as the number itself.

For eg – 6 where factors of 6 are 1, 2, 3 whose sum is 6 again which will give us wrong output as we don’t want to include 6 as both the number and opposite number are same.

Case 2 : When the number is already included.

So we go one by one for every number and check the number and its opposite number. When 220 comes its opposite number is 284 and when they both match the condition we add them to amicable_numbers array. So after 220 we start checking again 221, 222, 223 … Now we encounter 284. Now value of i is 284 and opposite_number is 220. This pair satisfies the condition but it is already entered. We don’t want duplicate entries.

I have taken care of above two cases in the if condition in the code. I think the time complexity for this would be O(2n) which is equal to O(n).

get_sum_of_amicable_number_under_n(10000)
31626

Euler Problem 20 – Factorial digit sum

Factorial digit sum

Problem 20

n! means n × (n − 1) × … × 3 × 2 × 1

For example, 10! = 10 × 9 × … × 3 × 2 × 1 = 3628800,
and the sum of the digits in the number 10! is 3 + 6 + 2 + 8 + 8 + 0 + 0 = 27.

Find the sum of the digits in the number 100!

Well, this seemed simple reading at first but it took lot of my time to find the solution. As experienced in various examples before where R was pretty good in handling large values, I thought there would not be any problem to solve this, sadly which wasn’t the case this time. Encountered a new package in process of solving this problem. Package gmp.

Logic was simple, calculate factorial 100, split every character and then sum them. So to calculate factorial I used prod

prod(seq(100))

but this gave me incorrect result (when I checked the value of factorial 100 on internet).

I was not sure why this is the case. While searching for reasons I came across a function that I had forgotten existed already in R, factorial. factorial directly gave the value of factorial of a number. When I did factorial(100) it again gave me the same answer as above.

 After a lot of research I came to know R has issues handling large numbers. So anything after factorial(22) was giving wrong answers. After lot of googling and asking experts I came across gmp package in which there is a factorialZ function which calculates the perfect factorial value for any number. The later part was simple, just to split the big number at every character and take the sum of it.

library(gmp)
sum_of_factorial_digits = function(number) {
     factorial = factorialZ(number)
     sum(as.numeric(strsplit(as.character(factorial), "")[[1]]))
}

sum_of_factorial_digits(10)
#[1] 27

sum_of_factorial_digits(100)
#[1] 648

Euler Problem 19 – Counting Sundays

Counting Sundays

Problem 19

You are given the following information, but you may prefer to do some research for yourself.

  • 1 Jan 1900 was a Monday.
  • Thirty days has September,
    April, June and November.
    All the rest have thirty-one,
    Saving February alone,
    Which has twenty-eight, rain or shine.
    And on leap years, twenty-nine.
  • A leap year occurs on any year evenly divisible by 4, but not on a century unless it is divisible by 400.

How many Sundays fell on the first of the month during the twentieth century (1 Jan 1901 to 31 Dec 2000)?

This was a simple one where I just create a sequence of dates for 1st of every month in the duration given using seq and then get the weekday of those dates and count the number of days that were Sunday.

get_number_of_days_in_timerange = function(start_date, end_date, day) {
sum(weekdays(seq(as.Date(start_date), as.Date(end_date), by = "1 month"))
                                                   == day)
}
get_number_of_days_in_timerange("1901-01-01", "2000-12-31", "Sunday")
#[1] 171

Euler Problem 18 & 67 – Maximum path sum I, II

Maximum path sum I

Problem 18

By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.

                                    3
                                   7 4
                                  2 4 6
                                 8 5 9 3

That is, 3 + 7 + 4 + 9 = 23.

Find the maximum total from top to bottom of the triangle below:

                                   75
                                 95 64
                                17 47 82
                               18 35 87 10
                              20 04 82 47 65
                             19 01 23 75 03 34
                            88 02 77 73 07 63 67
                           99 65 04 28 06 16 70 92
                          41 41 26 56 83 40 80 70 33
                         41 48 72 33 47 32 37 16 94 29
                        53 71 44 65 25 43 91 52 97 51 14
                       70 11 33 28 77 73 17 78 39 68 17 57
                      91 71 52 38 17 14 91 43 58 50 27 29 48
                     63 66 04 68 89 53 67 30 73 16 69 87 40 31
                    04 62 98 27 23 09 70 98 73 93 38 53 60 04 23

NOTE: As there are only 16384 routes, it is possible to solve this problem by trying every route. However, Problem 67, is the same challenge with a triangle containing one-hundred rows; it cannot be solved by brute force, and requires a clever method! ;o)

So another interesting problem. Reading for the first time it made me think to take the same approach as warned in the note section to not take which is calculate the sum of for every path and then find the maximum between them. However, as mentioned in the note section of the question there is also problem 67 which has 100 rows and it will take almost infinite time to solve it if we use the above method. Also, the programs should always be scalable enough. I was sure that I want to write a code which solves both the problem at one go. There is no point solving the same problems at different time with different codes.

I was trying to think of various ways I could solve this problem. My mind was kind of blocked at the same solution and could not think of any other solution. I then gave up on finding the solution and started thinking how should I store the values in them, which is the data structure that I should use. I was thinking of graph maybe because with graphs it has the parent and child concept which could be useful here as here for every node (except the leaf node) there were two children which should be considered, so somehow I need to maintain the relationship. However, it would be a little expensive to store graphs. Besides, I was not even sure if this can be done using R. I have heard of the igraph package which can be used here but have never used it before. So maybe with igraph or go to the traditional C, C++ structures.

One way, I could think of was to start from the top, take the maximum of its children and move to next children. So for the example shown, we start from 3, take the maximum of 7 and 4 which is 7, then go to 7’s children which is 2 and 4, select 4 and then select maximum from 5 and 9 which is 9. So we now have 3 + 7 + 4 + 9 = 23 which is the answer. Although this gave us the correct answer I was pretty sure this is not the correct approach as it misses so many important numbers which are not even considered. This worked for this case because it was a small example. Just to confirm that the approach is 100 % not working I used it on the bigger 10 X 10 dataset and calculated the sum manually and checked the answer which did not come correct.

Googled the question and the first link gave me the approach of this to be solved which was more than enough. The link does not contain the answer in any language it is just an approach. Now, the part remaining was to code the above requirement and I was still left with the original question of how to store the values. I thought of storing it in matrix with n X n values where n are the number of levels/rows in the pyramid. I could fill up the empty values with NA’s. It was possible to get their respective children’s easily as the structure is very descriptive and follow a specific sequence. After thinking of the pattern which they follow, i thought why do I even need a matrix for this. If they already follow a specific pattern , they should also follow the same pattern in the vector and there is no need of NA’s to fill in.

get_max_value_from_pyramid = function(vec, noOfRows) {
  #Repalcing \n with empty spaces
  vec = gsub("\n", " ", vec)
  reverseVec = rev(unlist(strsplit(vec, " ")))
  for( i in noOfRows:2) {
    vecIndex = tail(seq(sum(seq(noOfRows, i)), sum(seq(noOfRows,i-1))), -1)
    outputList = mapply(function(value, ind) {
    reverseVec <<- replace(reverseVec, ind, as.numeric(value) + 
max(as.numeric(reverseVec[ind - i]), as.numeric(reverseVec[ind - i + 1])))
    },reverseVec[vecIndex], vecIndex)
    maxOutputList = max(as.numeric(outputList))
  }
  return(maxOutputList)
}

To explain the code, let’s start from the beginning. There are two parameters which need to be send as input to the function. One is the pyramid passed as string and other is the number of rows/levels in the pyramid. The benefit of sending the pyramid as string is that you can just copy paste the pyramid from the question itself and it will run directly. Now, when we copy-paste there are some line breaks which are encountered in the text as “\n”, so first thing is I remove those line break characters and replace it with empty characters. Then as we need to perform the bottom up approach instead of top down one, we need to reverse the entire vector. Right now, all the numbers are in one string, so separating them as different numbers using strsplit .

So if you have not read the above stakoverflow link , let me explain you the approach which we are trying to follow here. In the example shown of 4 rows, we will start from second last row, add 2 with 8 and 5 respectively and keep the one which is maximum. so here we do 2 + 8 > 2 + 5 so we keep 10. Going forward we do 4 + 5 < 4 + 9 so we keep 13 and 6 + 9 > 6 + 3, so we keep 15. Now our pyramid would look like:

                      3
                     7 4
                   10 13 15

Continuing further, we reduce it to

                      3
                    20 19

and then finally we only have one number
23

The example shown is for illustration purpose. In reality we are not deleting any element in the original array, we are just incrementally replacing values in them. So what happens actually is,

Original array –                                 3 7 4 2 4 6 8 5 9 3
Reversed array –                               3 9 5 8 6 4 2 4 7 3
After first iteration –                       3 9 5 8 10 13 15 4 7 3
After second iteration –                  3 9 5 8 10 13 15 20 19 3
After third iteration –                      3 9 5 8 10 13 15 20 19 23
So this is how we get 23 as answer.

Also, I have read mostly in R, one should not use the global assignment operator ( <<- ) as it can have lot of side effects. If we want to use the same variable again which you are changing in the function, it might get corrupted with that global assignment operator. Here I am not using the original array again in the function and moreover this works fine for my example. One another option I was thinking was of using recursion, but recursion is too expensive. It would work fine with 15 levels but it would be computationally too expensive for problem 67 with 100 levels.

vec = "3
 7 4
 2 4 6
 8 5 9 3"

get_max_value_from_pyramid(vec, 4)
#[1] 23

get_max_value_from_pyramid(vec1, 15)
#[1] 1074

get_max_value_from_pyramid(vec2, 100)
#[1] 7273