My Blog

Euler Problem 46 – Goldbach’s other conjecture

Goldbach’s other conjecture

Problem 46

It was proposed by Christian Goldbach that every odd composite number can be written as the sum of a prime and twice a square.

9 = 7 + 2×12
15 = 7 + 2×22
21 = 3 + 2×32
25 = 7 + 2×32
27 = 19 + 2×22
33 = 31 + 2×12

It turns out that the conjecture was false.

What is the smallest odd composite that cannot be written as the sum of a prime and twice a square?

Couldn’t think how I would program this at first. So I started solving this manually by hand on paper. Tried to find the solution for 35, then 39 and so on. While doing this I observed myself of how I am solving the problem. After doing it for 3-4 numbers I was able to find a pattern. So now, I had an idea to write a program for it.

Every odd composite number can be represented in the form :

Odd_Composite_Number = Prime_Number + (2 X (Some_Number)^2)

So basically, the logic is we start our numbers from 35, since till 33 we already know that it is not the answer as given in the question. We first need to check if a number is a composite number. How do we check that? We already have a function which checks if a number is prime. If the number is prime then we just increase the count by 2 and move on to the next number. Now if a number is odd composite then we get all the primes under that number as the prime number component can be any number lesser than the current number. Now Some_Number component can be any number from 1. We just check that the complete sum doesn’t cross the current number we are checking. If it does, then we stop execution and move on to the next iteration.

get_answer_for_problem46_v2 <- function() {
  start_num <- 33
  flag = TRUE
  while(flag) {
    start_num = start_num + 2
    if(is_prime(start_num))
      next
    else {
      flag = FALSE
      primes <- get_primes_under_n(start_num - 2)  
      square_number = 1       
      for (i in primes) {         
        for (square_number in seq(start_num)) {    
          current_sum = i + (2 * (square_number^2))         
           if (current_sum > start_num)
            break
           else if (current_sum == start_num) {
            cat(start_num, "=", i, "+ 2 X", square_number, "^2\n")
            flag = TRUE
            break
          }
        }
        if(flag)
          break
      }
      if(!flag)
        cat("The answer is : ", start_num, "\n")
    }
  }
}

The outer while loop runs until you find an odd composite number which cannot be represented in the format given in the question i.e till the time you find an answer. If the number is a prime number, then we skip that iteration from there itself. If it is not prime, then we get all the primes under that number and run two for loops from there. First loop iterates over each prime number and the other loop for each number from 1 till the sum doesn’t exceed the current number. Certain numbers can be represented in multiple ways. For example, number 69…

69 = 19 + 2 X 5 ^2
69 = 37 + 2 X 4 ^2
69 = 61 + 2 X 2 ^2
69 = 67 + 2 X 1 ^2

We do not need to know all the possible ways, it can represent a number. Even if it can represent in one way then that is enough for us to move ahead with next statement. Added some break statements so that we can stop the loop as soon as it encounters even one representation. I received a huge performance improvement when I did that.

system.time(get_answer_for_problem46_v1())

#  user  system elapsed 
# 193.71    1.00  197.95 

system.time(get_answer_for_problem46_v2())

#user  system elapsed 
#75.73    0.27   77.06 

Also the code which I have written is a bit complicated and mind you it wasn’t written in one go. Everything was folded step by step to reach till here and print the final result.

Advertisements

Euler Problem 45 – Triangular, pentagonal, and hexagonal

Triangular, pentagonal, and hexagonal

Problem 45

Triangle, pentagonal, and hexagonal numbers are generated by the following formula:

Triangle Tn = n(n+1)/2 1, 3, 6, 10, 15, …
Pentagonal Pn = n(3n−1)/2 1, 5, 12, 22, 35, …
Hexagonal Hn = n(2n−1) 1, 6, 15, 28, 45, …

It can be verified that T285 = P165 = H143 = 40755.

Find the next triangle number that is also pentagonal and hexagonal.

This looked like a straightforward problem but as usual I was trapped thinking that there is a smart solution to this problem. However, after reading about the problem a bit now after completing the problem I don’t think there is any smart solution at least in the direction I was thinking. This time though I had a valid reason to think there exists a smart solution. When I checked the LCM of 143,165 and 285 it turned out to be 40755. Now this was something. There should be some connection between them. I then thought the next number would be 40755 * 2 would be the answer but which was not true. I didn’t spend too much of time thinking about it and continued with the brute force approach.

Anyways, the brute force approach was to keep on generating triangular, pentagonal and hexagonal numbers and find the ones which are common in all the three. I had already written function to generate pentagonal numbers in the last problem, I also wrote the same for hexagonal and triangular.

generate_nth_triangular_number <- function(n) {
   n * (n  + 1)/2
}
generate_nth_hexagonal_number <- function(n) {
    n * (2*n  - 1)
}

Now I just choose a random n number and see if I get a number greater than 40755 which is hexagonal, pentagonal as well as triangular. I chose n 1000 first and then slowly slowly kept on increasing n until I got a number which satisfied all the three conditions. Finally I was able to find the number with n = 60000. It was 1533776805.

Reduce(intersect, list(
       sapply(seq(60000), generate_nth_triangular_number),
       sapply(seq(60000), generate_nth_pentagonal_number),
       sapply(seq(60000), generate_nth_hexagonal_number)))

#[1]          1      40755 1533776805

This looks like a very compute heavy function as we are generating triangular, pentagonal  and hexagonal numbers for 60000 numbers and then finding the common elements from them but surprisingly it hardly took any time.

system.time(Reduce(intersect,list(
     sapply(seq(60000),generate_nth_triangular_number),
     sapply(seq(60000), generate_nth_pentagonal_number),
     sapply(seq(60000), generate_nth_hexagonal_number))))

#user  system elapsed
#0.31    0.00    0.31

This was good, but I did some improvements to this. I came to know that every hexagonal number is a triangular number so no need to check for triangular number when I am already looking for hexagonal numbers.

get_common_hpt <- function(n) {
  sixtykseq <- seq(n)
  Reduce(intersect, list(
         sapply(sixtykseq, generate_nth_hexagonal_number),
         sapply(sixtykseq, generate_nth_pentagonal_number)))
}

get_common_hpt(60000)
#[1]          1      40755 1533776805

This improved the performance by little.

system.time(get_common_hpt(60000))
#user  system elapsed
#0.20    0.00    0.21

Euler Problem 44 – Pentagon numbers

Pentagon numbers

Problem 44

Pentagonal numbers are generated by the formula, Pn=n(3n−1)/2. The first ten pentagonal numbers are:

1, 5, 12, 22, 35, 51, 70, 92, 117, 145, …

It can be seen that P4 + P7 = 22 + 70 = 92 = P8. However, their difference, 70 − 22 = 48, is not pentagonal.

Find the pair of pentagonal numbers, Pj and Pk, for which their sum and difference are pentagonal and D = |Pk − Pj| is minimized; what is the value of D?

I don’t know why but looking at such formula based problems I always think there is some formula to solve it but somehow with Project Euler this has never been true. After reading the question I started reading about pentagonal numbers, I got some important information but nothing which could be helpful to solve the problem here. I don’t think I have mentioned this before but OEIS is an excellent platform to get details of almost all the sequences. Till now I haven’t found a sequence which is not present in OEIS. Pentagonal Sequence number is A000326 . Besides, there is always google and Wikipedia to the rescue.

I created two separate functions. One to generate a pentagonal number using the formula in the question.

generate_nth_pentagonal_number <- function(n) {
   n * (3*n - 1)/2
}

The other one to check if the given number is a pentagonal number. There seems to be a formula to check (taken from Wikipedia).

is_pentagonal <- function(n) {
     ((sqrt(24 * n + 1) + 1) %% 6) == 0
}

Let us first write a brute force approach and see if we can get the answer. Now the approach is we start with 2 and generate a pentagonal number with n = 2, then for every other number less than n we generate the pentagonal numbers and check if their sum and difference are pentagonal as well. If they are, then we stop the loop and print the numbers and their difference.

source("/Users/ronakshah/Google Drive/Project Euler/44.Generate_nth_Pentagonal_numbers.R")
source("/Users/ronakshah/Google Drive/Project Euler/44.Is_pentagonal.R")

get_answer_for_problem_44 <- function() {
  n <- 2
  not_found <- TRUE
  while(not_found) {
    one = generate_nth_pentagonal_number(n)
    for (i in seq(n-1, 1)) {
      two = generate_nth_pentagonal_number(i)
      if(is_pentagonal(one + two) && is_pentagonal(one - two)) {
        print("Answer")
        not_found = FALSE
        print(c(n, i))
        print(one-two)
        break
       }
      }
    n = n + 1
  }
 }

In spite of a double loop, the timing was faster thanks to my Mac 😉 My windows sucks at such processes.

system.time(get_answer_for_problem_44())
#[1] "Answer"
#[1] 2167 1020
#[1] 5482660

#user  system elapsed
#2.307   0.021   2.344

I finally found an apt place where I can use && instead of & in R. In reality here & would have worked as well 😉 but just for change I used &&. Ok, now to the important part I was not sure what is going to be the point where we would stop checking for the numbers. This could have been an infinite loop if we didn’t find the number and it is kind of weird that the answer is the first number itself which satisfies this condition whereas in the question they have asked to find a number where the difference is minimized. So, I am not sure what is an ideal way to solve this problem however, I am happy that we could reach to the answer at least.

Or is this the only number which satisfies this condition? You never know…

Euler Problem 43 – Sub-string Divisibility

Sub-string divisibility

Problem 43

The number, 1406357289, is a 0 to 9 pandigital number because it is made up of each of the digits 0 to 9 in some order, but it also has a rather interesting sub-string divisibility property.

Let d1 be the 1st digit, d2 be the 2nd digit, and so on. In this way, we note the following:

  • d2d3d4=406 is divisible by 2
  • d3d4d5=063 is divisible by 3
  • d4d5d6=635 is divisible by 5
  • d5d6d7=357 is divisible by 7
  • d6d7d8=572 is divisible by 11
  • d7d8d9=728 is divisible by 13
  • d8d9d10=289 is divisible by 17

Find the sum of all 0 to 9 pandigital numbers with this property.

Pheww!!! This problem, really took a toll at me. I always considered myself as a good programmer. I mean not the best but I thought I would never have a problem at least to code up things. It would be not be the best or optimal solution but still it would be something which would give you an answer at least. Anyway, more cribbing on this later.

As I had mentioned in the previous post that Project Euler problems are becoming more and more difficult and now I am not sure how much I can go further with the skill level I have. 😛 If you keep getting more blog posts from me on project euler topic consider that I have copied them from somewhere 😀 Apart from jokes, reading the problem it looks (ahh, pandigital again!!) interesting. But again the only way I could think of solving this was using the donkey approach (brute force method). Interestingly, I was reading some (read one) blogs post about how to solve such kind of problems effectively. Nearly all the posts gave the same advice stating that don’t try to jump in to the most optimal way of solving the problem from the beginning. First solve it with a brute force approach and then try to optimize it. The good thing I have found of solving it with brute force approach is it is easier to check all the edge cases with brute force method. With optimal technique you might miss on some of the cases. So when you first solve with brute force approach you are well aware of all the edge cases and you know already that you have to take extra care of them while optimizing the problem.

In the brute force method we generate all the possible combinations of 0-9 which are 10! Numbers. These are big numbers which take a lot of time to generate.

library(combinat)
get_all_n_digital_pandigital <- function(start, end) {
  as.numeric(sapply(permn(start:end),function(x) paste0(x, collapse = "")))
 }

This function generates all the n digit pandigital numbers. So get_all_n_digital_pandigital(1, 4) generates all possible combination of 4 digits with 1, 2, 3 and 4 in it. So in our case we need get_all_n_digital_pandigital(0, 9) to generate all 10 digit pandigital numbers.

get_all_numbers <- get_all_n_digital_pandigital(0, 9)
ind2 = !as.numeric(substr(get_all_numbers, 2, 4)) %% 2
ind3 = !as.numeric(substr(get_all_numbers, 3, 5)) %% 3
ind5 = !as.numeric(substr(get_all_numbers, 4, 6)) %% 5
ind7 = !as.numeric(substr(get_all_numbers, 5, 7)) %% 7
ind11 = !as.numeric(substr(get_all_numbers, 6, 8)) %% 11
ind13 = !as.numeric(substr(get_all_numbers, 7, 9)) %% 13
ind17 = !as.numeric(substr(get_all_numbers, 8, 10)) %% 17

We break the number into different sub-strings based on their position given in the question and check if they are divisible or not.  So ind17 is a logical vector with values TRUE and FALSE in it. TRUE for those numbers which are divisible by 17 and FALSE for them which are not divisible by 17. Size of ind17 is same as get_all_numbers so it represents each number in it precisely. Same goes for ind13, ind11, ind7, ind5, ind3 and ind2. Now all these variables carry the logical indices for their respective number and we have to find out those numbers where the value is TRUE for all of them. So when we use the & operator it cancels out all those numbers that is FALSE for even one condition and returns TRUE only for those numbers which satisfies all the conditions (Divisible by 2, 3, 5, 7, 11, 13, 17). Finally we take the sum of all the TRUE numbers.

sum(get_all_numbers[ind2 & ind3 & ind5 & ind7 & ind11 & ind13 & ind17])
#[1] 16695334890

Although this gives an answer but it takes up lot of time to calculate all this. get_all_n_digital_pandigital generates 10! numbers and then you have to check these 10! numbers for all these 7 conditions one by one which takes up more amount of time.

This definitely was not the solution I was looking for. It took me more than an hour to execute all those steps. I had to do something better. Then after a lot of thinking and scratching my head I thought about an idea which originally came in mind from reading it somewhere (which I can’t recall right now).

How about we reverse engineer this problem? So instead of calculating all the possible combinations of pandigital numbers (which took the maximum of time), how about we incrementally generate the pandigital number which we want. What do I mean by that?

I know d8, d9 and d10 is divisible by 17 and it is a 3 digit number so I can generate multiples of 17 till 999 as these three numbers (d8, d9, d10) are going to be from one of these numbers from sure. Now similarly, we also generate all the 3 digit multiples of 13 for d7,d8 and d9. Now the tricky part. d8 and d9 are common here so they are going to be same for both the set of numbers which is 17 and 13. So we can find out those numbers which have the two digits common in both the lists.

Just to give you one example, multiple of 13 would be
013, 026, 039, 052, 065, 078, 091, 104, 117, 130…and so on.
Multiple of 17 would be
017, 034, 051, 068, 085, 102, 119, 136, 153, 170…and so on.

Because multiple of 17 (d8, d9, d10) and multiple of 13 (d7, d8, d9) are 3 digit numbers we keep all the digits rounded to 3 digits hence 013, 026, 017 etc. Ok so now, consider 13 X 1 = 013 and 17 X 8 = 136. 13 is common which can be one of the contender for d8 and d9 position so as they both match in both the tables of 13 and 17. From this 3 digit number we make it as 4 digit number by taking the first number from 13 multiple, then the common part and then the last number from 17 multiple. For this pair the 4 digit number would be 0136. (I actually also asked a SO question related to this here which was very well received). We make all such matches, some of them would be last 4 digit of the numbers we are trying to find out. Now in the next loop we create the same 3 digit 13 multiples from 0, 999 and match the second and third number of it to first and second number of the last 4 digits formed in the last loop and we continue this for all the prime numbers till 2.

Now comes the self-belief part. Reaching till here took lot of effort and time. The amount of time I took to reach from start till here I took an equal amount of time to reach from here to the final answer. The final output after running that loop I got had 208 numbers each of 9 digit. Also in first few of the numbers I checked there was a repetition of digits which was not expected as it is a pandigital number. Now I already knew the final answer till now, I knew the total number of pandigital numbers which satisfies the condition were 6 and something was terribly wrong and I was nowhere near the answer. I started cross checking where I might have done mistake or some condition which I have not taken care of but couldn’t find any.

Finally after lot of time I just read the question again and I realized nowhere in the question there was any mention of d1 so we need to find d1 which was easy as this was pandigital number. If we get 9 digits we could easily get the 10th digit and then I just took a chance and from those 208 numbers I excluded all those numbers which had repeating digits and guess what? There were only 6 numbers left and rest is HISTORY!! Ok, not actually!! 😛  I still need to write the next part. The rest part was simple, I knew the 9 digits so for each of those 6 numbers I find out the missing digit and add them as d1 in their respective numbers and add them together.

get_answer_for_euler_problem43 <- function() {
  #Generate 3 digit multiples of 17
  n17 <- sprintf("%03d", seq(17, 1000, 17))
  #Remove all the numbers who has any digit more than once
  final_n17 <- n17[sapply(n17, function(x) all(table(strsplit(x, "")[[1]])==1))]
  for (i in c(13, 11, 7, 5, 3, 2)) {
   #Generate 3 digit multiples of all the prime numbers above
   n13 <- sprintf("%03d", seq(i, 1000, i))
   final_n13 <- n13[sapply(n13, function(x) all(table(strsplit(x, "")[[1]])==1))]
   #Create a named vector where every number has it's common part as it's name
   #This part of the solution was taken from @Jaap's answer to this question on SO 
   #https://stackoverflow.com/questions/47431111/match-substring-of-two-vectors-and-create-a-new-vector-combining-them
   final_n17 <- setNames(final_n17, substr(final_n17,1,2))
   final_n13 <- setNames(final_n13, substr(final_n13,2,3))

   #merge the values based on common part between final_n17 and final_n13
   df <- merge(stack(final_n13), stack(final_n17), by = 'ind')
   #paste values together between
   combined_value <- paste0(substr(df$values.x,1,1), df$values.y)
   #Assigning the value back to final_n17 to continue in the loop
   final_n17 <- combined_value
  }
  #Removing all the solution who has got any digit more than once
  last_sums <- final_n17[sapply(final_n17, function(x) all(table(strsplit(x, "")[[1]])==1))]
  #To get d1 we need to fill in with the missing digit to make it pandigital
  sum(as.numeric(sapply(last_sums, function(x) 
      paste0(setdiff(0:9, strsplit(x, "")[[1]]), x))))
  }

I had no courage to calculate the time taken for the first approach because it took me more than an hour definitely to execute all those steps, while this just took such a little time. I was so happy that it finally paid off and I didn’t had to give up half way. 🙂

get_answer_for_euler_problem43()
#[1] 16695334890

system.time(get_answer_for_euler_problem43())
#user  system elapsed
#0.121   0.000   0.123

Study on R questions on Stack Overflow.

I have been waiting for doing this for so long. Evaluating a dataset and finding if there is some information we could put it to use. What better start it could have if I could start with something which is closer to me and something that I care about. STACKOVERFLOW!! At one stage, I was quite active on SO and I used to answer there quite frequently but as time passed I became less active there although I continued monitoring, reviewing , auditing and occasionally answering the questions.

I got a data set from Kaggle which had all the R question and answers on SO till 24th Sep 2017. Now let’s dig in.

Number of accepted answers from the answers tables comes out to be.

table(Answers$IsAcceptedAnswer)
#FALSE   TRUE
#140259 110529

So out of 189930 questions only 110529 answers have been accepted i.e 58% of the questions are accepted. Let us dig down further.

sum(Questions$Id %in% Answers$ParentId)
#[1] 159337

This tells us that these many questions have at least one answer which is equal to 159337.

sum(Questions$Id %in% Answers$ParentId)/nrow(Questions)
#[1] 0.8389249

So around 84% of the questions tagged with R are answered.

Let’s focus on the questions which are not answered.

Unanswered_questions <- Questions[!Questions$Id %in% Answers$ParentId, ]

#ggplot2 shiny  plot rstudio dataframe r-markdown dplyr 
# 2314    2172  1275    867       769      613     590                   

#data.table  knitr time-series
# 578        567         525

So most of the unanswered questions are related to ggplot2, shiny etc. I can understand that there are many ggplot2 and shiny questions which are unanswered because those questions are too descriptive, they are too specific which also means not everybody faces the same issue. It’s almost user-specific where every user is trying to do something different according to their needs. Also, not many have expertise in it. The 4th tag is rstudio which is quite weird. As far as I have observed most of the questions with rstudio tag are incorrectly tagged as it has got nothing to do with RStudio – the IDE. Let’s look at some of the questions tagged with R Studio.

sample(Unanswered_questions$Title[sapply(Unanswered_questions$tags, 
                     function(x) "rstudio" %in% x)], 10)

1

This seems a legit R studio question.

 2

Another legit R Studio question IMO.

3

I am confused about this one, not sure if there should be a rstudio tag or not to this one.

4

Now this isn’t a RStudio question. It has got nothing to do with rstudio and rstudio-server. I’ll just make an edit.

Now these results are little surprising to what I have observed. I usually observe people tagging rstudio to every R question they ask. Maybe because they are using RStudio as IDE and/or they are new and don’t know how the tags system work on SO.

So then it leaves us with some options.

  • Either my observation is just by chance or in reality it is not true that many tag rstudio to every R related question.
  • People tag question with rstudio tag and I observe it initially. However, someone later edits it correctly.

I’ll go with the second option as it is difficult for me to accept that I am wrong 😛

Now digging down further let’s examine how many characters each type of question contains on average.

sum(nchar(Unanswered_questions$Body))/nrow(Unanswered_questions)
#[1] 1639.612

sum(nchar(Questions$Body))/nrow(Questions)
#[1] 1456.157

sum(nchar(Answered_questions$Body))/nrow(Answered_questions)
#[1] 1420.619

Umm….not sure if anything could be understood from this observation. Let’s skip and move forward.

Let’s try to differentiate answer and unanswered question based on the sentiment. I gathered some positive and negative words from Github. and tried to compare it with the questions asked.

Let’s calculate the average sentiment score of answered questions. Here we calculate the presence of positive and negative words in each question and then subtract them from each other and take the mean.

pos_score <- sapply(strsplit(as.character(Answered_questions$Body), " "), 
             function(x) sum(x %in% pos_words$words))

neg_score <- sapply(strsplit(as.character(Answered_questions$Body), " "), 
             function(x) sum(x %in% neg_words$words))

mean(pos_score - neg_score)
#[1] 0.3726567

Let’s do the same for unanswered questions.

pos_score <- sapply(strsplit(as.character(Unanswered_questions$Body), " "),
              function(x) sum(x %in% pos_words$words))

neg_score <- sapply(strsplit(as.character(Unanswered_questions$Body), " "),
              function(x) sum(x %in% neg_words$words))

mean(pos_score - neg_score)
#[1] 0.04543523

As expected unanswered question has a low score which makes people less willing to answer to their questions (?) maybe.

This is it for now. I wanted to explore it a bit more especially using some NLP techniques but I had left this blog incomplete twice already and didn’t want to keep it in my drafts folder any longer so I would complete the analysis part here only. Another thing I realized that any analysis feels incomplete without visualization. Although the numbers represent the same story but with visualization it is easy for any person to understand what one is trying to say without explaining it with words. I’ll surely add some plots and graphs next time.

Euler Problem 42 – Coded Triangle Numbers

Coded triangle numbers

Problem 42

The nth term of the sequence of triangle numbers is given by, tn = ½n(n+1); so the first ten triangle numbers are:

1, 3, 6, 10, 15, 21, 28, 36, 45, 55, …

By converting each letter in a word to a number corresponding to its alphabetical position and adding these values we form a word value. For example, the word value for SKY is 19 + 11 + 25 = 55 = t10. If the word value is a triangle number then we shall call the word a triangle word.

Using words.txt (right click and ‘Save Link/Target As…’), a 16K text file containing nearly two-thousand common English words, how many are triangle words?

Well, although the problem was clear from the beginning but it took me lot of time to find a way to read this txt file. I want to read it in a way that I can use it later to process on it. So a kind of dataframe but the file was with txt extension. Turns out R does not have a straightforward approach to read txt file as a dataframe, if it was a csv then it would have been very easy to read and process data on it. Somehow after spending 15-20 mins searching various ways of reading the text file I could find out a way using read.table the way I wanted.

df <- read.table("/Users/ronakshah/Downloads/p042_words.txt", sep = ",")
new_df <- data.frame(t(df), stringsAsFactors = F)
rownames(new_df) <- NULL
colnames(new_df) <- c("word")

The approach I was thinking that we could first calculate sum of all the letters in the word for all the words in the text file and then take the maximum of those sums and generate a sequence of triangle number until that maximum and take the intersection of those sum of words and triangle numbers to get the count of triangle words in the text file.

In R if you use LETTERS you get all the 26 uppercase letters and when you use letters you get the lower case letters. So for each word I split them by character and match it with LETTERS (as all the words in the text file are uppercase). match returns the position of matching of the first vector in the second, so for each letter it gives its numerical equivalent and then we sum them over so that we get the actual sum for that word.

word_sum <- sapply(new_df$word, function(x) sum(match(strsplit(x, "")[[1]], LETTERS)))

word_sum is the vector of sum of all the words. Now if we do max(word_sum) we would get the maximum value any word contains from the text file. We generate triangle numbers until that number (n).

generate_triangle_numbers_until_n <- function(n) {
  current_num = 1
  current_index = 1
  numvec = numeric()
  while(current_num <= n) {
    current_num = current_index * (current_index+1)/2
    numvec = c(numvec, current_num)
    current_index = current_index + 1
   }
  return(numvec[-length(numvec)])
 }

Now, we just need to find how many of those triangle numbers are in word_sum to get the answer.

get_triangle_numbers_from_file <- function() {
  #Calculate sum of each word
  word_sum <- sapply(new_df$word, function(x) sum(match(strsplit(x, "")[[1]], LETTERS)))
  #Calculate triangle numbers until maximum of word sums
  triangle_numbers = generate_triangle_numbers_until_n(max(word_sum))
  sum(word_sum %in% triangle_numbers)
  }

The answer and time taken for the function to execute :

get_triangle_numbers_from_file()
#[1] 162

system.time(get_triangle_numbers_from_file())
#user  system elapsed
#0.007   0.000   0.007

 

Euler Problem 41 – Pandigital Prime

Pandigital prime

Problem 41

We shall say that an n-digit number is pandigital if it makes use of all the digits 1 to n exactly once. For example, 2143 is a 4-digit pandigital and is also prime.

What is the largest n-digit pandigital prime that exists?

After reading this I assumed (assumptions are bad .. :|) that the correct answer would be something starting with 9…… and followed by some other numbers. So one of the approach I could thought of was generate all the possible 9 digit pandigital numbers from the last and select the first one which is prime.  I wrote a for loop to make sure in case the pandigital number does not start with 9. If that is the scenario we generate all the possible pandigital combinations with 8.

library(combinat)
for (i in seq(9, 1)) {
all_numbers = sort(unlist(permn(1:i, function(x) 
           as.numeric(paste0(x, collapse = "")))), decreasing = T)
for (number in all_numbers) {
  if(is_prime(number)) {
     cat("Highest pandigital prime is ", number, "\n")
     flag = 1
     break
   }
 }
  if(flag) break
}

With permn function we generate all the combination using 1:i and for each of those combination we paste them together and convert it into numeric to get the complete number in numeric form. We then check if the number is prime. If it satisfies the condition we stop the loop there itself as we are sorting the numbers in descending order, so the first prime number which we get is going to be the answer.

When I ran the loop it took more time to execute than expected. It was more than 30 mins and the loop was still running. I started debugging the script and started checking where it is getting stuck. I added print steps to see which is the number currently in process. On debugging I came to know that it takes one big chunk amount of time to generate all_numbers and then it also takes good amount of time for is_prime to execute for each number where I was expecting it to pass though those numbers in a small amount of time. However, as these are big numbers (9 digit ) it took around 5-6s to check each number which was quite time consuming.

I added few filters later to reduce the numbers. I wrote a function which keeps only odd numbers from the list. This can be useful ahead in some of the upcoming problems as well.

get_odd_numbers_from_list = function(num_list) {
   num_list[which(num_list %% 2 != 0)]
}

get_even_numbers_from_list = function(num_list) {
   num_list[which(num_list %% 2 == 0)]
}

It reduced the numbers to check but still even after giving the function considerable amount of time, I was still not able to reach the answer. In the process, I just realized that the answer is just not 9…. (because I was printing the numbers, remember ;-)). It had completed all the numbers starting from 9 and was in the process of generating all the 8 digit pandigital numbers is when I stopped it and started thinking of some other solutions.

Other option which I could think of was we first generate all the primes under n (here, 987654321) and then select the first pandigital number from it. However, generating primes under 987654321 was equally time consuming. Then I started checking some resources on the internet to see if there are better ways to solve the problem. I came across one post which had discussed an important property. Calculating the sum of digits for each n-digit pandigital :

1 – 1
2 – 3
3 – 6
4 – 10
5 – 15
6 - 21
7 - 28
8 – 36
9 - 45

Now apart from 4 and 7 digit each number is divided by 3, so any pandigital number with that sum is going to be divisible by 3 and hence not a prime. Now we start generating 7 digit sequences and with 5 – 6 iterations we get the biggest pandigital prime.

source("C:\\Users\\Ronak Shah\\Google Drive\\Git-Project-Euler\\3.Check_if_Prime.R")
source("C:\\Users\\Ronak Shah\\Google Drive\\Git-Project-Euler\\41.Get_Numbers_List.R")
source("C:\\Users\\Ronak Shah\\Google Drive\\Git-Project-Euler\\32.Is_Pandigital.R")

library(combinat)
get_pandigital_prime = function() {
   max_digits = 7
   flag = 0
   for (i in seq(max_digits, 1)) {
   all_numbers = sort(unlist(permn(1:i, function(x) 
          as.numeric(paste0(x, collapse = "")))), decreasing = T)
   odd_numbers = get_odd_numbers_from_list(all_numbers)
   for (number in odd_numbers) {
     if(is_prime(number)) {
      cat("Highest pandigital prime is ", number, "\n")
      flag = 1
       break
      }
    }
   if(flag) break
  }
 }

This also takes comparatively very less time.

system.time(get_pandigital_prime())
#Highest pandigital prime is  7652413
#user  system elapsed
#2.70    0.06    2.76

 

Euler Problem 40 – Champernowne’s constant

Champernowne’s constant

Problem 40

An irrational decimal fraction is created by concatenating the positive integers:

0.123456789101112131415161718192021…

It can be seen that the 12th digit of the fractional part is 1.

If dn represents the nth digit of the fractional part, find the value of the following expression.

d1 × d10 × d100 × d1000 × d10000 × d100000 × d1000000

This looks like a straight forward problem. We need to write down numbers until they reach till millionth character and then find out the numbers at 1, 10, 100, 1000 to 1 millionth position and multiply them together to get the final answer. Now one thing to note Is we need a millionth digit and not millionth number. One number can have multiple digits. So we need to adjust accordingly.

So let’s calculate how many number we need to take into consideration for million digits. Now :

1 – 9                     – 9          1 digit numbers

10 – 99                – 90       2 digit numbers

100 – 999            – 900     3 digit numbers

1000 – 9999        – 9000   4 digit numbers and so on.

So if you observe we started with 9 and for every new digit number we keep on multiplying it by 10 to get the numbers in the next. To get the total number of digits we need to multiply the total of numbers with the number of digits in them. 9 X 1 + 90 X 2 + 900 X 3 until the count reaches to 1 million.

count_numbers = function(max_number) {
  nums = 9
  digits = 2
  sum_digits = 9
  while(TRUE) {
    nums = nums * 10
    sum_digits = sum_digits + (nums * digits)
    digits = digits + 1
    if (sum_digits > max_number) break
  }
  return(sum_digits)
}

At the end of this function we will have a number till which we need to paste the sequence of numbers together and then extract digits at specific location.

We paste the entire sequence togethert starting from 1 till sum_digit. Now after that we generate the sequence which we need to extract. (1, 10, 100, 1000, 10000, 1000000). We extract those specific digits from sum_digit and then multiply them together to get the final answer.

multiply_all_constants = function(sum_digits, max_number) {
pasted_digit = paste(seq(sum_digits), collapse = "")
n  = 1
numvec = n
while(n < max_number){
n = n *10
numvec = c(numvec, n)
}
return(prod(as.numeric(sapply(numvec, function(x) substr(pasted_digit, x, x)))))
}

We create a separate main file to call this functions.

source("/Users/ronakshah/Google Drive/Project Euler/40.Champernowne_constant.R")
sum_digits = count_numbers(1000000)
multiply_all_constants(sum_digits, 1000000)
sum_digits = count_numbers(1000000)

multiply_all_constants(sum_digits, 1000000)
#[1] 210

system.time(multiply_all_constants(sum_digits, 1000000))
#user  system elapsed
#3.702   0.150   3.860

 

Euler Problem 39 – Integer Right triangles

Integer right triangles

Problem 39

If p is the perimeter of a right angle triangle with integral length sides, {a,b,c}, there are exactly three solutions for p = 120.

{20, 48, 52}, {24, 45, 51}, {30, 40, 50}

For which value of p ≤ 1000, is the number of solutions maximized?

For this problem I thought there must be some direct answer / formula from which we can calculate all the Pythagorean triplets for each number. Turns out there are couple of formulas from where you can get the triplets however, those formulas do not give you all the combinations. They miss few of them so this was not suitable for this question where we need all the possible combinations. I will still discuss couple of them here as they are nice and useful tricks which can be helpful.

One trick to find Pythagorean triplet was: The Pythagorean triplet are a2 + b2 = c2 where a < b < c. In this trick we can take the smallest number and then based on that get the corresponding triplet. Now there are two cases in this:

1) When the smallest number is odd: If the number is odd then we take square of it and then divide it by 2. Square of an odd number is also an odd number and when divide by 2 would give us a decimal number. For example: – consider the smallest number as 5. 52 is 25 and when divide by 2 we get 12.5, now the two closest integers around that decimal number are the other two numbers of the triplet. So in this case it is 12 and 13. Hence, 5, 12 and 13 are Pythagorean triplets. Another example considering 9 as smallest number. 92  is 81 and after dividing by 2 we get 40.5. So the triplet is 9, 40 and 41.

2) When the smallest number is even: If the number we have selected is even. We divide by 2 first and then square it and the closest integers to that number would be the triplets. Consider 6 as the smallest number. Now divide 6 by 2, we get 3 and then we square it to get 9. Now the closest integers to 9 are 8 and 10. So 6, 8 and 10 are triplets. Similarly with 12, we get 35 and 37 as triplets.

This is an excellent trick except for the fact that these gives one of the many triplet including the given number and not all the triplets.

Another one is using the formula. Given two numbers m and n , such that m > n we can find a triplet using the formula:

a = m2 - n2
b = 2mn          
c = m2 + n2

This is Euclids formula of generating primes . This is nice as well but even this suffers from the same problem that it doesn’t give all the primes which is discussed in the Wikipedia page. There is also some workaround adding a constant k but now it was getting too complicated so I preferred to switch to the our donkey approach (brute force method).

Here , we have 3 loops for a, b and c. a starts from 1 to the limit (here 1000), b starts from a to limit and c starts from b to limit since a < b < c. Now , we check the condition if it satisfies the condition a2 + b2 = c2 we add them in the list of Pythagorean triplets. Now we sum all the triplets and filter out those whose sum is more than 1000 and get that sum which occurs for the maximum time.

get_max_solution_pythagorean_triplet_v1 = function(limit) {
limit = 1000
all_list = list()
for (p in seq(limit)) {
  for (q in seq(p, limit)){
    for(r in seq(q, limit)) {
      if(p^2 + q^2 == r^2) {
        all_list = append(all_list, list(c(p, q, r)))
      }
    }
  }
}
sum_list <- sapply(all_list, sum)
return(which.max(table(sum_list[sum_list <= limit])))
}

get_max_solution_pythagorean_triplet_v1(1000))
# 840
# 155

So, 840 is the number which occurs for maximum (155) times.

Now, after checking for how other people have solved this question online I came out with certain checks which can reduce some iteration. For example, when sum of three numbers moves beyond 1000 stop the iteration as our limit for this question is 10000 or also by equations a2 + b2 = c2 and a + b + c = 1000 we can make out that we are only concerned about numbers whose perimeter comes out to be even.

However, adding these checks have given me some strange system time results. Check the complete code on github .

Euler Problem 38 – Pandigital Multiples

Pandigital multiples

Problem 38

Take the number 192 and multiply it by each of 1, 2, and 3:

192 × 1 = 192
192 × 2 = 384
192 × 3 = 576

By concatenating each product we get the 1 to 9 pandigital, 192384576. We will call 192384576 the concatenated product of 192 and (1,2,3)

The same can be achieved by starting with 9 and multiplying by 1, 2, 3, 4, and 5, giving the pandigital, 918273645, which is the concatenated product of 9 and (1,2,3,4,5).

What is the largest 1 to 9 pandigital 9-digit number that can be formed as the concatenated product of an integer with (1,2, … , n) where n > 1?

The question seems to be too broad at first. I mean how do you decide which numbers would create pandigital multiples. However, after some analysis we can find out where to look. Since n has to be greater than 1 we have to restrict our number less than 5 digit because any 5 digit number when multiplied by 2 would give other 5 digit number which when concatenated would give 10 digit number but we only need a 9 digit pandigital number. If we take a two digit number then for n = 3 we will get a 8 digit number (2 + 3 + 3) and for n = 4 we will get a 11-digit number. (2 + 3 + 3 + 3). For 3 digit number, for n =2 we get 7 digit number (3 + 4) and for n = 3, we get 11 digit number (3+4+4) . So it has to be a 4 digit number starting with 9.

Also, for being pandigital number it has to contain numbers from 1 to 9 that too only once. No repetition of numbers is allowed. So the numbers would range from 9123 to 9876. n is going to be 2 here to get 9-digit number (4+5). We use the is_pandigital function from problem 32 where it returns a Boolean value if a number is pandigital or not.

So for numbers in range from 9123 to 9876 we paste the number and its second multiple together and check if it is pandigital or not and select the highest number from it.

source("C:\\Users\\Ronak Shah\\Google Drive\\Git-Project-Euler\\
       32.Is_Pandigital.R")
get_pandigital_multiples = function() {
   highest_pandigital_number = 0
   for(i in seq(9123, 9876)) {
     pasted_number = paste0(i, i*2)
     if(is_pandigital(pasted_number)) {
       highest_pandigital_number = pasted_number
     }
   }
  return(highest_pandigital_number)
}

get_pandigital_multiples()
#[1] "932718654"

system.time(get_pandigital_multiples())
#user  system elapsed
#0.14    0.00    0.14

9267 9273 and 9327 were the numbers which give a pandigital multiples out of which obviously the product with 9327 is the highest.
9327 X 1 = 9327 & 9327 X 2 = 18654