My Blog

bsetools – Get BSE share prices

In the last article I showed how to get prices from BSE’s website. As there is no package/library for bse prices I thought of creating a package and uploading it to PyPI. I decided the name of the package to be bsetools as it sounds similar to nsetools. So having found either of our package the user can easily find the other package if they wish to use the package. I had no idea how to upload packages to Pypi as I had not done it before. After googling a bit I found two posts really helpful to upload a package to PyPI which helped me a lot.

  1. Peter Downs blog  and
  2. Pycharm blog 

Apart from the code files I uploaded setup.py and LICENSE.txt files. LICENSE.txt file contains the MIT License which allows the conditions to share, distribute and reuse this code. setup.py contains all the information of the package. The install_requires part is used to mention all the package dependencies which needs to be installed to use this package. So in case if the person installing the package does not have these package, it automatically gets downloaded into their system so that they can use the package without any blocking.

Once the github repo is clean and ready for release of the package, we need to tag it.

git tag tag_name

Where tag_name is usually the version number which you are uploading.
Then do

git push –tags origin master

After the tag is pushed. We can upload the package to pypi directly by doing:

python setup.py sdist upload -r pypi

There are few changes which I included in this release.

  • I added support for python 3.5+
  • In case if wrong share price is requested which does not have bse page then it returns the error message.

The link to package on Pypi is here : PyPI Bse tools and the github repo is at
Github bsetools

So if you want to install and start using bsetools package you just need to do

 pip install bsetools

 

Advertisements

Getting BSE data

I had recently written a blog on getting NSE data and sending an email with the prices. That used the library called nsetools to get the prices. I actually needed prices of share from BSE website but there is no package called bsetools or any other one. So I decided to try to do it myself. I checked the source code for the nsetools package and basically they have scraped the prices from nseindia.com – the official website for NSE. So if you check the prices on nseindia then you would find that it has https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol= this link as constant and the part after “=” is taken based on the share price which we want.  Therefore it was necessary to get the symbol of the share for which we want the prices. So for Infosys the link becomes https://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/GetQuote.jsp?symbol=INFY and so on for all other shares as well.

I went to the bseindia.com website but unfortunately there is no direct way to get the share prices link. For example , for Infosys the link is : http://www.bseindia.com/stock-share-price/infosys-ltd/infy/500209/ , for tech Mahindra it is http://www.bseindia.com/stock-share-price/tech-mahindra-ltd/techm/532755/ so I can get the symbols infy and techm as the share symbols but there is are extra two unknowns here. For infy , it is “Infosys-ltd” and some number (500209) and same with tech Mahindra (tech-mahindra-ltd) and 532755. So this is extra and I don’t know from where to get this information for each share price. So I couldn’t dynamically create a link for BSE. I started looking for some other websites/resources that I can fetch the data from. There were couple of websites (rediff, moneycontrol etc) which gave the share prices but their URL needed modification in some way or the other as well. Another thing which I found was if you google “Infosys share prices” you would get bse share prices of Infosys there itself. Now this was easier because I was sure Google has a python API which I can use to query. However, after comparing the prices from Google website and BSE website I found that they were not equal, there was some difference in the price. It is not worth the effort if you can’t return the correct prices. So I decided to scrape the prices from bse website only. I didn’t know how but at least the prices were reliable if we select it from their original website.

So then I had an idea of using Google’s search API to get to the BSE website. We can search for share price on Google and then we will get a list of results and one of them would be bseindia’s website which would be present mostly in top 10 list of search result only. So I just filter the result and select the link which has bseindia in it. So with that we have the URL and we can go to the respective bse website’s link and then extract the prices.

After we reach to that respective page it took a little bit of time to find out how to extract the value (share price) from that page. I am using PhantomJS and BeautifulSoup to extract the data from the website. Basically there are only two classes which we need to consider to get the prices, “tbmainred” and “tbmaingreen” . So if the prices are greater than previous day then it takes class tbmaingreen and if the price is lesser than previous day then it takes tbmainred class. So I just need to search for that class and extract it’s value. I did create a class so I can use it from a third-party file to get the price.

I added an extra file (test.py) which reads the data from csv and extracts prices for all the shares listed in the csv and appends it as a new column in csv. Maybe I will use this to get email of prices in the future.

Euler Problem 53 – Combinatoric selections

Combinatoric selections

Problem 53

There are exactly ten ways of selecting three from five, 12345:

123, 124, 125, 134, 135, 145, 234, 235, 245, and 345

In combinatorics, we use the notation, 5C3 = 10.

In general,
nCr =n! / !(n−r)!,

where r ≤ nn! = n×(n−1)×…×3×2×1, and 0! = 1.

It is not until n = 23, that a value exceeds one-million: 23C10 = 1144066.

How many, not necessarily distinct, values of  nCr, for 1 ≤ n ≤ 100, are greater than one-million?

Somebody just asked me “Is there any Euler solution that you are particularly proud of?”. I am sure I have written something nice in the 50+ problems that I have solved till now however, I could not recollect any solution at that moment that I could be proud of. Anyway, this is one of the solution that I am particularly proud of. Although, computationally I was not able to made it super-fast but at least I changed it to something which took infinite amount of time to something which is acceptable time.

The base R combn function gives us all the combinations taken m at a time. So

combn(5, 3) gives

     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,]    1    1    1    1    1    1    2    2    2     3
[2,]    2    2    2    3    3    4    3    3    4     4
[3,]    3    4    5    4    5    5    4    5    5     5

This is all the possible combinations taking m (3) at a time. Here, we do not need all the combinations but just the count of number of combinations which we can get by ncol(combn(5, 3)) which returns 10.

Now the brute force approach is to write a two for loop and calculate number of elements for which ncol(combn(i, j)) >  million. Just for comparison I would have even included this example but it becomes computationally very expensive as it progresses. Moreover, it cannot even count for large numbers. For eg :-  ncol(combn(30, 17)) gives an error : Cannot allocate vector of size 7.6 gb. This was just 30 and we need to calculate values till 100. I definitely needed some better approach.

I then started to find out some patterns to calculate these combinations. So If we check values for all the values of 10 i.e combn(10, 1), combn(10, 2)….

sapply(seq(10), function(x) ncol(combn(10, x)))
#[1]  10  45 120 210 252 210 120  45  10   1

So, if you observe the values follow a circular path, they start on some minimum value, keeps on increasing till it reaches maximum value and then values started decreasing again ending at 1. For 10, value[5] is max value, value[4] = value[6], value[3] = value[7] and so on.

For 12 –

sapply(seq(12), function(x) ncol(combn(12, x)))
#[1]  12  66 220 495 792 924 792 495 220  66  12   1

Similarly, for 12, value[6] is maximum, value[5] = value[7] and so on. So you see the pattern, right? This is all for even values, let’s check for some odd values.

sapply(seq(11), function(x) ncol(combn(11, x)))
#[1]  11  55 165 330 462 462 330 165  55  11   1

For 11, we get max value at two positions, value[5] and value[6], then value[4] = value[7], value[3] = value[8] and so on.

sapply(seq(13), function(x) ncol(combn(13, x)))
#[1]   13   78  286  715 1287 1716 1716 1287  715  286   78   13    1

Similarly, for 13 we get max high value at value[6] and value[7] and later it keeps on forming pair across different indexes as shown in previous cases.

In the question, we need to find number of values which take value more than a million. Now based on the above observation can we devise some formula which will calculate the number of values which can be above some threshold instead of calculating ncol(combn(x, y)) for each number. For simplicity, in the above examples I assumed that the threshold is 100 and tried finding number of values which are greater than the threshold.

Let x be the first index which exceeds value of the threshold. Now , find how far this value is from the middle value because as the numbers are equally distributed, the number of values greater than threshold at left of mid is going to be equal to the value right of mid.

For 10, x = 3, mid = 5, end = 7
For 12, x = 3, mid = 6, end = 9

So for even numbers, number of values greater than 100 are 5 and 7 respectively. We can make a general formula for even numbers by 2 * (mid – x) + 1

For 10, 2 * (5-3) + 1 = 5 and
For 12, 2 * (6 – 3) + 1 = 7.

For 11, x = 3, mid = floor(11/2) = 5, end = 8
For 13, x = 3, mid = floor(13/2) = 6, end = 10

For odd numbers, number of values greater than 100 are 6 and 8 respectively. We can make a general formula for odd number by 2 * (mid – x) + 2

For 11, 2 * (5-3) + 2 = 6
For 13, 2 * (6-3) + 2 = 8.

I tried and studied so many examples of both odd and even numbers before coming up with this formula. The 2 of them shown here are just for illustration purpose only. So for each combination, we need to find the index of that one number which turns out the output of ncol(combn.. to be over a million and then we can calculate the number of total values over threshold directly without calculating any other values. This was a cool trick exploiting the property of combinatorics.

get_combinatoric_selection <- function(max_number, limit) {    
  counter = 0   
  for(i in seq(23, max_number)) {      
    for(j in seq(i)) {        
      if(ncol(combn(i, j)) > limit) {
        mid = floor(i/2)
        if (i %% 2 == 0) {
          counter = counter + (2*(mid - j) + 1)
        }
        else {
          counter = counter + (2*(mid - j) + 2)
        }
       break
      }
    }
  }
}

Made it as a separate function which accepts the max_number (upper limit) and the limit(threshold) value.

system.time(get_combinatoric_selection(100, 1000000))
#user  system elapsed
#303.96    2.14  306.78

print(counter)
#[1] 4075

 

Euler Problem 51 – Prime digit replacement

 Prime digit replacements

Problem 51

By replacing the 1st digit of the 2-digit number *3, it turns out that six of the nine possible values: 13, 23, 43, 53, 73, and 83, are all prime.

By replacing the 3rd and 4th digits of 56**3 with the same digit, this 5-digit number is the first example having seven primes among the ten generated numbers, yielding the family: 56003, 56113, 56333, 56443, 56663, 56773, and 56993. Consequently 56003, being the first member of this family, is the smallest prime with this property.

Find the smallest prime which, by replacing part of the number (not necessarily adjacent digits) with the same digit, is part of an eight prime value family.

Pheww!! At the end of last year (2017) I completed 50 problems from Euler and then I took a week of break from Euler and then came back to solve 51 and woaahhhh, this was so difficult. I kind of get it what the problem was but I really had tough time thinking of the way this should be coded. This problem has two unknowns. First one is we don’t know what is the upper bound when to exactly stop iterating. We know that we need to replace with similar digits but it is not mentioned how many digits we need to replace. In the first example, only 1 digit is replaced, in second example 2 digits are replaced, no such information is mentioned for the problem. As we are not even aware about the upper bound of the number to be checked so we can replace 2, 3, 4…or any number of digits. We do not have all this information beforehand.

So to start, the repeating digits can be anything between 0-9 and since we need to find a 8-member family, the smallest digit has to be either be 0, 1 or 2. I basically could not come up with the solution directly, I had to read some online stuff about it. Basically, it agrees that to find 8-member family it has to be a 6-digit number with 3 unknowns. Now we know there are 3 unknowns but we are not sure about their positions. As mentioned in the question it could be anywhere not necessary it has to be adjacent digits to be replaced. So for a 6 digit number the numbers could be 1x1x1x or 11x1xx or 111xxx or xxx111 or all other possibilities. We need to check for all the combinations it can take. For which combn seems to be a good function which does exactly the same. It creates a number of combinations of n (6) numbers taken r (3) at a time. So,

combn(3, 2) gives

     [,1] [,2] [,3]
[1,]    1    1    2
[2,]    2    3    3

And

combn(4, 2) gives

     [,1] [,2] [,3] [,4] [,5] [,6]
[1,]    1    1    1    2    2    3
[2,]    2    3    4    3    4    4

Each column represents one combination. So now for each combination of combn(6, 3) we start checking numbers from smallest six digit number (10001) to largest six digit number (999999). We first check if the number itself is prime first. If the number is not prime then there is no point replacing other numbers and checking all other things because we need all the numbers after replacing digits to be prime including the one which we are checking. By doing combn(6, 3) we are trying to shuffle the position where the replacement can be placed, now we need to decide which number actually we need to replace. As mentioned earlier, it could be any number between 0-9, as we know that these numbers are going to be repetitive so need to get their combinations within itself. Now for each of 0:9 we replace the values at that particular position (which is already decided by combn(6, 3)) by 0, 1, 2, 3…9. Check how many of those numbers are prime, if there are 8 primes then it is the answer or else we move forward.

Let us take an example to better understand this, for 1st value of combn(6, 3) which is c(1, 2, 3) and the lets take the first number as 10001.

So we replace first three digits with 0-9 generating numbers

000001, 111001, 222001, 333001, 444001, 555001, 666001, 777001, 888001 and 999001.

After these we remove all the numbers with preceding 0’s because it doesn’t make sense having 000001 when the actual number is just 1 and pass all the above numbers to is_prime function and count the number of primes in this sequence if it is equal to 8, then that is the answer and if not then we take the next number. We increment the numbers by 2 as every alternate number is an even number and is not a prime anyway so there is no point checking.

source("is_prime.R")

unknown_number = 3
consecutive_primes = 8
flag = TRUE

check_all_prime_combinations <- function(num) {
  current_number = 100001
  print(num)
  while(flag & current_number < 999999) {
    print(current_number)
    if (is_prime(current_number)) {
      all_nums = as.numeric(sapply(0:9, function(x)
      paste0(replace(strsplit(as.character(current_number), "")[[1]], num, x), collapse = "")))
      all_num_chars <- nchar(all_nums)
      all_nums <- all_nums[all_num_chars == names(which.max(table(all_num_chars)))]      
      if (sum(is_prime(all_nums)) >= consecutive_primes) {
        print(all_nums)
        flag = FALSE
      }
     }
    current_number = current_number + 2
   }
 }

combn(6, unknown_number, check_all_prime_combinations, FALSE)

#[1] 121313 222323 323333 424343 525353 626363 727373 828383 929393

I haven’t dared to check the time complexity as there is no point calculating time complexity if something you write takes more than 24 hours to execute 😛 I don’t know how I am OK with functions which takes so much of time to execute, I should have tried better approaches.

Euler problem 52 – Permuted multiples

Permuted multiples

Problem 52

It can be seen that the number, 125874, and its double, 251748, contain exactly the same digits, but in a different order.

Find the smallest positive integer, x, such that 2x, 3x, 4x, 5x, and 6x, contain the same digits.

I know, I haven’t written the solution for problem 51 yet but problem 51 was quite tricky and I have written a complicated code for it so it is running in the background(has been more than 24 hours already 😦 ) to generate the answer. In the meantime, I just had a look at problem 52 and it seemed interesting and straightforward so I proceeded with it.

The question is quite simple we need to find a number whose 2nd, 3rd, 4th, 5th and 6th multiple contain the same digit. So as per tradition, I first go with the donkey approach (brute force method) and solve it and then if possible I try to optimize it further.

So for each number we calculate its first 6 multiples and check if they have the same digits in them. I have broken this down into various functions.

get_character_in_order <- function(x) {
  sort(strsplit(as.character(x),"")[[1]])
}

get_character_in_order function breaks each number into separate digits and returns it sorted. For example, get_character_in_order(8914) would return #[1] 1 4 9 8

is_permuted_multiple <- function(vec) {
  if (length(unique(nchar(vec))) == 1) {
    ordered_vec = get_character_in_order(vec[1])
    return(all(ordered_vec == sapply(vec[-1], get_character_in_order)))
  }
  else
    return(FALSE)
 }

is_permuted_multiple function checks if a vector of numbers are permutation of each other. Before comparing their permutation we check if all the numbers in the vec have the same number of digits because if the numbers have to be permutation of each other, they should have the same digits. 154 and 1543 cannot be permutation of each other, so if the number of digits is different for any of the number, it is not a permutable multiple and we return FALSE for it directly. For the numbers who have the same digits we check if the output of get_character_in_order is same for all the numbers in the vector. We take the first number as a benchmark to compare it with all other numbers in the vector. The output for all of them should be the same for it to be permutable multiple of each other.

get_answer_problem_52 <- function() {
  flag = TRUE
  current_number = 1
  while(flag) {
    all_multiples = current_number * 1:6
    if(is_permuted_multiple(all_multiples)) {
      print(all_multiples)
      flag = FALSE
    }
    current_number = current_number + 1
  }
}

This function checks for every number one by one till the time it doesn’t find the numbers which are all permuted multiple of each other. 142857 comes out to be the answer. Although, 142857 is a big number it takes approximately 15 seconds to reach from 1 to that number. Still a good solution I would say for a brute force method.

system.time(get_answer_problem_52())
#[1] 142857 285714 428571 571428 714285 857142
#user  system elapsed
#15.27    0.00   15.30

Now, since I got the solution I was thinking of various ways to optimize it. After thinking for a while I got one way. So if a group of numbers have to be permutable combinations of each other then they also have to contain same number of digits. I have already explained this point above where we check nchar of all the numbers in the vector. For a 2-digit number all the six multiples should also be a 2-digit number. Now,

15 X 6 = 90
16 X 6 = 96
17 X 6 = 102

So for a 2-digit number to be permutable, the smallest digit cannot be greater than 16 because after 16, 17 X 6 = 102 is a 3-digit number which cannot be permutable of a 2-digit number. Similarly, for a 3 digit number

165 X 6 = 990
166 X 6 = 996
167 X 6 = 1002

After 166, all the multiples are 4-digit number.
For all 4 digit number the next limit is 1666, then 16666 and so on.

Basically, you just keep on appending 6 for each digit number. Ultimately, we can cut down the sequence to check, to

c(1:16, 100:116, 1000:1666, 10000:16666) and so on.

get_answer_problem_52_v1 <- function() {
  flag = TRUE
  start_number = 10
  end_number = 16
  current_vec = 1:16
  while(flag) {
    for (current_number in current_vec) {
      all_multiples = current_number * 1:6
      if(is_permuted_multiple(all_multiples)) {
        print(all_multiples)
        flag = FALSE
        break
       }
     }
  start_number = start_number * 10
  end_number = as.numeric(paste0(end_number, 6))
  current_vec = seq(start_number, end_number)
  }
}

The only change in this part was to create the required sequence. It showed a good performance improvement.

system.time(get_answer_problem_52_v1())
#[1] 142857 285714 428571 571428 714285 857142
#user  system elapsed
#8.57    0.00    8.59

Euler Problem 33 – Digit cancelling fractions

The fraction 49/98 is a curious fraction, as an inexperienced mathematician in attempting to simplify it may incorrectly believe that 49/98 = 4/8, which is correct, is obtained by cancelling the 9s.

We shall consider fractions like, 30/50 = 3/5, to be trivial examples.

There are exactly four non-trivial examples of this type of fraction, less than one in value, and containing two digits in the numerator and denominator.

If the product of these four fractions is given in its lowest common terms, find the value of the denominator.

After solving 50 problems, I realised that I had left problem 33 incomplete at that time because I could not understand it so I decided to continue it now. Turns out that I still do not understand the problem:P. I read the problem 4-5 times to get a better idea of it and broke it into different pieces.

So, the question is we need to find two 2-digit numbers which has one common number in between them and the ratio of those two 2-digit numbers is same as the one after we remove the common digit. As example is shown, 49/98 ratio is 0.5 and now if we remove the common digit in that number which is 9 we get numbers 4 and 8 whose ratio is 0.5 as well. We need to exclude some trivial example including 0, which are obvious. So, 30/50 is obviously going to be 3/5, same with 40/60 and so on. So we need to exclude those numbers which are completely divisible by 10.

The last line is confusing. Ok, so there are four such fractions from 10 to 99 which satisfies this condition and they have value less than 1 which means numerator < denominator. Now you need to take product of these four fractions and taking its lowest common terms find the value of denominator. Unclear to me at this moment what it means by taking lowest common terms. However, except for the last part rest everything was clear to me, so I decided to proceed anyway.

I decided to take the most basic, lazy method for the first try. The normal approach would be to use two loops from 10 to 99 and another one for 10, 99 but instead I used 4 for loops one for each digit of 2-two digit number. We now check for all the conditions which it needs to satisfy. It should have only one digit common in both the numbers and the numbers should not be repeating. We paste the numbers together and calculate the ratio1 of those two numbers. Further, we calculate the frequency of each character and keep only those digits which occurs only once, so duplicated digits get eliminated. We now calculate the ratio2 for these remaining digits and if ratio1 is equal to ratio2 then we print the numbers. Now, comes the last part which was unclear initially. So after searching few things online I realised we need to multiply all the fractions together and take reciprocal of it. We keep a note of products of numerators and denominators separately. We return the ratio 1/numerator/denominator.

The four fractions were :

[1] 16 64
[1] 19 95
[1] 26 65
[1] 49 98

The complete code can be found here. It seemed a very lengthy procedure though it didn’t take much of time.

digit_cancelling_fractionv1()
#[1] 100

system.time(digit_cancelling_fractionv1())
#user  system elapsed 
#0.14    0.00    0.14 

I also tried another approach with two for loop which almost uses the same logic and gives a slightly better performance.

system.time(digit_cancelling_fractionv2())
#user  system elapsed 
#0.08    0.00    0.08 

 

#Rstats bot on Twitter

I already have a twitter bot @wakeupwithsmile , which wishes random people Good Morning on Twitter. I have described the process, the code in some of the previous posts I published. I made another bot that which retweets #rstats stuff. I anyway love R, so doing something like this would be helpful for me and maybe for the community as well.  Since I already had most of the code, I just had to change the keyword (from #GoodMorning to #rstats). I also added the feature to auto follow/unfollow people as even this was already present and I really didn’t had to do anything extra. You can find the bot at @rstatsbot1234 and the code for it is present on GitHub.

It took me not more than 2 hours, to create an account, register an app and arrange the code with few modifications. Since I have been following this bot, it has been an interesting read to get all the #rstats news at one place.

Send email with daily stock prices

I have been working since 3-4 years now and I have saved a bit of money. As my dad is stock broker, I have invested some part of money in it. I just give him the money and he invests in whichever company he finds it appropriate. He informs me which stock he has bought, at what price and what quantity. I make a note of it in an excel file and later when we sell the stock, we also check how much profit/loss we made. However, I am not that stable kind of person who buys a stock and forget it. Every now and then I check the current price of the stocks which I have bought. Slowly I started buying more stocks and then it became increasingly tedious to check current prices of the stock every day. So I thought of automating this process.

There is an amazing library called nsetools which gives the current prices of the stocks from the nse website. I have a csv file which stores the information of the stocks which I have bought in the following format:

Date_Bought Shares Rate_Bought Quantity Amount
02-06-17 Infy 61.2 100 6120
30-10-17 HDFC 76.89 240 18453.6

*only for illustration purpose, are not real prices of the stock

So, I first read this csv and get the shares column. The thing to note is the share column should mention the stock names as symbol name as shown in below example.

email

So once we have the list of stock symbols, we send it to get_current_stock_prices function where we get the latest quote for each symbol and return the list. In case, if certain symbol is invalid, it would just return 0. According to current price of the stock, we calculate the current profit/loss of the stock and add a new column to the dataframe.

Now we have a dataframe ready to be sent via email. This dataframe is then send to email_main function which is responsible for sending the mail. I had written a post about sending a mail from python earlier this year and I realized it now that I don’t have this code on Github. I did a few modifications in that code according to the requirement here and it was ready to work.

I add the credentials and the receiver’s email in the yml file and read it. The tabulate package was helpful to represent the email in a table format. The complete code can be found here.

Euler Problem 50 – Consecutive prime sum

Consecutive prime sum

Problem 50

The prime 41, can be written as the sum of six consecutive primes:

41 = 2 + 3 + 5 + 7 + 11 + 13

This is the longest sum of consecutive primes that adds to a prime below one-hundred.

The longest sum of consecutive primes below one-thousand that adds to a prime, contains 21 terms, and is equal to 953.

Which prime, below one-million, can be written as the sum of the most consecutive primes?

This took a little time for me to understand what exactly is expected of the problem. So you have to find consecutive primes whose sum is also prime. For the example shown, sum of 2 + 3 + 5 = 8 is not prime but 2 + 3 + 5 + 7 + 9 + 11 + 13 = 41 is prime. So you don’t need all the sum of consecutive prime numbers to be prime. So here the next prime number which is 17, if we add it in 41 the sum is 58 which is not prime so there are only 6 consecutive prime. Another point to be noted is it is not necessary that the consecutive prime would start from starting i.e 2 only. It can start from any prime number. So as shown in example, instead of the consecutive primes starting from 2, the consecutive prime can be started from 29, 31, 37, 41 and so on.

First, we get all the primes below n (here 1000000) and for every prime we calculate their sum of primes incrementally, one step at a time.

get_primes_under_n(10)
#[1] 2 3 5 7

Now we, incrementally sum them, For 2

2 + 3  = 5
2 + 3 + 5 = 10
2+ 3 + 5+ 7 = 17

For 3,

3 + 5 = 8
3 + 5 + 7 = 15

And so on for all the other numbers.

Now for each of sum , we call the is_prime function and take the last index which was prime.
is_prime(5,10,17) would return TRUE FALSE TRUE. So in this case we store the index value to be 3 as this is the maximum value which is also prime. We follow the same for 3, 5 and 7. If any of the iteration exceeds the previously encountered index value then we replace the one with the new index value.

Also , we can add some conditions to reduce the comparisons. Once the sum exceeds the value of n, we should stop counting sums as we need to find a prime which is below n. In our example, then we should remove 17 from the sum_prime calculation as 17 > 10. Also in certain cases it is possible that we would not have any prime in the sum_prime, we can check length(ind) to cover that condition.

source("/Users/Ronak Shah/Google Drive/Git-Project-Euler/31-40/35. Get_Primes_Under_n.R")
source("/Users/Ronak Shah/Google Drive/Git-Project-Euler/1-10/3.Check_if_Prime.R")
n <- 1000000
cons_index <- 0
list_primes <- get_primes_under_n(n)
len_primes <- length(list_primes)
for (i in seq(len_primes - 1)) {
  sum_prime = numeric()
  for (j in seq(i+1, len_primes)) {
    sum_num <- sum(list_primes[i:j])    
    if(n > sum_num) {
      sum_prime <- c(sum_prime, sum_num)
    }
    else  
      break() 
    }
    ind = which.max(cumsum(is_prime(sum_prime)))   
    if(length(ind) > 0)
      if(ind > cons_index) {
        cons_index = ind
        cons_sum = sum_prime[ind]
     }
 }
 print(cons_sum)
 #[1] 997651

This code gives the correct answer however, it is computationally too expensive. Took more than two hours on my old laptop to execute this. I didn’t dare to calculate the time to execute this one. Complexity of O(n2) is never good

Euler Problem 49 – Prime Permuations

Prime permutations

Problem 49

The arithmetic sequence, 1487, 4817, 8147, in which each of the terms increases by 3330, is unusual in two ways: (i) each of the three terms are prime, and, (ii) each of the 4-digit numbers are permutations of one another.

There are no arithmetic sequences made up of three 1-, 2-, or 3-digit primes, exhibiting this property, but there is one other 4-digit increasing sequence.

What 12-digit number do you form by concatenating the three terms in this sequence?

There are multiple ways in which this problem can be solved. I have solved in two ways.

  • For every 4 digit number we generate all its permutation, keep only those numbers which are prime and check if there are any 3 numbers who has the same difference between them.
  • We first get the prime numbers in the range, then generate its permutations and then check if there are any 3 numbers who has the same difference between them.

The second approach is better since we are generating permutations of only the prime numbers whereas in the first approach we are generating permutations of all the 4 digit numbers.

First we generate a sequence of all the 4-digit numbers starting from 1001 to 9999 by step of 2. We exclude all the even numbers as they are not prime. We also create a variable named checked_nums which keeps a check of all the numbers which we have already encountered. For 1478, we generate all its permutation of numbers like 1478, 1748, 1847, 4718, 4817 and so on. Even for 1748 it would repeat all the same permutation of numbers so we can avoid these numbers. We then remove all the numbers which are not prime and take only the unique ones and sort them in ascending order. The next check, we should have at least 3 numbers in our set because we need 3 term set. Also we check all the digits are 4 digit numbers. For numbers which include 0, like 3103, one of the permutation is 0133 which turns to be 133 when we convert it into numeric, so we exclude such combination of numbers as well.

Now to get the three numbers, we write three for loops and each one point at one unique number from the list. For example for 1487, we set A, B and C as

1487  1847  8147  7841  8741  8471  4817 ………
  A     B     C

We calculate A – B and then find if there a number exists C such as A – B =  C – B. We keep on moving C till the end of the list, if we find the number which satisfies the above condition then we print it or else move B ahead and then A and so on. We repeat this procedure for all the numbers in the list.

Yes, this seems to be a very lengthy process with so many checks and loops but surprisingly it turns out it is not. It takes a little over a second to execute this. The code is little complicated than usual. I have tried adding comments in the code this time. You can check the complete code here.

library(combinat)
source("/Users/ronakshah/Google Drive/Project Euler/3.Check_if_Prime.R")
prime_permutations_in_range <- function(start, stop) {
  checked_nums <- numeric()
  for(i in seq(start, stop, 2)) {
    if (!i %in% checked_nums) {
      all_combns <- as.numeric(sapply(permn(strsplit(as.character(i), "")[[1]]),
                    function(x) paste0(x, collapse = "")))
      checked_nums <- c(all_combns, checked_nums)       primes  2 & all(nchar(primes) > (nchar(start) - 1))) {
        for(a in seq(length_prime - 2)) {
          for(b in seq(a + 1, length_prime - 1)) {
            for(c in seq(b + 1, length_prime)) {
              if ((primes[b] - primes[a]) == (primes[c] - primes[b])) { 
                print(c(primes[a], primes[b], primes[c]))
            }
          }
        }
      }
    }
   }
  }
 }

The timings for this approach :

system.time(prime_permutations_in_range(1001, 9999))
#[1] 1487 4817 8147
#[1] 2969 6299 9629

#user  system elapsed
#1.192   0.075   1.284

The second approach is 80% similar with few changes initially.

prime_permutation <- function() {
  all_primes <- get_primes_under_n(10000)
  all_primes  1000]
  checked_nums <- numeric()

  for(i in all_primes) {
    if (!i %in% checked_nums) {

and from here it continues same as the first approach. This gives a slight performance improvement though.

system.time(prime_permutation())
#[1] 1487 4817 8147
#[1] 2969 6299 9629

#user  system elapsed
#0.798   0.010   0.812

BTW, I also added third approach which first gets all the primes in the range and check if there are equal distance prime triplets between them. This approach seem to have least amount of checking involved with limited numbers involved however, this is the most expensive of the three approaches.