My Blog

Geometric Brownian Motion using Python.

This post is going to be a little different than what I usually post. I have recently joined a new organization in the finance domain. They are into trading and stuff. So you can trade Forex pairs, indices, commodities etc on their website. They have also generated their own underline named as volatility which can also be used for trading and to my surprise that is the most traded underline on our website. Not Forex, not metals but volatility indices are the most traded underline.

So I just thought can I also generate my own trading underline something similar. After some googling and consulting my seniors, I came across Geometric Brownian Motion  which can be used to generate such feeds.

The formula for feed generation is quite scary –

gbm

So you start with any initial value and then generate next values using the above formula. Most of which are constants.

To explain the formula –

S  is the current value of the feed.
So – is the previous value of the feed.
µ- 0
sigma – 0.01
Wt – random number between -1, 1.

So that’s it with all this ingredients, we are ready to generate our own feed. Now, this needs to be a continuous process. So we would also require a loop for that. Just for the records I am also writing each value into a text file.

from __future__ import division
import matplotlib.pyplot as plt
import math
import random
import time
#Generating random number using geometric Brownian motion
def dataListener() :
#Initializations of the constants
    startingIndex = 1000
    mu = 0
    sigma = 0.01
    flag = True
    i = 1
    try:
       while(flag):
          file = open('example.txt', 'a')
          expression = (mu - (0.5 * sigma ** 2)) * (2/(365 * 86400)) + (sigma *
          random.uniform(-1, 1))
          startingIndex = startingIndex * math.exp(expression)
          file.write(str(startingIndex) + "," + str(i) + "\n")
          print str(startingIndex) + "," + str(i)
          i = i + 1
          file.close()
          time.sleep(2)
    except KeyboardInterrupt:
           print("\nDone!!")

if __name__ == "__main__":
    dataListener()

This generates a continuous feed and writes the value in the console and revolves exactly the way I wanted. Goes positive/negative in both the ways. Now, the next target was to draw a graph which shows this continuous value. Finding an appropriate method for this one too took lot of time. Tried and tested various things and you know what, finally it was the basic matplotlib which came to my rescue.

From this  python programming tutorial, I tweaked it into something which I can use. The above code writes the data into a text file, whereas this data reads it from the same text file and plots a graph which changes real time.

import matplotlib.pyplot as plt
import matplotlib.animation as animation
from matplotlib import style
import randomNumberGeneration
style.use('fivethirtyeight')
fig = plt.figure()
ax1 = fig.add_subplot(1,1,1)

def animate(i):
    graph_data = open('example.txt','r').read()
    lines = graph_data.split('\n')
    xs = []
    ys = []
    for line in lines:
      if len(line) > 1:
        x, y = line.split(',')
        xs.append(x)
        ys.append(y)
      ax1.clear()
      ax1.plot(ys, xs)
ani = animation.FuncAnimation(fig, animate, interval=1000)
plt.show()

Twitter Bot in Python

This is going to be a short one. A quick update on something which I have already done.

I had prepared my bot in R using twitteR package and soon after it went live, I started facing issues like no function to follow the user, no function to mute them etc. I soon realized that I have to switch to Python for this. As I had started with R, I kept the first part as it is and started working for other parts in Python. The recent version was a hybrid one which was half in Python and half in R.

Search a tweet with #GoodMorning and replying it back was done in R and follow/ unfollowing people along with muting them was done in Python. I had started working on other additional features as well and that too in Python. I knew sooner or later I have to move the entire code in Python as this hybrid one would not last longer but I was just being too lazy or pretending to be busy to complete it.

Finally, one day I decided to do it. The coding part didn’t take much time but the scheduling one was a little different than the previous one and hence, it took some time to figure out.

Also, I have used both twython and tweepy package here. Functionality wise nothing has been changed, only change is in the code.

You can find the detailed code here on Github.

PS – at the time of writing this article I remember there is a new (or updated) package in R related to twitter which handles all this stuff. Not sure, though. I am python guy now. 😉

Euler Problem 16 – Power digit sum

Power digit sum

Problem 16

215 = 32768 and the sum of its digits is 3 + 2 + 7 + 6 + 8 = 26.

What is the sum of the digits of the number 21000?

I don’t know why this has been kept here after such a difficult problem 15. I mean this is simple especially with R. R never had any problem dealing with big numbers. The question is simple need to sum the digits of 2^1000.

I have written a one-line solution 😛

sum(as.numeric(unlist(strsplit(as.character(2^1000), ""))))
#[1] 1366

Here, I calculate 2 raise to 1000 th power, convert it into character, split each character separately, convert them into number and then sum them and boom – THE ANSWER!

Ok, the only tricky part is you need to run

options(scipen = 999)

before running the one-liner. This ensures that 2 ^ 1000 which is a 32 digit long number is represented as it is in complete number if this command is not run it will represent this number in scientific notation which would be something like 1.071509e+301 and then when you convert it into character and sum it , it would return NA’s.

Euler Problem 15 – Lattice Paths

Lattice paths

Problem 15

Starting in the top left corner of a 2×2 grid, and only being able to move to the right and down, there are exactly 6 routes to the bottom right corner.

PE15

How many such routes are there through a 20×20 grid

This has to be the toughest one till now. There are two types of difficult questions. One which you don’t understand at all what to do, I mean they are way out of your reach. There are other questions which you understand but then get quite stuck at it. This question was of second type.
As soon as I read it, I was sure that I don’t have to get into the graphical complexities of it and just consider this as a matrix, give them numbers and make an algorithm which will tell how you reach from point A to point B. So for 2 by 2 matrix , I made :

                  
                         1 -- 2 -- 3
                         |    |    |
                         4 -- 5 -- 6
                         |    |    |
                         7 -- 8 -- 9

So for this 2 by 2 matrix, there are 6 possible paths (as given in the question) which goes from point 1 to point 9. So there was the first generalization. For a n X n matrix there are going to be (n+1) * (n+1) points.
Now one way to solve this is writing down all the possible paths which it can take from first point to last point. So for this 2 by 2 matrix , all the possible paths are :

1 -> 2 -> 3 -> 6 -> 9
1 -> 2 -> 5 -> 6 -> 9
1 -> 2 -> 5-> 8 -> 9
1 -> 4 -> 5 -> 6 -> 9
1 -> 4 -> 5 -> 8 -> 9
1 -> 4 -> 7 -> 8 -> 9

Few observations :
• For a n X n matrix each path would contain 2n + 1 elements.
• There has to be n rights and n downs always to reach to the destination.

I was sure now, that there is formula to calculate the number of possible paths but it was difficult to arrive to the formula based only on this single matrix. So I decided to follow the same procedure for 3 by 3 matrix as well.
3 X 3 matrix also satisfied the above two observations and the number of possible paths for 3 X 3 matrix came out to be 20. It was not possible to find all the possible paths manually for 20 X 20. So I was trying to find out the formula for number of paths.

Matrix Paths
2 X 2       6
3 X 3      20

I was thinking about different kinds of generalization like 3 X 2 is 6 and 4 X 5 is 20. Ummm, but that doesn’t make any sense.

Maybe the number of paths in 3 X 3 matrix is 24 and I have missed some of them. So for n X n matrix we can have n + 1 ! paths. So for 2 X 2 is 3! And 3 X 3 is 4! . So when I checked with 21! as answer it showed as wrong answer and hence unfortunately my assumption was wrong. 3 X 3 only has 20 paths and not 24.
I thought about this for 3-4 days in various ways but finally I gave up and googled for a possible solution. As I correctly thought there was a formula to calculate but the formula was a little complicated for me to derive.

The formula was :
(2 X n) ! / n! n!

So for 2 X 2 matrix the answer is :
(2 X 2) ! / 2! 2! = 6

for 3 X 3 matrix it is :
(2 X 3)! / 3! 3! = 20

So finally, I just did the calculation in R 😛

prod(seq(40)) / (prod(seq(20)) * prod(seq(20)))
# [1] 137846528820

Euler Problem 14 – Longest Collatz sequence

Longest Collatz sequence

Problem 14

The following iterative sequence is defined for the set of positive integers:

n → n/2 (n is even)
n → 3n + 1 (n is odd)

Using the rule above and starting with 13, we generate the following sequence:

13 → 40 → 20 → 10 → 5 → 16 → 8 → 4 → 2 → 1

It can be seen that this sequence (starting at 13 and finishing at 1) contains 10 terms. Although it has not been proved yet (Collatz Problem), it is thought that all starting numbers finish at 1.

Which starting number, under one million, produces the longest chain?

NOTE: Once the chain starts the terms are allowed to go above one million.

This was one challenging question. Finding a number under one million producing the longest Collatz sequence, sounds computationally expensive? My approach was to first get to the solution and then think how to optimize it. Reaching to the solution is more important than optimizing it.
I wrote a separate function which would take a number as input and generate a collatz sequence of that number. I have a written a recursive function named generateCollatzSequence which would return if the number is 1, if the number is even it would divide it by half (n/2) and if the number is odd it would make it to 3*num + 1.

 generateCollatzSequence = function(vec, num) {
 vec = append(vec, num)
 if(num == 1) {
   return(vec)
 }
 if(num %% 2 == 0)
   num = num / 2
 else
  num = 3*num + 1
 generateCollatzSequence(vec, num)
}

So when we run
generateCollatzSequence(13)

it returns

[1] 13 40 20 10 5 16 8 4 2 1

The main function is get_highest_CNumber which accepts a number (here 1, 000, 000) and for every number from 1 to that number, it generates a collatz sequence and selects the one which is the biggest.

Source(“14. GenColSeq.R")
get_highest_CNumber = function(num) {
highestSeq = 0
for(i in seq(num))
{
  vec = numeric()
  newvec = generateCollatzSequence(vec, i)
  if(length(newvec) > highestSeq) {
    highestSeq = length(newvec)
    highestSeqNumber = i
  }
}
return(highestSeqNumber)
}
get_highest_CNumber(1000000)
#[1] 837799

and the length of the Collatz sequence of this number is 525.

Euler Problem 13 – Large Sum

Large sum

Problem 13

Work out the first ten digits of the sum of the following one-hundred 50-digit numbers.

So basically it took me a while to understand the question. I had to google a bit to understand what it meant.These are 100 numbers where each number is 50 digit long and we need to sum these 100 numbers. So each row is 1 number and we need to sum all the rows.

The trick in the question is how we handle such huge numbers. Well, fortunately R can handle very big integer value. Check .Machine$double.xmax in the R console which is 309 digit long. So using R language once again proved helpful for me.

I just read the numbers in one data frame. So it had 1 column and 100 rows and then sum the entire column. The answer we get is the complete answer and we just need first 10 digits of it. So we convert it into character and select only the 10 characters using substr. It’s just a one line solution.

Consider the dataframe as df and column as V1.

substr(as.character(sum(df$V1)), 1, 10)

You can read the dataframe in your R console by :

df = read.table(text = "
37107287533902102798797998220837590246510135740250
46376937677490009712648124896970078050417018260538
74324986199524741059474233309513058123726617309629
91942213363574161572522430563301811072406154908250
23067588207539346171171980310421047513778063246676
89261670696623633820136378418383684178734361726757
28112879812849979408065481931592621691275889832738
44274228917432520321923589422876796487670272189318
47451445736001306439091167216856844588711603153276
70386486105843025439939619828917593665686757934951
62176457141856560629502157223196586755079324193331
64906352462741904929101432445813822663347944758178
92575867718337217661963751590579239728245598838407
58203565325359399008402633568948830189458628227828
80181199384826282014278194139940567587151170094390
35398664372827112653829987240784473053190104293586
86515506006295864861532075273371959191420517255829
71693888707715466499115593487603532921714970056938
54370070576826684624621495650076471787294438377604
53282654108756828443191190634694037855217779295145
36123272525000296071075082563815656710885258350721
45876576172410976447339110607218265236877223636045
17423706905851860660448207621209813287860733969412
81142660418086830619328460811191061556940512689692
51934325451728388641918047049293215058642563049483
62467221648435076201727918039944693004732956340691
15732444386908125794514089057706229429197107928209
55037687525678773091862540744969844508330393682126
18336384825330154686196124348767681297534375946515
80386287592878490201521685554828717201219257766954
78182833757993103614740356856449095527097864797581
16726320100436897842553539920931837441497806860984
48403098129077791799088218795327364475675590848030
87086987551392711854517078544161852424320693150332
59959406895756536782107074926966537676326235447210
69793950679652694742597709739166693763042633987085
41052684708299085211399427365734116182760315001271
65378607361501080857009149939512557028198746004375
35829035317434717326932123578154982629742552737307
94953759765105305946966067683156574377167401875275
8902802571733229619176668713819931811048770190271
25267680276078003013678680992525463401061632866526
36270218540497705585629946580636237993140746255962
24074486908231174977792365466257246923322810917141
91430288197103288597806669760892938638285025333403
34413065578016127815921815005561868836468420090470
23053081172816430487623791969842487255036638784583
11487696932154902810424020138335124462181441773470
63783299490636259666498587618221225225512486764533
67720186971698544312419572409913959008952310058822
95548255300263520781532296796249481641953868218774
76085327132285723110424803456124867697064507995236
37774242535411291684276865538926205024910326572967
23701913275725675285653248258265463092207058596522
29798860272258331913126375147341994889534765745501
18495701454879288984856827726077713721403798879715
38298203783031473527721580348144513491373226651381
34829543829199918180278916522431027392251122869539
40957953066405232632538044100059654939159879593635
29746152185502371307642255121183693803580388584903
41698116222072977186158236678424689157993532961922
62467957194401269043877107275048102390895523597457
23189706772547915061505504953922979530901129967519
86188088225875314529584099251203829009407770775672
11306739708304724483816533873502340845647058077308
82959174767140363198008187129011875491310547126581
97623331044818386269515456334926366572897563400500
42846280183517070527831839425882145521227251250327
55121603546981200581762165212827652751691296897789
32238195734329339946437501907836945765883352399886
75506164965184775180738168837861091527357929701337
62177842752192623401942399639168044983993173312731
32924185707147349566916674687634660915035914677504
99518671430235219628894890102423325116913619626622
73267460800591547471830798392868535206946944540724
76841822524674417161514036427982273348055556214818
97142617910342598647204516893989422179826088076852
87783646182799346313767754307809363333018982642090
10848802521674670883215120185883543223812876952786
71329612474782464538636993009049310363619763878039
62184073572399794223406235393808339651327408011116
66627891981488087797941876876144230030984490851411
60661826293682836764744779239180335110989069790714
85786944089552990653640447425576083659976645795096
66024396409905389607120198219976047599490197230297
64913982680032973156037120041377903785566085089252
16730939319872750275468906903707539413042652315011
94809377245048795150954100921645863754710598436791
78639167021187492431995700641917969777599028300699
15368713711936614952811305876380278410754449733078
40789923115535562561142322423255033685442488917353
44889911501440648020369068063960672322193204149535
41503128880339536053299340368006977710650566631954
81234880673210146739058568557934581403627822703280
82616570773948327592232845941706525094512325230608
22918802058777319719839450180888072429661980811197
77158542502016545090413245809786882778948721859617
72107838435069186155435662884062257473692284509516
20849603980134001723930671666823555245252804609722
53503534226472524250874054075591789781264330331690")

substr(as.character(sum(df$V1)), 1, 10)

# [1] "5537376230"

Fetching twitter conversations via DM’s

Fetching twitter DM messages can be quite tricky especially when the conversation goes on for more than 200 messages. What this post helps in doing is getting an entire conversation of a user (say X) with you on twitter in a csv. Note that, the conversation is in the form of direct messages and not tweets.

The most challenging part is to get more than 200 messages. Twitter API allows only to fetch the latest 200 messages. Latest 200 which are received by you and latest 200 which are sent by you. So irrespective of how many times you call the API it would always return the same set of 200 messages. What can be done if we need to extract more than 200 messages? (The trick follows 😉 )

Well, the first thing is to have a twitter account and register your app on https://apps.twitter.com/ . We had already covered a basic tutorial on the same before. I have used the python Tweepy package which is a python wrapper to all the API calls to twitter.

api.direct_messages() and api.sent_direct_messages() are the two functions which will return the messages sent to you and sent by you respectively. As we are trying to fetch a conversation only with a particular user (User X) after getting the messages from the above functions we filter them based on the username we are looking for. So to get the next set of messages we need to delete these messages. (this is the trick ) Irrespective if it belongs to user X or not we should delete them. Its only when we delete these messages we would get the new latest messages.

This delete, deletes the messages permanently from your account and there is no way you can get them back so please be cautious while you are deleting them. Also, this is obvious but just to clarify although we are filtering messages only for user X but the other messages would also be deleted if they fall in the range.

Sharing a snippet of a code and I am not responsible for any of the goof ups in your account. 😀 😉

import sys
import tweepy
import pandas as pd
import pdb
import random


def setupTwitterAutorization() :

    api_key = "api_key"
    api_secret = "api_secret"
    access_token = "access_token"
    access_token_secret = "access_token_secret"

    auth = tweepy.OAuthHandler(api_key, api_secret)
    auth.set_access_token(access_token, access_token_secret)
    api = tweepy.API(auth)
    return api

#Extract Direct messages which are recieved and sent

def GetDirectMessagesInOrder (username) :

    receivedMsgs = api.direct_messages(count = 200)
    sentMsgs = api.sent_direct_messages(count = 200)
    if len(receivedMsgs) == 0 and len(sentMsgs) == 0 :
	return 0
  #Now extract the convesation only between the two users which we 
  #are interested in
    receiveText = []; recieveTime = []; saidBy = [];tweetId = [];

    for eachRDM in receivedMsgs :
        if eachRDM.sender_screen_name == username :
            receiveText.append(eachRDM.text)
            recieveTime.append(eachRDM.created_at)
            tweetId.append(eachRDM.id)
            saidBy.append(username)
        else :
            api.destroy_direct_message(eachRDM.id)

    for eachSDM in sentMsgs :
        if eachSDM.recipient_screen_name == username:
            receiveText.append(eachSDM.text)
            recieveTime.append(eachSDM.created_at)
            tweetId.append(eachSDM.id)
            saidBy.append('You')
        else :
            api.destroy_direct_message(eachSDM.id)

   abc = pd.DataFrame({'Tweet Id' : tweetId,'Timestamp' : recieveTime, 'Text' : receiveText, 'Said By' : saidBy})
    #Sort by timestamp so you could get an exact conversation
    if(abc.shape[0] > 0) :
        abc.sort_values('Timestamp', inplace = True)
     return abc

def DeleteRetrievedMessages(tweetIDs) :
    try :
        for id in tweetIDs :
            api.destroy_direct_message(id)
    except :
        pass

if __name__ == "__main__":
    completeList = []
    dataLeft = True
    #Setting up twitter authorization
    api = setupTwitterAutorization()
    while(dataLeft) :

        # Get one batch of msgs
        newDF = GetDirectMessagesInOrder(‘twitter_handle’)
	  if newDF == 0 :
	     dataLeft = False
        if (newDF.shape[0] > 0):
            completeList.append(newDF)
            # Delete those msgs
            DeleteRetrievedMessages(newDF['Tweet Id'])
        
    df = pd.concat(completeList)
    df.sort_values('Timestamp', inplace=True)
    df.to_csv("F:\Final.csv", encoding='utf-8')

The csv which is written is sorted by time so when you read it, it feels as if you are reading the conversation in the same flow. You can modify the script to get the answer based on your requirement.

Euler Problem 12 – Highly divisible triangular number

Highly divisible triangular number

Problem 12

The sequence of triangle numbers is generated by adding the natural numbers. So the 7th triangle number would be 1 + 2 + 3 + 4 + 5 + 6 + 7 = 28. The first ten terms would be:

1, 3, 6, 10, 15, 21, 28, 36, 45, 55, …

Let us list the factors of the first seven triangle numbers:

 1: 1
 3: 1,3
 6: 1,2,3,6
10: 1,2,5,10
15: 1,3,5,15
21: 1,3,7,21
28: 1,2,4,7,14,28

We can see that 28 is the first triangle number to have over five divisors.

What is the value of the first triangle number to have over five hundred divisors?

This problem is a perfect example of why one should read the question properly. A simple example and it took me so much of time to complete it. I’ll let you know my blunder later in the post.

For this I thought of writing one separate function for calculating factors of a number. So at first I wrote this

getFactors = function(num) {
numvec = numeric()
for (i in seq(floor(num/2)))
if (num %% i == 0)
numvec = append(numvec, i)
#Also append the number itself
numvec = append(numvec, num)
return(numvec)
}

A function which goes from 1 to half of the number since any number cannot be divided by any other number which is more than half of it. For example, 10 has the biggest factor as 5, 18 has the biggest factor as 9 and so on. However, this wasn’t giving me the best efficiency for bigger numbers. So I googled a bit and found another efficient approach.

 getFactors2 = function(num) {
numvec = numeric()
for (i in seq(ceiling(sqrt(num))))
if (num %% i == 0)
numvec =  append(numvec, c(i, num/i))
return(numvec)
}

This basically goes from 1 to the square root of the number and appends two number at a time. The factors are always in pairs. So for example for 24, we have factors as 1, 2, 3, 4, 6, 8, 12 and 24. So here pairs are 1 and 24, 2 and 12, 3 and 8, 4 and 6. This approach uses this method to its advantage and finds such pairs together. Also with this pair methodology we need to only loop till the square root of the number instead of half of it. So, if you want to find factors for 1000 you only need to loop for 32 times instead of 500. This saves whole lot of computation.

The questions reads :

What is the value of the first triangle number to have over five hundred divisors?

As in the example shown for 28, it is the number which is the first triangle number over five divisors. 28 has 6. So, I (wrongly) ‘assumed’ that over 5 divisors has 6 divisors so over 500 should mean find a first triangular number with 501 divisors. So I wrote a code which was checking if the number of divisors was 501 and it took me forever to run the script. With the getFactors function I ran it for 9 hours and it didn’t still produce the result. I was hoping this would run fast with getFactors2 function however, after running the script for 3 hours I realised the mistake.

Over 500 divisors can mean anything which is over 500 i.e 501, 502 anything. By checking only for 501 Is wrong. After that correction, it took me 10 seconds to get the answer. 😉

highly_divisible_triangle_number = function(factorLength) {
  num = 2
  while(TRUE) {
    triangleNumber = sum(seq(num))
    triangleFactors = getFactors2(triangleNumber)
#This is the place where I was making mimstake, using == instead of >
    if(length(unique(triangleFactors)) > factorLength) 
      break
    else
      num = num + 1
  }
  return(triangleNumber)
}

And as correctly thought of the answer 76576500 has 576 divisors which is greater than 500 and is not 501.

EULER PROBLEM 11 – Largest Product in a grid

Largest product in a grid

Problem 11

In the 20×20 grid below, four numbers along a diagonal line have been marked in red.

08 02 22 97 38 15 00 40 00 75 04 05 07 78 52 12 50 77 91 08
49 49 99 40 17 81 18 57 60 87 17 40 98 43 69 48 04 56 62 00
81 49 31 73 55 79 14 29 93 71 40 67 53 88 30 03 49 13 36 65
52 70 95 23 04 60 11 42 69 24 68 56 01 32 56 71 37 02 36 91
22 31 16 71 51 67 63 89 41 92 36 54 22 40 40 28 66 33 13 80
24 47 32 60 99 03 45 02 44 75 33 53 78 36 84 20 35 17 12 50
32 98 81 28 64 23 67 10 26 38 40 67 59 54 70 66 18 38 64 70
67 26 20 68 02 62 12 20 95 63 94 39 63 08 40 91 66 49 94 21
24 55 58 05 66 73 99 26 97 17 78 78 96 83 14 88 34 89 63 72
21 36 23 09 75 00 76 44 20 45 35 14 00 61 33 97 34 31 33 95
78 17 53 28 22 75 31 67 15 94 03 80 04 62 16 14 09 53 56 92
16 39 05 42 96 35 31 47 55 58 88 24 00 17 54 24 36 29 85 57
86 56 00 48 35 71 89 07 05 44 44 37 44 60 21 58 51 54 17 58
19 80 81 68 05 94 47 69 28 73 92 13 86 52 17 77 04 89 55 40
04 52 08 83 97 35 99 16 07 97 57 32 16 26 26 79 33 27 98 66
88 36 68 87 57 62 20 72 03 46 33 67 46 55 12 32 63 93 53 69
04 42 16 73 38 25 39 11 24 94 72 18 08 46 29 32 40 62 76 36
20 69 36 41 72 30 23 88 34 62 99 69 82 67 59 85 74 04 36 16
20 73 35 29 78 31 90 01 74 31 49 71 48 86 81 16 23 57 05 54
01 70 54 71 83 51 54 69 16 92 33 48 61 43 52 01 89 19 67 48

The product of these numbers is 26 × 63 × 78 × 14 = 1788696.

What is the greatest product of four adjacent numbers in the same direction (up, down, left, right, or diagonally) in the 20×20 grid?

This by far has been the most challenging problem. Took lot of time to solve and really had to think hard to get to the solution.

It was clear from the problem that we need to use rollapply like we used in the problem 8 . We need to use it once for rows and once for columns. The other option remaining was diagonal. In R, we could extract diagonal of matrices using a function called diag().

     [,1] [,2] [,3]

[1,]    1    4    7

[2,]    2    5    8

[3,]    3    6    9

So for above matrix when you do diag(mat) it returns 1, 5 , 9. So I can use rollapply on this diagonal elements as well.

My plan was to get 3 max values :

1) a – max for row-wise products

2) b – max for column-wise products

3) c- max for diagonal-wise products

and then max of these 3 values would be the answer.

However, it turns out the answer which I got wasn’t the right answer. I cross checked the commands which I was running on a smaller matrix and it did give the right answer.

Upon closely looking at the question again, I realised the example which was given with values 26, 63, 78 and 14 aren’t the elements of primary diagonals. So we had to check the product for every diagonal in the matrix. So for the same matrix as above

     [,1] [,2] [,3]

[1,]    1    4    7

[2,]    2    5    8

[3,]    3    6    9

Along with 1, 5 and 9 we also need to check the product for 2, 6 and 4, 8. Getting the primary diagonal was easy with diag in R but how can I get all the diagonals? While trying to search for it I came across an excellent answer on Stackoverflow which returns a list of all the diagonals.

j = row(temp) – col(temp)

This assigned same number to all the diagonal element which we can then pass to our split function which splits the matrix into diagonals.

So with this I get a list of all diagonals elements, we remove those lists elements with length less than 4 (as we need four consecutive elements) and then use rollapply on each list and get the max value from that list which would be our c now.  Surprisingly, even the answer which I got from this wasn’t the right answer. This was strange as I had checked my output on a smaller matrix and it worked.  I had covered all the cases (at least I thought so, but definitely I was missing something). There was no calculation error as well.

After lots and lots of thinking about which case I was missing. I finally found out.

     [,1] [,2] [,3]
[1,]    1    4    7

[2,]    2    5    8

[3,]    3    6    9

This was my matrix. For window = 2

a = rowwise maximum product (1*4, 4*7, 5*8, 6*9 etc…)

b = columnwise maximum product (1*2, 2*3, 4*5, 8*9 etc…)

c = diagonal wise maximum product (1 * 5, 5*9, 2*6, 4*8 )

and the case which i was missing was

d = anti-diagonal (or whatever it is called) wise product (7*5, 5*3, 4*2, 8*6)

Sigh!! That was tricky and interesting. The tricky part was only to figure out this anti diagonal thing. Once I fugured that out there was nothing challenging as such.

Those anti diagonal patterns could be formed easily by using

j = row(temp) + col(temp)

and then split the matrix accordingly. Got an output and tada…that was the correct answer!

Here’s the code

library(zoo)
getMaxProduct = function(mat, windowSize) {
#check for column
a = max(rollapply(mat, windowSize, prod))
#check for row
b = max(rollapply(t(mat), windowSize, prod))
#check for diagonal
# create an indicator for all diagonals in the matrix
d = row(mat) - col(mat)
k = row(mat) + col(mat)
# use split to group on these values
diags1 = split(mat, d)
diags2 = split(mat, k)
prodList1 = unlist(lapply(diags1[lapply(diags1, length) > (windowSize - 1)], function(x) rollapply(x, windowSize, prod)), use.names = FALSE)
prodList2 = unlist(lapply(diags2[lapply(diags2, length) > (windowSize - 1)], function(x) rollapply(x, windowSize, prod)), use.names = FALSE)
c = max(prodList1)
d = max(prodList2)
max(a, b, c, d)
}

getMaxProduct(mat, 4)
[1] 70600674

Euler Problem 10 -Summation of primes

Summation of primes

Problem 10

The sum of the primes below 10 is 2 + 3 + 5 + 7 = 17
Find the sum of all the primes below two million.

In this problem we are trying to find sum of all the prime numbers below a certain number. As in given example, the prime numbers below 10 are 2, 3, 5, and 7 and when we sum it the answer comes out ot be 17. The logic which I tried to apply was to generate an odd-valued sequence from 3 to that number and check if the number is prime of not and finally, sum those numbers which are prime.

To find out if a number is prime or not, I am using the is_prime function which is wrote in for example 3 previously. The is_prime function returns TRUE or FALSE, if a particular number is prime or not. We generate a sequence from 3 to the number and increment it by 2. Then using sapply we call is_prime function on every number. This would return a vector of logical values based on if a number is prime of not. We subset only those numbers which return TRUE and sum oer it. Finally, we add 2 in the final answer as 2 is the only even prime number and we haven’t considered it in the sequence.

source("Check_if_Prime.R")
getSumOfPrimes = function(x) {
additionSum = 0
numSeq = seq(3, x,by = 2)
additionSum = sum(numSeq[sapply(numSeq, is_prime)])
return(additionSum + 2)
}

Now, to test,

getSumOfPrimes(10)
#[1] 17
getSumOfPrimes(2000000)
#[1] 142913828922