Overview
In this lesson we work through an example to illustrate some basic programming strategies.
Objectives
Readings
None
Let’s say we want to write a program to check a famously simple mathematical conjecture. The Collatz Conjecture is very easy to state, but surprisingly has resisted nearly a century of efforts to prove that it is true. Consider the following algorithm: start with any number \(n\) and if it’s even, halve it, and if it’s odd, triple it and add 1; and repeat that process over and over. The conjecture is that, whatever number you start with, you will eventually end up at 1.
Now, it may have turned out that this is for an trivial reason, in the same way that the “divisibility by 9” rule turns out to be trivial (where a number is divisible by 9 if the sum of its digits is divisible by 9). But in this case, it’s not at all trivial why you end up at 1, and in fact mathematicians have been puzzled over how to prove it for decades.
But of couse, we can computationally test it easy for any given number. So let’s write a program that tests it for some number \(n\), saving the somewhat random-seeming sequence of numbers you get starting from \(n\), and stops when it gets to 1.
Here are a few tips for coding:
So how do we do this? The best way to start is in English, with numbered steps. We’ll write out in comments in R what we want our program to do, in as much detail as we can manage, and then later we’ll translate each bit of English into R code, as best we can.
So we want a program that takes some \(n\), does the Collatz algorithm on it repeatedly, saves each number along the way, and stops when it gets to 1, and returns the sequence of numbers.
# 1. Take some n as an input
# 2a. If it is even, halve it
# 2b. If it is odd, triple it and add 1
# 3. Repeat 2 over and over, saving each number along the way
# 4. If the current number is 1, stop
# 5. If we've done this, say, 10,000 times and never got to 1, stop and say we gave up
# 1. Take some n as an input
collatzf <- function(n){
# 2a. If it is even, halve it
# 2b. If it is odd, triple it and add 1
# 3. Repeat 2 over and over, saving each number along the way
# 4. If the current number is 1, stop
}
if
statement that we need. We know how to measure even and odd using the %%
modulo function, where x %% y
gives the remainder from dividing \(x\) by \(y\). So depending on whether it is even or odd, we create a new n
that is the transformation of the original n
.# 1. Take some n as an input
collatzf <- function(n){
# 2a. If it is even, halve it
# 2b. If it is odd, triple it and add 1
if(n %% 2 == 0){
n <- n/2
}else{
n <- 3*n + 1
}
# 3. Repeat 2 over and over, saving each number along the way
# 4. If the current number is 1, stop
}
c()
to paste a number number onto the end of the vector.# 1. Take some n as an input
collatzf <- function(n){
collatzv <- c(n)
# 2a. If it is even, halve it
# 2b. If it is odd, triple it and add 1
if(n %% 2 == 0){
n <- n/2
}else{
n <- 3*n + 1
}
collatzv <- c(collatzv,n)
# 3. Repeat 2 over and over, saving each number along the way
# 4. If the current number is 1, stop
}
Now we need to add the loop that repeats 2 until the current. We use while
and indent again because of the curly brackets.
# 1. Take some n as an input
collatzv <- c()
collatzf <- function(n){
collatzv <- c(n)
# 3. Repeat 2 over and over, saving each number along the way
# 4. If the current number is 1, stop
while(n != 1){
# 2a. If it is even, halve it
# 2b. If it is odd, triple it and add 1
if(n %% 2 == 0){
n <- n/2
}else{
n <- 3*n + 1
}
collatzv <- c(collatzv,n)
}
}
And finally, we want to return our vector so we can inspect it or use it later.
# 1. Take some n as an input
collatzv <- c()
collatzf <- function(n){
collatzv <- c(n)
# 3. Repeat 2 over and over, saving each number along the way
# 4. If the current number is 1, stop
while(n != 1){
# 2a. If it is even, halve it
# 2b. If it is odd, triple it and add 1
if(n %% 2 == 0){
n <- n/2
}else{
n <- 3*n + 1
}
collatzv <- c(collatzv,n)
}
return(collatzv)
}
So now let’s test it, with a few options, one of which we won’t show here.
collatzf(10)
## [1] 10 5 16 8 4 2 1
collatzf(100)
## [1] 100 50 25 76 38 19 58 29 88 44 22 11 34 17 52 26 13 40 20
## [20] 10 5 16 8 4 2 1
And here’s a couple we won’t run. Try to guess what would go wrong with each:
# Not a good idea
# collatzf(9.7)
# Very bad idea
# collatzf(-1)
# Bad but at least just produces an error
# collatzf("fish")
If you want to make the program more robust, you can put a step at the beginning that checks to make sure n
is a positive integer and breaks the loop otherwise, before it gets to the while
, which runs indefinitely given an negative input. (Interestingly, the input of 9.7 ought to never finish either, but if you run it, what you will see is that it does actually converge to 1: the number gets bigger and bigger until it is so big it is rounded to an integer in R’s memory, at which point the algorithm works to bring it down to 1 in about 444 total steps.)
Say instead we had wanted the Collatz sequence for every number between 1 and 100. Then we would start in much the same way, building our function up, and only then embed the function in a loop which adds each sequence to a growing list of sequences. So again, you would work from the inside out, doing the easier part first (the function) before adding the loop.
collatzlist <- list()
for(i in 2:100){
collatzlist[[i]] <- collatzf(i)
}
If you are curious, different numbers seem to take longer to converge than others. Here is how long they take to converge:
stepsto1 <- sapply(collatzlist,length)
And, getting ahead of ourselves, here is a plot of how long they take:
plot(1:length(stepsto1),stepsto1)
Why do some take a long time and others not very long? I have no idea! Maybe the Wikipedia article has some insights…