Overview

In this lesson we work through an example to illustrate some basic programming strategies.

Objectives

  1. Write code methodically by starting in English and adding lots of explanatory comments.
  2. Check each step of a program before writing the next.
  3. Test your finished program with various inputs to make sure it is operating correctly.

Readings

None

1 Example: The Collatz Conjecture

Let’s say we want to write a program to check a famously simple mathematical conjecture. The Collatz Conjecture is very easy to state, but surprisingly has resisted nearly a century of efforts to prove that it is true. Consider the following algorithm: start with any number \(n\) and if it’s even, halve it, and if it’s odd, triple it and add 1; and repeat that process over and over. The conjecture is that, whatever number you start with, you will eventually end up at 1.

Now, it may have turned out that this is for an trivial reason, in the same way that the “divisibility by 9” rule turns out to be trivial (where a number is divisible by 9 if the sum of its digits is divisible by 9). But in this case, it’s not at all trivial why you end up at 1, and in fact mathematicians have been puzzled over how to prove it for decades.

But of couse, we can computationally test it easy for any given number. So let’s write a program that tests it for some number \(n\), saving the somewhat random-seeming sequence of numbers you get starting from \(n\), and stops when it gets to 1.

2 Tips for coding

Here are a few tips for coding:

  1. Write it out in English first as best you can.
  2. Use lots of comments so it will be readable by you and others.
  3. Indent contents within curly brackets; RStudio does this automatically.
  4. Always include the close parenthesis or bracket before filling in the contents so you don’t forget; RStudio helps with this too.
  5. Check that each step of your program works before going to the next. Sometimes this requires writing temporary code (eg, printing out each step) that you erase later, just to make sure eveything is working.
  6. It’s often easiest to do the easy stuff first and the hard stuff later, even if the hard stuff comes earlier in your program; you can often leave out the hard stuff or use a place-holder while you work on the easier things.
  7. After you are done and it seems to work, try to stress it a bit by giving it odd inputs just to test it out. But for casual scripts, you don’t have make it robust to other users who might deliberately try to break it (eg, by feeding a numeric function a string or something else that’s problematic).

3 Start in English, then translate to R

So how do we do this? The best way to start is in English, with numbered steps. We’ll write out in comments in R what we want our program to do, in as much detail as we can manage, and then later we’ll translate each bit of English into R code, as best we can.

So we want a program that takes some \(n\), does the Collatz algorithm on it repeatedly, saves each number along the way, and stops when it gets to 1, and returns the sequence of numbers.

# 1. Take some n as an input
# 2a. If it is even, halve it
# 2b. If it is odd, triple it and add 1
# 3. Repeat 2 over and over, saving each number along the way
# 4. If the current number is 1, stop
# 5. If we've done this, say, 10,000 times and never got to 1, stop and say we gave up
  1. To do part 1, we see it’s a function, and so we wrap everything inside a function, putting the close-bracket at the end. We’ll preserve all the comments and keep them above each part of the code as best we can. Note how we indent the comments, as we will for all the content inside a curly bracket.
# 1. Take some n as an input
collatzf <- function(n){

  # 2a. If it is even, halve it
  # 2b. If it is odd, triple it and add 1
  # 3. Repeat 2 over and over, saving each number along the way
  # 4. If the current number is 1, stop

}
  1. For this, it’s clearly an if statement that we need. We know how to measure even and odd using the %% modulo function, where x %% y gives the remainder from dividing \(x\) by \(y\). So depending on whether it is even or odd, we create a new n that is the transformation of the original n.
# 1. Take some n as an input
collatzf <- function(n){
  
  # 2a. If it is even, halve it
  # 2b. If it is odd, triple it and add 1
  if(n %% 2 == 0){
    n <- n/2
  }else{
    n <- 3*n + 1
  }
  
  # 3. Repeat 2 over and over, saving each number along the way
  # 4. If the current number is 1, stop

}
  1. Ok, now we need some loop that keeps repeating the process in 2, and also saves each step along the way. This actually goes around the step in 2, so we’ll have to shuffle things a bit. First let’s create a vector to store our numbers, and update it with each new number. Note how we create a vector with just one element to begin, and then build it up by using c() to paste a number number onto the end of the vector.
# 1. Take some n as an input
collatzf <- function(n){
  collatzv <- c(n)
  
  # 2a. If it is even, halve it
  # 2b. If it is odd, triple it and add 1
  if(n %% 2 == 0){
    n <- n/2
  }else{
    n <- 3*n + 1
  }
  collatzv <- c(collatzv,n)
  
  # 3. Repeat 2 over and over, saving each number along the way
  # 4. If the current number is 1, stop
  
}

Now we need to add the loop that repeats 2 until the current. We use while and indent again because of the curly brackets.

# 1. Take some n as an input
collatzv <- c()
collatzf <- function(n){
  collatzv <- c(n)
  
  # 3. Repeat 2 over and over, saving each number along the way
  # 4. If the current number is 1, stop
  while(n != 1){
    # 2a. If it is even, halve it
    # 2b. If it is odd, triple it and add 1
    if(n %% 2 == 0){
      n <- n/2
    }else{
      n <- 3*n + 1
    }
    collatzv <- c(collatzv,n)
  }
  
}

And finally, we want to return our vector so we can inspect it or use it later.

# 1. Take some n as an input
collatzv <- c()
collatzf <- function(n){
  collatzv <- c(n)
  
  # 3. Repeat 2 over and over, saving each number along the way
  # 4. If the current number is 1, stop
  while(n != 1){
    # 2a. If it is even, halve it
    # 2b. If it is odd, triple it and add 1
    if(n %% 2 == 0){
      n <- n/2
    }else{
      n <- 3*n + 1
    }
    collatzv <- c(collatzv,n)
  }
  
  return(collatzv)
}

4 Test it

So now let’s test it, with a few options, one of which we won’t show here.

collatzf(10)
## [1] 10  5 16  8  4  2  1
collatzf(100)
##  [1] 100  50  25  76  38  19  58  29  88  44  22  11  34  17  52  26  13  40  20
## [20]  10   5  16   8   4   2   1

And here’s a couple we won’t run. Try to guess what would go wrong with each:

# Not a good idea
# collatzf(9.7)
# Very bad idea
# collatzf(-1)
# Bad but at least just produces an error
# collatzf("fish") 

If you want to make the program more robust, you can put a step at the beginning that checks to make sure n is a positive integer and breaks the loop otherwise, before it gets to the while, which runs indefinitely given an negative input. (Interestingly, the input of 9.7 ought to never finish either, but if you run it, what you will see is that it does actually converge to 1: the number gets bigger and bigger until it is so big it is rounded to an integer in R’s memory, at which point the algorithm works to bring it down to 1 in about 444 total steps.)

5 Extension

Say instead we had wanted the Collatz sequence for every number between 1 and 100. Then we would start in much the same way, building our function up, and only then embed the function in a loop which adds each sequence to a growing list of sequences. So again, you would work from the inside out, doing the easier part first (the function) before adding the loop.

collatzlist <- list()
for(i in 2:100){
  collatzlist[[i]] <- collatzf(i)
}

If you are curious, different numbers seem to take longer to converge than others. Here is how long they take to converge:

stepsto1 <- sapply(collatzlist,length)

And, getting ahead of ourselves, here is a plot of how long they take:

plot(1:length(stepsto1),stepsto1)

Why do some take a long time and others not very long? I have no idea! Maybe the Wikipedia article has some insights…