# Complexity

We need a new tool: we need a formal way of comparing the efficiency of different pieces of code.

To say that another way, let's say we come up with two different ways to solve a problem. Which one is better? How do we know? Better for which methods?

We call this new tool ** Complexity**. Another word for complexity might be

**: how difficult is a particular solution. Note that we're entirely uninterested in how difficult it was for us humans, we only care about our computer. How difficult is it for the computer to run our code?**

*difficulty*## Add Five to a Number

Let's say we have to write a method that adds five to its input and returns the output. We come up with three solutions to this problem: `add_five_a`

, `add_five_b`

, and `add_five_c`

. Take a moment to read all of them.

```
# This option is about as short as we can get it.
def add_five_a(x):
return x + 5
# What if we introduce a variable for output?
def add_five_b(x):
output = x + 5
return output
# What if we take as many steps as possible?
def add_five_c(x):
output = x
output += 1
output += 1
output += 1
output += 1
output += 1
return output
```

Which solution do you prefer? Why?*Think:*

### A Strategy: Count Lines of Code

One thing you might have thought of is that `add_five_a`

is only 1 line of code: a return statement, whereas `add_five_b`

takes two lines of code and `add_five_c`

is 7 lines of code.

As a counter-argument to this strategy, consider the following re-writing of `addFiveA`

and `addFiveB`

.

```
def add_five_a2(x):
# did you know you can have semicolons to add blank statements?
;
;
;
;
return x + 5
# ..or to squeeze them all onto one line?
def add_five_c2(int x):
output = x; output += 1; output += 1; output += 1; output += 1; output += 1; return output
```

I just cheated your measure (lines of code)! Now `add_five_a2`

is five lines of code and `add_five_c2`

is only one. The computer has to do the same amount of work as the original `add_five_a`

but now we have a comment, a blank line, and a `return`

statement that's all spread out, and it uses five times as many lines as `add_five_c2`

.

Lines of code is not a robust measure of coding complexity. If you ever have a manager who requires you to write a certain number of lines of code each day; now you can explain why that's not a fair measure!

### Strategy: Count Statements

"Eureka!", you say. "Your evil tricks have led me to the solution. What if we count statements instead of lines. Better than that: what if we count non-empty statements."

This turns out to be a pretty good strategy. Now we can compare these methods together. You diligently create the following table.

Method Name | Useful Semicolons |
---|---|

`add_five_a` |
1 |

`add_five_b` |
2 |

`add_five_c` |
7 |

Clearly, `add_five_a`

is the best of these methods, because it has the fewest useful statements. (If you've absorbed how to convert code to assembly, you could imagine counting ** instructions** as a good measure of complexity!)

## Count the sum of the numbers from 1 to N

Suppose you want to count all the numbers from 1 to N, including N. You quickly code up the following ** iterative** solution:

```
def sum_sequence_1(n):
out = 0
for i in range(1,n):
out += i
return out
```

- Think: New Problem: How do we count the statements in a for-loop?

This is a classical problem, as it turns out, for which there is an easier algorithm. This is asking for a Triangular Number, and so given any value `n`

we can compute this by directly calcuating it, rather than summing the intermediate numbers.

```
# Formula for triangular numbers: https://en.wikipedia.org/wiki/Triangular_number
def sum_sequence_2(n):
return (n * (n - 1)) / 2
# Again, with steps broken out.
def sum_sequence_3(n):
int result = n * (n - 1)
return result / 2
```

Writing some quick code lets us check that the behavior indeed appears to be the same:

```
# 1 + 2 + 3 == 6
print("S1(3) = ", sum_sequence_1(3));
print("S2(3) = ", sum_sequence_2(3));
# 1 + 2 + .. + 10 == 55
print("S1(10) = ", sum_sequence_1(10));
print("S2(10) = ", sum_sequence_2(10));
```

This causes some problems for our notion of complexity. When we simulate the for-loop from `sum_sequence_1`

in our head, we find out that it depends on `n`

, the size of the input.

```
# The for-loop from sum_sequence_1, but spread-out with labels (as a while-loop)
i = 1 # A: the initializer
while i <= n: # B: the loop check
out += i # C: the loop body
i += 1 # D: the step-statement
```

We then track each of the loop pieces independently:

- The initializer: A, runs only once.

- The loop-check: B, runs \(n+1\) times. It runs before the loop, and then at every step through the loop.

- The body of the loop: C, runs \(n\) times.

- The step-statment: D, runs \(n-1\) times. It only runs when the loop-check returns true after the first time through the loop body.

Woah. That's a lot of work. There are two other statements in the original method, so that brings us to: \( (n+1) + (n-1) + n + 2 \) or \( 3n + 2 \)

Method Name | Useful Statements |
---|---|

`sum_sequence_1` |
\( 3n + 2 \) |

`sum_sequence_2` |
\( 1 \) |

`sum_sequence_3` |
\( 2 \) |

In this instance, no matter what value \( n \) takes, we should always prefer `sum_sequence_2`

. But this gives us a hint for comparisons that really matter: `sum_sequence_1`

is dramatically worse than `sum_sequence_2`

(from a complexity point-of-view) because it depends on \( n \), the input itself. `sum_sequence_3`

is worse than `sum_sequence_2`

, but not significantly, when \(n\) is large.

## Big-O notation: What does significant mean?

The ** time complexity** of a method or procedure is always defined in terms of the

**, which is notated as \(n\). And it is always a function, e.g., \(f(n) = 3n + 2\) or \(f(n) = n^3 + n\).**

*size of its input*You may have noticed that actually counting the statements in a particular example is quite difficult and tedious. Little stylistic differences, (e.g., do you create a helper variable) appear to matter (`sum_sequence_2`

vs. `sum_sequence_3`

).

For this reason, we typically use so-called Big-O Notation when we discuss complexity. It allows us to remove the insignificant differences from our notation.

Consider the following examples and high-level discussion of what Big-O let's us do.

\(O(f(n)) = g(n)\) where \(g(n)\) captures the behavior of \(f(n)\) for really big \(n\).

When powers compete, the biggest power wins, and you ignore coefficients.

- \(O(n^3 + n^2 + 7) = O(n^3)\)

- \(O(n^3 + 9000n^2) = O(n^3)\)

- \(O(n^3 + 9000n^{3.14}) = O(n^{3.14})\)

It may help to remember that numbers, like \(7\) are equivalent to \(7n^0\), and so they have the lowest power, usually. You don't often see negative powers: \(O(n^{-k})\) ; that would be code that takes less time as the input gets bigger, which isn't very common (for realistic problems).

Some functions you have seen before have an ordering, bigger on the left, smaller on the right.

### What about calling methods?

We have defined complexity by counting semicolons, but what happens when you call another method?

Nothing changes: the complexity of a method hidden by a call is still part of the behavior of the method calling it -- those semicolons/statements are still used.

### What about sometimes?

Assuming random number generation is \(O(1)\), what happens if sometimes we use a good algorithm and sometimes we use a bad algorithm?

```
import random
def sum_sequence_maybe(n):
# 50% of the time, use the closed form, else use the loop.
if random.randint(0,1) == 0:
# O(1)
return (n * (n - 1)) / 2
else:
# O(n)
out = 0
for i in range(1,n):
out += i
return out
```

50% of the time, we have an \(O(1)\) solution to this problem, and 50% of the time we have an \(O(n)\) solution to this problem. Giving that information precisely is the clearest way to describe this method.

But, usually computer scientists are concerned with the worst-case scenario: we would mark this method as \(O(n)\). If you're a glass-half-empty kind of person, this makes you happy. If not, sometimes we also discuss best-case or average-case complexity.

Big-O notation can refer to worst-case, best-case, or average-case complexity. If not stated, we will be discussing worst-case complexity, but everyone has their own standards that matter. When in doubt, just ask!

## Conclusion

Theoretical Complexity is a broad topic and is still an open a research area. We have described complexity as being "like counting statements" along with Big-O notation to provide a general tool to discuss which functions or methods are most efficient, but we have done so quite informally here.