I’m currently taking a class on algorithms. While practising a bit, I found a cool trick: Most algorithms are easier to read when written in Python instead of pseudocode.

Why? Pseudocode doesn’t follow exact rules. Python’s syntax is very similar to pseudocode, but strictly regulated. Also, most style guides and best practices focus on readability. Python code is predictable; you get exactly what you expect. That’s why Python is perfect for studying algorithms.

You don’t believe me? Take a look at the Euclidean algorithm:

function gcd(a, b) while b ≠ 0 t := b; b := a mod b; a := t; return a;

Here’s the same algorithm in Python:

def gcd(a, b): while b: a, b = b, a % b return a

The second one requires less thinking, doesn’t it?

Of course, this trick also works on complicated, less famous algorithms. I wrote down a quick step-by-step guide for converting pseudocode into (more or less) “pythonic” Python:

- Translate pseudocode into Python
- Always
**count from 0**, not 1 - Replace expressions with pythonic alternatives:
**Swap**elements directly without helper variables- Choose fitting
**data structures** - Remove redundant
**assignments**caused by implementation details - When applicable (and more readable), write
`if x`

instead of**checking for 0, None or emptiness**. - Make use of pythonic ways to
**iterate**

**Rename**variables and methods.- Simplify
**conditions**

## An Example

Here’s an example to give you a better idea of what I’m doing.

I found an interesting algorithm on Wikipedia:

function LCSubstr(S[1..m], T[1..n]) L := array(1..m, 1..n) z := 0 ret := {} for i := 1..m for j := 1..n if S[i] == T[j] if i == 1 or j == 1 L[i,j] := 1 else L[i,j] := L[i-1,j-1] + 1 if L[i,j] > z z := L[i,j] ret := {S[i-z+1..i]} else if L[i,j] == z ret := ret ∪ {S[i-z+1..i]} else L[i,j] := 0 return ret

How does this algorithm work? What does it do? I have no idea.

Let’s apply my guidelines to it. First, we need to implement the algorithm in Python:

def LCSubstr(S, T): m = len(S) n = len(T) S = " " + S # relevant string needs to start at index '1' T = " " + T # relevant string needs to start at index '1' L = [[0 for x in range(n + 1)] for y in range(m + 1)] z = 0 ret = [] for i in range(1, m + 1): for j in range(1, n + 1): if S[i] == T[j]: if i == 1 or j == 1: L[i][j] = 1 else: L[i][j] = L[i - 1][j - 1] + 1 if L[i][j] > z: z = L[i][j] ret = [S[i - z + 1:i + 1]] else: if L[i][j] == z: ret = ret + [S[i - z + 1:i + 1]] else: L[i][j] = 0 return ret

Next, we need to take a closer look at the code. It’s weird to start counting with 1, isn’t it? Better fix that:

def LCSubstr(S, T): m = len(S) n = len(T) L = [[0 for x in range(n)] for y in range(m)] z = 0 ret = [] for i in range(m): for j in range(n): if S[i] == T[j]: if i == 0 or j == 0: L[i][j] = 1 else: L[i][j] = L[i - 1][j - 1] + 1 if L[i][j] > z: z = L[i][j] ret = [S[i - z + 1:i + 1]] else: if L[i][j] == z: ret = ret + [S[i - z + 1:i + 1]] else: L[i][j] = 0 return ret

That was surprisingly easy. What’s next? Replacing expressions with more pythonic ones:

def LCSubstr(S, T): L = [[0 for x in T] for y in S] z = 0 ret = [] for i, char1 in enumerate(S): for j, char2 in enumerate(T): if char1 == char2: if i == 0 or j == 0: # more readable than 'not i or not j' L[i][j] = 1 else: L[i][j] = L[i - 1][j - 1] + 1 if L[i][j] > z: z = L[i][j] ret = [S[i - z + 1:i + 1]] else: if L[i][j] == z: ret.append(S[i - z + 1:i + 1]) return ret

As you might have noticed, I didn’t replace the 0-checks in line 8. I decided to keep them this way because 0 is an actual, valid value for `i`

or `j`

, not just a termination condition.

Let’s take a look at the variables. Apparently, `ret`

is a list of found substrings and `z`

is the length of the found longest substring. `L`

stores information about all substring lengths. `S`

and `T`

are strings.

Let’s give them better names:

def find_lc_substrings(string1, string2): lengths = [[0 for _ in string2] for _ in string1] max_length = 0 substrings = [] for i, char1 in enumerate(string1): for j, char2 in enumerate(string2): if char1 == char2: if i == 0 or j == 0: length = 1 lengths[i][j] = length else: length = lengths[i - 1][j - 1] + 1 lengths[i][j] = length if length > max_length: max_length = length substrings = [string1[i - max_length + 1:i + 1]] else: if length == max_length: substrings.append(string1[i - max_length + 1:i + 1]) return substrings

We are almost done. There’s just one more step: The `if...else`

branches can be simplified. Obviously, this step depends on personal taste. I like to reduce nesting depth, so I went with this:

def find_lc_substrings(string1, string2): lengths = [[0 for _ in string2] for _ in string1] max_length = 0 substrings = [] for i, char1 in enumerate(string1): for j, char2 in enumerate(string2): if char1 != char2: continue length = 1 if i != 0 and j != 0: length += lengths[i - 1][j - 1] lengths[i][j] = length if length > max_length: max_length = length substrings = [string1[i - max_length + 1:i + 1]] elif length == max_length: substrings.append(string1[i - max_length + 1:i + 1]) return substrings

Much better. What do you think?

#### Pseudocode

function LCSubstr(S[1..m], T[1..n]) L := array(1..m, 1..n) z := 0 ret := {} for i := 1..m for j := 1..n if S[i] == T[j] if i == 1 or j == 1 L[i,j] := 1 else L[i,j] := L[i-1,j-1] + 1 if L[i,j] > z z := L[i,j] ret := {S[i-z+1..i]} else if L[i,j] == z ret := ret ∪ {S[i-z+1..i]} else L[i,j] := 0 return ret

#### Python

def find_lc_substrings(string1, string2): lengths = [[0 for _ in string2] for _ in string1] max_length = 0 substrings = [] for i, char1 in enumerate(string1): for j, char2 in enumerate(string2): if char1 != char2: continue length = 1 if i != 0 and j != 0: length += lengths[i - 1][j - 1] lengths[i][j] = length if length > max_length: max_length = length substrings = [string1[i - max_length + 1:i + 1]] elif length == max_length: substrings.append(string1[i - max_length + 1:i + 1]) return substrings