Understanding Algorithms

I’m currently taking a class on algorithms. While practising a bit, I found a cool trick: Most algorithms are easier to read when written in Python instead of pseudocode.

Why? Pseudocode doesn’t follow exact rules. Python’s syntax is very similar to pseudocode, but strictly regulated. Also, most style guides and best practices focus on readability. Python code is predictable; you get exactly what you expect. That’s why Python is perfect for studying algorithms.

You don’t believe me? Take a look at the Euclidean algorithm:

function gcd(a, b)
    while b ≠ 0
       t := b;
       b := a mod b;
       a := t;
    return a;

Here’s the same algorithm in Python:

def gcd(a, b):
    while b:
        a, b = b, a % b
    return a

The second one requires less thinking, doesn’t it?

Of course, this trick also works on complicated, less famous algorithms. I wrote down a quick step-by-step guide for converting pseudocode into (more or less) “pythonic” Python:

  • Translate pseudocode into Python
  • Always count from 0, not 1
  • Replace expressions with pythonic alternatives:
    • Swap elements directly without helper variables
    • Choose fitting data structures
    • Remove redundant assignments caused by implementation details
    • When applicable (and more readable), write if x instead of checking for 0, None or emptiness.
    • Make use of pythonic ways to iterate
  • Rename variables and methods.
  • Simplify conditions

An Example

Here’s an example to give you a better idea of what I’m doing.
I found an interesting algorithm on Wikipedia:

function LCSubstr(S[1..m], T[1..n])
    L := array(1..m, 1..n)
    z := 0
    ret := {}
    for i := 1..m
        for j := 1..n
            if S[i] == T[j]
                if i == 1 or j == 1
                    L[i,j] := 1
                else
                    L[i,j] := L[i-1,j-1] + 1
                if L[i,j] > z
                    z := L[i,j]
                    ret := {S[i-z+1..i]}
                else
                if L[i,j] == z
                    ret := ret ∪ {S[i-z+1..i]}
            else
                L[i,j] := 0
    return ret

How does this algorithm work? What does it do? I have no idea.
Let’s apply my guidelines to it. First, we need to implement the algorithm in Python:

def LCSubstr(S, T):
    m = len(S)
    n = len(T)
    S = " " + S  # relevant string needs to start at index '1'
    T = " " + T  # relevant string needs to start at index '1'
    L = [[0 for x in range(n + 1)] for y in range(m + 1)]
    z = 0
    ret = []
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if S[i] == T[j]:
                if i == 1 or j == 1:
                    L[i][j] = 1
                else:
                    L[i][j] = L[i - 1][j - 1] + 1
                if L[i][j] > z:
                    z = L[i][j]
                    ret = [S[i - z + 1:i + 1]]
                else:
                    if L[i][j] == z:
                        ret = ret + [S[i - z + 1:i + 1]]
            else:
                L[i][j] = 0
    return ret

Next, we need to take a closer look at the code. It’s weird to start counting with 1, isn’t it? Better fix that:

def LCSubstr(S, T):
    m = len(S)
    n = len(T)
    L = [[0 for x in range(n)] for y in range(m)]
    z = 0
    ret = []
    for i in range(m):
        for j in range(n):
            if S[i] == T[j]:
                if i == 0 or j == 0:
                    L[i][j] = 1
                else:
                    L[i][j] = L[i - 1][j - 1] + 1
                if L[i][j] > z:
                    z = L[i][j]
                    ret = [S[i - z + 1:i + 1]]
                else:
                    if L[i][j] == z:
                        ret = ret + [S[i - z + 1:i + 1]]
            else:
                L[i][j] = 0
    return ret

That was surprisingly easy. What’s next? Replacing expressions with more pythonic ones:

def LCSubstr(S, T):
    L = [[0 for x in T] for y in S]
    z = 0
    ret = []
    for i, char1 in enumerate(S):
        for j, char2 in enumerate(T):
            if char1 == char2:
                if i == 0 or j == 0:  # more readable than 'not i or not j'
                    L[i][j] = 1
                else:
                    L[i][j] = L[i - 1][j - 1] + 1
                if L[i][j] > z:
                    z = L[i][j]
                    ret = [S[i - z + 1:i + 1]]
                else:
                    if L[i][j] == z:
                        ret.append(S[i - z + 1:i + 1])
    return ret

As you might have noticed, I didn’t replace the 0-checks in line 8. I decided to keep them this way because 0 is an actual, valid value for i or j, not just a termination condition.

Let’s take a look at the variables. Apparently, ret is a list of found substrings and z is the length of the found longest substring. L stores information about all substring lengths. S and T are strings.
Let’s give them better names:

def find_lc_substrings(string1, string2):
    lengths = [[0 for _ in string2] for _ in string1]
    max_length = 0
    substrings = []
    for i, char1 in enumerate(string1):
        for j, char2 in enumerate(string2):
            if char1 == char2:
                if i == 0 or j == 0:
                    length = 1
                    lengths[i][j] = length
                else:
                    length = lengths[i - 1][j - 1] + 1
                    lengths[i][j] = length
                if length > max_length:
                    max_length = length
                    substrings = [string1[i - max_length + 1:i + 1]]
                else:
                    if length == max_length:
                        substrings.append(string1[i - max_length + 1:i + 1])
    return substrings

We are almost done. There’s just one more step: The if...else branches can be simplified. Obviously, this step depends on personal taste. I like to reduce nesting depth, so I went with this:

def find_lc_substrings(string1, string2):
    lengths = [[0 for _ in string2] for _ in string1]
    max_length = 0
    substrings = []
    for i, char1 in enumerate(string1):
        for j, char2 in enumerate(string2):
            if char1 != char2:
                continue

            length = 1
            if i != 0 and j != 0:
                length += lengths[i - 1][j - 1]
            lengths[i][j] = length

            if length > max_length:
                max_length = length
                substrings = [string1[i - max_length + 1:i + 1]]
            elif length == max_length:
                substrings.append(string1[i - max_length + 1:i + 1])
    return substrings

Much better. What do you think?


Pseudocode

function LCSubstr(S[1..m], T[1..n])
    L := array(1..m, 1..n)
    z := 0
    ret := {}
    for i := 1..m
        for j := 1..n
            if S[i] == T[j]
                if i == 1 or j == 1
                    L[i,j] := 1
                else
                    L[i,j] := L[i-1,j-1] + 1
                if L[i,j] > z
                    z := L[i,j]
                    ret := {S[i-z+1..i]}
                else
                if L[i,j] == z
                    ret := ret ∪ {S[i-z+1..i]}
            else
                L[i,j] := 0
    return ret

Python

def find_lc_substrings(string1, string2):
    lengths = [[0 for _ in string2] for _ in string1]
    max_length = 0
    substrings = []

    for i, char1 in enumerate(string1):
        for j, char2 in enumerate(string2):
            if char1 != char2:
                continue

            length = 1
            if i != 0 and j != 0:
                length += lengths[i - 1][j - 1]
            lengths[i][j] = length

            if length > max_length:
                max_length = length
                substrings = [string1[i - max_length + 1:i + 1]]
            elif length == max_length:
                substrings.append(string1[i - max_length + 1:i + 1])

    return substrings
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s