Interesting Data Compression Algorithm: Run Length

I stumbled upon something called the Run-length algorithm.  Basically, it compresses a file based on the frequency of each unique character.  It stored one segment of data for each character: { char:freq }.  For example, a file might contain: "wwwrrraaa" and the compressed version would be (after the Run-length algorithm) "w3r3a3."  It's fairly efficient, with a computational complexity of $$\ O(n) \$$.  I found it on someone's blog.  He implemented the algorithm in Python.  Check it out:

#!/usr/bin/env python  
# http://acm.zhihua-lai.com

def runlen(s):  
    r = ""
    l = len(s)
    if l  0:
        return ""
    if l  1:
        return s + "1"
    last = s[0]
    cnt = 1
    i = 1
    while i < l:
        if s[i]  s[i - 1]: # check it is the same letter
            cnt += 1
        else:
            r = r + s[i - 1] + str(cnt) # if not, store the previous data
            cnt = 1
        i += 1
    r = r + s[i - 1] + str(cnt)
    return r

if __name__  "__main__":  
    print runlen("aaabbccccddddd")
    print runlen("a")
    print runlen("")
    print runlen("abcdefg")
    print runlen("eeeeeaaaff")

According to the Wikipedia article on this particular algorithm, this is a lossless data compression algorithm.  What that means is that when the file is reduced, you can easily transform the compressed file back to its original form.

Not much to it.  Just wanted to point this out.  I never looked into data compression.

Greg

Software Engineer

Subscribe to GregBlogs