PFAB #19: Working with raw bytes

Jul 26, 2020

(You can also read this post on my blog)

On the past three editions of Programming Feedback for Advanced Beginners (#16, #17, #18) we've been optimizing Justin Reppert's ASCII art generation program. If you haven't read these editions yet then start with them. You'll still get something out of this week's discussion if you don't, but they contain a lot of good stuff that you'll need in order for this week to properly click into place.

Previously on PFAB

We've been focussing on the portion of Justin's program responsible for adding color to his ASCII art. We've been trying to optimize the algorithm that the program uses to add colors to the ASCII images that it prints to the terminal. Many terminals are only able to print a small number of colors, defined by the ANSI standard. To deal with this limitation we need to write code that maps from the true color of each pixel in our input image to the closest ANSI color.

One approach we've been taking is to pre-compute the closest ANSI color code to every possible RGB pixel color and store this data in a file. When we run our program to convert an image to ASCII art, we first load up our pre-computed mappings of pixel colors to ANSI codes from our stored file. Then we look up the ANSI code corresponding to the color of each pixel, and print our ASCII character using that code.

Last time we were working on shrinking the size of the file in which we store our pre-computed mapping so as to make it faster to load. We devised a file format in which each mapping from a pixel color to an ANSI code is represented by 8 somewhat cryptic characters of hexadecimal. In this format, a block of characters that maps a pixel color to an ANSI code looks something like this:

5CFF0152

The first 6 characters in a block represent the RGB (red, green, blue) value of the pixel color using the hexadecimal counting system. The 7th and 8th characters in each block represent the ANSI color code that the pixel color maps to.

Let's look at this block in a little more detail. Two hexadecimal digits are capable of representing any number between 0 and 255. The first 2 characters in the block are the amount of red in the pixel color, expressed as a 2-digit hex number (5C in the above example). The 3rd and 4th are the amount of green (FF), and the 5th and 6th are the amount of blue (01). The 7th and 8th are the ANSI code (52), which is also a number between 0 and 255.

Since each block is exactly 8 characters long we don't need newlines or any other separator between them. Instead, the code that reads our file is responsible for chunking up the file into blocks of 8 characters itself.

This serialization format is already 70% smaller than our original pretty JSON. But how can we make it even smaller?

Remove the pixel colors

Currently each block contains both a pixel color (6 characters) and an ANSI code (2 characters). However, we can save significant space by writing only the ANSI codes. If we do this then we'll still need a way to keep track of which pixel color each ANSI code corresponds to, but we can do this using a convention instead of explicitly including 6 characters of pixel color for every pixel/ANSI mapping.

Remember that our file isn't storing the pixel/ANSI mapping for a specific image; it's storing the mapping from a pixel color to an ANSI code for every possible pixel color. When our program wants to convert the color of a pixel from a specific image to the closest ANSI code, it looks it up in the pre-computed mapping that it loaded from our file when it started up. Changing the form of our file doesn't change the ANSI code that any pixel color maps to. Instead it expresses the same information using fewer characters.

For our convention we can decree that the first two characters in the file are the closest ANSI code to the pixel color (0,0,0). The next two characters are the closest code to the pixel color (0,0,1), then the next two are for (0,0,2), and so on. Once we reach the code for the pixel color (0,0,255) then we loop back round to (0,1,0), then (0,1,1), and so on. In more detail, we write the ANSI codes corresponding to:

(0,0,0)
(0,0,1)
(0,0,2)
...
And so on until the B value reaches 255,
at which point we increment the G value
by 1.
(0,0,255)
(0,1,0)
(0,1,1)
(0,1,2)
...
We keep cycling through all the possible
B values and incrementing the G value
each time B reaches 255. Once G reaches
255 we increment R by 1 and repeat the
whole process again.
(0,255,255)
(1,0,0)
(1,0,1)
(1,0,2)
...
And so on until we reach the ANSI code
for (255,255,255), at which point we're done.
(255,255,255)

Previously, to represent the fact that (0,0,0), (0,0,1), and (0,0,2) all map to the ANSI color code 0, we would have to write the following (headings, newlines, and spaces added for clarity):

Pixel  ANSI
000000 00
000001 00
000002 00

Now we are using a convention that says that the first 2 characters in the file are the ANSI code for (0,0,0), the next 2 are the code for (0,0,1), the next 2 are the code for (0,0,2), and so on. This means that we can write the above as the much terser:

00 00 00

When we do the calculations we'll find that 00 is the closest ANSI code to all the pixel colors from (0,0,0) to (0,0,47), but once we reach (0,0,48) the closest code becomes 11, and keeps changing as the pixel color changes.

This approach gets us all the way down to 2 characters - meaning a mere 2 bytes - per block.

Think in terms of bytes instead of characters

We can still squeeze 1 final byte out of each block. To do this, we'll stop thinking about the contents of our file in terms of strings and characters, and start thinking about it in terms of bytes.

The contents of a file is stored on your hard drive as a long sequence of 0s and 1s. Each 0 or 1 is called a bit, but for convenience we more usually talk about groups of 8 bits, which is known as a byte. Each of the 8 bits in a byte can have a value of either 0 or 1, so a byte can take any of 2**8 = 256 different values.

If a file is intended to represent text then a computer needs a way to interpret the underlying bytes on its hard drive (like 00101011) as textual characters (like A). The way it makes this translation is called a character encoding, which we touched on very briefly last time on PFAB. There are several different common types of encoding, such as ASCII and UTF-8, but we won't go into the details of how they work here.

Our strategy so far has been to express the information that we want to express in terms of characters, and then use a character encoding to save those characters to a file (by default Python 3 uses the UTF-8 encoding). When reconstructing our color map we read those characters back out of the file, and write code that interprets them as numbers.

We started out by representing numbers using the digits 0-9. We added in the letters A-F when we switched to writing our numbers using hex. This means that there are still 256-16 = 240 other values that the bytes in our files could theoretically take that we're not using. This means that we're wasting valuable opportunities to convey extra data. This is like trying to write a book with only one word on each page. We can still convey the same information, it just takes more pages.

This means that we can squeeze our serialized file even smaller. We can stop thinking of the contents of our file in terms of encoded characters like A, B, 1, and 2, which we later re-interpret as numbers. Instead we can think of it in terms of the underlying bytes.

Remember that a byte can take any value between 0-255. Conveniently (and not coincidentally), the ANSI codes that we are trying to store are themselves numbers between 0-255. This means that we don't have to stick with our current approach of storing ANSI codes by writing them out as characters that are turned into bytes using a character encoding. Instead, we can write each ANSI code to our file directly as a single byte, with a single value between 0-255, without having to worry about converting this byte to and from characters. This means that we exploit the full space of each byte, allowing us to convey more information in the same amount of disk space.

To do this in Python, we open our file in "binary mode" by passing the b flag to open:

# Calculate the ordered list of ansi codes
# as integers.
ansi_ids = # ...TODO...

# Open our output file for writing using "binary mode"
#
# The `w` flag means write mode
# The `b` means binary mode
with open('./color_map.bin', 'wb') as f:
    # `bytes` converts a list of integers to
    # a list of bytes, which we then write to
    # our file.
    f.write(bytes(ansi_ids))

When reading our file back into a color map that we use to produce an image, we read the file in binary mode too. We map each code to a pixel color in the same way as in the last hex example. This means that the first ANSI code byte corresponds to the pixel color (0,0,0), the second to (0,0,1), and so on. Here are the changes I made to our code from last time to implement this approach.

This approach squishes our file down to a single byte for each pixel color/ANSI code pair, giving us a file that is 256*256*256 = 16.8 million bytes, or 16.8 MB in size. This is about 95% smaller than our original JSON file, which weighed in at around 400MB. The past few editions of PFAB have been a lot of (arguably unnecessary) work, but at least we've got something out of it. Surely we're done now?

Not quite.

Compression algorithms

16.8MB is the smallest that our file can possibly go while still explicitly listing all the precomputed answers. However, there are still more tricks available to us. We could use a lossless compression algorithm on our file to scrunch our file up even further. Compression algorithms condense blocks of information by exploiting patterns in their contents to summarize them in a terser manner. For a compression algorithm to be lossless it must be able to perfectly reconstruct the original information from the compressed version, not just a rough approximation.

For example, consider a string that alternates the letters A and B for ten thousand characters. This data can be naively represented as the full string ABABABABABAB..., taking up a byte for each character and resulting in a file that is 10MB in size. However, the exact same information can also be condensed into and saved as a much shorter statement to the effect of "alternate the letters A and B for ten thousand characters". This is (very roughly speaking) the principle behind compression algorithms. A compression algorithm analyzes a block of information (like a file) for patterns (generally much more complex than in this toy example) and uses what it finds to summarize and un-summarize the information.

Conclusion

This edition of PFAB has been fun, but it's also been almost entirely academic. As I keep saying, Justin's program is perfectly fast already. It might be technically more efficient to condense our precomputed color map, but most computers have enormous hard drives that won't notice an extra 400MB file or two. This means that shrinking our file won't improve our program in any meaningful way. Even if we truly did need to make it smaller for some good and unavoidable reason, in real life we would likely just slap a compression algorithm on our original JSON and call it a day.

Sometimes maximum efficiency is critical. Websites need to be fast, and cheap embedded devices need to cope with tiny hard drives. There's nothing wrong with optimizing your code when you need to, and learning how to do so can teach you a lot about how computers work. Just don't feel like you have to. "I thought it would be interesting" is a great reason to optimize your hobby project. "My code works fine already but I feel like a loser when my programs are a bit slow" is not.

PFAB #18: Adventures in shrinking serialized data

Robert Heaton

Jul 13, 2020

Last time on Programming Feedback for Advanced Beginners we wrote a program that used pre-computation to speed up Justin Reppert's ASCII art program. We calculated the closest ANSI color for every possible pixel color, and stored the answers in a file. Whenever we ran our ASCII art-generating program, we first loaded our previous pre-computations back into a color map, and then used them to find the best ANSI color for each pixel color almost instantly.

When I was writing the code for that edition of PFAB, my precomputed file was over 100MB in size and took around 30 seconds to load whenever I ran the program. This wasn't a big problem, but while I was waiting I started daydreaming. If we tried, how small could we make this file? And since smaller files are quicker for a program to read, how much time would doing this save us? This (and next) time on PFAB, we're going to find out.

We'll learn a lot about serialization, hexadecimal and problem solving. None of what follows is necessarily the best way to do anything, and I'm sure I've glossed over lots of micro-optimizations. However, it should still help you think more deeply about how computers work.

If you haven't read the previous two editions of PFAB then I'd suggest starting with them (#16, #17). Then let's begin today's edition by learning a long word for a simple concept.

What is serialization?

In order to re-use our pre-computed color map, we'll need to write its contents to a file. When we want to use our color map again, we'll read the file back into our program and use the information inside it re-create our original map. In order to do this, we'll need to a set of rules for converting the data in our program into a long string that can be written to a file. We'll also need a corresponding set of rules converting that long string back into our color map data. These rules are called a serialization format.

Some common, generic serialization formats that you may or may not have heard of include JSON, YAML, Pickle, and XML. As we'll see, we can also define our own, specialized serialization formats that are only ever used by our program. A serialization format is just a pair of rules for writing and reading data - there's nothing stopping you from making up your own rules.

The process of turning data into something (specifically, a stream of bytes) that can be written to a file is called serialization, and the opposite process of turning a file back into data inside a program is called deserialization. The same de/serialization processes can also be used when sending and receiving data over a network (like the internet).

+------------+  Serialization  +-----------------+
|            +---------------->| Bytes that      |
| Data in    |                 | can be saved    |
| a program  |                 | to file/sent    |
|            +<----------------+ over network/etc|
+------------+ Deserialization +-----------------+

The simplest way to serialize our color mapping is to use a pre-existing, general-purpose serialization format like JSON. We'll soon go off in some strange directions in our quest for compactness, but JSON is where we'll start.

Pretty JSON

A JSON string looks something like this:

{
    "key": "value",
    "another_key": {
        "nested_key": [1,2,3],
        "another_nested_key": [4,5,6]
    }
}

We used JSON in our first version of this program from the previous edition, which was a very sensible and pragmatic approach. If I hadn't already decided that I wanted this week's PFAB to go into unnecessary detail about file minification then I'd say that we should use JSON and call it a day. Every language has at least one well-tested library for writing JSON to files and reading it back out into structured data. JSON is even straightforward for humans to read, which helps with debugging. But flexibility and readability come at the cost of verbosity, and for today verbosity is the enemy.

Nonetheless, the first version of my precomputation code stored its data in a file of "pretty" JSON. "Pretty" JSON means that each new element is written on a new line and is indented to make it clearer to see where blocks begin and end. My file looked something like this:

{
    "(0,0,0)": 0,
    "(0,0,0)": 1,
    // ...etc...
    "(175,135,255)": 141,
    // ...etc...
}

Let's estimate how large a pretty JSON color map will be. This will allow us to compare its size to that of the more compact approaches we'll soon look at.

How big is a pretty JSON color map?

If we look at the example JSON color map above, we can see that each pixel color/ANSI pair requires between 18 and 26 characters to store, depending on how many digits there were in each number. In a file (or indeed any string) a line break is represented by a \n character, which added an extra character per line.

Let's assume that 1 character requires 1 byte of storage in a file. The exact value actually depends on a file's character encoding - the way in which a series of bits and bytes on your hard drive are converted into meaningful strings. Common examples of character encodings are ASCII and UTF-8. Character encodings can be fiddly and frustrating, and since you don't need to know anything about them for this post we won't discuss them any further.

Let's stick with the assumption that 1 character takes up 1 byte. There are 256*256*256 = 16.7 million different pixel color/ANSI pairs that we need to store, so we would expect a pretty JSON file representing them to be around 16.7 million colors * 26 bytes/color = 434 million bytes = 434MB in size. A photo taken by your smartphone is about 1-2MB, so 434MB is quite a weight. Let's look at how we can use different serialization formats to make it smaller.

Ugly JSON is smaller than pretty JSON

We can shrink our file by condensing the JSON inside it without fundamentally changing its meaning. Our program doesn't care how pretty or ugly our JSON is as long as it's valid. This means that we can immediately save some bytes by turning off pretty-mode. This gets rid of all extraneous newlines and whitespace and prints the entire JSON object on a single line:

{"(0,0,0)":0,"(0,0,1)":0,...etc...}

We can save a few more characters by noticing that we don't strictly need the brackets in the keys (i.e. the ( and ) in the "(0,0,0)"). The reason I've kept these brackets up until now is that they allow us to use a convenient if fragile hack for quickly looking up pixel colors in Python.

In order to look up the ANSI color for a pixel color tuple (eg. (0,0,0)), we need to have a standard process for turning that tuple into a string key that we can look up in our color map. Passing a tuple into the str function (i.e. calling str((0,0,0,))) returns a string of the elements of the tuple, surrounded by brackets (i.e. the string "(0,0,0)"). Using this output as the key in our color map means that in Python we can convert a tuple to a key simply by using the str function.

As we've discussed, we can shorten these keys by removing the brackets from them. This will mean that we need to write a little more code in order to turn a pixel tuple into a key, since we can no longer rely on simply the str function. This is no bad thing, since having our program implicitly depend on the way in which Python happens to convert tuples to strings makes for fragile, hard-to-follow code. Making this change will give us an ugly JSON file that looks like this:

{"0,0,0":0,"0,0,1":0,...etc...}

Now each extra pixel pair takes up between 10 and 18 characters, a saving of around 33% over pretty JSON. This is progress. We could keep trying to squeeze our JSON further and further, but I think that now is a prudent time for us to explore creating a custom serialization format.

Custom serialization format

As we've discussed already, JSON is a very flexible way to serialize data. It can handle strings, integers, booleans, lists, and maps, all nested as deep as your use-case requires. This flexibility is why JSON is so widely used. However, with great flexibility comes extra verbosity in the form of commas, colons, and square- and curly-brackets. Our use-case is very specific and well-defined and so doesn't need any flexibility. We can therefore invent and use our own specialized serialization format, safely sacrificing some versatility in exchange for terser output. This means that we have to decide on our own form for representing a color map in a file, as well as writing our own code to convert our precomputed data to and from this form.

Inventing a custom serialization format in order to save a few bytes is rarely a good idea. It's almost always worth accepting slightly bloated output in return for flexibility, readability, and standardization. But, for better or for worse, today we've decided that the only thing we care about is pure resource efficiency. Let's therefore see how we can sacrifice some of these boons in exchange for further space savings. For our first attempt at a serialization format, let's use the following rules:

Each color pair is on a new line
The first part of the line is the pixel color, in RGB form. Each RGB element should be separated by a comma
This should be followed by a pipe (|)
Then the final part of the line is the ANSI color ID

For example:

0,0,0|0
0,0,1|0
// ...etc..
175,135,255|141
// ...etc..

Now each pixel pair takes up between 8 and 16 characters, depending on the length of the numbers (don't forget the newline \n character). The main savings over our ugly JSON have come from removing the need for speech-marks around each key (eg. "0,0,0":0 has become 0,0,0|0). This is further progress, but we can still keep going.

Removing the separators using hex and fixed-width numbers

Each line still contains 4 "information-less" characters: the 2 commas, 1 pipe, and 1 newline. These characters' only purpose is to indicate that one number has finished and the next number has started. Their relative lack of importance suggests that we might be able to get rid of them, but doing so is not straightforward. Each RGB number and ANSI ID can be either 1, 2, or 3 digits long, and so if we simply removed the commas and pipes then many lines would become ambiguous. For example, how should we parse this line?

92255182

There's no way for us to know whether this represents 92,25,51|82, or 9,225,5|182, or any one of many other ways of dividing up the digits. We could solve this problem by requiring that every number written using exactly 3 digits, adding leading padding with zeroes where necessary. This would alter numbers like so:

7 => 007
45 => 045
112 => 112

This would allow us to confidently treat every fixed-width block of 3 digits as a new number, removing the need for any separators. For example, this line:

092255001082

can be unambiguously parsed as 92,255,1|82. However, since nearly half the numbers in the range 0-255 have only 1 or 2 digits (i.e. all numbers from 0-99), we would end up having to add a lot of leading 0s for padding. We'd have removed all the commas and pipes, but replaced them with 0s instead, undoing a lot of our gains.

We can adapt this approach by continuing to use fixed-width values for our numbers, but this time using the hexadecimal counting system. Hexadecimal (often shortened to "hex") is an alternative to the standard, decimal system of representing numbers. It uses the digits 0-9 to represent the decimal numbers 0-9, but then also uses the letters A-F to represent the decimal numbers 10-15. To count to the decimal number 20 in hex, you go 1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,10,11,12,13,14. For more on hex, see this article.

Conveniently, every decimal number between 0 and 255 can be represented as a hex number with only 1 or 2 digits. The decimal number 0 is also 0 in hex, and decimal 255 is FF in hex. This means that we can write our previous, separator-less line using padded hex numbers as:

5CFF0152

We've still got rid of all the commas and pipes, but by using hex instead of decimal numbers we've drastically reduced the character length and the amount of 0-padding that we need. Since we know that every line will be exactly 8 characters long (6 for the pixel color, 2 for the ANSI ID), we don't even need the new line characters (\n) anymore. We can instead put all of our data on a single line, and confidently break it up into color pair blocks every 8 characters.

We're now down to exactly 8 bytes for every color pair, a saving of around 75% on the 18-26 required by our original pretty JSON. Of course, we can still do even better. But we've learned enough about serialization, files, and numbering systems for one week, so we'll finish off this problem next time on Programming Feedback for Advanced Beginners.

More posts from my blog:

PFAB #17: Pre-computation sounds like cheating but isn't

Robert Heaton

Jun 27, 2020

You can also read this post with better code formatting on my blog.

Last time on Programming Feedback for Advanced Beginners we started speeding up Justin Reppert's ASCII art generator. We looked at how to improve Justin's algorithm for mapping image colors to terminal colors and devised an approach that got the job done thirty times faster. If you haven't read last week's PFAB then I'd suggest you start there. It's not long and it got rave reviews.

A thirty-x speedup is nothing to sneeze at, but this week we're going to optimize Justin's program even further. What we're about to do might feel like cheating, but it's really just good practise.

Pre-computation

At the moment Justin's program goes through each pixel in a source image and computes the closest ANSI color that can be displayed by his terminal. The program performs each calculation as it goes, on-the-fly. But what if we already knew all the answers? What if we had already worked out the closest ANSI color to every possible pixel color? Then our program wouldn't need to do any computation; instead it could instantly look up the closest ANSI color to each pixel color in zero seconds flat.

"Well yes," you might say, "if you know all the answers in advance then of course your algorithm will be faster. But there's no such thing as a free lunch, and we still have to do all the work to find those answers at some point.

"In fact," you might continue, "if we use this approach then the price of our lunch actually increases. If we use the on-the-fly method method for an 80x80 image then we have to calculate the closest ANSI color to a pixel color 80 * 80 = 6,400 times. But if we calculate in advance the closest ANSI color to every possible pixel color then we have to do 256 * 256 * 256 =~ 16,000,000 calculations, almost all of which will be wasted. This is like saying that producing Lord of the Rings was easy because all the publisher had to do was press print and a best-selling book came out." If you were paying me for any of my blog posts, which you aren't, you might ostentatiously demand a refund.

However, you'd be missing some important subtleties. What we think of as a program's "speed" depends on the context. Programs are written to be used, and so arguably the most important definition of "speed" is the time that a program takes to respond to a user's request. For our program, this is the time between a user asking for a colorful ASCII art image and our program printing this image to the terminal.

Precomputing all the answers may require more total work to be performed. However, we can do all of the pre-computation in advance of the program being run. We can store the results in a file or a database, and load them when our program runs. This means that the user doesn't have to sit and wait while we run all the calculations. They can benefit from our work without having to sit through it. There's no free lunch, but like any good investment bank we can use some sleight of hand to palm the costs off onto someone else.

Is this actually a good idea?

The code for the pre-computation algorithm will be very simple: for each pixel, look up the correct answer in a pre-existing answer list. We also have to write a separate script to calculate all the answers ahead of time, but this is no harder than calculating the answer for each pixel on-the-fly, which is what we are currently doing. However, whether any of this is actually a good idea depends on the specifics of how Justin intends his program to be used and how we store our precomputed color mapping.

Let's start by discussing our storage options. Two of the simplest ways to store our precomputed mapping are in a file using a serialization format like JSON, or in a database like SQLite.

JSON:

{
    // Pixel color => closest ANSI color
    "000000": "000000",
    "000001": "000000",
    "000002": "000000",
    // etc...
}

SQLite:

| id | pixel  | ansi   |
+----+--------+--------+
|  1 | 000000 | 000000 |
|  2 | 000001 | 000000 |
|  3 | 000002 | 000000 |

As always, there are tradeoffs between these two options. I would expect the code that loads data from a JSON file to be simpler than the code that loads it from a database, because there are fewer moving parts. I would also expect it to be faster to retrieve a record from a JSON file (once it has been loaded and parsed) than from a database because retrieving data from Python dictionaries is a very speedy operation. Retrieving data from a database, especially one with 16MM rows, is typically slower than in-memory operations like retrieving data from a dictionary, although reading from a database can be sped up by using database indexes.

On the other hand, at the start of our program it will take much longer to load and parse a large JSON file than it will take to connect to a SQLite database. As part of implementing this algorithm (see below) I generated a JSON file containing all the pre-computated mappings from RGB values to the closest ANSI colors. It was 350MB in size. I think I could have used some tricks to slim it down (which we'll talk about in a future PFAB), but not by very much. My computer took around 20 seconds to load and parse the file. For a single 80x80 image, this means that even though generating an image would take a fraction of the time of an on-the-fly approach, the overall program would still take substantially more time to run from start to finish.

How do we make these tradeoffs?

This long list of tradeoffs illustrates why the way in which Justin's program is used is important. If the program is only intended to generate a single image at a time, then storing our pre-computed color map in a database will likely be the best solution. This is because it's not worth spending the time to load a large JSON file if we're only going to use it once. However, if the program is used to generate many images in a single run then we still only have to load the JSON file once. This means that accepting the large fixed cost of loading the JSON file starts to look like a good idea if it reduces our time to retrieve the color mapping for a pixel.

Let's formalize this a little. The overall running time of our program can be approximated using the following equation:

total_time = startup_time + (time_per_image * n_images)

When n_images is small, total_time is dominated by the startup_time term. This means that to minimize total_time we need to focus on minimizing startup_time. This indicates that we should read our data from a database, or even stick with calculating it on-the-fly as in the previous PFAB.

However, as n_images gets large, the time_per_image * n_images term starts to dominate. Now to minimize total_time we need to focus on minimizing time_per_image, not startup_time. In other words, the more images we process, the more sensible it becomes to accept a large startup_time if in exchange we get a smaller time_per_image. This is exactly what precomputing all the answers achieves, especially if we load our answers into memory from a JSON file instead of querying them from a database.

Using an approach with a small time_per_image also becomes more beneficial the larger each image is. This is because the more pixels we are processing, the more important it becomes to minimize the time we spend on each pixel. We can rewrite our previous equation in terms of pixels as follows:

total_time = startup_time + (time_per_pixel * pixels_per_image * n_images)

The larger pixels_per_image is, the larger the (time_per_pixel * pixels_per_image * n_images) term, and the more important it becomes to minimize time_per_pixel instead of startup_time.

Getting the best of all worlds

This analysis shows us that the fastest algorithm depends on the inputs our program receives. This means that if we really, really want to optimize our program's performance (which after all is the whole point of this post) then we should change the algorithm that we used based on the inputs we receive.

We could run our program repeatedly and record how the different algorithms perform for different inputs. We could use this data to estimate how long each approach will take for a given input set of images. For example, if we are given 1000 images that are each 640x480 pixels in size, we might choose to accept the startup performance hit of loading the precomputed color map in exchange for a minimal time-per-pixel. We have so many calculations to do that we will be able to efficiently amortize the startup cost over them all. By contrast, if we are given just 2 images of 80x80 pixels in size, we might decide that connecting to a database or even using an on-the-fly approach will be quicker. All of this calculation can be done transparently in the background, without the user having to know or care what we are doing.

Where's the code?

I've updated Justin's program to precompute color mappings and store them in a JSON file. You can read the full code here; see here for the most important changes. You can also read just the diff between the old and new versions here. Storing color mappings in a database and choosing an algorithm based on user inputs is left as an exercise for the reader.

Conclusion

At first glance, precomputing the closest ANSI color to every single RGB value seems like a stupid, gluttonous idea. And in some situations it is. But what sounds stupid for a single image might not be so ridiculous for hundreds of images, and the more work your program is doing, the more work it's worth you putting in to optimize it.

You'll rarely need to write out equations in the same way as we did here, or to know off-hand how long it takes to access a dictionary or a database. But it does pay to build up your intuition about how different approaches behave at different scales. And if you need to cast-iron ensure that your program is as fast as it can possibly be then you can't beat trying a few things out and timing them.

Questions? Comments? I’d love to hear from you. Otherwise, until next time:

Read my recent accessible writeup of a serious security vulnerability in KensingtonWorks, a program that manages your mice.
Looking for project ideas? Try Programming Projects for Advanced Beginners
Got some code you’d like me to take a look at? Send it to me!

Rob

PFAB #16: How to make your code faster and why you often shouldn't bother

Robert Heaton

Jun 14, 2020

Read this post with better code formatting on my blog.

This week on Programming Feedback for Advanced Beginners we're going to talk about speed. To do this we're going to analyze some code that was sent to me by PFAB reader Justin Reppert. Justin's code is a solution to Programming Projects for Advanced Beginners #1: ASCII art. Justin's program works perfectly and his code is neat and tidy, but he still wants to know:

"How can I make it faster?"

Whenever I'm asked this question, my first response is always "why does it need to be faster?" Fast code is of course better than slow code, all else being equal. But you should always ask yourself whether the execution speed of your particular program actually matters. Is it fast enough already? If it was a bit faster, would the world be an appreciably better place? Writing faster code takes time that could be profitably spent elsewhere, and optimized code is often more complex and more bug-prone than its dawdling brethren. Unless my code's speed is actively costing me money or happiness, I'd almost always rather spend my energy on making it more reliable or readable than on making it faster. Justin should be proud of his code just the way it is.

To his credit, Justin recognizes that the speed of his ASCII art generation program is not important. He can afford to wait a few seconds for his command line masterpieces, and he understands that great art takes time. Nonetheless, optimizing his code's performance is still an interesting educational exercise. So, with all the above caveats in mind, here's how I would do it.

What does Justin's program do?

Before we look at how to speed up Justin's program, let's talk about how it works. Justin's program takes in an input image file in JPG format. It analyzes the image and prints it to the user's terminal using colorful ASCII characters. Here's his code.

Justin wants to speed up the part of his program responsible for choosing the colors that are printed to the terminal. In an image file, the color of a pixel is stored as 3 numbers that describe the amount of Red, Green, and Blue in the pixel respectively. Each of these numbers is an integer between 0 and 255; for example, the shade of blue in the Facebook logo is represented as (59,89,152). This trio of numbers is called a color's RGB value. This means that each pixel in an image file can be any of 256 * 256 * 256 =~ 16MM possible colors.

However, using a terminal as a canvas imposes restrictions on an artist. Characters can only be printed to a terminal using a limited subset of these 16MM possible colors. The ANSI extended standard defines 255 colors that compliant terminals should be capable of displaying. This means that in order to display an image in a terminal, we need to round the color of each pixel in our file to the closest ANSI color. Justin's sensible, straightforward method of doing this is to, for each pixel:

Calculate the distances between the pixel color and each ANSI color using 3-D Pythagoras's Theorem
Return the ANSI color closest to the pixel color

This is a practical and logical algorithm. However, Justin tells me that he feels uneasy about the amount of computational work it requires. His algorithm has to calculate the distance between every pixel color and every terminal color. Even for a scaled down image of 80x80 pixels, this is 80 * 80 * 256 =~ 1.6MM separate distances for a single image. Justin's program is perfectly speedy and completes in a few seconds, but he wonders whether much of the work it does could be avoided.

It could.

How can we speed up choosing colors?

I can think of several ways we can speed up Justin's program:

Use a faster algorithm for finding the closest terminal color to a given pixel color
Use the same algorithm, but use a cache to reuse past work where possible instead of starting fresh for each image pixel
Use the same algorithm, but precompute some amount of data before starting

This week we'll look into option 1: use a faster algorithm. We'll look into options 2 and 3 in the next edition of PFAB.

1. Use a faster algorithm

Using a faster algorithm is the simplest and most effective way of speeding up Justin's code. If your algorithm isn't fast enough, it makes sense to look for a faster one. But despite being the best approach, I think that it's also the most boring. Designing a faster color-matching algorithm will teach you a couple of tricks and ways of thinking that might indirectly come in useful once or twice in your life. However, it won't give you any generally-applicable lessons for structuring your future programs and systems. Nonetheless, this week we'll discuss it, see how to do it, then pretend it never happened and look at the more exciting - if less immediately practical - options next time.

Note that even the zippy algorithm I'm about to describe is almost certainly not the absolute fastest method out there. Despite this, it still gives us a good 30x speedup over Justin's current approach. You can always make even a somewhat optimized piece of code faster. If we really needed to make the program as fast as physically possible - cost and complexity be damned - then we would write it in C or hand-create the assembly code. Computer scientist Donald Knuth is famous for saying that "premature optimization is the root of all evil". That would make us the Satan of speed.

Fortunately we don't have to do anything too villainous in order to pick up a good chunk of extra velocity. We can exploit the fact that the ANSI terminal colors are spaced regularly around RGB color space. Each RGB value is always one of [0, 95, 135, 175, 215, 255] (apart from a few exceptions that we'll talk about later). Even though this list shows us that ANSI colors aren't spaced evenly, their RGB values still form a rectangular (or cuboid, in 3-D) grid. This is all we need.

Suppose that we're trying to find the best matching terminal color for a pixel. Since we know the values at which the effective ANSI gridlines are drawn, we don't actually have to compare our pixel color to every possible terminal color (which is what Justin currently does). Instead, we can look at the list of values at which gridlines are drawn, and snap each of our pixel's R, G, and B values to the nearest of these gridlines. Doing this for R, G, and B will give us a new, rounded RGB co-ordinate. Because ANSI terminal colors are arranged regularly, we know that there will be an ANSI color at this rounded co-ordinate. And if you think about it hard enough, we also know that this ANSI color will be the closest one to our original pixel color.

For example, suppose that we want to find the closest ANSI color to the pixel color (41, 98, 201). We remember that the RGB values of (most) ANSI colors are always [0, 95, 135, 175, 215, 255]. We snap our R, G, and B values to the nearest value in this list, giving us (0, 95, 215). We look this up and see that it indeed corresponds to a valid ANSI color. This is a much quicker calculation than performing 3-D Pythagoras's Theorem 255 times.

Astute readers might notice that the full list of extended ANSI colors includes extra grayscale colors (white, black, and several shades of grey) at additional gridline values not used by non-gray colors.

There are several ways we can adapt our algorithm to include these additional tones. For example, we could use the above process to snap our pixel RGB values to the nearest grayscale gridlines. If there is a valid ANSI color at this snapped point, we return it. If not, we return the ANSI color calculated in the previous paragraph. We've had enough fun for one day already, so a detailed analysis of this minor point is left to the reader.

Writing the faster code

I've updated Justin's code to use this faster algorithm. You can read my new version here, or look at just the diff between our versions here. On my computer Justin's original code takes a respectable 6.2s to process the sample image from his repo, but the updated algorithm zips through it in a mere 0.18s.

To find the closest gridline value to an R, G or B value, I found the gridline value with the minimum absolute difference between the gridline value and the original value. The absolute difference between two numbers is the difference between them, expressed as a positive number. The absolute difference between 5 and 8 is 3. The absolute difference between 10 and 1 is 9. In Python the absolute difference between two numbers x and y can be calculated by abs(x - y). This gave me the following code:

def rgb_to_ansi(rgb):
    # `snap_value` snaps an R, G, or B value to the
    # closest gridline value.
    def snap_value(val):
        return min(GRIDLINES, key=lambda el: abs(val - el))

    ansi_rgb = [snap_value(v) for v in rgb]
    # Map the snapped RGB value to an ANSI color code
    # (not important for an understanding of the
    # algorithm).
    return inverted_color_dict[str(ansi_rgb)]

Even this approach can be sped up. When snapping an RGB value to the nearest gridline, the code iterates through every value in GRIDLINES, even in situations where we intuitively know that we've already found the closest gridline. To save some work we could iterate through GRIDLINES in ascending order and return early once the absolute difference starts to increase. Once the absolute difference starts to increase, the fact that we're iterating through the gridline values in ascending order means that we know we've found the closest value. We don't need to calculate abs(val - g) for any further gridlines and can instead immediately return the closest value that we've found so far.

Suppose that we want to snap the number 120 to the closest gridline. Our calculations might look like this:

// We calculate abs(gridline - value) for each gridline
// value in ascending order:
GRIDLINES = [0, 95, 135, 175, 215, 255]

abs(120 - 0) = 120
abs(120 - 95) = 25
// The absolute difference is decreasing, so we keep
// going.

abs(120 - 135) = 15
// The absolute difference is still decreasing, so we
// still keep going.

abs(120 - 175) = 55
// The absolute difference has started to increase,
// so we immediately return the previous gridline
// value: 135.

I'm not entirely certain that this approach really is any quicker than simply using min. I think it probably is, but I'd need to time and compare them carefully in order to be sure. Even if this approach is quicker, in my opinion the increased complexity of the resulting code is unlikely to be worth the fractions of a second that you'll save. Unless for your specific situation it is.

Conclusion

You can almost always speed up a piece of code, but speed doesn't come for free. To quote myself from a few paragraphs ago: "Writing faster code takes time that could be profitably spent elsewhere, and optimized code is often more complex and more bug-prone than its dawdling brethren." Sometimes these tradeoffs will be worth it, but often they won't. Don't feel bad about writing straightforward, moderately-paced code.

Create your own lo-fi art and implement the algorithm in this post yourself by tackling Programming Projects for Advanced Beginners #1: ASCII art.
Missed previous PFABs? Catch up with the full archives and learn how to make your programs shorter, clearer, and easier for other people to work with.
Comments? Questions? I’d love to hear from you.

Loading more posts…

Programming Feedback for Advanced Beginners

Programming Projects for Advanced Beginners #7: Auto-project-builder

PFAB #19: Working with raw bytes

Previously on PFAB

Remove the pixel colors

Think in terms of bytes instead of characters

Compression algorithms

Conclusion

Further reading

PFAB #18: Adventures in shrinking serialized data

What is serialization?

Pretty JSON

How big is a pretty JSON color map?

Ugly JSON is smaller than pretty JSON

Custom serialization format

Removing the separators using hex and fixed-width numbers

PFAB #17: Pre-computation sounds like cheating but isn't

Pre-computation

Is this actually a good idea?

How do we make these tradeoffs?

Getting the best of all worlds

Where's the code?

Conclusion

PFAB #16: How to make your code faster and why you often shouldn't bother

What does Justin's program do?

How can we speed up choosing colors?

1. Use a faster algorithm

Writing the faster code

Conclusion