Learn something new about programming

It’s been a busy week and I haven’t had a chance to write anything, so instead I’ve scoured my archives and put together a menu of past posts that help teach you something new about computers. All of these posts are intended to be accessible to advanced beginners, so if you’ve ever wanted to know more about how the systems inside a large company work, or what exactly Tor is, or how to snoop on the apps and websites that snoop on you, then settle down and have a read.

Project ideas

Programming inside a large company

Using an HTTP proxy to snoop on apps and websites

Networking and security

Programming odds and ends

Questions? Comments? I’d love to hear from you.

Rob

PFAB #15: Don't overwork your functions

You can also read this post with prettier code formatting on my blog.

Last time on Programming Feedback for Advanced Beginners, we analyzed a data processing script written by PFAB reader Frankie Frankleberry. Frankie's program helps his company scrutinize its product ranges to see if any items are mispriced. The program loads a big CSV of product data and flags any products meeting certain criteria, such as those with particularly low sales prices or profit margins. The company presumably uses the program's output to try to charge more money for the same stuff.

Last time we looked at how Frankie could use first-class functions to avoid having to use the evil eval function. This week we're going to look at some of his functions that have become overworked and responsible for too many different tasks. We'll see how we can take some weight off of them by migrating them into classes, and by the end of the episode Frankie's program will be looking positively cromulent.

You can read both Frankie's original program and my refactored version on GitHub (and here's a commit showing just the changes). The code is written in Python, but the lessons are applicable to any language. If you haven't read the previous episode of PFAB in which we started analyzing Frankie's program, start there.

Describing filters

Frankie's program uses "filters" to select a subset of his input data matching a criteria. For example, a filter could select all products in the luxury category with a price below $50. In order to document his filters for his users, Frankie wants to associate each filter function with a plain-English description of what the function does. This might be used as follows:

$ python3 filter.py list-filters
######################
## 7 active filters ##
######################

1. low_price_products: Find all products that cost less than $2
2. low_selling_items: Find all items that have not sold any
    copies in the last week
3. ...etc...

In his code, Frankie wants each filter's description to be directly attached the function containing its logic. He doesn't want the functions to live in one part of his code and the descriptions in another. He doesn't want to have to write code like the following snippet, in which one function called run_filter is responsible for executing filters, and another completely separate function called get_description is responsible for keeping track of their descriptions:

# `run_filter` runs the filter with the given filter name
def run_filter(filter_name):
    if filter_name == "low_price_products":
        return low_price_products()
    elif filter_name == ...
        # ...etc...

# `get_description` returns the description of the filter
# with the given name.
def get_description(filter_name):
    if filter_name == "low_price_products":
        return "Find all products costing less than $2"
    elif filter_name == ...
        # ...etc...

filter_name = "low_price_products"

print(f"Running function: {filter_name}")
print(get_description(filter_name))
run_filter(filter_name)

This code is unpleasant for at least two reasons. First, future programmers have to be careful to keep the two if/elif blocks in run_filter and get_description exactly in sync. This is not being checked or enforced, and it will be easy to forget to update a branch if a filter name changes or a new one is added.

Second, it's difficult for a programmer working on the code to see which description is associated with which function, because they have to hop back and forth between run_function and get_description. It would be clearer and more robust if the description and logic for a function lived right next to each other and didn't need to be matched up by strings and parallel if-statements.

Uniting a filter's description and its logic is a worthy aim. But as we'll see, the way in which Frankie achieved this goal created new problems for him. Let's look at Frankie's solution, the problems it created, and how we can fix them.

Functions that do too much

Frankie bound his logic and descriptions together by requiring each filter function to accept a description_only boolean flag argument. If the function is called with description_only=False, the function filters the data as normal and returns the results as a dataset of some sort. But if it is called with description_only=True, it instead returns the description of the filter as a string. For example:

def low_price_products(input, description_only):
    if description_only:
        return "Find all products costing less than $2"
    else:
        # ... do some filtering of `input` ...
        return output_data

print(low_price_products(data, description_only=False))
# Prints the filtered dataset
# => [{"product_name": "Banana", ...

print(low_price_products(data, description_only=True))
# Prints the description of `low_price_products`
# => "Find all the products that cost less than $2"

This approach tightly links logic and description, as desired, but at a high cost. The filter functions (like low_price_products) have become overworked. You should aim to have each component of your code be responsible for a single thing. Frankie's functions either return a string of the description, or the output dataset. A function that returns different types of data depending on the input it is given is by definition responsible for too much.

Functions should always return the same datatype

A rule of thumb:

Functions should always return the same data type (list, dictionary, integer, string, custom class, etc.), regardless of the arguments that they are given.

As with all things in life and software, there are exceptions. A function that takes a Twitter username and searches for their profile information could reasonably return either:

  • A dictionary of information if the username exists

  • Or nil if it the username doesn't exist

Anything fancier than this is likely a bad idea, because it means that your function is doing too much.

An aside on dynamically- and statically-typed languages

How easy or hard it is to write functions that return different data types depends on the programming language you are using. Dynamically-typed languages like Python and Ruby make it easy (arguably too easy). A loose but still helpful definition of a dynamically-typed language is one in which you don't specify in your code the data types of variables and function inputs and outputs. For example, in Python you write code that looks like:

# We don't have to specify the type of `a` and `b`,
# or the type that `multiply` returns.
#
# New versions of Python do allow you to specify
# *type-hints*, but we can safely ignore type-hints
# for the sake of this discussion.
def multiply(a, b):
    return a * b

x = 3
y = 4
z = multiply(x, y)

By contrast, statically-typed languages like Go and Java require you to specify data types in your code (again, this is not the strict, textbook definition, but it's good enough). For example, you might rewrite the above Python code in Go as follows:

// We specify that `a` and `b` are integers, and that
// the return value of `multiply` is an integer too.
func multiply(a int, b int) int {
    return a * b
}

// Sometimes the language's compiler can infer
// the type of a variable on its own.
x := 3
y := 4
z := multiply(x, y)

If we tried to write a filter function in Go that returned either a string or a dataset, the Go compiler would get upset when we tried to run our program:

// XXX: we have to specify in advance the data type that
// `lowPriceItems` will return. This means that we
// can't have it return either a string or a Dataset -
// we have to choose one!
func lowPriceItems(input Dataset, descriptionOnly string) ????? {
    if descriptionOnly {
        return "This filter returns low priced items"
    } else {
        // ... do some filtering of `input` ...
        return outputDataset
    }
}

There are ways around this restriction if we were determined enough, but they are unlikely to be a good idea. For example, we could make a type called FilterOutput with two fields called Description and Dataset. We could tell the Go compiler that the lowPriceItems function will return a FilterOutput object and set only the field corresponding to the type of data that we want to return:

// Define the FilterOutput type with 2 fields.
type FilterOutput struct {
    Description string
    Dataset dataset
}

// Now `lowPriceItems` will always return a `FilterOutput`
// object, so we can set this `FilterOutput` as the
// `lowPriceItems` return type.
func lowPriceItems(input Dataset, descriptionOnly string) FilterOutput {
    if descriptionOnly {
        // Return a `FilterOutput` with only the `Description`
        // field set.
        return FilterOutput{
            Description: "Find all products costing less than $2",
        }
    } else {
        // ... do some filtering of `input` ...
        //
        // Return a `FilterOutput` with only the `Dataset`
        // field set.
        return FilterOutput{
            Dataset: outputDataset
        }
    }
}

This approach would work but I would not recommend it, for the reasons already discussed. Instead, we should take the hint that the compiler is giving us and work out something cleaner.

Binding data together in a class

Frankie wants to tightly bind together a filter's logic and its description. Unstead of squeezing them both into the same function, let's hang them off of a class.

We'll define a class called Filter. This class's constructor will take two arguments:

  • A first-class function containing the filter's logic (see PFABs #10 and #14 for more on first-class functions)

  • A description string

Filter will have a single method called apply. This will take a dataset, run it through the filter function that was given to the constructor, and return the result.

class Filter(object):

    def __init__(self, filter_f, description);
        self.filter_f = filter_f
        self.description = description

    def apply(self, input):
        return self.filter_f(input)

At the end of the previous PFAB, our code had a list of first-class functions that our code would pass our input dataset through:

functions = [
    low_profit_products,
    high_price_products,
    low_selling_products,
]

input_data = load_data()
for f in functions:
    print(f(input_data))

To migrate this code to use this our Filter class, we'll convert the list of functions to a list of Filters; add descriptions; and have the code pass our input dataset through each filter in turn:

filters = [
    Filter(
        filter_f=low_price_items,
        description="Find all products costing less than $2"
    ),
    Filter(
        filter_f=low_selling_items,
        description="Find all items that have not sold any copies in the last week"
    ),
    # ...etc...
]

input_data = load_data()
for f in filters:
    print(f.description)
    print(f.apply(input_data))

This approach has several advantages. First, it splits up a filter's description and logic so that they are no longer squashed into the same function. filter_f always returns a dataset; description is always a string. Second, it gives us a framework on which we can hang additional properties of filters in the future, such as:

  • The name of the filter

  • The name of the team who owns the filter

  • Who to email if the filter breaks

  • Where the filter should load its input data from

Eventually we may find ourselves with so many extra properties on Filter that we split them off into smaller classes like DataSource and FailureNotifier. But that's a story for another day.

In summary

Don't write functions that return drastically different data types. If you need to tightly couple several concepts or pieces of data together, consider hanging them off of a class instead.

Questions? Suggestions?

I’d love to hear from you - feel free to email me or DM me on Twitter.

PFAB #14: Evil `eval`

Programming Feedback for Advanced Beginners reader Frankie Frankleberry writes:

Here's a program I wrote recently. It works, but I'm really not sure if my code is "proper". I use Python's eval function, which never feels like a good idea. There might be a nicer way to do it...?

Frankie is absolutely correct; using the eval function is never a good idea. Fortunately, he's also correct that you almost never have to use it, and there are almost always better options available. In this post we'll learn what the eval function does, why it's wonderful and amazing, and why you should never, ever use it. We'll also see how we can rewrite Frankie's code using first-class functions to expunge the evil eval altogether. Frankie's code is written in Python, but the lessons are applicable to code written in many other languages.

What does Frankie's program do?

Frankie's program is a data processing script that helps his business analyze its product ranges to see if any of them are mispriced. The program loads a big CSV of product data and flags any products meeting certain criteria, such as those with particularly low sales prices or profit margins. The company presumably uses the program's output to try to charge more money for the same stuff.

                    +---------------------+
+------------+  >-->+low_profit_products  +-->  +------------+
|            |  |   +---------------------+  |  |            |
|Product Data+--+-->+high_price_products  +--+->+   Output   |
|            |  |   +---------------------+  |  |            |
+------------+  >-->+low_selling_products +-->  +------------+
                    +---------------------+

You can read Frankie's code on GitHub, as well as my refactored version (here's his original code, here's the updated version, and here's a commit showing just the changes). In order to understand the changes that I made, we need to first understand Python's eval function.

What does the eval function do?

Python's eval function takes a string and evaluates it as a Python expression. For example:

# ==== EXAMPLE 1 ====
inp1 = "5"
inp2 = "8"
operator = "+"

# `eval` evaluates the string "5+8"
# and returns the result.
x = eval(inp1 + operator + inp2)

print("x is: " + x)
# => x is: 13

# ==== EXAMPLE 2 ====
function_name = "reverse"
l = [1,2,3,4]

# `eval` evaluates the string "reverse([1,2,3,4])"
# and returns the result.
y = eval(function_name + "(" + l + ")")

print("y is: " + y)
# => y is: [4,3,2,1]

Here's a rough outline of Frankie's code. It uses eval to loop through a list of filter functions:

def low_profit_margin_products(data):
  # ...do some stuff and return a subset of data...

def low_sales_price_products(data):
  # ...do some other stuff and return another subset of data...

function_names = [
  "low_profit_margin_products",
  "low_sales_price_products",
]
dataset = load_data()

# This calls each function in function_names
# on our dataset.
outputs = []
for fn in function_names:
  # Evaluates strings like "low_profit_margin_products(dataset)"
  # and adds the result to `output`.
  outputs.append(eval(fn + "(dataset)"))

print(outputs)

eval-like methods exist in most other interpreted languages too, like Ruby and JavaScript. They allow you to dynamically construct the code of your program. They are flexible, powerful, and fun to work with, and you should never ever use them.

Why is eval dangerous?

eval is dangerous because it can make your code insecure. The above eval example snippet is, in the exact form that it is currently written, technically fine. If you used it as part of a real website or other system, it would not introduce any immediate vulnerabilities. But the eval would still be lurking there, waiting for an innocuous-seeming change to turn it into a gaping flaw.

Here's a plausible story about the future. Suppose that Frankie's system keeps growing and adding new features. It becomes so useful that his company releases it as a standalone product that other organizations can use to analyze their own data. Frankie adds a UI in which users can select the filters that they want to run on their data. He asks users for the list of function_names that they want to run, and swaps that list in for the current, hard-coded function_names variable. His new code looks something like this:

function_names = get_function_names_from_user_input()
dataset = load_data()

outputs = []
for fn in function_names:
  outputs.append(eval(fn + "(dataset)"))

Very elegant, but very, very insecure. To see why, think about what would happen if a user passed in a function name of:

print('hello world') and low_profit_margin_products 

The code would assemble and then run the following string as code:

print('hello world') and low_profit_margin_products(dataset)

This line would return the low profit margin products, as per usual, but before it did so it would execute print('hello world'). Printing hello world isn't going to bring down Frankie's company, but an attacker could use the same technique with a function name of:

exec('import os; os.rmdir("/")') and low_profit_margin_product 

to erase Frankie's server's hard drive. That would ruin quite a few people's days.

The problem with eval is that it risks allowing attackers to craft malicious input (such as the above) that tricks your program into executing harmful code. This is not a theoretical threat; an attacker trying to exploit your system will often try feeding it a long list of sneaky inputs, designed to take advantage of insecure usages of tools like eval. Even if a program uses eval in a way that is technically safe today, it adds a subtle booby trap that future programmers might unwittingly stumble into when they update the code. You want your code to be secure, robust, and difficult to accidentally break.

As well as being a security risk, eval makes your code difficult to understand and work with. For example, suppose that you write several methods to work with "reports" called create_reportdelete_report, and update_report. To reduce duplication in your code, you decide to use eval to wrap the functions up inside a single perform_report_action method, like so:

def perform_report_action(action_type):
  """
  action_type is either "create", "delete" or "update".
  """
  # Debug statement
  print("Performing report action: " + action_type)

  # Check that the current user is allowed to
  # perform this action
  if not current_user_has_permission_for_action_type(action_type):
    raise Exception("You are not authorized to perform this action!")

  # Save a database record saying that the action was performed
  # for auditing purposes.
  record_action_audit_log_in_database(action_type)

  # Use `eval` to actually execute the appropriate
  # action method
  return eval(action + "_report()")

c = perform_report_action("create")
d = perform_report_action("delete")
u = perform_report_action("update")

This fancy code works and saves you from repeating the code that performs the permission check and audit log for each report action. However, a few months later you decide to add some extra arguments to the create_report method. In order to make this change you'll need to update every existing usage of create_report(). You search through your project for the string "create_report". However, because of your previous cleverness, this string doesn't actually appear anywhere and so your search finds nothing. This makes it difficult for you to figure out where create_report is used, or even whether it is still used at all. You either give up and move onto something else, or make your change and accidentally break your system.

In summary, never use eval or any method like it.

Why did Frankie use eval?

Frankie is a smart guy. In his email to me he even noted that he didn't like the eval function. So why did he use it?

Frankie had good intentions. He wanted to avoid writing repetitive code like this:

fn = 'example-data.csv'
data = load_data(fn)

results1 = low_profit_products(data)
analysis1 = do_analysis(results1)

results2 = high_price_products(data)
analysis2 = do_analysis(results2)

results3 = low_selling_products(data)
analysis3 = do_analysis(results3)

# ...and so on...

He didn't like the way that this approach would require him to copy and paste several lines every time he wanted to add a new filter function to his program. Think about a similar situation - it's easy to pass multiple inputs through one function using a for-loop:

# Multiple inputs
animals = ["cat", "dog", "horse", "monkey"]
for a in animals:
  # One function
  process_animal(a)

Shouldn't it be just as easy to pass one input through multiple functions?

It is, but it doesn't require the use of eval or anything like it. Instead, we can use first-class functions. We've talked about first-class functions in a previous PFAB, but here's a brief refresher.

First-class functions

You learn very early on in your programming career that you can use a variable to store the output of a function:

reversed_list = reverse([1,3,5,7,9])
print(reversed_list)
# => [9,7,5,3,1]

However, in Python you can also use a variable to store a function itself:

f = reverse
reversed_list = f([1,3,5,7,9])
print(reversed_list)
# => [9,7,5,3,1]

Frankie's code contains a list of function name strings. He uses these names to call the corresponding functions using eval. Here's the relevant lines again:

function_names = [
  "low_profit_products",
  "high_price_products",
  "low_selling_products",
]

outputs = []
for fn in function_names:
  outputs.append(eval(fn + "(dataset)"))

We can remove the need for eval by storing a list, not of function names, but of references to the functions themselves. We can iterate through this list using a for-loop, exactly as above, passing our dataset into each function in turn. This might look something like this:

functions = [
  low_profit_products,
  high_price_products,
  low_selling_products,
]

outputs = []
for f in functions:
  outputs.append(f(dataset))

This version is much safer, and is even easier to read too. If we want to add a new filter function that performs a new analysis, all we have to do is add it to our list of functions. The for-loop takes care of the rest, no eval-ing or copy-pasting required.


Any time you think you need to use eval or any other method that evaluates a string as code, stop and think. There will almost certainly be another way to do what you want that is safer and clearer. You could go through an entire 40 year career as a programmer without using any methods like this in production code and you'd almost certainly have been doing it right.

First-class functions are wonderful. Passing around logic in the same way as any other value opens up a whole new world of elegant code. If you read my full refactored version of Frankie's program, you'll see that I took this concept even further and wrapped up each filter function inside a Filter object. Next time on PFAB we'll talk about why.


PFAB #13: When code is too clever to be clean

I just finished reading Robert Martin's Clean Code, one of the better-selling programming books of all time. I agreed with the vast majority of its recommendations, but that's not interesting. If you want to hear about the parts that I agreed with then you might as well just read it yourself. In this post I'm going to quibble with both its overall tone and some of its specific recommendations.

I wished that Martin had talked more about why and when it can be correct to compromise on code cleanliness. It's one thing to know how to write good code given plenty of time and clear requirements. It's another to know what to do when the clock is ticking and you have no idea what tomorrow will bring. You should never call your variables x and asdf instead of preTaxSubTotal and authorName in order to save a few milliseconds of typing, but it can sometimes be right to skip over whole libraries of best practices in the name of speed or simplicity. Martin often feels like a clean dogmatist, fond of hifalutin words like "craftsmanship". But sometimes being a craftsman or an artisan or any other word that makes being a salaried employee sound cooler requires making tradeoffs.

Maybe Martin is sensibly leaving this aspect of programming as a story for another day. I'm sure that he has very nuanced opinions about technical debt and compromises, but this particular book is called Clean Code, not Clean Enough For Now Code. Maybe it's better to learn the infinite time best approach, and make concessions from there.

Philosophy aside, I also disagreed with some of the books specifics. Let's look at what I think is a clear example of trying to be too clever, on p128 of my paperback version. See who you agree with.

Trying to be too clever

In a past life, Martin worked on an environment monitoring system. He doesn't go into detail about its specifics, but it seems to have been responsible for measuring and controlling the temperature of buildings. It was written in Java, although this isn’t important for the story. Martin quotes a code snippet from one of the system's unit tests; a piece of code that runs the main program and verifies using assertions that it works in the way that the programmer expects. Martin does not like the test's code:

@Test
public void turnOnLoTempAlarmAtThreshold() throws Exception {
  hw.setTemp(WAY_TOO_COLD);
  controller.tic();
  assertTrue(hw.heaterState());
  assertTrue(hw.blowerState());
  assertFalse(hw.coolerState());
  assertFalse(hw.hiTempAlarm());
  assertTrue(hw.loTempAlarm());
}

Martin writes:

Notice, as you read the test, that your eye needs to bounce back and forth between the name of the state being checked, and the sense of the state being checked. You see heaterState, and then your eyes glissade left to assertTrue. You see coolerState and your eyes must track left to assertFalse. This is tedious and unreliable. It makes the test hard to read.

I'm not offended by these problems in the same way that Martin is. On the other hand, I didn't have to work in the codebase, and I could certainly believe that after reading and debugging 30 other near-identical tests then the style could begin to grate. Either way, I think that the cure that Martin proposes is worse than the disease:

I improved the reading of this test greatly by transforming it into [the following code]:

@Test
public void turnOnLoTempAlarmAtThreshold() throws Exception {
  wayTooCold();
  assertEquals("HBchL", hw.getState())
}

He continues:

[...] the thing to note is the strange string in the assertEquals. Upper case means "on", lower case means "off", and the letters are always in the following order: {heater, blower, cooler, hi-temp-alarm, lo-temp-alarm}. [...] Notice, once you know the meaning, your eyes glide across that string and you can quickly interpret the results. Reading the test becomes almost a pleasure.

This refactored version is much more compact than the original. But this terseness comes at the cost of converting all of those clear, explicit assert statements into a too-cryptic secret language. To the uninitiated, it is impossible to understand what this test is doing. Even the already-initiated don't have it much easier. Is hiTemp the first or second h? What does that B stand for again?

The situation gets worse if the system ever evolves. What if we add another property to the state, like hiHumidity? What if we remove one? Or rename it? Going through each HBchL-like string and removing, adding, or updating the exact right character will be a chore. Pretend that you're a new programmer making a small change to the system. You know roughly how it works, but don't have any deep experience working with it. You make your change, run the tests, and see:

turnOnLoTempAlarmAtThreshold FAILED

> assertEquals("HBchL", hw.getState())
Expected: HBchL
Actual:   hBcHL

Even if you were familiar with roughly what the error message is getting at, it would require a distressingly keen eye to quickly understand the specific problem at hand. Contrast this error message with one you might expect from the original version:

turnOnLoTempAlarmAtThreshold FAILED

> assertTrue(hw.heaterState());
Expected: True
Actual:   False 

We expected the heater to be on, but it was off. We can immediately start work figuring out why.

So what do you suggest, genius?

As I've already said, I'm not particularly offended by the original code. Without meaning to be snarky, in the real world I'd probably just leave it as it is and try to find something more important to work on. But if I was writing a book or a blog about clean code and wanted to spend as much time as necessary to make it as pleasant as possible, I'd represent the state of the environment more using a more explicit data structure than a string.

I'd like the state data structure to be self-documenting, meaning that a reader can immediately see what the state is by reading the code. This requirement instantly rules out something trivial like an array of booleans - I don't like HBchL, but [true, true, false, false, true] would be much worse.

Much more reasonable would be to represent the state using a Map, giving us something like the following:

@Test
public void turnOnLoTempAlarmAtThreshold() throws Exception {
  wayTooCold();
  assertEquals(Map.of(
    "heater", true,
    "blower", true,
    "cooler". false,
    "hiTemp", false,
    "loTemp", true,
  ), hw.getState())
}

The eye jumping that bothered Martin in the original version is gone. We use more lines and characters than his refactored version, but I claim that we get more than enough extra clarity to justify our troubles. However, I'm bothered by the lack of pro-active defence against a typo in the map's keys. I'd like us to warn the programmer if they accidentally write haeter or bowler instead of heater or blower. At the moment if the programmer makes this type of blunder then the test will fail - which is good - but it might take them a second (or an afternoon if they didn't sleep well last night) to figure out why.

To guard against typos, I'd like to wrap the state of the environment in a simple class called something like EnvironmentState. This class will expose properties for heaterblower, and so on, and will give us a single, centralized place in which to validate that any state we construct is valid. I'd ideally like to use named parameters to guard against typos. Calling a function with named parameters means that you explicitly say which argument each parameter you pass should be assigned to, instead of relying on their order. Java doesn't support named parameters, but in Python we might write:

state = EnvironmentState(
  heater=True,
  blower=True,
  cooler=False,
  hi_temp=False,
  lo_temp=True,
)

Compare this to the alternative:

state = EnvironmentState(
  True,
  True,
  False,
  False,
  True,
)

The former is easier to read, and doesn't require us to remember the exact order of the parameters. If someone tries to pass in haeter=True, Python will say "there's no such argument as haeter". The programmer immediately knows exactly what has gone wrong.

Since Java does not support named parameters, we need to find another way of helping the programmer realize when they have made a typo. I see two main ways to do this, depending on whether we want to represent our state using a Map or an EnvironmentState class.

If we want to stick with a Map, we could wrap the call to assertEquals inside our own method called assertStateEqualsassertStateEquals begins by checking that the given Map only contains valid properties. If it finds any invalid properties (eg. haeter) it immediately throws an exception with a helpful error message. If it doesn't, it calls assertEquals on the states, as before. The code would look like this:

@Test
public void turnOnLoTempAlarmAtThreshold() throws Exception {
  wayTooCold();
  assertStateEquals(Map.of(
    "heater", true,
    "blower", true,
    "cooler". false,
    "hiTemp", false,
    "loTemp", true,
  ), hw.getState())
}

Alternatively, if we wanted to go the route of an EnvironmentState class, we could add a method called something like EnvironmentState.fromMap. This method would take a Map, validate that it contains only valid keys, and use those keys to construct an EnvironmentState object. If EnvironmentState.fromMap found any invalid keys (like bowler), it would throw an exception. The code might look like this:

@Test
public void turnOnLoTempAlarmAtThreshold() throws Exception {
  wayTooCold();
  assertEquals(EnvironmentState.fromMap(Map.of(
    "heater", true,
    "blower", true,
    "cooler". false,
    "hiTemp", false,
    "loTemp", true,
  )), hw.getState())
}

This approach is my favorite. The resulting code is easy to read and has well-placed guardrails to catch and guide programmers when they make mistakes. Everything about what it means to be a valid state is wrapped up inside the EnvironmentState class, giving us a single place to go when we want to understand or modify the system. It's easy to add new properties to the state (like humidityOk). It's even easy to change those properties to have value-types other than booleans (like tempCelsius: 34.5), although that wasn't part of the specification so it's perhaps unfair to count it as a benefit. On the other hand, my approach is more verbose and more boilerplate-heavy than Martin's. If you preferred his to mine then I would think you were wrong, but I wouldn't think you were crazy. What makes code clean can vary according to taste.


Clean Code is packed with good ideas and examples. I found the writing style a little domineering at times, and I don't think I'd want to be friends with the author. But that's fine - I'm looking for professional advice, not a new golfing buddy. I'd recommend the book to anyone.


Until next time:

Rob

Systems design for Advanced Beginners

You’ve started yet another company with your good friend, Steve Steveington. It’s an online marketplace where people can buy and sell things and where no one asks too many questions. It’s basically a rip-off of Craigslist, but with Steve’s name instead of Craig’s.

You’re going to be responsible for building the entire Steveslist technical platform, including all of its websites, mobile apps, databases, and other infrastructure. You’re excited, but also very nervous. You figure that you can probably cobble together a small website, since you’ve done that a few times before as part of your previous entertaining-if-morally-questionable escapades with the Stevester. But you have no idea how to even start building out all of the other infrastructure and tools that you assume lie behind large, successful online platforms.

You are in desperate need of a detailed yet concise overview of how real companies do this. How do they store their data? How do their different applications talk to each other? How do they scale their systems to work for millions of users? How do they keep them secure? How do they make sure nothing goes wrong? What are APIs, webhooks and client libraries, when you really get down to it?


You send a quick WhatsApp to your other good friend, Kate Kateberry, to see if she can help. You’ve worked together very effectively in the past, and she has decades of experience creating these types of systems at Silicon Valley’s biggest and most controversial companies.

She instantly accepts your job offer. You had actually only been ringing for some rough guidance and a good gossip, but you nonetheless instantly accept her acceptance. No point looking a gift horse in the mouth, even when you don’t have any money to pay her. Kate proposes that her first day be 5 weeks ago in order to help her smooth over some accounting irregularities. She can come into the office sometime next week. You feel encouraged and threatened by her eagerness.


Kate bounces into your offices in the 19th Century Literature section of the San Francisco Public Library. “OK let’s do this!” she shouts quietly. “What have we got so far? How are all our systems set up? What’s the plan?” You lean back in your chair and close your laptop, which was not turned on because you have left your charger at home. You steeple your fingers in a manner that you hope can be described as “thoughtful”.

“Let me flip that question around, Kate. What do you think the plan should be?”

Kate takes a deep breath and paints an extremely detailed vision of the Steveslist platform five years into the future and the infrastructure that will power it.


Read Kate’s in-depth vision in full on my blog. It’s long and detailed and covers topics from database to webhooks, and by the end you’ll understand the infrastructure behind almost all modern online companies.

Read it now.

Rob

Loading more posts…