A brief history of Nothing
How various modern languages deal with missing data
Almost all programs encounter the need to represent the absence of a value, whether it's an empty form field or an empty database column. But programming languages vary in the tools they provide. This post explores different approaches that language designers have taken in attempting to account for missing values.
In the beginning, there was null
Null references have been called the "billion dollar mistake" by their inventor, Tony Hoare. In languages like Java, any reference can either point to its declared type or a null
value which will only be 'discovered' at runtime.
MyEnterpriseFactoryBean foo = null;
foo.help(); // NullPointerException!
This lack of compile-time safety is the cause of many bugs, tests and 'defensive coding' boilerplate – surely, by now, more than a billion dollars' worth.
Static null-checking
Until quite recently, C# also suffered from the same lack of compile-time null checking for reference types. However, C# 8.0 introduced warnings on nullable references, and a compile-time "null state analysis" which determines a variable's null state (maybe-null or not-null) at any point in the program. In practice this means that comparing a variable to null
in a conditional will prevent warnings about that variable's nullability:
void FindRoot(Node node, Action<Node> processNode)
{
for (var current = node; current != null; current = current.Parent)
{
processNode(current);
}
}
This code doesn't generate a warning for processNode(current)
because current != null
is part of the for
loop's condition. Kotlin has the same feature.
In these languages, nulls and null-checking are specifically a feature of the compiler, with various operators defined to help programmers work with them.
Flow-based analysis is a fundamental shift in typed languages' ability to provide static type guarantees. Now, variables' known types can change over the course of a function. In C#, that's currently restricted to null checking, but I expect eventually it will expand to include...
Union types
A union type is better thought of an "either" type, usually denoted with a |
. For example, a value of type A | B
may be either an A
or a B
. Languages with union types have an easy way to incorporate nullability: for any given type Foo
, the union type Foo | null
is the nullable equivalent. Several languages, including TypeScript and MyPy (typechecked Python), provide compile-time flow analysis features to complement their union types, allowing refinement of a variable's type over the course of a function body in the same way as C#'s null checking.
Interestingly, while Kotlin does provide flow analysis (known as 'Smart casts'), it does not provide generalised union types, so programmers are restricted to inheritance and explicit nullables for this feature.
Other options
Conspicuous in their absence until this point are languages which do not feature null
at all. In these languages, optional values are an opt-in feature, expressed at the type level as an Option
(or Maybe
). This simple structure is similar to a union type, with some additional features.
Confusingly, different languages have opted for different names:
- In Scala (and Rust etc), an
Option[A]
can beSome[A]
orNone
- In Haskell (and Elm etc), a
Maybe a
can beJust a
orNothing
What these languages have in common are ways to manipulate the optional value, with map
and bind/flatMap
functions defined on the type. It is difficult to talk about all of these languages as a whole, but the key differential here is that there is no special treatment of absent values by the compiler. Option
s are not a language feature, they are simply part of the standard library, which differentiates them from the nullable features found in C# and Kotlin. By treating optional values as just another type, common functional abstractions such as functor (for map
) and monad (for bind/flatMap
) come for free.
In these languages, a Foo
cannot be substituted for an Option[Foo]
– it must be explicitly wrapped in a Some
, and later unwrapped in order to be converted back to Foo
.
The Rust compiler does provide extra syntax sugar for Option
(and its more complex cousin, Result
) with the ?
operator. This does not quite reach the levels of flow-analysis as TypeScript and Kotlin, but does provide a handy way to propagate the None
value in a typesafe way:
struct Foo { }
fn get_foo() -> Option<Foo> { ... }
fn use_foo(_: &Foo) -> () { ... }
fn get_and_use_foo() -> Option<Foo> {
let x = get_foo()?;
// although get_foo returns an Option<Foo>,
// the type of x below is simply Foo:
use_foo(&x);
Some(x)
}
In summary, there are a few different techniques employed by modern languages to deal with missing data. The trend is very clearly away from runtime checking, towards compile-time safety. C#, Kotlin, TypeScript and MyPy have steered towards union types and compiler features; Haskell, Scala, and Elm provide a dedicated type in the stdlib. Rust goes a step further, providing both a dedicated type and compiler features for ease of use. Who knew such a simple-seeming problem could have so many possible solutions?
Image credits via Unsplash:
- Cover by David Ragusa