Tech Blog
Maps in Go
Maps in Go
Introduction
C++ was the first general purpose programming language I learnt (after MATLAB) and where I was introduced to the concept of hash tables. Hash tables have many different names but here we will refer to them as a map
. Maps
are a fundamental data structure as they provide a simple interface (and constant-time lookup) for retrieving specific elements of stored data. Each element in the map
is a key-value pair. Maps
may also be iterated over to view each key-value pair.
After having segfaults in some C++ code, I learnt the hard way about “iterator invalidation” when an element is deleted from the map during iteration.
Later, coming to Go two things struck me immediately about the in-built map
type that were different compared to C++:
- Deleting during an iteration didn’t require any special thought
- The iteration order is non-deterministic.
This blog explores the implementations in both languages to explain these differences. When source code is presented, comments added for this blog are prefaced [Blog]
.
C++ implementation
C++ provides the unordered_map
type for storing key-value pairs. There are a number of different implementations of the C++ Standard Libary, here we will be looking at the libc++
implementation.
The libc++
unordered_map
implementation uses the hash
of the key is to index into an array, known as a bucket. Each bucket is a linked list which stores the individual key-value pairs.
Internally the unordered_map
defers to the __hash_table
class, where we can see this structure in the __bucket_list
type:
The __bucket_list
is therefore an array of pointers, each of these points is to the first element in the linked list that holds the key-value pairs, as shown below.
There is some interesting discussion in the proposal for introducing the type into the C++ Standard Library on the design choices.
We can see this structure in the find
method (comments mine):
A crucial part of the C++ implementation is that final element in the linked list points to the next element in the __bucket_list
. This explains why in find
there is a check to make sure we stay in the same bucket as we follow the __next
pointer in the linked list.
This means that for iteration we just keep following the __next
pointer until we reach the end of the _bucket_list
and, therefore, the iteration order is identical (assuming no elements are removed or added) between iterations.
This underlying data structure also explains why the following program segfaults. When the current element (it
) is removed, the chain of __next
pointers is broken.
The fix is a simple one, as of C++11, the erase
method returns the __next
pointer of the deleted element to allow the iteration to continue.
Go implementation
The implementation in Go follows a similar strategy but has some fundamental differences. There is still an array with a pointer to a list of key-value pairs (a bucket) but in Go’s case each bucket holds eight elements. If a particular bucket needs to store more than eight elements, initially an “overflow bucket” is used until a threshold is hit, which triggers twice as many buckets to be allocated and all the existing elements are rearranged into the new buckets.
Another interesting facet is that within the bucket all the keys are stored first, followed by the values to be maximally space efficient. Vincent Blanchon has a nice overview of the Go map type in a two part series which covers the underlying data structure, including the overflow buckets and key-value packing.
In the mapAccess1
function we can see a similar routine for finding a given element. First the key is hashed and converted to an index in the bucket array:
and then the bucket is iterated to find the corresponding key,
Although the data structure is slightly different, so far the C++ and Go maps have a similar approach for finding the value for a given key. The iteration behaviour is quite different and can be a suprise to those new to Go; in the Section “For Statemets with range clause“, we have
- The iteration order over maps is not specified and is not guaranteed to be the same from one iteration to the next.
This is a deliberate design choice in order to stop developers relying on a particular iteration order. If the Go Team wanted to change the underlying implementation of the map
type (but keep the same API), a change in the iteration order could then, perhaps silently, break existing code. Early on, the order was deterministic if the map had fewer than eight elements (ie 1 bucket) but this too was made non-deterministic (Issue 6719).
This iteration order begins in the mapiterinit
function
This means we start the iteration at a random bucket and at a random point within the bucket. In essence we start with a random key-pair. Afterwards each key-pair is visited until we reach the end of the bucket list, at which point we go back to the start of the bucket list and continue to visit the elements until we are back where we started (it.startBucket
).
The logic in the mapiternext
function is quite involved so some of the code is elided.
With this in mind, it’s not true to say that the iteration order is “random”. We proceed through the buckets in a specific order, it is just that from one iteration to the next, the starting bucket and the order through each bucket will be different.
This iteration strategy also answers the question we had at the start of this post about why deletion does not require any special thought. We know that each of the buckets is always eight elements large (with potentially an overflow bucket) which means that we do not have the chain of next
pointers to keep a track of. The Spec also says
If a map entry that has not yet been reached is removed during iteration, the corresponding iteration value will not be produced. If a map entry is created during iteration, that entry may be produced during the iteration or may be skipped.
This aligns with what we have seen. If an element is created during iteration it will appear in a particular bucket depending on the hash of the key. Whether we see it not during iteration depends on whether we have already visited that bucket. The same logic applies for deletion.
Conclusion
We have looked in detail at the implementations of a map
data structure in both C++ and Go. They both have similar functionality — insert, delete, iterate — but the underlying implementation yields different behaviour. The non-deterministic iteration order in Go is particularly interesting and shows the length to which the Go team will go to provide backwards compatibilty. In their blog post on maps, an example is given for providing a deterministic interation order.