~/posts/small_string_optimisation

Many years ago I left a `TODO` comment in `anton/string.hpp` telling me to eventually implement the Small String Optimisation (SSO). Well, took me years, but finally I had a university assignment I did not particularly want to do, so I decided to instead implement the SSO.

The theory of the SSO

Containers managing dynamic memory must store 3 things: the pointer to the memory (data pointer), the number of elements (size) and the maximum number of elements we are able to store without reallocation (capacity). The latter two may be stored as either numbers or pointers. For our purposes, it is easier to store them as numbers, but with aligned allocations we could also use pointers.

The three members may be stored in any order (the layout may or may not affect performance). We will store the cpacity first, then the size and finally the data pointer.

```cpp i64 capacity; i64 size; char8* data; ```

The total size of those is 24 bytes on a system where `sizeof(void*)` is 8 bytes. This will be our "large" layout.

24 bytes is a significant amount of memory we could use to store a string without allocating any additional memory. What we need is a way to indicate that the string is "small" or not, the size of the stored string and a null-terminator. We may store the small flag in a single bit, the remaining 7 bits of the byte will store the size of the string and we always devote one byte to the null-terminator. That leaves us with 22 bytes of memory for storing the actual contents of the string.

```cpp u8 small: 1; u8 size: 7; char8 buffer[23]; ```

If we put both layouts in a union, we are almost done with the SSO.

```cpp union { struct Large { i64 capacity; i64 size; char8* data; } large; struct Small { u8 small: 1; u8 size: 7; char8 buffer[sizeof(Large) - 1]; } small; }; ```

There is one caveat here, namely how do we preserve the small flag when the string is large? We simply adjust the capacity to always be a multiple of 2. This way the lowest bit, corresponding to the small flag, will always be 0.

What remains is the implementation of the functions exposed by the string. Whenever the string shrinks to our SSO capacity, we turn it into a small string. We do the opposite, that is turn it into a large string, when we exceed the SSO capcity.

Performance results

I also want to test whether adding additional 8 bytes, for a total of 32 bytes, has any benefit in terms of speed.

|#: tab:sso-results | |!: The results of running vushc on a 75MB generated test | |!: file. | |----------------|----------|-----------------------|---------| | |*: Memory |*: Time | Speedup | |----------------|----------|-----------------------|---------| |*>: No SSO | 4608MB | 5.420s ± 0.08s | x1.00 | |*>: 24 byte SSO | 4265MB | 4.832s ± 0.09s | x1.11 | |*>: 32 byte SSO | 4319MB | 4.833s ± 0.03s | x1.11 | |----------------|----------|-----------------------|---------|

As may be seen, the SSO does indeed save memory and speed up the overall program execution despite introducing additional branches in all of string's functions. In this particular test case the savings amount to approximately 350MB (7.5%) and 600ms (11%), which are significant numbers. The 32 byte SSO performs roughly similarly, which is due to the fact that the identifier in this particular file all fit in the 22 byte buffer.

|#: tab:sso-results | |!: The results of running vushc on a 91MB | |!: generated test file with identifiers exceeding | |!: 22 bytes. | |----------------|----------|-----------------------| | |*: Memory |*: Time | |----------------|----------|-----------------------| |*>: 24 byte SSO | 4309MB | 5.108s ± 0.09s | |*>: 32 byte SSO | 4336MB | 5.108s ± 0.09s | |----------------|----------|-----------------------|

When run on a similar 91MB file with roughly half the identifiers larger than 22 bytes, there does not appear to be any difference in the performance.

Notes

- The results are based on Vush (41e7dbe7)[codeberg.org/kociap/vush/commit/41e7dbe75b745a65c7b232009935a3fdde3d182a]. - The SSO has been implemented in anton_core [10597d87](codeberg.org/kociap/anton_core/commit/10597d871321f89b8962d1b7fed2d09b872fea0f).