Optimizing Memory Consumption with String Pools: Part III
By Tobias Gurock, March 2nd, 2009
In the first two posts, I introduced the basic concepts and technical details of string pools. Today I will show you how we implemented the string pool, integrated it into our SmartInspect Console and why the effort was worth it.
Implementation
I already wrote in the previous posting that I’m a big fan of hash tables. They have a great constant-time asymptotic run-time performance and consume only a reasonable amount of memory at the same time. A very rare combination. At first look, it’s not that obvious how you can achieve constant-time lookups, inserts and deletes on a set of items. Let me explain how this works. A hash table, in its simplest form, usually consists of:
- A table / array
The array is responsible for holding the items. The items are usually key/value pairs. For our string pool, we are only interested in the keys and can therefore ignore the values. The hash table can keep this array fixed in size or dynamically resize it depending on the current load factor (i.e. the ratio of the used array elements and the capacity of the array). - A hash function
The hash function is a function which calculates a value, the hash, for a given item. This hash is then mapped directly to the index in the array. - Collision lists
Since a hash table has only limited space but usually deals with an unlimited set of possible items, it cannot have a simple 1:1 relationship between items and the indices in the array. It may happen that two different items are mapped to the same index in the array. This is called a collision. Collisions are handled by allowing more than one item per array element, usually in the form of collision lists but sometimes also with collision trees or even inner hash tables.
For our string pool, we chose a configurable but fixed-size array and standard collision lists as its secondary storage. We decided against a variable-size array to avoid occasional large memory moves and the need for re-indexing the items. It also helped to keep the implementation simple. Picking a sufficiently-large array and, more importantly, a good hash function was therefore crucial:
“A good hash function is essential for good hash table performance. A poor choice of a hash function is likely to lead to clustering, in which probability of keys mapping to the same hash bucket (i.e. a collision) is significantly greater than would be expected from a random function. A nonzero collision probability is inevitable in any hash implementation, but usually the number of operations required to resolve a collision scales linearly with the number of keys mapping to the same bucket, so excess collisions will degrade performance significantly.”
I personally can recommend the djb2 hash function. We have used it with great success in several occasions over the years. It’s fast, easy to implement and provides a good hash distribution:
function HashOf(const S: String): Cardinal;
begin
Result := 5381;
for I := 1 to Length(S) do
begin
Result := ((Result shl 5) + Result) + Ord(S[I]);
end;
end;
Side note: I also experimented with other hash functions such as the Bob Jenkins hash or several SBox variants. I found that, although these functions are said to provide a better hash distribution as djb2, the djb2-based string pool outperformed all other string pool variants. In several tests with our particular string pool implementation and varying inputs and array sizes, the djb2 solution sometimes even won by a factor of two or three. Not sure why this happens actually. I will investigate this, maybe that’s something for another article.
So, when putting all these pieces together and combining it with the basic string pool algorithm I gave in the previous post, interning a string now roughly looks like this:
function InternString(const S: String): String;
var
LIndex: Integer;
P: ...
begin
{ Hash the string and map it to the array }
LIndex := IndexOf(S);
{ Lookup the record at the given array index }
P := Find(LIndex);
if not Assigned(P) then
begin
{ Allocate a new record and add it to the table }
...
end;
{ Return a reference to our interned string }
Result := P.Value;
end;
For the threading related aspects of the string pool, we opted for the thread-local solution. This may not be right decision for all applications but makes perfect sense for the Console. Because its string input and distribution are heavily thread-local as well, spreading the string pool over all application threads wouldn’t bring many benefits. It could possibly even decrease the overall application performance. We added support for thread-locality directly to the string pool by storing the internal hash table in thread-local-storage.
Integration
Since we chose the thread-local option, we needed to add one string pool to each application thread which calls InternString for the various strings it processes and loads. The Console uses a multi-threaded approach for its log clients, so we integrated the string pool into the main thread and into each of the listener classes for the clients (TcpListener and PipeListener for handling dynamic threads and clients via TCP/IP and named pipes, respectively). The integration consists of just a few lines of code for each thread pool instance to dynamically initialize and shutdown the string pool for the particular thread (required for setting and cleaning up the hash table array). The string pools are either cleared when a dynamic client thread exits, or when a user clears the log in the main thread.
Our string pool is plugged right into the reading and loading process of log packets. Every time a string property is read from a log stream, we route the result through InternString. It basically works as follows:
function TLogEntry.ReadFromStream(const AStream: TStream);
var
LHostNameLength: Integer;
begin
LHostNameLength := ...
...
{ Read the host name from the stream and get the equivalent string
from our string pool }
FHostName := InternString(ReadUtf8String(AStream, LHostNameLength));
...
end;
The following screenshot of the Console illustrates duplicates that are avoided with this technique. There are several additional strings we intern that aren’t obvious from the screenshot, but it should give you a good overview nonetheless (for example, the watches toolbox on the lower left corner of the screenshot displays only the latest name/value pair for a given watch, but the names and values can be logged and stored multiple times):
Results
Besides improving the memory consumption of the Console, one design goal of the string pool and integration was to keep the performance penalty of the changes to a minimum. We therefore tested the hash table with several different array sizes and chose a size that offers both a good overall performance and uses a reasonable amount of memory. It turned out that a relative large array with 216 elements would be a good fit.
The following table shows some benchmarks we conducted on my current development machine (an Intel 3GHz Core 2 Duo). Each row lists the interning of one million strings with varying contents. The time column specifies the time spent for the loop to intern the strings. The memory column specifies the memory consumption of the interned strings compared to the consumed memory without a string pool (not including the size of the fixed-size array).
| Time | Memory | |
|---|---|---|
| Random, length 1 to 100 | 702 ms | 103.47% |
| Random, fixed length of 25 | 639 ms | 106.89% |
| Numbers, all the same | 187 ms | 0.01% |
| Numbers, 10% different | 203 ms | 11.45% |
| Numbers, 50% different | 249 ms | 56.43% |
| Numbers, 90% different | 297 ms | 102.71% |
| Numbers, no duplicates | 312 ms | 112.67% |
As you can see, performance and memory consumption improve if you have more duplicates in your data. This is expected since more duplicates result in a smaller load factor of the hash table and less memory overhead. This also means that the memory consumption increases if you have less duplicates (which then leads to a decrease in performance because of the collisions and resulting overhead).
It’s therefore important to decide which strings should be interned. If a string has a small probability of being duplicated somewhere in your application or thread, it’s likely that interning such a string does more harm than good. Interning a few additional strings isn’t a lot of overhead of course, but if you unnecessarily intern thousands of thousands of strings, this may negatively influence your application performance and even slightly increase your memory usage. So the rule should be to only intern strings that are likely to be duplicated.
I think we found a very good trade-off for the Console. We expect that only 30-40% of strings in a typical log file are different, which means a reduction of memory consumption of around 50% for strings. But even in the worst case (a log file with no duplicates whatsoever), the overhead is not really noticeable. We are very pleased with these results and are happy that we took the time to implement and integrate the string pool into the SmartInspect Console. Optimizing and reducing the memory consumption of the Console means that customers are now able to open and analyze even larger log files, and we are already planning additional improvements to reduce the memory consumption even further.
Optimizing Memory Consumption with String Pools: Part II
By Tobias Gurock, February 17th, 2009
As promised in the first part of this series, today I will delve a bit more into the technical details of string pools.
From an abstract point of view, a string pool can be seen as a simple black box which, on a given input string, returns a string with the same value. That’s all you can expect from a string pool. In particular, this means that it provides no real upfront information about the relationship between the memory of the input string and the memory of the output string. It may return the same string object (or, to be more precise, a reference to the same memory location) as passed in but may also do something completely different such as returning a different existing string object or even allocating a new string. It only guarantees that will you receive a content-wise equivalent string:
![]()
And that’s the whole point of a string pool. By giving this minimal guarantee and letting you worry about the memory of the input string (in languages without garbage collection at least), it can drastically optimize the string management and allocation. In the best case (memory-wise), you may end up with an application with no duplicate strings whatsoever. Let’s see how a string pool can do this.
On a given string:
- it checks whether a string with the same value is already interned
- if so, it returns a reference to this interned string, thus reusing an existing string and memory area
- otherwise, it stores the passed string and returns the reference to it
That’s the whole magic if you can call it like that. There may be small variations here and there but that’s the usual way you would implement a string pool.
In languages with manual memory management, a string pool may give the additional guarantee of always returning a string that is unrelated to the input string (from a memory perspective). This drastically simplifies the memory management for your input string because you can always free it safely after you got your equivalent string from your string pool. A string pool may provide this guarantee by always creating a copy of the given string in step three instead of returning the original string reference:
“Software written to minimize memory usage can conserve memory by keeping all allocated strings in a hash table. If an already existing string is found a pointer to that string is returned; otherwise, a new string is allocated and added to the hash table.” (emphasize mine)
That’s all there is to it?
In an ideal world, the presented properties and guarantees of a string pool should be sufficient to make use of it. In real-world applications, though, you unfortunately need some additional tidbits of information. As we all know, abstractions are leaky and the string pool is no exception. In particular, we need to know:
- whether it can be safely used from multiple threads
- which factors influence its performance and behavior
Threading and thread-safety
Let’s begin with the threading issue. From a threading perspective, a string pool can basically be modeled in one of the following ways:
- Thread-unsafe
The simplest form of a string pool doesn’t pay any attention to things like threading, locking and race conditions. Although such a thread-unsafe implementation may look useless for a data structure which is apparently best used in an application-wide fashion, it has a few very interesting properties: it is easy to develop, scales perfectly and contains no threading-related overhead. - Thread-safe with locks
The most obvious implementation for a string pool with safe support for multi-threaded access. However, the most obvious solution is not always the best one. On the pro side, we have a straightforward and not too difficult implementation, and the support for thread-independence. On the con side, however, we have a huge performance and scalability bottleneck when used massively from multiple threads. - Thread-safe without locks
Also a thread-safe but much more scalable and unobtrusive solution that uses a lock-free data structure underneath. Lock-free string pools can safely be used from multiple threads, are reasonable fast and sufficiently scalable. The problem is that they are very difficult to implement correctly without introducing obscure hidden bugs, something even experts cannot get right every time.
So, if you want to use a single string pool from multiple threads at the same time, you are essentially stuck with the most difficult option that is really easy to mess up. In this case, you can hopefully reuse some existing and proven string pool implementation and don’t need to write your own (I suspect, or hope, that String.Intern for .NET and String.intern for Java are implemented that way). But what if you wouldn’t need the support for an application-wide string pool? In this case, we may take a look at an additional fourth option:
- Thread-local
Inherently thread-safe by managing a different string set for each thread. This string pool type has the same positive properties as the thread-unsafe string pool: it’s fast and reasonable easy to implement. Managing string sets on a per-thread basis may, for instance, be implemented by adding some kind of thread-locale-storage support to the thread-unsafe approach.
Thread-local string pools are the perfect alternative if your string distribution is heavily thread-local as well or if you do not have access to a thread-safe lock-free string pool. Otherwise, the thread-safe lock-free option should be your string pool of choice.
Performance and scalability
As we have seen, the supported thread model is not only an important factor for the thread-safety of a string pool, but also influences its performance and scalability. In addition to this threading related performance, it’s also important to know the expected runtime performance from a string pool in isolation (i.e. without taking threading related aspects into consideration). In particular, we need to know whether interning a string is a constant-time operation or somehow depends on the number of previously interned strings. You definitely want to avoid the latter as it’s getting slower and slower the more strings you add to the pool (logarithmic scale may be acceptable depending on your amount of strings, but linear or even worse isn’t):

If you recall from the introduction of this post, a string pool has two internal operations for finding and adding string entries. These two operations determine the overall performance of the string pool and if one of the operations is slow, this means that the whole interning is slow. So, essentially, we need a data structure which can represent both operations efficiently and in constant time. One data structure which meets these requirements is the hash table:
“Hash tables support the efficient lookup, insertion and deletion of elements in constant time on average (O(1) in the number of accessed memory cells); this bound is independent of the number of elements stored [...]. These operations are all performed in amortized constant time, which makes maintaining and accessing a huge hash table very efficient.”
A hash table is something like the Swiss knife for a developer. If you need a data structure with an excellent average performance for inserts, deletes and lookups, the hash table is definitely the way to go.
In the last part of this series, I will explain how a hash table is usually build and what to consider when designing a hash table for a string pool. I will also reveal which thread model we chose for our own string pool implementation for the SmartInspect Console (Delphi doesn’t have a built-in string pool), how we implemented it and why integrating it into the Console was well worth it.
Optimizing Memory Consumption with String Pools: Part I
By Tobias Gurock, February 9th, 2009
While working on the latest version of SmartInspect, we brainstormed on how to reduce the memory consumption of the SmartInspect Console. The background for this is that we recently got one or two reports of Out Of Memory exceptions when loading really huge log files into the Console (huge begins at several 100MB depending on the installed amount of memory). While there’s a good solution for this problem (splitting log files, either by enabling log file rotation in the logging library or by using our splitting tool), it would be nicer if the Console could handle bigger log files by itself.
Before I go into details explaining what we’ve already done and what we are hoping to do in the future to improve the memory behavior of the Console, let me first give you some background information about SmartInspect log files and how they are represented in memory. The format of a SmartInspect log file is a simple binary format which is optimized for performance and size. It’s basically a sequence of packets each consisting of a simple packet header and packet body. The header describes its type and size and the packet body contains the actual logging data (if you are interested in all the gory details, see SmartInspect Log Formats and Protocols). While it’s not a super flexible file format, it can be written and read extremely efficiently and stores the data in a very compact way. It’s so compact that the Console needs around three to four times the memory of the log file size.
Here’s why. In a log file, the data is stored sequentially and nearly every byte of data in the log file on disk maps to one byte of real information. There is practically no overhead involved. This is completely different from the memory representation in the Console where:
- log packets are strongly-typed objects with the usual overhead associated with them
- strings and attached log entry data are accessed indirectly through pointers
- packets have several additional object fields to implement some Console features efficiently
- the packets require some additional management structures such as lists and queues
We’ve already reduced the memory usage of the Console by implementing several lazy loading mechanisms and by choosing smaller data types wherever possible (2 instead of 4 byte integers). We also drastically improved the log file loading process by avoiding temporary memory peeks. However, the real issue which we haven’t adressed yet are strings. One big problem is that strings are stored in log files with the UTF-8 Unicode encoding, but use the standard Windows UTF-16 encoding in the Console. This results in using twice as much as memory as needed in most cases, plus the additional management overhead that is always associated with strings. Here’s an illustration of how a string looks like in a SmartInspect log file and how the same string is represented in the Console’s memory (the Console is written in Delphi, but a string has a similar or even worse overhead in Java or .NET):
![]()
If you have a string variable somewhere in a Delphi application, it’s basically just a pointer to its first character similar to char pointers in C (actually, its a pointer to the first Unicode code point, but let’s ignore the whole surrogate thing for a moment). Prepended to this array there’s a header record storing the reference count and the string length. Compared to how a string is stored in the log file, the memory representation has the overhead of the pointer, the reference count, an additional null character for C/Windows API compability and the different string encoding. All combined, a string represented in memory uses more than twice as much as space. Here’s certainly some room for improvement.
An obvious and quite effective solution would be to change the string encoding and use UTF-8 the same way it’s used in log files. But this would have a negative impact on performance as we would have to convert a string to UTF-16 every time we want to display it (all Windows APIs expect strings to be encoded with UTF-16). We’ve tested this and while it looked like a viable solution, we didn’t implement it because we came up with a better idea: a string pool.
Log files have the property that, although they usually contain thousands of strings, a large percentage of these strings are actually the same. Besides the actual data passed to log methods (such as log messages, variable names and values), the libraries also collect some context information such as the host name, application name and session name (sessions can be thought of as log categories). These strings never or rarely change inside a log, so we end up with thousands and thousands of copies of the same strings. Now the idea is to implement some kind of abstraction, the string pool, that makes it easy to avoid duplicate strings by storing them in a central location and reusing them. Implemented correctly, this has the potential of reducing the memory usage in the Console by so much that it uses less space than required for an (uncompressed) log file on disk.
The idea is actually quite simple and the concept is nothing new (it’s usually called string interning and is supported out-of-the-box by several modern programming environments, including Java and .NET):
“In computer science, string interning is a method of storing only one copy of each distinct string value, which must be immutable. Interning strings makes some string processing tasks more time- or space-efficient at the cost of requiring more time when the string is created or interned. The distinct values are stored in a string intern pool. String interning is supported by some modern object-oriented programming languages, including Python, Ruby (with its symbols), Java and .NET languages.”
Although string interning/pooling is pretty much a known secret, we haven’t seen it used with Delphi yet and it’s a great solution for a not so untypical problem. Although the Console might be a special case and would benefit more from string interning than other applications, it’s a very useful concept every developer should be familiar with and add to his toolbox. That’s why you shouldn’t miss part II and part III of this blog series which will introduce the technical details of string pools, explain the several different usage scenarios you can optimize for and give a detailed implementation guide.

