Announcing the Microsoft.Toolkit HighPerformance 7.0 release

Windows Community Toolkit — background and logo by .NET Foundation

As a small recap on the HighPerformance package itself: this is a library part of the Windows Community Toolkit and targeting a number of different runtimes and profiles (ranging from .NET Standard 1.4 to .NET 5), that includes a collection of extensions and helper methods that can help when working on performance and memory usage critical applications. A brief introduction and the official docs are available in the MS docs website.

In this blog post I wanted to through some of the new features being included in this new version of the HighPerformance package, explain how they work and how they can be used to help in performance oriented scenarios 🚀

Memory2D<T> and Span2D<T> types

The Memory2D<T> and Span2D<T> types are meant to be a natural extension for Memory<T> and Span<T>, but working in 2 dimensions. They include several key features such as offering a flexible abstraction of discontiguous buffers due to their ability to slice 2D memory areas, meaning that each row might not be adjacent to the end of the previous one. They also do not share the same int.MaxValue limitation as standard BCL types — while the length of each axis is still restricted to int for consistency with .NET arrays, the total size is exposed as a nint value. All the internal indexing operations are performed using nint values too, so that on 64 bit architectures it is possible to index much larger memory areas than it was before.

Since one line of code is worth a thousand words, here is an example of how all these various APIs can be used to perform different operations:

These are just some of the features enabled by these new types, and there’s plenty more available APIs that can be used to easily work with these new types as well as to interoperate with the existing types from the BCL!

StringPool

We put this to the test with a benchmark parsing the taxi-fare-train.csv file, a 24MB CSV database from dotnet/machinelearning containing many duplicated text labels. The full benchmark gist can be found here, but the important bit is the difference in code to parse each string value:

Here StringPool will take care of transcoding the input UTF-8 sequence into UTF-16 (working on buffers rented from a pool), and then it will try to find an existing string instance matching that content. If it is found, its timestamp is updated and that instance is returned, otherwise a new one is created and added to the internal map. This is all completely transparent to users, that just need to change a single line in this case. This is just one of the available StringPool APIs — there are overloads taking a ReadOnlySpan<char> as input (and doing no transcoding), or a string (similarly to how string.Intern(string) works), and more. For now, let’s look at a benchmark to see how this change changed the performance and memory usage of our code (using both the two possible solutions displayed above):

We can see that the performance is just a bit worse when using Encoding.UTF8, which makes perfect sense considering both versions of the code still need to do the same transcoding from UTF-8 to UTF-16, but the StringPool one also needs to deal with the additional logic to update its internal data structures — despite this the difference is still just 8%. On the other hand, doing the decoding manually with Utf8.ToUtf16 lets us reduce the overhead and gives us better performance than the original version, even including the extra work needed to manage the pooling. And in both cases, if we look at the memory usage, we can see that using StringPool reduced the total allocations from 67 MB to just 512 B: this new implementation is using over 100,000x less memory! If we use the profiler to inspect the allocations, we can verify that this new version is only ever allocating the unique string instances one for each label we encountered, and then working with no additional allocations from that point on.

How does it work?

There are 3 main components that make up the StringPool type:

  • Each StringPool instance contains a number of buckets, that is dynamically calculated upon construction based on the maximum capacity of the pool. Each of these buckets is independent from the others and implements the actual LRU pool logic. The reason for these buckets is to provide a more fine-grained locking ability and reduce thread contention when multiple threads are trying to work on a given StringPool instance. Depending on the input sequence of characters being used to index the pool, a target bucket is selected and a lock is taken on just that bucket, so that other threads will be able to still work concurrently on other buckets. This is possible because each unique ReadOnlySpan<char> value will only map to a single bucket.
  • Each bucket contains a hashset, which is implemented using the standard approach of using hashcode-indexed buckets and chained sequences, but then specifically tine-tuned to work within the StringPool type. For instance, each entry also stores the index of the associated min-heap item (this is needed to perform the downheap operation when an existing map instance is accessed), and some operations have also been heavily optimized by leveraging how the StringPool type work.
  • Each bucket also contains a min-heap, implemented as a 0-indexed array. Each heap entry contains a timestamp, which is incremented every time a string instance is retrieved from the pool, and the index of the relative index in the chained list in the map.

As for the Memory2D<T> and Span2D<T> types, feel free to have a look at the source code if you’re interested in more details about all this!

Memory<T> casting

I originally had the idea for this feature after seeing this issue in the amazing library ImageSharp, and after adding (limited) support for this there I decided to also expand on this feature and make it part of the HighPerformance package, given the number of other developers that expressed interest in using this API as well. The usage is extremely straightforward and mirrors how you can use extensions to cast a Span<T> instance as well:

These extensions work by first retrieving the underlying data storage for the input Memory<T> instance (so that they can work on it directly for better efficiency), and then creating a custom MemoryManager<T> instance that is responsible for creating Span<T> instances of the target type when requested. There are three different types of custom memory managers (for string, T[] arrays or MemoryManager<T>), so that the extension can automatically select the right one and reduce overhead (as opposed to having a single one wrapping the entire input Memory<T> instance). Additionally, chained casts are optimized in such a way that if you attempt to cast a Memory<T> that itself was the result of a previous casting operation, the new cast will first retrieve the original data storage and wrap that one directly. This means that even if you try casting a Memory<T> instance multiple times, each resulting Memory<T> instance will only be using a single custom MemoryManager<T> instance to retrieve data, as opposed to just chaining all the previous instances as well for each cast operation.

IBufferWriter<byte>.AsStream() extension

As a practical example of where this new API could be used, here is a simple snippet of code that I found in a codebase I was recently browsing:

This code is simply compressing and decompressing data passed in a byte[] array, using the DeflateStream API. We can see how this code is not particularly efficient for a number of reasons, from the MemoryStream type not using array pooling to those final ToArray() calls to get the new array to return. Some of these issues could be worked around, such as replacing the ToArray() calls with GetBuffer(), and changing the return type to a Memory<byte> instead, but the end result would still not be ideal. For one, we have no way to transfer ownership of the returned buffer, so we have no way of making it a pooled array. Here is how we could rewrite the two methods above using some of the APIs in the HighPerformance package:

We can immediately see a number of differences in this second snippet. The return types have now been changed to IMemoryOwner<byte>, so we can transfer ownership and allow consumers to also dispose the buffer so that it can be returned to the pool. This means that we can amortize the memory allocations for repeated calls to this method. Similarly, we’re not using the ArrayPoolBufferWriter<T> type instead of MemoryStream, and turning it into a Stream to pass it to the DeflateStream APIs using this new AsStream() extension. This effectively lets us treat it as a pooled stream, so that as the DeflateStream type writes data to it, the internal buffer will just be expanded by renting a new larger array from the shared pool. This again lets us amortize the memory allocations over a larger number of calls. We also tweaked the input types to allow consumers to have more flexibility over the exact data source they use when invoking these two methods.

And here are some benchmarks to compare these two versions:

The pooled version is faster in both cases, especially when decompressing data, and more importantly it uses a fraction of the allocated memory every time! This is just an example of how all these APIs can be used together with the existing types and APIs from the BCL to write faster and more efficient code.

There’s more!

If you’re interested in learning more about the HighPerformance package, check out the docs in the .NET API browser, or the source code in the Windows Community Toolkit repo on GitHub! The new 7.0 releases many other new features, improvements and bug fixes that I couldn’t fit in a single post, and there’s more things planned for the future still! If you have any feedbacks or ideas, please feel free to open an issue as well!

Here’s a few useful links:

Happy coding! 🚀

.NET and UWP dev, Win Dev MVP, .NET Foundation member, author of the Microsoft.Toolkit Mvvm, HighPerformance, Diagnostics packages. C. Eng. master’s student.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store