When we present information about the system to the users, we want to hide the system's inner workings. But, sometimes, it is necessary to give the user some of the details like:
Username - Unique user-defined string (or sometimes, e.g., system-defined integer) that connects the system's inner workings to the specific user.
- The user has to know it to log in.
OrderId - Unique system-defined id that connects the system's inner workings to the specific order.
- Contacting support, order tracking, etc.
Url - Unique system defined string (can be a date, int, string - but always presented as a string in browser) - that connects the inner workings of the system to the specific web page
- Necessary for modern SEO.
One can argue that every part of the system is open for possible exploitation, but unique value-based inputs are even more vulnerable.
Local courier with unnecessary sequential Id in the URL https://www.dexpress.rs/rs/pracenje-posiljaka/PD0000193715. The name and address are partially hidden. Hundreds of thousands of people with starting address Bul(evar) - boulevard, and probably thousands of people with Pav starting surname. However, it was easy to pinpoint my house and me at my old address in a small street.
From the system's perspective - when a user has to know something uniquely identifiable, a standard solution is to assign an additional unique value - the secondary id to the system's primary identifier - the primary id of that entity. Problem with this approach - you have two unique values per entity. Both values are essential - users will search your data with the secondary id, but the system will use the primary id. And you can have only one clustered index. Sometimes this is a norm; for example, you will present to your users their usernames that are unique to them, but the system will not use that data for inner workings unless you are Sony PlayStation Network. For SEO-optimized websites, this is a standard as well. Google (and "other" search engines) consider URLs a significant part of algorithm calculation for keyword relevance - it is much better to have a descriptive URL than an URL with a generic Id.
Using natural keys outside of the system is not a good idea either. For example, we use Unique Master Citizen Number (aka JMBG) in the former SFRY. And I have seen instances where they use Unique Master Citizen Number in the URL. What will you do with the (generic URL) logs and the GDPR?
Using primary Id outside the scope of the system has its disadvantages. The system is more open to hacking. A malicious call can only attack the system part that converts from secondary id to primary id with only secondary id known.
One quick and easy alternative is to use guid for secondary id. Guid is genuinely random - the chance of guessing guid in the system is minimal. It is reasonably small (16 bytes), at least compared to characters, but (even I have to admit that) something like OrderId: 1a82927a-aeab-4e5b-9929-0e55f3714e10 does not look that nice; it's hard to read, write, and slow to say aloud.
Library: HashIds
Hashids is an open-source library for generating unique, non-sequential ids from integers. It is a small code with no dependencies on external libraries. The library is available in more than 40 languages.
The basic principle behind the library is simple:
- Create a secret string (salt), and store it
- Use secret to convert (encode) - the integer number to Base62 (0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz) message
- Present the user with the encoded message, not an integer
- Use message with salt and convert(decode) it back to integer.
.Net implementation is straightforward. It is possible to hash a single integer and the array of integers.
Simple example with a single integer
int exampleId = 1234567890;
string salt = "the original salt";
int minHashLength = 8;
Hashids hashids = new Hashids(salt, minHashLength);
string resultEncode = hashids.Encode(exampleId);
int[] resultDecodeArray = hashids.Decode(resultEncode);
Console.WriteLine($"Start value is {exampleId}");
Console.WriteLine($"Encode Value is {resultEncode}");
Console.WriteLine($"Decode Value is {string.Join(", ", resultDecodeArray.Select(x=>x.ToString()))}");
Console.WriteLine();
with the result
Encode Value is qdoXRBg9
Decode Value is 1234567890
and the example with a list of integers
var exampleIDs = new int[] { 1, 2, 3, exampleId };
string resultEncodeB = hashids.Encode(exampleIDs);
int[] resultDecodeArrayB = hashids.Decode(resultEncodeB);
Console.WriteLine($"Start values are {string.Join(", ", exampleIDs.Select(x=>x.ToString()))}");
Console.WriteLine($"Encode Value is {resultEncodeB}");
Console.WriteLine($"Decode Value are {string.Join(", ", resultDecodeArrayB.Select(x=>x.ToString()))}");
and the results are
Start values are 1, 2, 3, 1234567890
Encode Value is 5BtDIkUa0Xgw4
Decode Value are 1, 2, 3, 1234567890
Not a Hash function
HashIds is not a Hash Function! HashIds are a type of Substitution Cipher.
But I'm guessing SubstitionCipherIds is a lot less brandable than HashIds.
No curse words (in theory)
The algorithm has an interesting approach to avoiding (English) curse word generation by simply not allowing some letters next to each other.
c, s, f, h, u, i, t
You can do this manually with separators (last parameter).
Optimizing
If necessary, to make them more practical for reading, writing, and saying aloud, it is possible to limit encrypted message characters, e.g., just digits + uppercase letters. This would make messages longer but easier to work with for humans without a computer (e.g., copy/paste).
Security and Cryptanalysis
Hashids is not a "true encryption algorithm" and should not be used for encryption. This is more of a "masking tool" that can help with security. If you need encryption - use a proper, modern encryption algorithm.
Check Christopher Riley's cryptanalysis for a more interesting approach than an elemental brute force attack.
Library: AspNetCore.Hashids
AspNetCore.Hashids is an addon library for Hashids that simplifies using Hashids with Asp.Net Core. You can automate encoding and decode on the API level without direct manipulation. The main advantage is complete isolation from business logic.
For straightforward solutions, this works fantastic. However, do you want to define your IDs in the Asp.Net Core API Project for more complex solutions, not on a lower layer?
Conclusions
The Solution to all problems?
No. Not really.
HashIds only work on integers. It was a decision made by the team to discourage people from using HashIds for encryption. However, even if it could, how long encoded result would have to be for letters (26x2) + digits(10)? The result of a lower to higher base conversion is fewer characters, so a shorter encoded message. But if we use 62 chars to 62 chars, the encoded message would need to be at least the same length (there are rules for curse words). Not that practical.
Conversion from guid is also a problem. It is possible to convert guid to number, but a 128bit number, or in the case of c# two longs. Possible, but again not practical.
A perfect solution to some problems
HashIds is very good for a particular scenario:
You have sequential numbers, and you want to hide the order from users without storing unnecessary data.
Great tool for a specific purpose.
Links
- Hashids - Official Site
- GitHub Hashids for .Net - 2.2k Stars (only for .Net version)
- Hashids Nuget - Total Downloads: 2.9M