MurmurHash is a non-cryptographic hash function created by Austin Appleby. [1]
The hash
MurmurHash 2.0 is a relatively simple algorithm, which compiles down to about 52 instruction on the x86. It is noted[2] for being exceptionally fast, often two to four times faster than comparable[3] aglorithms such as FNV[4], Jenkins' lookup3[5][6] and Hsieh's SuperFastHash[7], with excellent distribution [8], avalanche behavior [9] and overall collision resistance [10]. A survey of hash functions[11] concluded that "Murmur2 is the only of the complex hash functions that provides good performance for all kinds of keys. It can be recommended as a general-purpose hashing function."
Variations
The first version was MurmurHash 1.0, but it was superceded by 2.0, which Appleby says is "much faster & mixes better." [12]. All have been released into the public domain.
The fastest implementation[13] reads four bytes at a time and generates canonical values only on a processor with Intel [little-endian] byte. For processors that require aligned reads[14] or have Motorola big-endian byte order[15], there are versions that work correctly but at approximately half speed. The algorithm generates a 32-bit hash value and is optimized for processors with 32-bit or larger registers.
Implementation
The algorithm was originally expressed in C++, and has been ported to a number of popular languages, including Python[16], C# and Java[17].
//----------------------------------------------------------------------------- // MurmurHash2, by Austin Appleby // Note - This code makes a few assumptions about how your machine behaves - // 1. We can read a 4-byte value from any address without crashing // 2. sizeof(int) == 4 // And it has a few limitations - // 1. It will not work incrementally. // 2. It will not produce the same results on little-endian and big-endian // machines. unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed ) { // 'm' and 'r' are mixing constants generated offline. // They're not really 'magic', they just happen to work well. const unsigned int m = 0x5bd1e995; const int r = 24; // Initialize the hash to a 'random' value unsigned int h = seed ^ len; // Mix 4 bytes at a time into the hash const unsigned char * data = (const unsigned char *)key; while(len >= 4) { unsigned int k = *(unsigned int *)data; k *= m; k ^= k >> r; k *= m; h *= m; h ^= k; data += 4; len -= 4; } // Handle the last few bytes of the input array switch(len) { case 3: h ^= data[2] << 16; case 2: h ^= data[1] << 8; case 1: h ^= data[0]; h *= m; }; // Do a few final mixes of the hash to ensure the last few // bytes are well-incorporated. h ^= h >> 13; h *= m; h ^= h >> 15; return h; }