MurmurHash

MurmurHash is a non-cryptographic hash function created by Austin Appleby. [1]

The hash

MurmurHash 2.0 is a relatively simple algorithm, which compiles down to about 52 instruction on the x86. It is noted[2] for being exceptionally fast, often two to four times faster than comparable[3] aglorithms such as FNV [4], Jenkins' lookup3 [5][6] and Hsieh's SuperFastHash [7], with excellent distribution [8], avalanche behavior [9] and overall collision resistance [10]. A survey of hash functions[11] concluded that "Murmur2 is the only of the complex hash functions that provides good performance for all kinds of keys. It can be recommended as a general-purpose hashing function."

Variations

The first version was MurmurHash 1.0, but it was superceded by 2.0, which Appleby says is "much faster & mixes better." [12]. All have been released into the public domain.

The fastest implementation[13] of MurmurHash2 reads four bytes at a time and generates canonical values only on a processor with Intel little-endian byte order. For processors that require aligned reads[14] or have Motorola big-endian byte order[15], there are versions that work correctly but at approximately half speed. The algorithm generates a 32-bit hash value and is optimized for processors with 32-bit or larger registers.

There are also two versions[16] available which generate 64-bit hash values, suitable for differentiating among tens of thousands of items without false positives caused by the birthday paradox. The first, MurmurHash64A, is MurmurHash2 scaled up directly to 64 bits, for maximum performance on 64-bit processors. The second, MurmurHash64B, is designed for 32-bit processors and essentially runs a pair of MurmurHash2 algorithms in parallel and combines the results. Both of these generate different hash values from each other and Murmur2, and have the same byte order and alignment requirements, except that MurmurHash64A alignment is on 8-byte, not 4-byte, boundaries.

Optimized MurmurHash2 implementation

The algorithm was originally expressed in C++, and has been ported to a number of popular languages, including Python[17], C#[18] and Java[19].

   //-----------------------------------------------------------------------------
   // MurmurHash2, by Austin Appleby
   
   // Note - This code makes a few assumptions about how your machine behaves -
   
   // 1. We can read a 4-byte value from any address without crashing
   // 2. sizeof(int) == 4
   
   // And it has a few limitations -
   
   // 1. It will not work incrementally.
   // 2. It will not produce the same results on little-endian and big-endian
   //    machines.
   unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
   {
   	// 'm' and 'r' are mixing constants generated offline.
   	// They're not really 'magic', they just happen to work well.
   
   	const unsigned int m = 0x5bd1e995;
   	const int r = 24;
   
   	// Initialize the hash to a 'random' value
   
   	unsigned int h = seed ^ len;
   
   	// Mix 4 bytes at a time into the hash
   
   	const unsigned char * data = (const unsigned char *)key;
   
   	while(len >= 4)
   	{
   		unsigned int k = *(unsigned int *)data;
   
   		k *= m; 
   		k ^= k >> r; 
   		k *= m; 
   		
   		h *= m; 
   		h ^= k;
   
   		data += 4;
   		len -= 4;
   	}
   	
   	// Handle the last few bytes of the input array
   
   	switch(len)
   	{
   	case 3: h ^= data[2] << 16;
   	case 2: h ^= data[1] << 8;
   	case 1: h ^= data[0];
   	        h *= m;
   	};
   
   	// Do a few final mixes of the hash to ensure the last few
   	// bytes are well-incorporated.
   
   	h ^= h >> 13;
   	h *= m;
   	h ^= h >> 15;
   
   	return h;
   }

MurmurHash64A/B

    typedef unsigned __int64 uint64_t;
    
    // 64-bit hash for 64-bit platforms
    
    uint64_t MurmurHash64A ( const void * key, int len, unsigned int seed )
    {
    	const uint64_t m = 0xc6a4a7935bd1e995;
    	const int r = 47;
    
    	uint64_t h = seed ^ len;
    
    	const uint64_t * data = (const uint64_t *)key;
    	const uint64_t * end = data + (len/8);
    
    	while(data != end)
    	{
    		uint64_t k = *data++;
    
    		k *= m;
    		k ^= k >> r;
    		k *= m;
    
    		h ^= k;
    		h *= m;
    	}
    
    	const unsigned char * data2 = (const unsigned char*)data;
    
    	switch(len & 7)
    	{
    	case 7: h ^= uint64_t(data2[6]) << 48;
    	case 6: h ^= uint64_t(data2[5]) << 40;
    	case 5: h ^= uint64_t(data2[4]) << 32;
    	case 4: h ^= uint64_t(data2[3]) << 24;
    	case 3: h ^= uint64_t(data2[2]) << 16;
    	case 2: h ^= uint64_t(data2[1]) << 8;
    	case 1: h ^= uint64_t(data2[0]);
    	        h *= m;
    	};
    
    	h ^= h >> r;
    	h *= m;
    	h ^= h >> r;
    
    	return h;
    }
    
    
    
    // 64-bit hash for 32-bit platforms
    
    uint64_t MurmurHash64B ( const void * key, int len, unsigned int seed )
    {
    	const unsigned int m = 0x5bd1e995;
    	const int r = 24;
    
    	unsigned int h1 = seed ^ len;
    	unsigned int h2 = 0;
    
    	const unsigned int * data = (const unsigned int *)key;
    
    	while(len >= 8)
    	{
    		unsigned int k1 = *data++;
    		k1 *= m; k1 ^= k1 >> r; k1 *= m;
    		h1 *= m; h1 ^= k1;
    		len -= 4;
    
    		unsigned int k2 = *data++;
    		k2 *= m; k2 ^= k2 >> r; k2 *= m;
    		h2 *= m; h2 ^= k2;
    		len -= 4;
    	}
    
    	if(len >= 4)
    	{
    		unsigned int k1 = *data++;
    		k1 *= m; k1 ^= k1 >> r; k1 *= m;
    		h1 *= m; h1 ^= k1;
    		len -= 4;
    	}
    
    	switch(len)
    	{
    	case 3: h2 ^= ((unsigned char*)data)[2] << 16;
    	case 2: h2 ^= ((unsigned char*)data)[1] << 8;
    	case 1: h2 ^= ((unsigned char*)data)[0];
    			h2 *= m;
    	};
    
    	h1 ^= h2 >> 18; h1 *= m;
    	h2 ^= h1 >> 22; h2 *= m;
    	h1 ^= h2 >> 17; h1 *= m;
    
    
    	uint64_t h = h1;
    
    	h = (h << 32) | h2;
    
    	return h;
    }

External links

Austin Appleby's MurmurHash page