(Go: >> BACK << -|- >> HOME <<)

MurmurHash

This is an old revision of this page, as edited by OliverTwisted (talk | contribs) at 04:12, 9 April 2009 (trading templates). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

MurmurHash is a non-cryptographic hash function created by Austin Appleby. [1]

The hash

MurmurHash 2.0 is a relatively simple algorithm, which compiles down to about 52 instruction on the x86. It is noted[2] for being exceptionally fast, often two to four times faster than comparable[3] aglorithms such as FNV[4], Jenkins' lookup3[5][6] and Hsieh's SuperFastHash[7], with excellent distribution [8], avalanche behavior [9] and overall collision resistance [10]. A survey of hash functions[11] concluded that "Murmur2 is the only of the complex hash functions that provides good performance for all kinds of keys. It can be recommended as a general-purpose hashing function."

Variations

The first version was MurmurHash 1.0, but it was superceded by 2.0, which Appleby says is "much faster & mixes better." [12]. All have been released into the public domain.

The fastest implementation[13] reads four bytes at a time and generates canonical values only on a processor with Intel [little-endian] byte. For processors that require aligned reads[14] or have Motorola big-endian byte order[15], there are versions that work correctly but at approximately half speed. The algorithm generates a 32-bit hash value and is optimized for processors with 32-bit or larger registers.

Implementation

The algorithm was originally expressed in C++, and has been ported to a number of popular languages, including Python[16], C# and Java[17].

   //-----------------------------------------------------------------------------
   // MurmurHash2, by Austin Appleby
   
   // Note - This code makes a few assumptions about how your machine behaves -
   
   // 1. We can read a 4-byte value from any address without crashing
   // 2. sizeof(int) == 4
   
   // And it has a few limitations -
   
   // 1. It will not work incrementally.
   // 2. It will not produce the same results on little-endian and big-endian
   //    machines.
   unsigned int MurmurHash2 ( const void * key, int len, unsigned int seed )
   {
   	// 'm' and 'r' are mixing constants generated offline.
   	// They're not really 'magic', they just happen to work well.
   
   	const unsigned int m = 0x5bd1e995;
   	const int r = 24;
   
   	// Initialize the hash to a 'random' value
   
   	unsigned int h = seed ^ len;
   
   	// Mix 4 bytes at a time into the hash
   
   	const unsigned char * data = (const unsigned char *)key;
   
   	while(len >= 4)
   	{
   		unsigned int k = *(unsigned int *)data;
   
   		k *= m; 
   		k ^= k >> r; 
   		k *= m; 
   		
   		h *= m; 
   		h ^= k;
   
   		data += 4;
   		len -= 4;
   	}
   	
   	// Handle the last few bytes of the input array
   
   	switch(len)
   	{
   	case 3: h ^= data[2] << 16;
   	case 2: h ^= data[1] << 8;
   	case 1: h ^= data[0];
   	        h *= m;
   	};
   
   	// Do a few final mixes of the hash to ensure the last few
   	// bytes are well-incorporated.
   
   	h ^= h >> 13;
   	h *= m;
   	h ^= h >> 15;
   
   	return h;
   }