(Go: >> BACK << -|- >> HOME <<)

MurmurHash

This is an old revision of this page, as edited by 208.80.104.2 (talk) at 15:14, 10 September 2009 (→‎Usage and availability: added Hadoop. Note that this is yet another reliable, non-blog reference. In fact, it's a reference manual for a notable project.). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

MurmurHash is a non-cryptographic hash function created by Austin Appleby. [1]

The hash

MurmurHash 2.0 is a relatively simple algorithm, which compiles down to about 52 instruction on the x86. It is noted[2] for being exceptionally fast, often two to four times faster than comparable[3] algorithms such as FNV[4], Jenkins' lookup3[5][6] and Hsieh's SuperFastHash[7], with excellent distribution [8], avalanche behavior [9] and overall collision resistance [10].

A survey of hash functions[11] concluded that "Murmur2 is the only [one] of the complex hash functions that provides good performance for all kinds of keys. It can be recommended as a general-purpose hashing function."

Variations

The first version was MurmurHash 1.0, but it was superseded by 2.0, which Appleby says is "much faster & mixes better." [12]. All have been released into the public domain.

The fastest implementation[13] of MurmurHash2 reads four bytes at a time and generates canonical values only on a processor with little-endian byte order. For processors that require aligned reads[14] or have big-endian byte order[15], there are versions that work correctly but at approximately half speed. The algorithm generates a 32-bit hash value and is optimized for processors with 32-bit or larger registers.

There are also two versions[16] available which generate 64-bit hash values, suitable for differentiating among tens of thousands of items without false positives caused by the birthday paradox. The first, MurmurHash64A, is MurmurHash2 scaled up directly to 64 bits, for maximum performance on 64-bit processors. The second, MurmurHash64B, is designed for 32-bit processors, so it essentially runs a pair of MurmurHash2 algorithms in parallel and combines the results. Both of these generate different hash values from each other and MurmurHash2, and have the same byte order and alignment requirements, except that MurmurHash64A alignment is on 8-byte, not 4-byte, boundaries.

Usage and availability

Murmur was originally expressed in C++, and has been ported to a number of popular languages, including Python[17], C#[18], Perl[19], Java[20][21] and Delphi[22].

It has been adopted into libmemcached, which is the C driver for Memcached, is used by maatkit and in Hadoop, and may be found in hashing utilities such as Easy Hash.

See also