(Go: >> BACK << -|- >> HOME <<)

MurmurHash: Difference between revisions

Content deleted Content added
Diablo-D3 (talk | contribs)
m Changed Java reference to Derek Young's, much easier to reuse than Hadoop's
→‎Variants: logical order
 
(261 intermediate revisions by more than 100 users not shown)
Line 1:
{{Short description|Computer function}}
'''MurmurHash''' is a non-[[Cryptographic hash function|cryptographic]] [[hash function]] suitable for general hash-based lookup.<ref name="Hadoop">[http://hadoop.apache.org/hbase/docs/r0.19.1/api/org/apache/hadoop/hbase/util/MurmurHash.html Hadoop in Java]</ref><ref>[http://laboratorios.fi.uba.ar/lsi/chouza-tesisingenieriainformatica.pdf Chouza et al].</ref><ref>[http://www.inesc-id.pt/ficheiros/publicacoes/5453.pdf Coceiro et al.]</ref> It was created by Austin Appleby,<ref>[http://murmurhash.googlepages.com/ MurmurHash on GooglePages]</ref> and exists in a number of variants, all of which have been released into the public domain.
{{Use dmy dates|date=January 2021}}
 
'''MurmurHash''' is a [[non-cryptographic hash function]] suitable for general hash-based lookup.<ref name="Hadoop">{{cite web |url=http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |title=Hadoop in Java |publisher=Hbase.apache.org |date=24 July 2011 |accessdate=13 January 2012 |url-status=dead |archiveurl=https://web.archive.org/web/20120112023407/http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |archivedate=12 January 2012}}
</ref><ref>
[http://materias.fi.uba.ar/7500/chouza-tesisingenieriainformatica.pdf Chouza et al].
</ref><ref>
{{cite web|url=http://www.inesc-id.pt/ficheiros/publicacoes/5453.pdf |title=Couceiro et al. |language=pt |accessdate=13 January 2012
|page= 14
}}
</ref>
It was created by Austin Appleby in 2008<ref>{{cite web|author=Tanjent (tanjent) wrote,3 March 2008 13:31:00 |url=http://tanjent.livejournal.com/756623.html |title=MurmurHash first announcement |publisher=Tanjent.livejournal.com |accessdate=13 January 2012}}</ref> and is currently hosted on GitHub along with its test suite named 'SMHasher'. It also exists in a number of variants,<ref name="Murmur160">{{cite web|url=http://simonhf.wordpress.com/2010/09/25/murmurhash160/ |title=MurmurHash2-160 |publisher=Simonhf.wordpress.com |date=25 September 2010 |accessdate=13 January 2012}}</ref> all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.
 
Unlike [[cryptographic hash function]]s, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.
 
==Variants==
 
===MurmurHash1===
The current version is MurmurHash2, which yields a 32-bit hash value and is optimized for Intel processors. Slower versions of MurmurHash2 are available for little-endian and aligned-only machines. The MurmurHash2A variant adds the [[Merkle-Damgard construction]] so that it can be called incrementally. There are two variants which generate 64-bit values; MurmurHash64A, which is optimized for 64-bit processors, and MurmurHash64B, for 32-bit ones. MurmurHash1 is obsolete.
The original MurmurHash was created as an attempt to make a faster function than [[Jenkins hash function#lookup3|Lookup3]].<ref>{{cite web |title=MurmurHash1 |url=https://github.com/aappleby/smhasher/wiki/MurmurHash1 |accessdate=12 January 2019}}</ref> Although successful, it hadn't been tested thoroughly and wasn't capable of providing 64-bit hashes as in Lookup3. It had a rather elegant design, that would be later built upon in MurmurHash2, combining a multiplicative hash (similar to [[Fowler–Noll–Vo hash function]]) with a [[Xorshift]].
 
===MurmurHash2===
MurmurHash2<ref>{{cite web|title=MurmurHash2 on Github|url=https://github.com/aappleby/smhasher/blob/master/src/MurmurHash2.cpp}}</ref> yields a 32-bit or 64-bit value. It came in multiple variants, including some that allowed incremental hashing and aligned or neutral versions.
 
* MurmurHash2 (32-bit, x86) - The original version; contains a flaw that weakens collision in some cases.<ref>{{cite web |title=MurmurHash2Flaw |url=https://github.com/aappleby/smhasher/wiki/MurmurHash2Flaw |accessdate=15 January 2019}}</ref>
* MurmurHash2A (32-bit, x86) - A fixed variant using [[Merkle–Damgård construction]]. Slightly slower.
* CMurmurHash2A (32-bit, x86) - MurmurHash2A but works incrementally.
* MurmurHashNeutral2 (32-bit, x86) - Slower, but endian and alignment neutral.
* MurmurHashAligned2 (32-bit, x86) - Slower, but does aligned reads (safer on some platforms).
* MurmurHash64A (64-bit, x64) - The original 64-bit version. Optimized for 64-bit arithmetic.
* MurmurHash64B (64-bit, x86) - A 64-bit version optimized for 32-bit platforms. It is not a true 64-bit hash due to insufficient mixing of the stripes.<ref>{{cite web |title=MurmurHash3 (see note on MurmurHash2_x86_64) |url=https://github.com/aappleby/smhasher/wiki/MurmurHash3#bulk-speed-test-hashing-an-8-byte-aligned-256k-block |accessdate=15 January 2019}}</ref>
 
The person who originally found the flaw{{what|date=July 2021}} in MurmurHash2 created an unofficial 160-bit version of MurmurHash2 called MurmurHash2_160.<ref>{{cite web |title=MurmurHash2_160 |url=https://simonhf.wordpress.com/2010/09/25/murmurhash160/ |accessdate=12 January 2019}}</ref>
 
===MurmurHash3===
The current version is MurmurHash3,<ref>{{cite web|title=MurmurHash3 on Github|url=https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp}}</ref><ref name="Horvath">{{cite web | first = Adam | last = Horvath | url = http://blog.teamleadnet.com/2012/08/murmurhash3-ultra-fast-hash-algorithm.html | title = MurMurHash3, an ultra fast hash algorithm for C# / .NET | date = 10 August 2012 }}</ref> which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher—a hash function test suite.
 
==Implementations==
The canonical implementation is in [[C++]], but there are efficient ports for a variety of popular languages, including [[Python (programming language)|Python]],<ref>{{cite web|url=https://code.google.com/p/pyfasthash/ |title=pyfasthash in Python |accessdate=13 January 2012}}</ref> [[C (programming language)|C]],<ref>{{cite web|url=https://github.com/wolkykim/qlibc |title=C implementation in qLibc by Seungyoung Kim}}</ref> [[Go (programming language)|Go]],<ref>{{cite web|url=https://github.com/spaolacci/murmur3 |title=murmur3 in Go}}</ref> [[C Sharp (programming language)|C#]],<ref name="Horvath"/><ref>{{cite web|last=Landman |first=Davy |url=http://landman-code.blogspot.com/2009/02/c-superfasthash-and-murmurhash2.html |title=Davy Landman in C# |publisher=Landman-code.blogspot.com |accessdate=13 January 2012}}</ref> [[D (programming language)|D]],<ref>{{Cite web|url=https://dlang.org/phobos/std_digest_murmurhash.html|title=std.digest.murmurhash - D Programming Language|website=dlang.org|access-date=2016-11-05}}</ref> [https://github.com/tkaemming/lua-murmurhash3 Lua], [[Perl]],<ref>{{cite web|url=http://metacpan.org/module/Digest::MurmurHash |title=Toru Maesaka in Perl |publisher=metacpan.org |accessdate=13 January 2012}}</ref> [[Ruby (programming language)|Ruby]],<ref>{{cite web|author=Yuki Kurihara |url=https://github.com/ksss/digest-murmurhash |title=Digest::MurmurHash |publisher=GitHub.com |date=16 October 2014 |accessdate=18 March 2015}}</ref> [[Rust (programming language)|Rust]],<ref>{{Cite web|title = stusmall/murmur3|url = https://github.com/stusmall/murmur3|website = GitHub|accessdate = 2015-11-29}}</ref> [[PHP]],<ref>{{cite web|url=https://github.com/lastguest/murmurhash-php |title=PHP userland implementation of MurmurHash3 |publisher=github.com |accessdate=18 December 2017}}</ref><ref>{{cite web|url=https://php.watch/versions/8.1/MurmurHash3 |title=PHP 8.1 with MurmurHash3 support}}</ref> [[Common Lisp]],<ref>{{cite web|url=https://bitbucket.org/tarballs_are_good/murmurhash3|title=tarballs_are_good / murmurhash3|accessdate=7 February 2015}}</ref> [[Haskell (programming language)|Haskell]],<ref>{{cite web|url=http://hackage.haskell.org/package/murmur-hash |title=Haskell |publisher=Hackage.haskell.org |accessdate=13 January 2012}}</ref> [[Elm (programming language)|Elm]],<ref>{{cite web|url=https://package.elm-lang.org/packages/Skinney/murmur3/latest/ |title=Elm |publisher=package.elm-lang.org |accessdate=12 June 2019}}</ref> [[Clojure]],<ref>{{cite web|url=https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Murmur3.java|title=Murmur3.java in Clojure source code on Github|publisher=clojure.org|accessdate=2014-03-11}}</ref> [[Scala (programming language)|Scala]],<ref>{{cite web|url=https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/hashing/MurmurHash3.scala|title=Scala standard library implementation|date=26 September 2014}}</ref> [[Java (programming language)|Java]],<ref>[https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/hash/Hashing.html Murmur3], part of Guava</ref><ref>{{cite web|url=https://github.com/greenrobot/greenrobot-common/blob/master/hash-functions.md|title=Murmur3A and Murmur3F Java classes on Github|publisher=greenrobot|accessdate=5 November 2014}}</ref> [[Erlang (programming language)|Erlang]],<ref>{{cite web|url=https://github.com/bipthelin/murmerl3|title=bipthelin/murmerl3|work=GitHub|accessdate=21 October 2015}}</ref> [[Swift (programming language)|Swift]],<ref>{{cite web|author=Daisuke T |url=https://github.com/daisuke-t-jp/MurmurHash-Swift |title=MurmurHash-Swift |publisher=GitHub.com |date=7 February 2019 |accessdate=10 February 2019}}</ref> [[Object Pascal]],<ref>[https://github.com/Xor-el/HashLib4Pascal GitHub - Xor-el/HashLib4Pascal: Hashing for Modern Object Pascal<!-- Bot generated title -->]</ref> [[Kotlin (programming language)|Kotlin]],<ref>{{cite web|url=https://github.com/goncalossilva/kotlinx-murmurhash |title=goncalossilva/kotlinx-murmurhash |publisher=GitHub.com |date=10 December 2021 |accessdate=14 December 2021}}</ref> [[JavaScript]].<ref>{{cite web|author=raycmorgan (owner) |url=https://gist.github.com/588423 |title=Javascript implementation by Ray Morgan |publisher=Gist.github.com |accessdate=13 January 2012}}</ref> and [[OCaml]], <ref>{{cite web|author=INRIA |url=https://github.com/ocaml/ocaml/blob/trunk/runtime/hash.c |title=OCaml Source |publisher=GitHub.com}}</ref>
 
It has been adopted into a number of open-source projects, most notably [[libstdc++]] (ver 4.6), [[nginx]] (ver 1.0.1),<ref>{{cite web|url=http://nginx.org/en/CHANGES |title=nginx |accessdate=13 January 2012}}</ref> [[Rubinius]],<ref>{{cite web|url=https://github.com/rubinius/rubinius/commit/1d69526c484cc9435a7198e41b8995db6c3acf1a |title=Rubinius |accessdate=29 February 2012}}</ref> libmemcached (the [[C (programming language)|C]] driver for [[Memcached]]),<ref>{{cite web|url=http://libmemcached.org/libMemcached.html|title=libMemcached|work=libmemcached.org|accessdate=21 October 2015}}</ref> [[Npm (software)|npm]] (nodejs package manager),<ref>{{cite web|url=https://github.com/npm/write-file-atomic/commit/22dd8759076fc8d0327d50693283060e479afafc |title=switch from MD5 to murmur}}</ref> [[maatkit]],<ref>{{cite web|url=https://code.google.com/p/maatkit/source/detail?r=3273 |title=maatkit |date=24 March 2009 |accessdate=13 January 2012}}</ref> [[Hadoop]],<ref name="Hadoop"/> Kyoto Cabinet,<ref>{{cite web|url=http://fallabs.com/kyotocabinet/spex.html |title=Kyoto Cabinet specification |publisher=Fallabs.com |date=4 March 2011 |accessdate=13 January 2012}}</ref> [[Apache Cassandra|Cassandra]],<ref>{{cite web|url=http://wiki.apache.org/cassandra/Partitioners|title=Partitioners|publisher=apache.org |date=2013-11-15|accessdate=2013-12-19}}</ref><ref>{{cite web |url=https://www.youtube.com/watch?v=d7o6a75sfY0 |title=Introduction to Apache Cassandra™ + What’s New in 4.0 by Patrick McFadin. DataStax Presents |work=YouTube |date=April 10, 2019}}</ref> [[Apache Solr|Solr]],<ref>{{cite web|url=http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/codecs/bloom/MurmurHash2.html|title=Solr MurmurHash2 Javadoc}}</ref> [[Vowpal Wabbit|vowpal wabbit]],<ref>{{cite web|url=https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/explore/hash.h|title=hash.cc in vowpalwabbit source code}}</ref> [[Elasticsearch]],<ref>{{cite web|url=https://www.elastic.co/guide/en/elasticsearch/reference/2.0/breaking_20_crud_and_routing_changes.html#_routing_hash_function|title=Elasticsearch 2.0 - CRUD and routing changes}}</ref> [[Google Guava|Guava]],<ref>{{cite web|url=https://github.com/google/guava/blob/master/guava/src/com/google/common/hash/Hashing.java|title=Guava Hashing.java}}</ref> [[Apache Kafka|Kafka]]<ref>{{cite web|url=https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java|title=Kafka DefaultPartitioner.java}}</ref> and [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/vdo-integration RedHat Virtual Data Optimizer (VDO)].<ref>Virtual Data Optimizer [https://github.com/dm-vdo/kvdo source code]</ref>
 
==Vulnerabilities==
Hash functions can be vulnerable to [[collision attack]]s, where a user can choose input data in such a way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and [[Daniel J. Bernstein]] were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called [[HashDoS]] attacks.<ref>{{cite web|url=https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/|title=Breaking Murmur: Hash-flooding DoS Reloaded}}</ref> With the use of [[differential cryptanalysis]] they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend to use their own [[SipHash]] instead.
 
==Algorithm==
'''algorithm''' Murmur3_32 '''is'''
''// Note: In this version, all arithmetic is performed with unsigned 32-bit integers.''
''// In the case of overflow, the result is reduced modulo {{math|2<sup>32</sup>}}.''
'''input:''' ''key'', ''len'', ''seed''
c1 ← 0xcc9e2d51
c2 ← 0x1b873593
r1 ← 15
r2 ← 13
m ← 5
n ← 0xe6546b64
hash ← ''seed''
'''for each''' fourByteChunk of ''key'' '''do'''
k ← fourByteChunk
k ← k × c1
k ← k ROL r1
k ← k × c2
hash ← hash XOR k
hash ← hash ROL r2
hash ← (hash × m) + n
'''with any''' remainingBytesInKey '''do'''
remainingBytes ← SwapToLittleEndian(remainingBytesInKey)
''// Note: Endian swapping is only necessary on big-endian machines.''
''// The purpose is to place the meaningful digits towards the low end of the value,''
''// so that these digits have the greatest potential to affect the low range digits''
''// in the subsequent multiplication. Consider that locating the meaningful digits''
''// in the high range would produce a greater effect upon the high digits of the''
''// multiplication, and notably, that such high digits are likely to be discarded''
''// by the modulo arithmetic under overflow. We don't want that.''
remainingBytes ← remainingBytes × c1
remainingBytes ← remainingBytes ROL r1
remainingBytes ← remainingBytes × c2
hash ← hash XOR remainingBytes
hash ← hash XOR ''len''
hash ← hash XOR (hash >> 16)
hash ← hash × 0x85ebca6b
hash ← hash XOR (hash >> 13)
hash ← hash × 0xc2b2ae35
hash ← hash XOR (hash >> 16)
 
A sample C implementation follows (for little-endian CPUs):
 
<syntaxhighlight lang="c">
The canonical implementations are in [[C++]], but there are efficient ports for a variety of popular languages, including [[Python (programming language)|Python]]<ref>[http://code.google.com/p/pyfasthash/ pyfasthash in Python]</ref>, [[C Sharp (programming language)|C#]]<ref>[http://landman-code.blogspot.com/2009/02/c-superfasthash-and-murmurhash2.html Davy Landman in C#]</ref>, [[Perl]]<ref>[http://search.cpan.org/~tmaesaka/Digest-MurmurHash-0.10/lib/Digest/MurmurHash.pm Toru Maesaka in Perl]</ref>, [[Ruby (programming language)|Ruby]]<ref>http://rubyforge.org/projects/murmurhash Jeff Hodges
static inline uint32_t murmur_32_scramble(uint32_t k) {
in Ruby</ref>, [[Haskell (programming language)|Haskell]]<ref>[http://hackage.haskell.org/package/murmur-hash]</ref>, and [[Java (programming language)|Java]]<ref>[http://dmy999.com/article/50/murmurhash-2-java-port Derek Young in Java], public domain</ref>.
k *= 0xcc9e2d51;
k = (k << 15) | (k >> 17);
k *= 0x1b873593;
return k;
}
uint32_t murmur3_32(const uint8_t* key, size_t len, uint32_t seed)
{
uint32_t h = seed;
uint32_t k;
/* Read in groups of 4. */
for (size_t i = len >> 2; i; i--) {
// Here is a source of differing results across endiannesses.
// A swap here has no effects on hash properties though.
memcpy(&k, key, sizeof(uint32_t));
key += sizeof(uint32_t);
h ^= murmur_32_scramble(k);
h = (h << 13) | (h >> 19);
h = h * 5 + 0xe6546b64;
}
/* Read the rest. */
k = 0;
for (size_t i = len & 3; i; i--) {
k <<= 8;
k |= key[i - 1];
}
// A swap is *not* necessary here because the preceding loop already
// places the low bytes in the low places according to whatever endianness
// we use. Swaps only apply when the memory is copied in a chunk.
h ^= murmur_32_scramble(k);
/* Finalize. */
h ^= len;
h ^= h >> 16;
h *= 0x85ebca6b;
h ^= h >> 13;
h *= 0xc2b2ae35;
h ^= h >> 16;
return h;
}
</syntaxhighlight>
{| class="wikitable"
|+Tests
!Test string
!Seed value
!Hash value (hexadecimal)
!Hash value (decimal)
|-
|
|<code>0x00000000</code>
|<code>0x00000000</code>
|<code>0</code>
|-
|
|<code>0x00000001</code>
|<code>0x514E28B7</code>
|<code>1,364,076,727</code>
|-
|
|<code>0xffffffff</code>
|<code>0x81F16F39</code>
|<code>2,180,083,513</code>
|-
|test
|<code>0x00000000</code>
|<code>0xba6bd213</code>
|<code>3,127,628,307</code>
|-
|test
|<code>0x9747b28c</code>
|<code>0x704b81dc</code>
|<code>1,883,996,636</code>
|-
|Hello, world!
|<code>0x00000000</code>
|<code>0xc0363e43</code>
|<code>3,224,780,355</code>
|-
|Hello, world!
|<code>0x9747b28c</code>
|<code>0x24884CBA</code>
|<code>612,912,314</code>
|-
|The quick brown fox jumps over the lazy dog
|<code>0x00000000</code>
|<code>0x2e4ff723</code>
|<code>776,992,547</code>
|-
|The quick brown fox jumps over the lazy dog
|<code>0x9747b28c</code>
|<code>0x2FA826CD</code>
|<code>799,549,133</code>
|}
 
== See also ==
It has been adopted into a number of open-source projects, most notably libmemcached<ref>[http://tangent.org/552/libmemcached.html libmemcached]</ref> (the [[C (programming language)|C]] driver for [[Memcached]]), maatkit<ref>[http://code.google.com/p/maatkit/source/detail?r=3273 maatkit]</ref>, and [[Hadoop]]<ref name="Hadoop"/>.
* [[Non-cryptographic hash functions]]
 
==References==
{{reflist|colwidth=30em}}
<references/>
 
==SeeExternal alsolinks==
* [https://github.com/aappleby/smhasher Official SMHasher site]
*[[Fowler-Noll-Vo hash function]]
*[[Jenkins hash function]]
 
{{DEFAULTSORT:Murmurhash}}
[[Category:Hash functionsfunction (non-cryptographic)]]
[[Category:Articles with example pseudocode]]
[[Category:Articles with example C code]]