(Go: >> BACK << -|- >> HOME <<)

MurmurHash: Difference between revisions

Content deleted Content added
m Added missing spaces and comma.
→‎Variants: logical order
 
(43 intermediate revisions by 31 users not shown)
Line 1:
{{Short description|Computer function}}
{{Use dmy dates|date=January 20122021}}
'''MurmurHash''' is a non-[[Cryptographic hash function|cryptographic]] [[hash function]] suitable for general hash-based lookup.<ref name="Hadoop">{{cite web |url=http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |title=Hadoop in Java |publisher=Hbase.apache.org |date=24 July 2011 |accessdate=13 January 2012 |url-status=dead |archiveurl=https://web.archive.org/web/20120112023407/http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |archivedate=12 January 2012 |df=dmy-all }}</ref> It was created by Austin Appleby in 2008<ref>{{cite web|author=Tanjent (tanjent) wrote,3 March 2008 13:31:00 |url=http://tanjent.livejournal.com/756623.html |title=MurmurHash first announcement |publisher=Tanjent.livejournal.com |accessdate=13 January 2012}}</ref> and is currently hosted on GitHub along with its test suite named 'SMHasher'. It also exists in a number of variants,<ref name="Murmur160">{{cite web|url=http://simonhf.wordpress.com/2010/09/25/murmurhash160/ |title=MurmurHash2-160 |publisher=Simonhf.wordpress.com |date=25 September 2010 |accessdate=13 January 2012}}</ref> all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.
 
'''MurmurHash''' is a [[non-cryptographic hash function]] suitable for general hash-based lookup.<ref name="Hadoop">{{cite web |url=http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |title=Hadoop in Java |publisher=Hbase.apache.org |date=24 July 2011 |accessdate=13 January 2012 |url-status=dead |archiveurl=https://web.archive.org/web/20120112023407/http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |archivedate=12 January 2012}}
</ref><ref>
[http://materias.fi.uba.ar/7500/chouza-tesisingenieriainformatica.pdf Chouza et al].
</ref><ref>
{{cite web|url=http://www.inesc-id.pt/ficheiros/publicacoes/5453.pdf |title=Couceiro et al. |language=pt |accessdate=13 January 2012
|page= 14
}}
</ref>
'''MurmurHash''' is a non-[[Cryptographic hash function|cryptographic]] [[hash function]] suitable for general hash-based lookup.<ref name="Hadoop">{{cite web |url=http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |title=Hadoop in Java |publisher=Hbase.apache.org |date=24 July 2011 |accessdate=13 January 2012 |url-status=dead |archiveurl=https://web.archive.org/web/20120112023407/http://hbase.apache.org/docs/current/api/org/apache/hadoop/hbase/util/MurmurHash.html |archivedate=12 January 2012 |df=dmy-all }}</ref> It was created by Austin Appleby in 2008<ref>{{cite web|author=Tanjent (tanjent) wrote,3 March 2008 13:31:00 |url=http://tanjent.livejournal.com/756623.html |title=MurmurHash first announcement |publisher=Tanjent.livejournal.com |accessdate=13 January 2012}}</ref> and is currently hosted on GitHub along with its test suite named 'SMHasher'. It also exists in a number of variants,<ref name="Murmur160">{{cite web|url=http://simonhf.wordpress.com/2010/09/25/murmurhash160/ |title=MurmurHash2-160 |publisher=Simonhf.wordpress.com |date=25 September 2010 |accessdate=13 January 2012}}</ref> all of which have been released into the public domain. The name comes from two basic operations, multiply (MU) and rotate (R), used in its inner loop.
 
Unlike [[cryptographic hash function]]s, it is not specifically designed to be difficult to reverse by an adversary, making it unsuitable for cryptographic purposes.
Line 6 ⟶ 16:
==Variants==
 
===MurmurHash3MurmurHash1===
The original MurmurHash was created as an attempt to make a faster function than [[Jenkins hash function#lookup3|Lookup3]].<ref>{{cite web |title=MurmurHash1 |url=https://github.com/aappleby/smhasher/wiki/MurmurHash1 |accessdate=12 January 2019}}</ref> Although successful, it hadn't been tested thoroughly and wasn't capable of providing 64-bit hashes as in Lookup3. It had a rather elegant design, that would be later built upon in MurmurHash2, combining a multiplicative hash (similar to [[Fowler–Noll–Vo hash function]]) with a [[Xorshift]].
The current version is MurmurHash3,<ref>{{cite web|title=MurmurHash3 on Github|url=https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp}}</ref><ref name="Horvath">{{cite web | first = Adam | last = Horvath | url = http://blog.teamleadnet.com/2012/08/murmurhash3-ultra-fast-hash-algorithm.html | title = MurMurHash3, an ultra fast hash algorithm for C# / .NET | date = Aug 10, 2012 }}</ref> which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher—a hash function test suite.
 
===MurmurHash2===
Line 18 ⟶ 28:
* MurmurHashAligned2 (32-bit, x86) - Slower, but does aligned reads (safer on some platforms).
* MurmurHash64A (64-bit, x64) - The original 64-bit version. Optimized for 64-bit arithmetic.
* MurmurHash64B (64-bit, x86) - A 64-bit version optimized for 32-bit platforms. Unfortunately itIt is not a true 64-bit hash due to insufficient mixing of the stripes.<ref>{{cite web |title=MurmurHash3 (see note on MurmurHash2_x86_64) |url=https://github.com/aappleby/smhasher/wiki/MurmurHash3#bulk-speed-test-hashing-an-8-byte-aligned-256k-block |accessdate=15 January 2019}}</ref>
 
The person who originally found the flaw{{what|date=July 2021}} in MurmurHash2 created an unofficial 160-bit version of MurmurHash2 called MurmurHash2_160.<ref>{{cite web |title=MurmurHash2_160 |url=https://simonhf.wordpress.com/2010/09/25/murmurhash160/ |accessdate=12 January 2019}}</ref>
 
===MurmurHash1MurmurHash3===
The current version is MurmurHash3,<ref>{{cite web|title=MurmurHash3 on Github|url=https://github.com/aappleby/smhasher/blob/master/src/MurmurHash3.cpp}}</ref><ref name="Horvath">{{cite web | first = Adam | last = Horvath | url = http://blog.teamleadnet.com/2012/08/murmurhash3-ultra-fast-hash-algorithm.html | title = MurMurHash3, an ultra fast hash algorithm for C# / .NET | date = Aug 10, August 2012 }}</ref> which yields a 32-bit or 128-bit hash value. When using 128-bits, the x86 and x64 versions do not produce the same values, as the algorithms are optimized for their respective platforms. MurmurHash3 was released alongside SMHasher—a hash function test suite.
The original MurmurHash was created as an attempt to make a faster function than [[Jenkins hash function#lookup3|Lookup3]].<ref>{{cite web |title=MurmurHash1 |url=https://github.com/aappleby/smhasher/wiki/MurmurHash1 |accessdate=12 January 2019}}</ref> Although successful, it hadn't been tested thoroughly and wasn't capable of providing 64-bit hashes as in Lookup3. It had a rather elegant design, that would be later built upon in MurmurHash2, combining a multiplicative hash (similar to [[Fowler–Noll–Vo hash function]]) with a [[Xorshift]].
 
==Implementations==
The canonical implementation is in [[C++]], but there are efficient ports for a variety of popular languages, including [[Python (programming language)|Python]],<ref>{{cite web|url=https://code.google.com/p/pyfasthash/ |title=pyfasthash in Python |publisher=Google |accessdate=13 January 2012}}</ref> [[C (programming language)|C]],<ref>{{cite web|url=https://github.com/wolkykim/qlibc |title=C implementation in qLibc by Seungyoung Kim}}</ref> [[Go (programming language)|Go]],<ref>{{cite web|url=https://github.com/spaolacci/murmur3 |title=murmur3 in Go}}</ref> [[C Sharp (programming language)|C#]],<ref name="Horvath"/><ref>{{cite web|last=Landman |first=Davy |url=http://landman-code.blogspot.com/2009/02/c-superfasthash-and-murmurhash2.html |title=Davy Landman in C# |publisher=Landman-code.blogspot.com |accessdate=13 January 2012}}</ref> [[D (programming language)|D]],<ref>{{Cite web|url=https://dlang.org/phobos/std_digest_murmurhash.html|title=std.digest.murmurhash - D Programming Language|website=dlang.org|access-date=2016-11-05}}</ref> [https://github.com/tkaemming/lua-murmurhash3 Lua], [[Perl]],<ref>{{cite web|url=http://metacpan.org/module/Digest::MurmurHash |title=Toru Maesaka in Perl |publisher=metacpan.org |accessdate=13 January 2012}}</ref> [[Ruby (programming language)|Ruby]],<ref>{{cite web|author=Yuki Kurihara |url=https://github.com/ksss/digest-murmurhash |title=Digest::MurmurHash |publisher=GitHub.com |date=16 OctOctober 2014 |accessdate=18 March 2015}}</ref> [[Rust (programming language)|Rust]],<ref>{{Cite web|title = stusmall/murmur3|url = https://github.com/stusmall/murmur3|website = GitHub|accessdate = 2015-11-29}}</ref> [[PHP]],<ref>{{cite web|url=https://github.com/lastguest/murmurhash-php |title=PHP userland implementation of MurmurHash3 |publisher=github.com |accessdate=18 December 2017}}</ref><ref>{{cite web|url=https://php.watch/versions/8.1/MurmurHash3 |title=PHP 8.1 with MurmurHash3 support}}</ref> [[Common Lisp]],<ref>{{cite web|url=https://bitbucket.org/tarballs_are_good/murmurhash3|title=tarballs_are_good / murmurhash3|accessdate=7 February 2015}}</ref> [[Haskell (programming language)|Haskell]],<ref>{{cite web|url=http://hackage.haskell.org/package/murmur-hash |title=Haskell |publisher=Hackage.haskell.org |accessdate=13 January 2012}}</ref> [[Elm (programming language)|Elm]],<ref>{{cite web|url=https://package.elm-lang.org/packages/Skinney/murmur3/latest/ |title=Elm |publisher=package.elm-lang.org |accessdate=12 June 2019}}</ref> [[Clojure]],<ref>{{cite web|url=https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/Murmur3.java|title=Murmur3.java in Clojure source code on Github|publisher=clojure.org|accessdate=2014-03-11}}</ref> [[Scala (programming language)|Scala]],<ref>{{cite web|url=https://github.com/scala/scala/blob/2.12.x/src/library/scala/util/hashing/MurmurHash3.scala|title=Scala standard library implementation|date=26 September 2014}}</ref> [[Java (programming language)|Java]],<ref>[https://google.github.io/guava/releases/snapshot/api/docs/com/google/common/hash/Hashing.html Murmur3], part of Guava</ref><ref>{{cite web|url=https://github.com/greenrobot/greenrobot-common/blob/master/hash-functions.md|title=Murmur3A and Murmur3F Java classes on Github|publisher=greenrobot|accessdate=5 November 2014}}</ref> [[Erlang (programming language)|Erlang]],<ref>{{cite web|url=https://github.com/bipthelin/murmerl3|title=bipthelin/murmerl3|work=GitHub|accessdate=21 October 2015}}</ref> [[Swift (programming language)|Swift]],<ref>{{cite web|author=Daisuke T |url=https://github.com/daisuke-t-jp/MurmurHash-Swift |title=MurmurHash-Swift |publisher=GitHub.com |date=7 FebFebruary 2019 |accessdate=10 FebFebruary 2019}}</ref> and[[Object Pascal]],<ref>[https://github.com/Xor-el/HashLib4Pascal GitHub - Xor-el/HashLib4Pascal: Hashing for Modern Object Pascal<!-- Bot generated title -->]</ref> [JavaScript[Kotlin (programming language)|Kotlin]],<ref>{{cite web|author=raycmorgan (owner) |url=https://gist.github.com/588423goncalossilva/kotlinx-murmurhash |title=Javascript implementation by Ray Morgangoncalossilva/kotlinx-murmurhash |publisher=Gist.githubGitHub.com |date=10 December 2021 |accessdate=1314 JanuaryDecember 20122021}}</ref> [[JavaScript]].<ref>{{cite web|author=garycourtraycmorgan (owner) |url=https://gist.github.com/garycourt/murmurhash-js588423 |publishertitle=Github.comJavascript implementation by Ray Morgan |titlepublisher=MurmurHashGist.js ongithub.com Github|accessdate=13 January 2012}}</ref> togetherand with[[OCaml]], an online version.<ref>{{cite web|author=INRIA |url=httphttps://murmurhash.shorelabsgithub.com/ocaml/ocaml/blob/trunk/runtime/hash.c |title=OnlineOCaml version of MurmurHashSource |publisher=shorelabsGitHub.com |accessdate=12 August 2014}}</ref>
 
It has been adopted into a number of open-source projects, most notably [[libstdc++]] (ver 4.6), [[nginx]] (ver 1.0.1),<ref>{{cite web|url=http://nginx.org/en/CHANGES |title=nginx |accessdate=13 January 2012}}</ref> [[Rubinius]],<ref>{{cite web|url=https://github.com/rubinius/rubinius/commit/1d69526c484cc9435a7198e41b8995db6c3acf1a |title=Rubinius |accessdate=29 February 2012}}</ref> libmemcached (the [[C (programming language)|C]] driver for [[Memcached]]),<ref>{{cite web|url=http://libmemcached.org/libMemcached.html|title=libMemcached|work=libmemcached.org|accessdate=21 October 2015}}</ref> [[Npm (software)|npm]] (nodejs package manager),<ref>{{cite web|url=https://github.com/npm/write-file-atomic/commit/22dd8759076fc8d0327d50693283060e479afafc |title=switch from MD5 to murmur}}</ref> [[maatkit]],<ref>{{cite web|url=https://code.google.com/p/maatkit/source/detail?r=3273 |title=maatkit |publisher=Google |date=24 March 2009 |accessdate=13 January 2012}}</ref> [[Hadoop]],<ref name="Hadoop"/> Kyoto Cabinet,<ref>{{cite web|url=http://fallabs.com/kyotocabinet/spex.html |title=Kyoto Cabinet specification |publisher=Fallabs.com |date=4 March 2011 |accessdate=13 January 2012}}</ref> [[RaptorDBApache Cassandra|Cassandra]],<ref>{{cite web|last=Gholam |first=Mehdi |url=http://wwwwiki.codeprojectapache.comorg/KBcassandra/database/RaptorDB.aspx Partitioners|title=RaptorDB CodeProject page Partitioners|publisher=Codeprojectapache.comorg |date=13 November 2011 2013-11-15|accessdate=13 January 20122013-12-19}}</ref> [[OlegDB]],<ref>{{cite web |url=https://olegdbwww.orgyoutube.com/documentation.htmlwatch?v=d7o6a75sfY0 |title=OlegDBIntroduction Documentationto |accessdate=24Apache JanuaryCassandra™ 2013}}</ref>+ [[ApacheWhat’s Cassandra|Cassandra]],<ref>{{citeNew web|url=http://wikiin 4.apache0 by Patrick McFadin.org/cassandra/Partitioners DataStax Presents |titlework=Partitioners|publisher=apache.orgYouTube |date=2013-11-15|accessdate=2013-12-19April 10, 2019}}</ref> [[Apache Solr|Solr]],<ref>{{cite web|url=http://www.solr-start.com/javadoc/solr-lucene/org/apache/lucene/codecs/bloom/MurmurHash2.html|title=Solr MurmurHash2 Javadoc}}</ref> [[Vowpal Wabbit|vowpal wabbit]],<ref>{{cite web|url=https://github.com/VowpalWabbit/vowpal_wabbit/blob/master/explore/hash.h|title=hash.cc in vowpalwabbit source code}}</ref> [[Elasticsearch]],<ref>{{cite web|url=https://www.elastic.co/guide/en/elasticsearch/reference/2.0/breaking_20_crud_and_routing_changes.html#_routing_hash_function|title=Elasticsearch 2.0 - CRUD and routing changes}}</ref> [[Google Guava|Guava]],<ref>{{cite web|url=https://github.com/google/guava/blob/master/guava/src/com/google/common/hash/Hashing.java|title=Guava Hashing.java}}</ref> [[Apache Kafka|Kafka]]<ref>{{cite web|url=https://github.com/apache/kafka/blob/trunk/clients/src/main/java/org/apache/kafka/clients/producer/internals/DefaultPartitioner.java|title=Kafka DefaultPartitioner.java}}</ref> and [https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/storage_administration_guide/vdo-integration RedHat Virtual Data Optimizer (VDO)]<sup>[.<ref>Virtual Data Optimizer [https://github.com/dm-vdo/kvdo source code]</ref>]</sup>
 
==Vulnerabilities==
Hash functions can be vulnerable to [[collision attack]]s, ifwhere a user can choose input data in such asa way so as to intentionally cause hash collisions. Jean-Philippe Aumasson and [[Daniel J. Bernstein]] were able to show that even implementations of MurmurHash using a randomized seed are vulnerable to so-called [[HashDoS]] attacks.<ref>{{cite web|url=https://emboss.github.io/blog/2012/12/14/breaking-murmur-hash-flooding-dos-reloaded/|title=Breaking Murmur: Hash-flooding DoS Reloaded}}</ref> With the use of [[differential cryptanalysis]] they were able to generate inputs that would lead to a hash collision. The authors of the attack recommend to use their own [[SipHash]] instead.
 
==Algorithm==
Line 38 ⟶ 48:
''// In the case of overflow, the result is reduced modulo {{math|2<sup>32</sup>}}.''
'''input:''' ''key'', ''len'', ''seed''
c1 ← 0xcc9e2d51
c2 ← 0x1b873593
Line 45 ⟶ 55:
m ← 5
n ← 0xe6546b64
hash ← ''seed''
Line 68 ⟶ 78:
''// multiplication, and notably, that such high digits are likely to be discarded''
''// by the modulo arithmetic under overflow. We don't want that.''
remainingBytes ← remainingBytes × c1
remainingBytes ← remainingBytes ROL r1
Line 74 ⟶ 84:
hash ← hash XOR remainingBytes
hash ← hash XOR ''len''
Line 126 ⟶ 136:
}
</syntaxhighlight>
{| class="wikitable"
|+Tests
!Test string
!Seed value
!Hash value (hexadecimal)
!Hash value (decimal)
|-
|
|<code>0x00000000</code>
|<code>0x00000000</code>
|<code>0</code>
|-
|
|<code>0x00000001</code>
|<code>0x514E28B7</code>
|<code>1,364,076,727</code>
|-
|
|<code>0xffffffff</code>
|<code>0x81F16F39</code>
|<code>2,180,083,513</code>
|-
|test
|<code>0x00000000</code>
|<code>0xba6bd213</code>
|<code>3,127,628,307</code>
|-
|test
|<code>0x9747b28c</code>
|<code>0x704b81dc</code>
|<code>1,883,996,636</code>
|-
|Hello, world!
|<code>0x00000000</code>
|<code>0xc0363e43</code>
|<code>3,224,780,355</code>
|-
|Hello, world!
|<code>0x9747b28c</code>
|<code>0x24884CBA</code>
|<code>612,912,314</code>
|-
|The quick brown fox jumps over the lazy dog
|<code>0x00000000</code>
|<code>0x2e4ff723</code>
|<code>776,992,547</code>
|-
|The quick brown fox jumps over the lazy dog
|<code>0x9747b28c</code>
|<code>0x2FA826CD</code>
|<code>799,549,133</code>
|}
 
== See also ==
* [[Non-cryptographic hash functions]]