(Go: >> BACK << -|- >> HOME <<)

SlideShare a Scribd company logo
Mathematics for programmers (early draft)
Dennis Yurichev <math@yurichev.com>
September 22, 2020
2
Contents
1 Prime numbers 1
1.1 Integer factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.1 Using composite number as a container . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Using composite number as a container (another example) . . . . . . . . . . . . . . . . . . . . 4
1.2 Coprime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Semiprime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 How RSA1 works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.1 Fermat little theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.2 Euler’s totient function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4.3 Euler’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.4 RSA example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.5 So how it works? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.6 Breaking RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.7 The difference between my simplified example and a real RSA algorithm . . . . . . . . . . . . 9
1.4.8 The RSA signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.9 Hybrid cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Modulo arithmetics 11
2.1 Quick introduction into modular arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Modular arithmetic on CPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Remainder of division by modulo 2n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 Getting random numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Modulo inverse, part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.2.1 No remainder? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Modulo inverse, part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Reversible linear congruential generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.5 Getting magic number using extended Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . . . . 19
3 Probability 23
3.1 Text strings right in the middle of compressed data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 Autocomplete using Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Dissociated press . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.2 Autocomplete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.3 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.4 The files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
3.2.5 Read more . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.3 random.choices() in Python 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1
Rivest–Shamir–Adleman cryptosystem
3
4 Combinatorics 41
4.1 Soldering a headphones cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.2 Vehicle license plate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.3 Forgotten password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.4 Executable file watermarking/steganography using Lehmer code and factorial number system . . . . . 49
4.5 De Bruijn sequences; leading/trailing zero bits counting . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.2 Trailing zero bits counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.5.3 Leading zero bits counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.5.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5.6 Generation of De Bruijn sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
4.5.7 Other articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
5 Galois Fields, GF(2) and yet another explanation of CRC2 61
5.1 What is wrong with checksum? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 Division by prime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.3 (Binary) long divison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 (Binary) long division, version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5 Shortest possible introduction into GF(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.6 CRC32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.7 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.8 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6 Logarithms 69
6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.1.1 Children’s approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.1.2 Scientists’ and engineers’ approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2 Logarithmic scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.1 In human perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
6.2.2 In electronics engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6.2.3 In IT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.2.4 Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Multiplication and division using addition and subtraction . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3.1 Logarithmic slide rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3.2 Logarithmic tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.3.3 Working with very small and very large numbers . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.3.4 IEEE 754: adding and subtracting exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.4 Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.5 Square root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.6 Base conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.7 Binary logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.7.1 Denoting a number of bits for some value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.7.2 Calculating binary logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.7.3 O(log n) time complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.8 Common (base 10) logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.9 Natural logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.9.1 Savings account in your bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.9.2 Exponential decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
2
Cyclic redundancy check
4
7 Symbolic computation 93
7.1 Rational data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8 Graph theory 97
8.1 Clique in graph theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1.1 Social graph: simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1.2 Social graph: IRC network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
8.1.3 Attempt to find communities in IRC social graph . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.4 Social graph: social networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.1.5 Links graph: Wikipedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.1.6 Social graph: LiveJournal spammers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.1.7 Links graph: link farms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.1.8 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
9 GCD and LCM 105
9.1 GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.2 Oculus VR Flicks and GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
9.3 LCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
9.3.1 File copying routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
10 Linear algebra 111
10.1 Gaussian elimination: Kirchhoff’s circuit laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
10.1.1 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
11 Acronyms used 117
Thanks
Thanks to those who helped: Slava “Avid” Kazakov, Serhiy Matviychuk.
5
6
Chapter 1
Prime numbers
Prime numbers are the numbers which has no divisors except itself and 1. This can be represented graphically.
Let’s take 9 balls or some other objects. 9 balls can be arranged into rectangle:
ooo
ooo
ooo
So are 12 balls:
oooo
oooo
oooo
Or:
ooo
ooo
ooo
ooo
So 9 and 12 are not prime numbers. 7 is prime number:
ooooooo
Or:
o
o
o
o
o
o
o
It’s not possible to form a rectangle using 7 balls, or 11 balls or any other prime number.
1
The fact that balls can be arranged into rectangle shows that the number can be divided by the number which is rep-
resented by height and width of rectangle. Balls of prime number can be arranged vertically or horizontally, meaning,
there are only two divisors: 1 and the prime number itself.
1.1 Integer factorization
Natural number can be either prime or composite number. Composite number is a number which can be breaked up
by product of prime numbers. Let’s take 100. It’s not a prime number.
By the fundamental theorem of arithmetic, any number can be represented as product of prime numbers, in only one
single way.
So the composite number phrase means that the number is composed of prime numbers.
Let’s factor 100 in Wolfram Mathematica:
Listing 1.1: Wolfram Mathematica
In[]:= FactorInteger[100]
Out[]= {{2, 2}, {5, 2}}
This mean that 100 can be constructed using 2 and 5 prime numbers (22 · 52):
Listing 1.2: Wolfram Mathematica
In[]:= 2^2*5^2
Out[]= 100
1.1.1 Using composite number as a container
Even more than that, it’s possible to encode some information in prime numbers using factoring. Let’s say, we would
encode the “Hello” text string. First, let’s find ASCII codes of each character in the string:
Listing 1.3: Wolfram Mathematica
In[]:= ToCharacterCode["Hello"]
Out[]= {72, 101, 108, 108, 111}
Let’s find a first 5 prime numbers, each number for each character:
Listing 1.4: Wolfram Mathematica
In[]:= Map[Prime[#] &, Range[5]]
Out[]= {2, 3, 5, 7, 11}
Build a huge number using prime numbers as bases and ASCII codes as exponents, then get a product of all them
(272 · 3101 · 5108 · 7108 · 11111):
Listing 1.5: Wolfram Mathematica
In[]:= tmp = 2^72*3^101*5^108*7^108*11^111
Out[]= 
1649465578065933994718255257642275679479006861206428830641826551739434
9344066214616222018844835866267141943107823334187149334898562231349428
5708281252457614466981636618940129599457183300076472809826225406689893
2
5837524252859632074600687844523389231265776082000229507684707641601562
5000000000000000000000000000000000000000000000000000000000000000000000
000
It’s a big number, but Wolfram Mathematica is able to factor it back:
Listing 1.6: Wolfram Mathematica
In[]:= FactorInteger[tmp]
Out[]= {{2, 72}, {3, 101}, {5, 108}, {7, 108}, {11, 111}}
A first number in each pair is prime number and the second is exponent. Get the text string back:
Listing 1.7: Wolfram Mathematica
In[]:= FromCharacterCode[Map[#[[2]] &, tmp]]
Out[]= "Hello"
That allows to have some fun. Let’s add exclamation point to the end of string by manipulating only the big number.
ASCII code of exlamation point is 33. The next prime number after 11 is 13. So add it (by multiplying by 1333):
Listing 1.8: Wolfram Mathematica
In[]:= tmp = tmp*13^33
Out[]= 
9494539005656577744061615691556750598033024729435332190254469113536733
9032823543118405499931761589928052797992206631285822671397023217541663
5920521812548793623881568510051214975599793760307837993570818136014139
9497958680836430182400405525832564875875193876694267121604212637095253
0725145452728611417114734649658203125000000000000000000000000000000000
000000000000000000000000000000000000000
So we got new number. Let’s factor it back and decode:
Listing 1.9: Wolfram Mathematica
In[]:= factored = FactorInteger[tmp]
Out[]= {{2, 72}, {3, 101}, {5, 108}, {7, 108}, {11, 111}, {13, 33}}
In[]:= FromCharacterCode[Map[#[[2]] &, factored]]
Out[]= "Hello!"
Wow, that works. Will it be possible to remove one ’l’ character from the string at the third position? The ’l’ has the
ASCII code of 108 and it is exponent for two prime numbers in our expression: 5 (first ’l’) and 7 (second ’l’).
To knock out the character, we divide the big number by the corresponding prime number with the exponent of 108
(divide by 5108):
Listing 1.10: Wolfram Mathematica
In[]:= tmp = tmp/5^108
Out[]= 
3081154065769189664244341216329094565621009415122099836376732969546063
1079164051611808432546107410277501678916823138724630810880390384343750
1196528030610615786507542545262118293483878711112407171889948257893463
3
8494741216231004109210436295299274515484540190050751059821909485854359
9630924207126074604240892753608704
In[]:= factored = FactorInteger[tmp]
Out[]= {{2, 72}, {3, 101}, {7, 108}, {11, 111}, {13, 33}}
In[]:= FromCharacterCode[Map[#[[2]] &, factored]]
Out[]= "Helo!"
1.1.2 Using composite number as a container (another example)
Let’ssay, theinitialcontainer numberis1. Let’sincrementthenumberatthesecondpositionwithinitbymultiplicating
by the first prime number (3):
Listing 1.11: Wolfram Mathematica
In[]:= tmp = 1*3
Out[]= 3
Then let’s set the number at fourth posistion to 123. The fourth prime number is 7 (the percent sign in Mathematica
denotes the last result):
Listing 1.12: Wolfram Mathematica
In[]:= tmp = tmp*7^123
Out[]= 26557071110804040505330743411815438275701018334410643480070773
5780279761186999642944265644421128096489029
Then let’s set the number at fifth position to 456. The fifth prime number is 11:
Listing 1.13: Wolfram Mathematica
In[]:= tmp = tmp*11^456
Out[]= 19917948660639605307938425372554395433764512138284060223646519
1257621966825293455339751080188144910510813322192288287162176499976800
9147068591160798243308591883294649069355015558472564457422829073938118
4396204999051856879940101934339913600942451006747982291524910653185084
4057972896537894301213735252418639782974592077393028390193060138936503
0125465578958567377627063815620261557939484036628536230966158222960762
8509899641574477547658142457598168497006309608599830554758951672484533
9216105863754463712957143732563375990198901073698626584903164027850451
8825659837114080589573087269
Then let’s decrement the number at fourth position, the fourth prime number is 7:
Listing 1.14: Wolfram Mathematica
In[]:= tmp = tmp/7
Out[]= 28454212372342293297054893389363422048235017340405800319495027
3225174238321847793342501543125921300729733317417554695945966428538287
0210097987372568919012274118992355813364307940675092082032612962768740
6280292855788366971343002763342733715632072866782831845035586647407263
4368532709339849001733907503455199689963702967704326271704371627052147
4
1607807969940810539467234022314659368484977195183623187094511747086804
0728428059392110782368774939425954995723299440856900792512788103549334
1737294091077805304224491046519108557427001533855180835575948611214931
260808548159154369939012467
Let’s factor the composite number and get all the numbers we set inside container (1, 122, 456):
Listing 1.15: Wolfram Mathematica
In[]:= FactorInteger[tmp]
Out[]= {{3, 1}, {7, 122}, {11, 456}}
So the resulting number has 3 · 7122 · 11456 form at the end.
This is somewhat wasteful way to store the numbers, but out of curiosity: since there are infinite number of prime
numbers and so any number of infinitely big numbers can be stored within one (although huge) composite number.
1.2 Coprime numbers
Coprime numbers are the 2 or more numbers which don’t have any common divisors. In mathematical lingo, the GCD1
of all coprime numbers is 1.
3 and 5 are coprimes. So are 7 and 10. So are 4, 5 and 9.
Coprime numbers are the numerator and denominator in fraction which cannot be reduced further (irreducible frac-
tion). For example, 130
14 is 65
7 after reduction (or simplification), 65 and 7 are coprime to each other, but 130 and 14 are
not (they has 2 as common divisor).
One application of coprime numbers in engineering is to make number of cogs on cogwheel and number of chain
elements on chain to be coprimes. Let’s imagine bike cogwheels and chain:
Figure 1.1: The picture was taken from www.mechanical-toys.com
Ifyouchoose 5 as number of cogson one of cogwhell andyou have10 or 15 or 20chain elements, eachcogon cogwheel
will meet some set of the same chain elements. For example, if there are 5 cogs on cogwheel and 20 chain elements,
each cog will meet only 4 chain elements and vice versa: each chain element has its own cog on cogwheel. This is bad
because both cogwheel and chain will run out slightly faster than if each cog would interlock every chain elements at
some point. To reach this, number of cogs and chain elements could be coprime numbers, like 5 and 11, or 5 and 23.
That will make each chain element interlock each cog evenly, which is better.
1
Greatest Common Divisor
5
1.3 Semiprime numbers
Semiprime is a product of two prime numbers.
An interesting property of semiprime:
In 1974 the Arecibo message was sent with a radio signal aimed at a star cluster
. It consisted of 1679 binary digits intended to be interpreted as a 23×73
bitmap image. The number 1679 = 23×73 was chosen because it is a semiprime
and therefore can only be broken down into 23 rows and 73 columns , or 73 rows
and 23 columns.
( https://en.wikipedia.org/wiki/Semiprime )
1.4 How RSA works
RSA public-key cryptosystem (named after its inventors: Ron Rivest, Adi Shamir and Leonard Adleman) is the most
used asymmetric public-private key cryptosystem.
1.4.1 Fermat little theorem
Fermatlittletheoremstatesthatifpisprime,thiscongruenceisvalidforanyaintheenvironmentofmoduloartihemtic
of base p:
ap−1 ≡ 1 (mod p).
There are proofs, which are, perhaps, too tricky for this article which is intended for beginners. So far, you can just
take it as granted.
This theorem may be used to sieve prime numbers. So you take, for example, 10 and test it. Let’s take some random a
value (123) (Wolfram Mathematica):
Listing 1.16: Wolfram Mathematica
In[]:= Mod[123^(10 - 1), 10]
Out[]= 3
We’ve got 3, which is not 1, indicating the 10 is not prime. On the other hand, 11 is prime:
Listing 1.17: Wolfram Mathematica
In[]:= Mod[123^(11 - 1), 11]
Out[]= 1
This method is not perfect, some composite p numbers can lead to 1, for example p=1105, but can be used as a method
to sieve vast amount of prime numbers candidates.
1.4.2 Euler’s totient function
It is a number of coprime numbers under some n. Denoted as ϕ(n), pronounced as phi. For the sake of simplification,
you may just keep in mind that if n = pq (i.e., product of two prime numbers), φ(pq) = (p − 1)(q − 1). This is true for
RSA environment.
6
1.4.3 Euler’s theorem
Euler’s theorem is a generalization of Fermat little theorem. It states:
aφ(n) ≡ 1 (mod n)
But again, for the sake of simplification, we may keep in mind that Euler’s theorem in the RSA environment is this:
a(p−1)(q−1) ≡ 1 (mod n)
... where n = pq and both p and q are prime numbers.
This theorem is central to RSA algorithm.
1.4.4 RSA example
There are The Sender and The Receiver. The Receiver generates two big prime numbers (p and q) and publishes its
product (n = pq). Both p and q are kept secret.
For the illustration, let’s randomly pick p and q among the first 50 prime numbers in Wolfram Mathematica:
Listing 1.18: Wolfram Mathematica
In[]:= p = Prime[RandomInteger[50]]
Out[]= 89
In[]:= q = Prime[RandomInteger[50]]
Out[]= 43
In[]:= n = p*q
Out[]= 3827
3827 is published as public key, named public key modulus or modulo. It is semiprime. There is also public key expo-
nent e, which is not secret, is often 65537, but we will use 17 to keep all results tiny.
Now The Sender wants to send a message (123 number) to The Receiver and he/she uses one-way function:
Listing 1.19: Wolfram Mathematica
In[]:= e = 17
Out[]= 17
In[]:= encrypted = Mod[123^e, n]
Out[]= 3060
3060 is encrypted message, which can be decrypted only using p and q values separately. This is one-way function, be-
cause only part of exponentiation result is left. One and important consequence is that even The Sender can’t decrypt
it. This is why you can encrypt a piece of text in PGP/GnuPG to someone using his/her public key, but can’t decrypt it.
Perhaps, that’s how CryptoLockers works, making impossible to decrypt the files.
To recover message (123), p and q values must be known.
First, we get the result of Euler’s totient function (p − 1)(q − 1) (this is the point where p and q values are needed):
Listing 1.20: Wolfram Mathematica
In[]:= totient = (p - 1)*(q - 1)
Out[]= 3696
7
Now we calculating decrypting exponent using multiplicative modulo inverse (multiplicative inverse was also de-
scribed in here (2) (e−1 (mod totient = (p − q)(q − 1))):
Listing 1.21: Wolfram Mathematica
In[]:= d = PowerMod[e, -1, totient]
Out[]= 2609
Now decrypt the message:
Listing 1.22: Wolfram Mathematica
In[18]:= Mod[encrypted^d, n]
Out[18]= 123
So the d exponent forms another one-way function, restoring the work of what was done during encryption.
1.4.5 So how it works?
It works, because e and d exponents are reciprocal to each other by modulo totient = (p − 1)(q − 1):
Listing 1.23: Wolfram Mathematica
In[]:= Mod[e*d, totient] (* check *)
Out[]= 1
This allows...
med
= m (mod n) (1.1)
Or in Mathematica:
Listing 1.24: Wolfram Mathematica
In[]:= Mod[123^(e*d), n]
Out[]= 123
So the encryption process is me (mod n), decryption is (me)d = m (mod n).
To prove congruence 1.1, we first should prove the following congruence:
ed ≡ 1 (mod ((p − 1)(q − 1)))
... using modular arithmetic rules, it can be rewritten as:
ed = 1 + h(p − 1)(q − 1)
h is some unknown number which is present here because it’s not known how many times the final result was rounded
while exponentiation (this is modulo arithmetic after all).
So med = m (mod n) can be rewritten as:
m1+h((p−1)(q−1)) ≡ m (mod n)
...and then to:
m m(p−1)(q−1) h
≡ m (mod n)
8
The last expression can be simplified using Euler’s theorem (stating that a(p−1)(q−1) ≡ 1 (mod n)). The result is:
m(1)h ≡ m (mod n)
... or just:
m ≡ m (mod n)
1.4.6 Breaking RSA
We can try to factor n semiprime (or RSA modulus) in Mathematica:
Listing 1.25: Wolfram Mathematica
In[]:= FactorInteger[n]
Out[]= {{43, 1}, {89, 1}}
And we getting correct p and q, but this is possible only for small values. When you use some big ones, factorizing is
extremely slow, making RSA unbreakable, if implemented correctly.
The bigger p, q and n numbers, the harder to factorize n, so the bigger keys in bits are, the harder it to break.
1.4.7 The difference between my simplified example and a real RSA algorithm
In my example, public key is n = pq (product) and secret key are p and q values stored separately. This is not very
efficient, to calculate totient and decrypting exponent each time. So in practice, a public key is n and e, and a secret
key is at least n and d, and d is stored in secret key precomputed.
For example, here is my PGP public key2:
dennis@...:∼$ gpg --export-options export-reset-subkey-passwd --export-secret -
subkeys 0x3B262349! | pgpdump
Old: Secret Key Packet(tag 5)(533 bytes)
Ver 4 - new
Public key creation time - Tue Jun 30 02:08:38 EEST 2015
Pub alg - RSA Encrypt or Sign(pub 1)
RSA n(4096 bits) - ...
RSA e(17 bits) - ...
...
... so there are available openly big (4096 bits) n and e (17 bits).
And here is my PGP secret key:
dennis@...:∼$ gpg --export-options export-reset-subkey-passwd --export-secret -
subkeys 0x55B5C64F! | pgpdump
gpg: about to export an unprotected subkey
You need a passphrase to unlock the secret key for
user: "Dennis Yurichev <dennis@yurichev.com>"
4096-bit RSA key, ID 55B5C64F, created 2015-06-29
gpg: gpg-agent is not available in this session
Old: Secret Key Packet(tag 5)(533 bytes)
2
https://yurichev.com/pgp.html
9
Ver 4 - new
Public key creation time - Tue Jun 30 02:08:38 EEST 2015
Pub alg - RSA Encrypt or Sign(pub 1)
RSA n(4096 bits) - ...
RSA e(17 bits) - ...
...
Old: Secret Subkey Packet(tag 7)(1816 bytes)
Ver 4 - new
Public key creation time - Tue Jun 30 02:08:38 EEST 2015
Pub alg - RSA Encrypt or Sign(pub 1)
RSA n(4096 bits) - ...
RSA e(17 bits) - ...
RSA d(4093 bits) - ...
RSA p(2048 bits) - ...
RSA q(2048 bits) - ...
RSA u(2048 bits) - ...
Checksum - 94 53
...
... it has all variables stored in the file, including d, p and q.
1.4.8 The RSA signature
It’s possible to sign a message to publish it, so everyone can check the signature. For example, The Publisher wants to
sign the message (let’s say, 456). Then he/she uses d exponent to compute signature:
Listing 1.26: Wolfram Mathematica
In[]:= signed = Mod[456^d, n]
Out[]= 2282
Now he publishes n = pq (3827), e (17 in our example), the message (456) and the signature (2282). Some other
Consumers can verify its signature using e exponent and n:
Listing 1.27: Wolfram Mathematica
In[]:= Mod[2282^e, n]
Out[]= 456
... this is another illustration that e and d exponents are complement each other, by modulo totient = (p − 1)(q − 1).
The signature can only be generated with the access to p and q values, but it can be verified using product (n = pq)
value.
1.4.9 Hybrid cryptosystem
RSA is slow, because exponentiation is slow and exponentiation by modulo is also slow. Perhaps, this is the reason
why it was considered impractical by GCHQ when Clifford Cocks first came with this idea in 1970s. So in practice, if
The Sender wants to encrypt some big piece of data to The Receiver, a random number is generated, which is used as
a key for symmetrical cryptosystem like DES or AES. The piece of data is encrypted by the random key. The key is then
encrypted by RSA to The Receiver and destroyed. The Receiver recovers the random key using RSA and decrypts all the
data using DES or AES.
10
Chapter 2
Modulo arithmetics
2.1 Quick introduction into modular arithmetic
Modular arithmetic is an environment where all values are limited by some number (modulo). Many textbooks has
clock as example. Let’s imagine old mechanical analog clock. There hour hand points to one of number in bounds of
0..11 (zero is usually shown as 12). What hour will be if to sum up 10 hours (no matter, AM or PM) and 4 hours? 10+4 is
14 or 2 by modulo 12. Naively you can just sum up numbers and subtract modulo base (12) as long as it’s possible.
Modern digital watch shows time in 24 hours format, so hour there is a variable in modulo base 24. But minutes and
seconds are in modulo 60 (let’s forget about leap seconds for now).
Another example is US imperial system of measurement: human height is measured in feets and inches. There are 12
inches in feet, so when you sum up some lengths, you increase feet variable each time you’ve got more than 12 inches.
Another example I would recall is password cracking utilities. Often, characters set is defined in such utilities. And
when you set all Latin characters plus numbers, you’ve got 26+10=36 characters in total. If you brute-forcing a 6-
characters password, you’ve got 6 variables, each is limited by 36. And when you increase last variable, it happens in
modular arithmetic rules: if you got 36, set last variable to 0 and increase penultimate one. If it’s also 36, do the same.
If the very first variable is 36, then stop. Modular arithmetic may be very helpful when you write multi-threading (or
distributed) password cracking utility and you need to slice all passwords space by even parts.
This is yet another application of modulo arithmetic, which many of us encountered in childhood.
Given a counting rhyme, like:
Eeny, meeny, miny, moe,
Catch a tiger by the toe.
If he hollers , let him go,
Eeny, meeny, miny, moe.
( https://en.wikipedia.org/wiki/Eeny,_meeny,_miny,_moe )
... predict, who will be choosen.
If I’m correct, that rhyme has 16 items, and if a group of kids constist of, say, 5 kids, who will be choosen? 16 mod 5 = 1,
meaning, the next kid after the one at whom counting had begun.
Or 7 kids, 16 mod 7 = 2. Count two kids after the first one.
If you can calculate this quickly, you can probably get an advantage by choosing a better place within a circle...
11
Now let’s recall old mechanical counters which were widespread in pre-digital era:
Figure 2.1: The picture was stolen from http://www.featurepics.com/ — sorry for that!
This counter has 6 wheels, so it can count from 0 to 106 − 1 or 999999. When you have 999999 and you increase the
counter, it will resetting to 000000— this situation is usually understood by engineers and computer programmers as
overflow. And if you have 000000 and you decrease it, the counter will show you 999999. This situation is often called
“wrap around”. See also: http://en.wikipedia.org/wiki/Integer_overflow.
2.1.1 Modular arithmetic on CPUs
The reason I talk about mechanical counter is that CPU registers acting in the very same way, because this is, perhaps,
simplest possible and efficient way to compute using integer numbers.
This implies that almost all operations on integer values on your CPU is happens by modulo 232 or 264 depending on
your CPU. For example, you can sum up 0x87654321 and 0xDEADBABA, which resulting in 0x16612FDDB. This value is
too big for 32-bit register, so only 0x6612FDDB is stored, and leading 1 is dropped. If you will multiply these two num-
bers, the actual result it 0x75C5B266EDA5BFFA, which is also too big, so only low 32-bit part is stored into destination
register: 0xEDA5BFFA. This is what happens when you multiply numbers in plain C/C++ language, but some readers
may argue: when sum is too big for register, CF (carry flag) is set, and it can be used after. And there is x86 MUL instruc-
tion which in fact produces 64-bit result in 32-bit environment (in EDX:EAX registers pair). That’s true, but observing
just 32-bit registers, this is exactly environment of modulo with base 232.
Now that leads to a surprising consequence: almost every result of arithmetic operation stored in general purpose reg-
ister of 32-bit CPU is in fact remainder of division operation: result is always divided by 232 and remainder is left in reg-
ister. For example, 0x16612FDDB is too large for storage, and it’s divided by 232 (or 0x100000000). The result of division
(quotient) is 1 (which is dropped) and remainder is 0x6612FDDB (which is stored as a result). 0x75C5B266EDA5BFFA
divided by 232 (0x100000000) produces 0x75C5B266 as a result of division (quotient) and 0xEDA5BFFA as a remainder,
the latter is stored.
And if your code is 32-bit one in 64-bit environment, CPU registers are bigger, so the whole result can be stored there,
but high half is hidden behind the scenes – because no 32-bit code can access it.
12
By the way, this is the reason why remainder calculation is often called ”division by modulo”. C/C++ has a percent sign
(“%”) for this operation, but some other PLs like Pascal and Haskell has ”mod” operator.
Usually, almost all sane computer programmers works with variables as they never wrapping around and value here is
always in some limits which are defined preliminarily. However, this implicit division operation or ”wrapping around”
can be exploited usefully.
2.1.2 Remainder of division by modulo 2n
... canbeeasilycomputedwithANDoperation. Ifyouneedarandomnumberinrangeof0..16, hereyougo: rand()&0xF.
That helps sometimes.</p>
<p>For example, you need a some kind of wrapping counter variable which always should be in 0..16 range. What you
do? Programmers often write this:
int counter=0;
...
counter++;
if (counter==16)
counter=0;
But here is a version without conditional branching:
int counter=0;
...
counter++;
counter=counter&0xF;
As an example, this I found in the git source code:
char *sha1_to_hex(const unsigned char *sha1)
{
static int bufno;
static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
static const char hex[] = "0123456789abcdef";
char *buffer = hexbuffer[3 & ++bufno], *buf = buffer;
int i;
for (i = 0; i < GIT_SHA1_RAWSZ; i++) {
unsigned int val = *sha1++;
*buf++ = hex[val >> 4];
*buf++ = hex[val & 0xf];
}
*buf = '0';
return buffer;
}
( https://github.com/git/git/blob/aa1c6fdf478c023180e5ca5f1658b00a72592dc6/hex.c )
This function returns a pointer to the string containing hexadecimal representation of SHA1 digest
(like ”4e1243bd22c66e76c2ba9eddc1f91394e57f9f83”). But this is plain C and you can calculate SHA1 for some block,
get pointer to the string, then calculate SHA1 for another block, get pointer to the string, and both pointers are still
13
points to the same string buffer containing the result of the second calculation. As a solution, it’s possible to allo-
cate/deallocate string buffer each time, but more hackish way is to have several buffers (4 are here) and fill the next
each time. The bufno variable here is a buffer counter in 0..3 range. Its value increments each time, and its value is
also always kept in limits by AND operation (3 & ++bufno).
The author of this piece of code (seemingly Linus Torvalds himself) went even further and forgot (?) to initialize bufno
counter variable, which will have randomgarbageat the functionstart. Indeed: no matter, which buffer we are starting
each time! This can be mistake which isn’t affect correctness of the code, or maybe this is left so intentionally – I don’t
know.
2.1.3 Getting random numbers
When you write some kind of videogame, you need random numbers, and the standard C/C++ rand() function gives
you them in 0..0x7FFF range (MSVC) or in 0..0x7FFFFFFF range (GCC). And when you need a random number in 0..10
range, the common way to do it is:
X_coord_of_something_spawned_somewhere=rand() % 10;
Y_coord_of_something_spawned_somewhere=rand() % 10;
No matter what compiler do you use, you can think about it as 10 is subtraced from rand() result, as long as there is
still a number bigger than 10. Hence, result is remainder of division of rand() result by 10.
One nasty consequence is that neither 0x8000 nor 0x80000000 cannot be divided by 10 evenly, so you’ll get some
numbers slightly more often than others.
I tried to calculate in Mathematica. Here is what you get if you write <i>rand()
In[]:= Counts[Map[Mod[#, 3] &, Range[0, 16^^8000 - 1]]]
Out[]= <|0 -> 10923, 1 -> 10923, 2 -> 10922|>
So a number 2 appers slightly seldom than others.
Here is a result for rand() % 10:
In[]:= Counts[Map[Mod[#, 10] &, Range[0, 16^^8000 - 1]]]
Out[]= <|0 -> 3277, 1 -> 3277, 2 -> 3277, 3 -> 3277, 4 -> 3277,
5 -> 3277, 6 -> 3277, 7 -> 3277, 8 -> 3276, 9 -> 3276|>
Numbers 8 and 9 appears slightly seldom (3276 against 3277).
Here is a result for rand() % 100:
In[]:= Counts[Map[Mod[#, 100] &, Range[0, 16^^8000 - 1]]]
Out[]= <|0 -> 328, 1 -> 328, 2 -> 328, 3 -> 328, 4 -> 328, 5 -> 328,
6 -> 328, 7 -> 328, 8 -> 328, 9 -> 328, 10 -> 328, 11 -> 328,
12 -> 328, 13 -> 328, 14 -> 328, 15 -> 328, 16 -> 328, 17 -> 328,
18 -> 328, 19 -> 328, 20 -> 328, 21 -> 328, 22 -> 328, 23 -> 328,
24 -> 328, 25 -> 328, 26 -> 328, 27 -> 328, 28 -> 328, 29 -> 328,
30 -> 328, 31 -> 328, 32 -> 328, 33 -> 328, 34 -> 328, 35 -> 328,
36 -> 328, 37 -> 328, 38 -> 328, 39 -> 328, 40 -> 328, 41 -> 328,
42 -> 328, 43 -> 328, 44 -> 328, 45 -> 328, 46 -> 328, 47 -> 328,
48 -> 328, 49 -> 328, 50 -> 328, 51 -> 328, 52 -> 328, 53 -> 328,
54 -> 328, 55 -> 328, 56 -> 328, 57 -> 328, 58 -> 328, 59 -> 328,
60 -> 328, 61 -> 328, 62 -> 328, 63 -> 328, 64 -> 328, 65 -> 328,
14
66 -> 328, 67 -> 328, 68 -> 327, 69 -> 327, 70 -> 327, 71 -> 327,
72 -> 327, 73 -> 327, 74 -> 327, 75 -> 327, 76 -> 327, 77 -> 327,
78 -> 327, 79 -> 327, 80 -> 327, 81 -> 327, 82 -> 327, 83 -> 327,
84 -> 327, 85 -> 327, 86 -> 327, 87 -> 327, 88 -> 327, 89 -> 327,
90 -> 327, 91 -> 327, 92 -> 327, 93 -> 327, 94 -> 327, 95 -> 327,
96 -> 327, 97 -> 327, 98 -> 327, 99 -> 327|>
…now larger part of numbers happens slightly seldom, these are 68...99.
This is sometimes called modulo bias. It’s perhaps acceptable for videogames, but may be critical for scientific simu-
lations, including Monte Carlo method.
Constructing a PRNG1 with uniform distribution may be tricky, there are couple of methods:
http://www.reddit.com/r/algorithms/comments/39tire/using_a_01_generator_generate_a_random_number/,
http://www.prismmodelchecker.org/casestudies/dice.php.
2.2 Modulo inverse, part I
Example: this piece of code divides by 17:
#include <stdio.h>
#include <stdint.h>
uint32_t div17 (uint32_t a)
{
return a*0xf0f0f0f1;
};
int main()
{
printf ("%dn", div17(1700)); // result=100
printf ("%dn", div17(34)); // result=2
printf ("%dn", div17(2091)); // result=123
};
How it works?
Let’s imagine, we work on 4-bit CPU, it has 4-bit registers, each can hold a value in 0..15 range.
Now we want to divide by 3 using multiplication. Let’s find modulo inverse of 3 using Wolfram Mathematica:
In[]:= PowerMod[3, -1, 16]
Out[]= 11
This is in fact solution of a 3m = 16k + 1 equation (16 = 24):
In[]:= FindInstance[3 m == 16 k + 1, {m, k}, Integers]
Out[]= {{m -> 11, k -> 2}}
1
Pseudorandom number generator
15
The ”magic number” for division by 3 is 11. Multiply by 11 instead of dividing by 3 and you’ll get a result (quotient).
This works, let’s divide 6 by 3. We can now do this by multiplying 6 by 11, this is 66=0x42, but on 4-bit register, only 0x2
will be left in register (0x42 ≡ 2 mod 24). Yes, 2 is correct answer, 6/3=2.
Let’s divide 3, 6 and 9 by 3, by multiplying by 11 (m).
|123456789 abcdef0 | 1 2 . . . f0 |123456789 abcdef0 | 1 2 . . . f0 |123456789 abcdef0 | 1 2 . . . f0 |123456789 abcdef0 |
m=11 |*********** | . . . | | . . . | | . . . | |
3/3 3m=33 |****************|** . . . **|* | . . . | | . . . | |
6/3 6m=66 |****************|** . . . **|****************|** . . . **|** | . . . | |
9/3 9m=99 |****************|** . . . **|****************|** . . . **|****************|** . . . **|*** |
A “protruding” asterisk(s) (“*”) in the last non-empty chunk is what will be left in 4-bit register. This is 1 in case of 33, 2
if 66, 3 if 99.
In fact, this ”protrusion” is defined by 1 in the equation we’ve solved. Let’s replace 1 with 2:
In[]:= FindInstance[3 m == 16 k + 2, {m, k}, Integers]
Out[]= {{m -> 6, k -> 1}}
Now the new ”magic number” is 6. Let’s divide 3 by 3. 3*6=18=0x12, 2 will be left in 4-bit register. This is incorrect, we
have 2 instead of 1. 2 asterisks are ”protruding”. Let’s divide 6 by 3. 6*6=36=0x24, 4 will be left in the register. This is
also incorrect, we now have 4 ”protruding” asterisks instead of correct 2.
Replace 1 in the equation by 0, and nothing will ”protrude”.
2.2.1 No remainder?
Now the problem: this only works for dividends in 3x form, i.e., which can be divided by 3 with no remainder. Try to
divide 4 by 3, 4*11=44=0x2c, 12 will be left in register, this is incorrect. The correct quotient is 1.
We can also notice that the 4-bit register is ”overflown” during multiplication twice as much as in ”incorrect” result in
low 4 bits.
Here is what we can do: use only high 4 bits and drop low 4 bits. 4*11=0x2c and 2 is high 4 bits. Divide 2 by 2, this is 1.
Let’s ”divide” 8 by 3. 8*11=88=0x58. 5/2=2, this is correct answer again.
Now this is the formula we can use on our 4-bit CPU to divide numbers by 3: ”x*3 » 4 / 2” or ”x*3 » 5”. This is the same
as almost all modern compilers do instead of integer division, but they do this for 32-bit and 64-bit registers.
2.3 Modulo inverse, part II
A very simple function:
int f(int a)
{
return a/9;
};
If compiled by non-optimizing GCC 4.4.1...
Listing 2.1: Non-optimizing GCC 4.4.1
public f
f proc near
16
arg_0 = dword ptr 8
push ebp
mov ebp, esp
mov ecx, [ebp+arg_0]
mov edx, 954437177 ; 38E38E39h
mov eax, ecx
imul edx
sar edx, 1
mov eax, ecx
sar eax, 1Fh
mov ecx, edx
sub ecx, eax
mov eax, ecx
pop ebp
retn
f endp
And it can be rewritten back to pure C like that:
#include <stdint.h>
uint32_t divide_by_9 (uint32_t a)
{
return ((uint64_t)a * (uint64_t)954437177) >> 33; // 954437177 = 0x38e38e39
};
This function still works, without division operation. How?
From school-level mathematics, we can remember that division by 9 can be replaced by multiplication by 1
9. In fact,
sometimes compilers do so for floating-point arithmetics, for example, FDIV instruction in x86 code can be replaced
by FMUL. At least MSVC 6.0 will replace division by 9 by multiplication by 0.111111... and sometimes it’s hard to be
sure, what operation was in the original source code.
But when we operate over integer values and integer CPU registers, we can’t use fractions. However, we can rework
fraction like that:
result = x
9 = x · 1
9 = x · 1·MagicNumber
9·MagicNumber
Given the fact that division by 2n is very fast (using shifts), we now should find that MagicNumber, for which the
following equation will be true: 2n = 9 · MagicNumber.
Division by 232 is somewhat hidden: lower 32-bit of product in EAX is not used (dropped), only higher 32-bit of product
(in EDX) is used and then shifted by additional 1 bit.
In other words, the assembly code we have just seen multiplicates by
954437177
232+1 , or divides by
232+1
954437177. To find a
divisor we just have to divide numerator by denominator. Using Wolfram Alpha, we can get 8.99999999.... as result
(which is close to 9).
Many people miss “hidden” division by 232 or 264, when lower 32-bit part (or 64-bit part) of product is not used. This
is why division by multiplication is difficult to understand at the beginning.
17
2.4 Reversible linear congruential generator
LCG2 PRNG is very simple: just multiply seed by some value, add another one and here is a new random number. Here
is how it is implemented in MSVC (the source code is not original one and is reconstructed by me):</p>
uint32_t state;
uint32_t rand()
{
state=state*214013+2531011;
return (state >>16)&0x7FFF;
};
The last bit shift is attempt to compensate LCG weakness and we may ignore it so far. Will it be possible to make an
inverse function to rand(), which can reverse state back? First, let’s try to think, what would make this possible? Well,
if state internal variable would be some kind of BigInt or BigNum container which can store infinitely big numbers,
then, although state is increasing rapidly, it would be possible to reverse the process. But state isn’t BigInt/BigNum,
it’s 32-bit variable, and summing operation is easily reversible on it (just subtract 2531011 at each step). As we may
know now, multiplication is also reversible: just multiply the state by modular multiplicative inverse of 214013!
#include <stdio.h>
#include <stdint.h>
uint32_t state;
void next_state()
{
state=state*214013+2531011;
};
void prev_state()
{
state=state -2531011; // reverse summing operation
state=state*3115528533; // reverse multiply operation. 3115528533 is modular
inverse of 214013 in 232.
};
int main()[style=customc]
{
state=12345;
printf ("state=%dn", state);
next_state();
printf ("state=%dn", state);
next_state();
printf ("state=%dn", state);
next_state();
printf ("state=%dn", state);
prev_state();
2
Linear congruential generator
18
printf ("state=%dn", state);
prev_state();
printf ("state=%dn", state);
prev_state();
printf ("state=%dn", state);
};
Wow, that works!
state=12345
state=-1650445800
state=1255958651
state=-456978094
state=1255958651
state=-1650445800
state=12345
It’s hard to find a real-world application of reversible LCG, but this could be the one: a media player with forward/back-
ward buttons. Once shuffle is clicked, random number is generated (number of item to be played). User clicks forward:
get a new random number by calculating the next state. User clicks backward: get it by calculating the previous state.
Thus, ausercouldnavigatethroughsome”virtual”(butconsistent)playlist, whichisevennotpresentinmediaplayer’s
memory!
2.5 Getting magic number using extended Euclidean algorithm
Extended Euclidean algorithm can find x/y for given a/b in the diophantine equation: ax + by = gcd(a, b).
x/y are also known as “Bézout coefficients”.
However, since a/b are coprime to each other, gcd(a, b) = 1, and the algorithm will find x/y for the ax + by = 1
equation.
Let’s plug a = 3 and b = 216 (like if we finding magic constant for 16-bit CPU):
#include <assert.h>
#include <stdio.h>
#include <stdint.h>
// copypasted and reworked from https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm
void EGCD (int a, int b)
{
int s=0; int old_s=1;
int t=1; int old_t=0;
int r=b; int old_r=a;
int tmp;
while (r!=0)
{
int quotient=old_r/r;
tmp=r; r=old_r - quotient*r; old_r=tmp;
tmp=s; s=old_s - quotient*s; old_s=tmp;
tmp=t; t=old_t - quotient*t; old_t=tmp;
19
};
printf ("GCD: %dn", old_r);
if (old_r!=1)
{
printf("%d and %d are not coprimes!n", a, b);
return;
};
printf ("Bézout coefficients: %d %dn", old_s, old_t);
printf ("quotients by the GCD (s,t): %d %dn", s, t);
// see also: https://math.stackexchange.com/q/1310415
if (old_s >=0 && old_t <=0)
printf ("corrected coefficients: %d(0x%x) %d(0x%x)n", old_s,
old_s, -old_t, -old_t);
else if (old_s <=0 && old_t >=0)
printf ("corrected coefficients: %d(0x%x) %d(0x%x)n", old_s+b,
old_s+b, -old_t+a, -old_t+a);
else
assert(0);
};
void main()
{
EGCD(3, 0x10000);
};
GCD: 1
Bézout coefficients: -21845 1
quotients by the GCD (s,t): 65536 -3
corrected coefficients: 43691(0xaaab) 2(0x2)
That means, x=-21845, y=1. Is it correct? −21845 ∗ 3 + 65536 ∗ 1 = 1
65536-21845=0xaaab, and this is the magic number for division by 3 on 16-bit CPU.
GCD(any_odd_number, 2n) = 1 for any values. Hence, we can find a magic number for any even number. But, this
is not true for even numbers. We can’t find magic coefficient for even number like 10. But you can find it for 5, and
then add additional division by 2 using shift right.
If you try to compile x
10 code, GCC can do it like:
push %ebp
mov %esp,%ebp
mov 0x8(%ebp),%eax
mov $0xcccccccd ,%edx
mul %edx
mov %edx,%eax
shr $0x3,%eax
pop %ebp
ret
20
This is in fact the magic number for division by 5. And there is 3 instead of 2 in the SHR instruction, so the result is
divided by 2.
Extended Euclidean algorithm is probably an efficient way of finding magic number, but needless to say, this equation
can be solved using other ways. In “SAT/SMT by Example”3you can find a method of finding ”magic number” using
SMT-solver.
3
https://sat-smt.codes/
21
22
Chapter 3
Probability
3.1 Text strings right in the middle of compressed data
You can download Linux kernels and find English words right in the middle of compressed data:
% wget https://www.kernel.org/pub/linux/kernel/v4.x/linux -4.10.2.tar.gz
% xxd -g 1 -seek 0x515c550 -l 0x30 linux -4.10.2.tar.gz
0515c550: c5 59 43 cf 41 27 85 54 35 4a 57 90 73 89 b7 6a .YC.A'.T5JW.s..j
0515c560: 15 af 03 db 20 df 6a 51 f9 56 49 52 55 53 3d da .... .jQ.VIRUS=.
0515c570: 0e b9 29 24 cc 6a 38 e2 78 66 09 33 72 aa 88 df ..)$.j8.xf.3r...
% wget https://cdn.kernel.org/pub/linux/kernel/v2.3/linux -2.3.3.tar.bz2
% xxd -g 1 -seek 0xa93086 -l 0x30 linux -2.3.3.tar.bz2
00a93086: 4d 45 54 41 4c cd 44 45 2d 2c 41 41 54 94 8b a1 METAL.DE-,AAT...
00a93096: 5d 2b d8 d0 bd d8 06 91 74 ab 41 a0 0a 8a 94 68 ]+......t.A....h
00a930a6: 66 56 86 81 68 0d 0e 25 6b b6 80 a4 28 1a 00 a4 fV..h..%k...(...
One of Linux kernel patches in compressed form has the “Linux” word itself:
% wget https://cdn.kernel.org/pub/linux/kernel/v4.x/testing/patch -4.6-rc4.gz
% xxd -g 1 -seek 0x4d03f -l 0x30 patch -4.6-rc4.gz
0004d03f: c7 40 24 bd ae ef ee 03 2c 95 dc 65 eb 31 d3 f1 .@$.....,..e.1..
0004d04f: 4c 69 6e 75 78 f2 f3 70 3c 3a bd 3e bd f8 59 7e Linux..p<:.>..Y∼
0004d05f: cd 76 55 74 2b cb d5 af 7a 35 56 d7 5e 07 5a 67 .vUt+...z5V.^.Zg
Other English words I’ve found in other compressed Linux kernel trees:
linux -4.6.2.tar.gz: [maybe] at 0x68e78ec
linux -4.10.14.tar.xz: [OCEAN] at 0x6bf0a8
linux -4.7.8.tar.gz: [FUNNY] at 0x29e6e20
linux -4.6.4.tar.gz: [DRINK] at 0x68dc314
23
linux -2.6.11.8.tar.bz2: [LUCKY] at 0x1ab5be7
linux -3.0.68.tar.gz: [BOOST] at 0x11238c7
linux -3.0.16.tar.bz2: [APPLE] at 0x34c091
linux -3.0.26.tar.xz: [magic] at 0x296f7d9
linux -3.11.8.tar.bz2: [TRUTH] at 0xf635ba
linux -3.10.11.tar.bz2: [logic] at 0x4a7f794
There is a nice illustration of apophenia and pareidolia (human’s mind ability to see faces in clouds, etc) in Lurkmore,
Russian counterpart of Encyclopedia Dramatica. As they wrote in the article about electronic voice phenomenon1,
you can open any long enough compressed file in hex editor and find well-known 3-letter Russian obscene word, and
you’ll find it a lot: but that means nothing, just a mere coincidence.
AndIwasinterestedincalculation, howbigcompressedfilemustbetocontainallpossible3-letter, 4-letter, etc, words?
In my naive calculations, I’ve got this: probability of the first specific byte in the middle of compressed data stream
with maximal entropy is 1
256, probability of the 2nd is also 1
256, and probability of specific byte pair is 1
256·256 = 1
2562 .
Probabilty of specific triple is 1
2563 . If the file has maximal entropy (which is almost unachievable, but …) and we live
in an ideal world, you’ve got to have a file of size just 2563 = 16777216, which is 16-17MB. You can check: get any
compressed file, and use rafind2 to search for any 3-letter word (not just that Russian obscene one).
It took ≈ 8-9 GB of my downloaded movies/TV series files to find the word “beer” in them (case sensitive). Perhaps,
these movies wasn’t compressed good enough? This is also true for a well-known 4-letter English obscene word.
My approach is naive, so I googled for mathematically grounded one, and have find this question: “Time until a con-
secutive sequence of ones in a random bit sequence” 2. The answer is: (p−n−1)/(1−p), where p is probability of each
event and n is number of consecutive events. Plug 1
256 and 3 and you’ll get almost the same as my naive calculations.
So any 3-letter word can be found in the compressed file (with ideal entropy) of length 2563 =≈ 17MB, any 4-letter
word — 2564 = 4.7GB (size of DVD). Any 5-letter word — 2565 =≈ 1TB.
For the piece of text you are reading now, I mirrored the whole kernel.org website (hopefully, sysadmins can forgive
me), and it has ≈ 430GB of compressed Linux Kernel source trees. It has enough compressed data to contain these
words, however, I cheated a bit: I searched for both lowercase and uppercase strings, thus compressed data set I need
is almost halved.
This is quite interesting thing to think about: 1TB of compressed data with maximal entropy has all possible 5-byte
chains, but the data is encoded not in chains itself, but in the order of chains (no matter of compression algorithm,
etc).
Now the information for gamblers: one should throw a dice ≈ 42 times to get a pair of six, but no one will tell you,
when exactly this will happen. I don’t remember, how many times coin was tossed in the “Rosencrantz & Guildenstern
Are Dead” movie, but one should toss it ≈ 2048 times and at some point, you’ll get 10 heads in a row, and at some
other point, 10 tails in a row. Again, no one will tell you, when exactly this will happen.
Compressed data can also be treated as a stream of random data, so we can use the same mathematics to determine
probabilities, etc.
If you can live with strings of mixed case, like “bEeR”, probabilities and compressed data sets are much lower: 1283 =
2MB for all 3-letter words of mixed case, 1284 = 268MB for all 4-letter words, 1285 = 34GB for all 5-letter words,
etc.
Moral of the story: whenever you search for some patterns, you can find it in the middle of compressed blob, but that
means nothing else then coincidence. In philosophical sense, this is a case of selection/confirmation bias: you find
1
http://archive.is/gYnFL
2
http://math.stackexchange.com/questions/27989/time-until-a-consecutive-sequence-of-ones-in-a-random-bit-sequence/
27991#27991
24
what you search for in “The Library of Babel” 3.
3.2 Autocomplete using Markov chains
TL;DR: collect statistics, for a given natural language, what words come most often after a word/pair of words/triplet
of words.
What are most chess moves played after 1.e4 e5 2.Nf3 Nf6? A big database of chess games can be queried, showing
statistics:
( from https://chess-db.com/ )
Statistics shown is just number of games, where a corresponding 3rd move was played.
The same database can be made for natural language words.
3.2.1 Dissociated press
Thisisawell-knownjoke: https://en.wikipedia.org/wiki/Dissociated_press,https://en.wikipedia.org/
wiki/SCIgen, https://en.wikipedia.org/wiki/Mark_V._Shaney.
I wrote a Python script, took Sherlock Holmes stories from Gutenberg library...
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
# python 3! due to random.choices()
import operator , random
from collections import defaultdict
3
A short story by Jorge Luis Borges
25
with open ("all.txt", "r") as myfile:
data=myfile.read()
sentences=data.lower().replace('r',' ').replace('n',' ').replace('?','.').
replace('!','.').replace('“','.').replace('”','.').replace(""",".").replace(
'‘',' ').replace('-',' ').replace('’',' ').replace(''',' ').split(".")
def remove_empty_words(l):
return list(filter(lambda a: a != '', l))
# key=list of words, as a string, delimited by space
# (I would use list of strings here as key, but list in not hashable)
# val=dict, k: next word; v: occurrences
first={}
second={}
def update_occ(d, seq, w):
if seq not in d:
d[seq]=defaultdict(int)
d[seq][w]=d[seq][w]+1
for s in sentences:
words=s.replace(',',' ').split(" ")
words=remove_empty_words(words)
if len(words)==0:
continue
for i in range(len(words)):
if i>=1:
update_occ(first, words[i-1], words[i])
if i>=2:
update_occ(second , words[i-2]+" "+words[i-1], words[i])
"""
print ("first table:")
for k in first:
print (k)
# https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary -by-
value
s=sorted(first[k].items(), key=operator.itemgetter(1), reverse=True)
print (s[:20])
print ("")
"""
"""
print ("second table:")
for k in second:
print (k)
# https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary -by-
value
s=sorted(second[k].items(), key=operator.itemgetter(1), reverse=True)
26
print (s[:20])
print ("")
"""
text=["it", "is"]
# https://docs.python.org/3/library/random.html#random.choice
def gen_random_from_tbl(t):
return random.choices(list(t.keys()), weights=list(t.values()))[0]
text_len=len(text)
# generate at most 100 words:
for i in range(200):
last_idx=text_len -1
tmp=text[last_idx -1]+" "+text[last_idx]
if tmp in second:
new_word=gen_random_from_tbl(second[tmp])
else:
# fall-back to 1st order
tmp2=text[last_idx]
if tmp2 not in first:
# dead-end
break
new_word=gen_random_from_tbl(first[tmp2])
text.append(new_word)
text_len=text_len+1
print (" ".join(text))
And here are some 1st-order Markov chains. First part is a first word. Second is a list of words + probabilities of appear-
ance of each one, in the Sherlock Holmes stories. However, probabilities are in form of words’ numbers.
In other word, how often each word appears after a word?
return
[('to', 28), ('or', 12), ('of', 8), ('the', 8), ('for', 5), ('with', 4), ('by',
4), ('journey ', 4), ('and', 3), ('when', 2), ('ticket', 2), ('from', 2), ('at
', 2), ('you', 2), ('i', 2), ('since', 1), ('on', 1), ('caused', 1), ('but',
1), ('it', 1)]
of
[('the', 3206), ('a', 680), ('his', 584), ('this', 338), ('it', 304), ('my',
295), ('course', 225), ('them', 167), ('that', 138), ('your', 137), ('our',
135), ('us', 122), ('her', 116), ('him', 111), ('an', 105), ('any', 102), ('
these', 92), ('which', 89), ('all', 82), ('those', 75)]
by
27
[('the', 532), ('a', 164), ('his', 67), ('my', 39), ('no', 34), ('this', 31), ('
an', 30), ('which', 29), ('your', 21), ('that', 19), ('one', 17), ('all', 17)
, ('jove', 16), ('some', 16), ('sir', 13), ('its', 13), ('him', 13), ('their
', 13), ('it', 11), ('her', 10)]
this
[('is', 177), ('agreement ', 96), ('morning ', 81), ('was', 61), ('work', 60), ('
man', 56), ('case', 43), ('time', 39), ('matter', 37), ('ebook', 36), ('way',
36), ('fellow', 28), ('room', 27), ('letter', 24), ('one', 24), ('i', 23),
('young', 21), ('very', 20), ('project ', 18), ('electronic ', 18)]
no
[('doubt', 179), ('one', 134), ('sir', 61), ('more', 56), ('i', 55), ('no', 42),
('other', 36), ('means', 36), ('sign', 34), ('harm', 23), ('reason', 22), ('
difficulty ', 22), ('use', 21), ('idea', 20), ('longer', 20), ('signs', 18),
('good', 17), ('great', 16), ('trace', 15), ('man', 15)]
Now some snippets from 2nd-order Markov chains.
the return
[('of', 8), ('track', 1), ('to', 1)]
return of
[('sherlock ', 6), ('lady', 1), ('the', 1)]
for the
[('use', 22), ('first', 12), ('purpose ', 9), ('sake', 8), ('rest', 7), ('best',
7), ('crime', 6), ('evening ', 6), ('ebooks', 6), ('limited ', 6), ('moment',
6), ('day', 5), ('future', 5), ('last', 5), ('world', 5), ('time', 5), ('loss
', 4), ('second', 4), ('night', 4), ('remainder ', 4)]
use of
[('the', 15), ('anyone', 12), ('it', 6), ('project ', 6), ('and', 6), ('having ',
2), ('this', 1), ('my', 1), ('a', 1), ('horses', 1), ('some', 1), ('arguing ',
1), ('troubling ', 1), ('artificial ', 1), ('invalids ', 1), ('cocaine ', 1), ('
disguises ', 1), ('an', 1), ('our', 1)]
you may
[('have', 17), ('copy', 13), ('be', 13), ('do', 9), ('choose', 7), ('use', 6),
('obtain', 6), ('convert ', 6), ('charge', 6), ('demand', 6), ('remember ', 5),
('find', 5), ('take', 4), ('think', 3), ('possibly ', 3), ('well', 3), ('know
', 3), ('not', 3), ('say', 3), ('imagine ', 3)]
the three
[('of', 8), ('men', 4), ('students ', 2), ('glasses ', 2), ('which', 1), ('strips
', 1), ('is', 1), ('he', 1), ('gentlemen ', 1), ('enterprising ', 1), ('massive
', 1), ('quarter ', 1), ('randalls ', 1), ('fugitives ', 1), ('animals ', 1), ('
shrieked ', 1), ('other', 1), ('murderers ', 1), ('fir', 1), ('at', 1)]
it was
28
[('a', 179), ('the', 78), ('not', 68), ('only', 40), ('in', 30), ('evident ', 28)
, ('that', 28), ('all', 25), ('to', 21), ('an', 18), ('my', 17), ('at', 17),
('impossible ', 17), ('indeed', 15), ('no', 15), ('quite', 15), ('he', 14), ('
of', 14), ('one', 12), ('on', 12)]
Now two words is a key in dictionary. And we see here an answer for the question ”how often each words appears after
a sequence of two words?”
Now let’s generate some rubbish:
it is just to the face of granite still cutting the lower part of the shame
which i would draw up so as i can t have any visitors if he is passionately
fond of a dull wrack was drifting slowly in our skin before long vanished in
front of him of this arch rebel lines the path which i had never set foot in
the same with the heads of these have been supplemented or more waiting to
tell me all that my assistant hardly knowing whether i had 4 pounds a month
however is foreign to the other hand were most admirable but because i know
why he broke into a curve by the crew had thought it might have been on a
different type they were about to set like a plucked chicken s making the
case which may help to play it at the door and called for the convenience of
a boat sir maybe i could not be made to draw some just inference that a woman
exactly corresponding to the question was how a miners camp had been dashed
savagely against the rock in front of the will was drawn across the golden
rays and it
it is the letter was composed of a tall stout official had come the other way
that she had been made in this i expect that we had news that the office of
the mud settling or the most minute exactness and astuteness represented as i
come to see the advantages which london then and would i could not speak
before the public domain ebooks in compliance with any particular paper
edition as to those letters come so horribly to his feet upon the wet clayey
soil but since your wife and of such damage or cannot be long before the
rounds come again whenever she might be useful to him so now my fine fellow
will trouble us again and again and making a little wizened man darted out of
the baskervilles *** produced by roger squires updated editions will be the
matter and let out forty three diamonds of the attack we carried him into the
back and white feather were but a terrible fellow he is not for your share
in an instant holmes clapped his hands and play with me anyhow i d ha known
you under that name in a considerable treasure was hid for no other
it is the unofficial force the shutter open but so far as to what i say to me
like that which is rather too far from at ease for i knew that we were
driving; but soon it came just as he opened it myself i was not a usual
signal between you and you thought it might be removed to a magistrate than
upon the luminous screen of the one word would he have wanted there are
business relations between him and we listened to his eyes to the spot as
soon as their master s affairs of life since she was but one day we hoped
would screen me from under his arm chair of shining red leather chair his
flask in his chair and gave me his name is sherlock holmes took each face of
a barmaid in bristol and marry her but learned from dr
29
it is selden the criminal or lunatic who had left a match might mar my career
had reached the tower of the crime was committed before twelve foolish
tradesmen in a bad lot though the authorities are excellent at amassing facts
though what its object might be a better nerve for the creature flapped and
struggled and writhed his hands in her bosom lady hilda was down on the table
and the curious will so suddenly upon the floor were now nearly five
thousand pounds will be impossible but i struck him down and roared into the
shadow of the thing seemed simplicity itself said i you seem most fortunate
in having every characteristic of the place is deserted up to keep some
clandestine appointment and found as i have no desire to settle this little
matter of fact of absolute ignorance and he with a similar pearl without any
light upon what he had found ourselves at the time and my heart sank for
barrymore at the close of november and holmes fears came to think over the
afghan campaign yet shown by this time at which he now and i have seen his
death this morning he walked straight into
By first look, these pieces of text are visually OK, but it is senseless. Some people (including me) find it amusing.
Spammers also use this technique to make email message visually similar to a meaningful text, albeit it is not mean-
ingful at all, rather absurdic and funny.
3.2.2 Autocomplete
It’s surprising how easy this can be turned into something rather practically useful.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import operator
from collections import defaultdict
with open ("all.txt", "r") as myfile:
data=myfile.read()
sentences=data.lower().replace('r',' ').replace('n',' ').replace('?','.').
replace('!','.').replace('“','.').replace('”','.').replace(""",".").replace(
'‘',' ').replace('-',' ').replace('’',' ').replace(''',' ').split(".")
def remove_empty_words(l):
return list(filter(lambda a: a != '', l))
# key=list of words, as a string, delimited by space
# (I would use list of strings here as key, but list in not hashable)
# val=dict, k: next word; v: occurrences
first={}
second={}
third={}
def update_occ(d, seq, w):
if seq not in d:
d[seq]=defaultdict(int)
30
d[seq][w]=d[seq][w]+1
for s in sentences:
words=s.replace(',',' ').split(" ")
words=remove_empty_words(words)
if len(words)==0:
continue
for i in range(len(words)):
# only two words available:
if i>=1:
update_occ(first, words[i-1], words[i])
# three words available:
if i>=2:
update_occ(second , words[i-2]+" "+words[i-1], words[i])
# four words available:
if i>=3:
update_occ(third, words[i-3]+" "+words[i-2]+" "+words[i-1], words[i
])
"""
print ("third table:")
for k in third:
print (k)
# https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary -by-
value
s=sorted(third[k].items(), key=operator.itemgetter(1), reverse=True)
print (s[:20])
print ("")
"""
test="i can tell"
#test="who did this"
#test="she was a"
#test="he was a"
#test="i did not"
#test="all that she"
#test="have not been"
#test="who wanted to"
#test="he wanted to"
#test="wanted to do"
#test="it is just"
#test="you will find"
#test="you shall"
#test="proved to be"
test_words=test.split(" ")
test_len=len(test_words)
last_idx=test_len -1
31
def print_stat(t):
total=float(sum(t.values()))
# https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value
s=sorted(t.items(), key=operator.itemgetter(1), reverse=True)
# take 5 from each sorted table
for pair in s[:5]:
print ("%s %d%%" % (pair[0], (float(pair[1])/total)*100))
if test_len >=3:
tmp=test_words[last_idx -2]+" "+test_words[last_idx -1]+" "+test_words[
last_idx]
if tmp in third:
print ("* third order. for sequence:",tmp)
print_stat(third[tmp])
if test_len >=2:
tmp=test_words[last_idx -1]+" "+test_words[last_idx]
if tmp in second:
print ("* second order. for sequence:", tmp)
print_stat(second[tmp])
if test_len >=1:
tmp=test_words[last_idx]
if tmp in first:
print ("* first order. for word:", tmp)
print_stat(first[tmp])
print ("")
First, let’s also make 3rd-order Markov chains tables. There are some snippets from it:
the return of
[('sherlock ', 6), ('lady', 1), ('the', 1)]
the use of
[('the', 13), ('anyone', 12), ('project ', 6), ('having', 2), ('a', 1), ('horses
', 1), ('some', 1), ('troubling ', 1), ('artificial ', 1), ('invalids ', 1), ('
disguises ', 1), ('it', 1)]
of the second
[('stain', 5), ('floor', 3), ('page', 1), ('there', 1), ('person', 1), ('party',
1), ('day', 1)]
it was in
[('the', 9), ('vain', 5), ('a', 4), ('his', 2), ('83', 1), ('one', 1), ('motion
', 1), ('truth', 1), ('my', 1), ('this', 1), ('march', 1), ('january ', 1), ('
june', 1), ('me', 1)]
was in the
32
[('room', 3), ('habit', 3), ('house', 2), ('last', 2), ('loft', 2), ('spring ',
1), ('train', 1), ('bar', 1), ('power', 1), ('immediate ', 1), ('year', 1), ('
midst', 1), ('forenoon ', 1), ('centre', 1), ('papers', 1), ('best', 1), ('
darkest ', 1), ('prime', 1), ('hospital ', 1), ('nursery ', 1)]
murder of the
[('honourable ', 1), ('discoverer ', 1)]
particulars of the
[('crime', 1), ('inquest ', 1), ('habits ', 1), ('voyage', 1)]
the death of
[('the', 8), ('sir', 8), ('captain ', 3), ('their', 2), ('this', 2), ('his', 2),
('her', 2), ('dr', 2), ('sherlock ', 1), ('mrs', 1), ('a', 1), ('van', 1), ('
that', 1), ('drebber ', 1), ('mr', 1), ('stangerson ', 1), ('two', 1), ('selden
', 1), ('one', 1), ('wallenstein ', 1)]
one of the
[('most', 23), ('best', 3), ('main', 3), ('oldest', 2), ('greatest ', 2), ('
numerous ', 2), ('corners ', 2), ('largest ', 2), ('papers', 2), ('richest ', 2),
('servants ', 2), ('moor', 2), ('finest', 2), ('upper', 2), ('very', 2), ('
broad', 2), ('side', 2), ('highest ', 2), ('australian ', 1), ('great', 1)]
so far as
[('i', 17), ('to', 8), ('that', 2), ('the', 2), ('it', 2), ('was', 1), ('we', 1)
, ('they', 1), ('he', 1), ('you', 1), ('a', 1), ('your', 1), ('this', 1), ('
his', 1), ('she', 1)]
You see, they looks as more precise, but tables are just smaller. You can’t use them to generate rubbish. 1st-order
tables big, but ”less precise”.
And here I test some 3-words queries, like as if they inputted by user:
"i can tell"
* third order. for sequence: i can tell
you 66%
my 16%
them 16%
* second order. for sequence: can tell
you 23%
us 17%
me 11%
your 11%
this 5%
* first order. for word: tell
you 35%
me 25%
us 6%
him 5%
the 4%
33
"she was a"
* third order. for sequence: she was a
blonde 12%
child 6%
passenger 6%
free 6%
very 6%
* second order. for sequence: was a
very 4%
man 3%
small 3%
long 2%
little 2%
* first order. for word: a
very 2%
man 2%
little 2%
few 2%
small 1%
"he was a"
* third order. for sequence: he was a
man 11%
very 7%
young 3%
tall 3%
middle 2%
* second order. for sequence: was a
very 4%
man 3%
small 3%
long 2%
little 2%
* first order. for word: a
very 2%
man 2%
little 2%
few 2%
small 1%
"i did not"
* third order. for sequence: i did not
know 22%
say 9%
even 3%
tell 3%
34
mind 3%
* second order. for sequence: did not
know 11%
take 3%
wish 3%
go 2%
say 2%
* first order. for word: not
a 4%
be 4%
have 3%
been 3%
to 2%
"all that she"
* third order. for sequence: all that she
said 100%
* second order. for sequence: that she
had 22%
was 19%
would 7%
is 5%
has 5%
* first order. for word: she
was 12%
had 10%
is 5%
said 3%
would 3%
"have not been"
* third order. for sequence: have not been
able 25%
here 25%
employed 12%
shadowed 12%
personally 12%
* second order. for sequence: not been
for 9%
in 6%
there 6%
a 4%
slept 4%
* first order. for word: been
a 5%
in 4%
the 2%
so 2%
35
taken 1%
"who wanted to"
* third order. for sequence: who wanted to
see 100%
* second order. for sequence: wanted to
see 15%
know 15%
speak 10%
ask 10%
hide 5%
* first order. for word: to
the 11%
be 6%
me 4%
his 2%
my 2%
"he wanted to"
* third order. for sequence: he wanted to
know 50%
do 50%
* second order. for sequence: wanted to
see 15%
know 15%
speak 10%
ask 10%
hide 5%
* first order. for word: to
the 11%
be 6%
me 4%
his 2%
my 2%
"wanted to do"
* third order. for sequence: wanted to do
the 100%
* second order. for sequence: to do
with 23%
so 14%
it 10%
the 4%
and 3%
* first order. for word: do
you 24%
not 20%
36
with 6%
so 4%
it 4%
"it is just"
* third order. for sequence: it is just
possible 42%
as 33%
killing 4%
such 4%
in 4%
* second order. for sequence: is just
possible 25%
as 22%
the 8%
a 8%
to 5%
* first order. for word: just
as 13%
now 5%
a 4%
to 3%
the 2%
"you will find"
* third order. for sequence: you will find
that 20%
it 20%
me 8%
the 8%
a 6%
* second order. for sequence: will find
it 19%
that 17%
the 9%
me 7%
a 5%
* first order. for word: find
that 13%
the 10%
it 9%
out 8%
a 6%
"you shall"
* second order. for sequence: you shall
have 15%
37
see 12%
know 12%
hear 9%
not 9%
* first order. for word: shall
be 20%
have 7%
not 4%
see 3%
i 3%
"proved to be"
* third order. for sequence: proved to be
a 42%
the 14%
too 4%
something 4%
of 4%
* second order. for sequence: to be
a 11%
the 3%
in 3%
able 2%
so 2%
* first order. for word: be
a 7%
the 4%
in 2%
of 2%
no 2%
Perhaps, results from all 3 tables can be combined, with the data from 3rd order table used in highest priority (or
weight).
And this is it — this can be shown to user. Aside of Conan Doyle works, your software can collect user’s input to adapt
itself for user’s lexicon, slang, memes, etc. Of course, user’s ”tables” should be used with highest priority.
I have no idea, what Apple/Android devices using for hints generation, when user input text, but this is what I would
use as a first idea.
As a bonus, this can be used for language learners, to get the idea, how a word is used in a sentence.
3.2.3 Further work
Comma can be a separator as well, etc...
3.2.4 The files
... includingConanDoyle’sstories(2.5M).https://github.com/DennisYurichev/Math-for-programmers/tree/
master/prob/markov. Surely, any other texts can be used, in any language...
38
Another related post is about typos: https://yurichev.com/blog/fuzzy_string/.
3.2.5 Read more
Meaningful Random Headlines by Markov Chain
A discussion at hacker news
3.3 random.choices() in Python 3
This is a very useful function4: weights (or probabilities) can be added.
(I used it in my Markov chains example (3.2).)
For example:
#!/usr/bin/env python3
import random
for i in range(1000):
print (random.choices("123", [25, 60, 15]))
Let’s generate 1000 random numbers in 1..3 range:
$ python3 tst.py | sort | uniq -c
234 ['1']
613 ['2']
153 ['3']
”1” is generated in 25% of cases, ”2” in 60% and ”3” in 15%. Well, almost.
Here is another use of it.
You know, when you send an email, the final destination is a server somewhere. But it may be irresponsible. So
network engineers add additional servers, ”relays”, which can hold your email for some time.
For example, what is about gmail.com?
% dig gmail.com MX
...
;; ANSWER SECTION:
gmail.com. 3600 IN MX 5 gmail-smtp-in.l.google.com.
gmail.com. 3600 IN MX 10 alt1.gmail-smtp-in.l.google.
com.
gmail.com. 3600 IN MX 20 alt2.gmail-smtp-in.l.google.
com.
gmail.com. 3600 IN MX 30 alt3.gmail-smtp-in.l.google.
com.
gmail.com. 3600 IN MX 40 alt4.gmail-smtp-in.l.google.
com.
4
https://docs.python.org/3/library/random.html#random.choices
39
...
The first server is primary (marked with 5). Other 4 (alt...) are relays. They can hold emails for user@gmail.com if the
main server is down. Of course, relays also can be down. So an MTA (Message transfer agent) tries to send an email
via the first server in list, then via the second, etc. If all are down, MTA is waiting for some time (not infinitely).
See also: https://en.wikipedia.org/wiki/MX_record.
A number (5/10/20/30/40) is priority:
MX records contain a preference indication that MUST be used in
sorting if more than one such record appears (see below). Lower
numbers are more preferred than higher ones. If there are multiple
destinations with the same preference and there is no clear reason to
favor one (e.g., by recognition of an easily reached address), then
the sender-SMTP MUST randomize them to spread the load across
multiple mail exchangers for a specific organization.
( https://tools.ietf.org/html/rfc5321 )
NowifyouwantyourMTAbepolite, youcanmakeitpokerelayswithsomeprobability, unloadingthemainmailserver.
In any case, the internal network withing Google is way better than a link between you and any of these mail servers.
And it would be OK to drop an email to any of these mail servers listed in MX records.
This is how a destination server can be chosen:
random.choices(range(4), weights=[1/5, 1/10, 1/20, 1/40])
I’m using reciprocal weights (1/x) because the lower priority, the higher probability it is to be chosen.
What if I want to send 100 emails to someone@gmail.com?
>>> [random.choices(range(4), weights=[1/5, 1/10, 1/20, 1/40])[0] for x in range
(100)]
[1, 1, 2, 1, 0, 2, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
0, 0, 1, 1, 1, 0, 1, 0, 2, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 3, 0, 0,
2, 1, 0, 0, 0, 0, 1, 2, 2, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0,
1, 0, 2, 1, 0, 0, 2, 0, 0, 0, 3, 2, 0, 1, 2, 0, 1, 1, 3, 1, 1, 1, 1]
1000? (I’m using the collections.Counter5 here for gathering statistics).
>>> Counter([random.choices(range(4), weights=[1/5, 1/10, 1/20, 1/40])[0] for x
in range(1000)])
Counter({0: 535, 1: 268, 2: 129, 3: 68})
535 emails will be sent via the first (primary) mail server, 268/129/68 – via corresponding relays.
This is probably not how MTAs usually operates, but this is how it could be done.
5
https://docs.python.org/3/library/collections.html#collections.Counter
40
Chapter 4
Combinatorics
4.1 Soldering a headphones cable
This is a real story: I tried to repair my headphone’s cable, soldering 3 wires with minijack together. But which is left?
Right? I couldn’t even find ground wire. I asked myself, how many ways there are to solder 3 wires to 3-pin minijack?
I could try them all and pick the combination that sounds best.
With Python’s itertools module this is just:
import itertools
wires=["red", "green", "blue"]
for i in itertools.permutations(wires):
print i
('red', 'green', 'blue')
('red', 'blue', 'green ')
('green', 'red', 'blue')
('green', 'blue', 'red')
('blue', 'red', 'green ')
('blue', 'green', 'red')
(Just 6 ways.)
What if there are 4 wires?
import itertools
wires=["red", "green", "blue", "yellow"]
for i in itertools.permutations(wires):
print i
('red', 'green', 'blue', 'yellow ')
('red', 'green', 'yellow', 'blue')
('red', 'blue', 'green', 'yellow ')
('red', 'blue', 'yellow', 'green ')
41
('red', 'yellow', 'green', 'blue')
('red', 'yellow', 'blue', 'green ')
('green', 'red', 'blue', 'yellow ')
('green', 'red', 'yellow', 'blue')
('green', 'blue', 'red', 'yellow ')
('green', 'blue', 'yellow', 'red')
('green', 'yellow', 'red', 'blue')
('green', 'yellow', 'blue', 'red')
('blue', 'red', 'green', 'yellow ')
('blue', 'red', 'yellow', 'green ')
('blue', 'green', 'red', 'yellow ')
('blue', 'green', 'yellow', 'red')
('blue', 'yellow', 'red', 'green ')
('blue', 'yellow', 'green', 'red')
('yellow', 'red', 'green', 'blue')
('yellow', 'red', 'blue', 'green ')
('yellow', 'green', 'red', 'blue')
('yellow', 'green', 'blue', 'red')
('yellow', 'blue', 'red', 'green ')
('yellow', 'blue', 'green', 'red')
(24 ways.)
This is what is called permutation in combinatorics.
4.2 Vehicle license plate
There was a hit and run. And you’re police officer. And this is what your only witness says about 4-digit license plate
of the guilty vehicle:
There was 13: 1 and 3 together , I'm sure, but not sure where, 13 as first 2
digits, last 2 digits or in the middle.
And there was also 6, or maybe 8, or maybe even 9, not sure which, but one of
them.
Combinatorics textbooks are abound with exercises like this: can you enumerate all possible 4-digit numbers con-
strained in this way?
import itertools
part1_list=["13"]
part2_list=["6", "8", "9"]
part3_list=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"]
for i in itertools.product(part1_list , part2_list , part3_list):
for j in itertools.permutations(i):
print "".join(j)
1360
1306
42
6130
6013
...
9139
9913
9139
9913
( 180 numbers )
This is something you can query registered vehicle database with...
4.3 Forgotten password
You forgot a password, but this is what you remember: there was a name of your parent, or wife, or one of children.
Also, someone’s year of birth. And one punctuation character, which are so recommended in passwords. Can you
enumerate all possible passwords?
import itertools
part1_list=["jake", "melissa", "oliver", "emily"]
part2_list=["1987", "1954", "1963"]
part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","]
for part1 in part1_list:
for part2 in part2_list:
for part3 in part3_list:
l=[part1, part2, part3]
for i in list(itertools.permutations(l)):
print "".join(i)
jake1987!
jake!1987
1987jake!
1987!jake
!jake1987
...
1963emily,
1963,emily
,emily1963
,1963emily
( 936 of them in total )
But nested for’s are not aesthetically pleasing. They can be replaced with ”cartesian product” operation:
import itertools
43
part1_list=["jake", "melissa", "oliver", "emily"]
part2_list=["1987", "1954", "1963"]
part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","]
for l in itertools.product(part1_list , part2_list , part3_list):
for i in list(itertools.permutations(l)):
print "".join(i)
And this is a way to memorize it: the length of the final result equals to lengths of all input lists multiplied with each
other (like ”product”).
import itertools
part1_list=["jake", "melissa", "oliver", "emily"] # 4 elements
part2_list=["1987", "1954", "1963"] # 3 elements
part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","] # 13 elements
for l in itertools.product(part1_list , part2_list , part3_list):
print l
('jake', '1987', '!')
('jake', '1987', '@')
('jake', '1987', '#')
('jake', '1987', '$')
('jake', '1987', '%')
('jake', '1987', '&')
('jake', '1987', '*')
...
('emily', '1963', '*')
('emily', '1963', '-')
('emily', '1963', '=')
('emily', '1963', '_')
('emily', '1963', '+')
('emily', '1963', '.')
('emily', '1963', ',')
4*3*13=156, and this is a size of a list, to be permuted...
Now the new problem: some Latin characters may be uppercased, some are lowercased. I’ll add another ”cartesian
product” operation to alter a final string in all possible ways:
import itertools , string
part1_list=["jake", "melissa", "oliver", "emily"]
part2_list=["1987", "1954", "1963"]
part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","]
for l in itertools.product(part1_list , part2_list , part3_list):
44
for i in list(itertools.permutations(l)):
s="".join(i)
t=[]
for char in s:
if char.isalpha():
t.append([string.lower(char), string.upper(char)])
else:
t.append([char])
for q in itertools.product(*t):
print "".join(q)
JAke1987!
JAkE1987!
JAKe1987!
JAKE1987!
jake!1987
jakE!1987
jaKe!1987
jaKE!1987
...
,1963eMIly
,1963eMIlY
,1963eMILy
,1963eMILY
,1963Emily
,1963EmilY
,1963EmiLy
( 56160 passwords )
Now leetspeak1 This is somewhat popular only among youngsters, but still, this is what people of all age groups do:
replacing ”o” with ”0” in their passwords, ”e” with ”3”, etc. Let’s add this as well:
import itertools , string
part1_list=["jake", "melissa", "oliver", "emily"]
part2_list=["1987", "1954", "1963"]
part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","]
for l in itertools.product(part1_list , part2_list , part3_list):
for i in list(itertools.permutations(l)):
s="".join(i)
t=[]
for char in s:
if char.isalpha():
to_be_appended=[string.lower(char), string.upper(char)]
if char.lower()=='e':
1
urbandictionary.com
45
to_be_appended.append('3')
elif char.lower()=='i':
to_be_appended.append('1')
elif char.lower()=='o':
to_be_appended.append('0')
t.append(to_be_appended)
else:
t.append([char])
for q in itertools.product(*t):
print "".join(q)
jake1987!
jakE1987!
jak31987!
jaKe1987!
jaKE1987!
jaK31987!
...
,1963EM1lY
,1963EM1Ly
,1963EM1LY
,19633mily
,19633milY
,19633miLy
( 140400 passwords )
Obviously, you can’t try all 140400 passwords on Facebook, Twitter or any other well-protected internet service. But
this is a peace of cake to brute-force them all on password protected RAR-archive or feed them all to John the Ripper,
HashCat, etc.
All the files: https://github.com/DennisYurichev/Math-for-programmers/tree/master/comb/password.
Now let’s use combinations from itertools Python package.
Let’s say, you remember that your password has maybe your name, maybe name of your wife, your year of birth, or
her, and maybe couple of symbols like !, $, ^.
import itertools
parts=["den", "xenia", "1979", "1985", "secret", "!", "$", "^"]
for i in range(1, 6): # 1..5
for combination in itertools.combinations(parts, i):
print "".join(combination)
Here we enumerate all combinations of given strings, 1-string combinations, then 2-, up to 5-string combinations. No
string can appear twice.
46
...
denxenia1979
denxenia1985
denxeniasecret
denxenia!
denxenia$
denxenia^
den19791985
den1979secret
den1979!
...
xenia1985secret$^
xenia1985!$^
xeniasecret!$^
19791985secret!$
19791985secret!^
...
(218 passwords)
Now let’s permute all string in all possible ways:
import itertools
parts=["den", "xenia", "1979", "1985", "secret", "!", "$", "^"]
for i in range(1, 6): # 1..5
for combination in itertools.combinations(parts, i):
for permutation in itertools.permutations(list(combination)):
print "".join(permutation)
...
^den
xenia1979
1979xenia
xenia1985
1985xenia
xeniasecret
secretxenia
xenia!
!xenia
...
^!$1985secret
^!$secret1985
^$1985secret!
^$1985!secret
^$secret1985!
^$secret!1985
^$!1985secret
^$!secret1985
47
(8800 passwords)
And finally, let’s alter all Latin characters in lower/uppercase ways and add leetspeek, as I did before:
import itertools , string
parts=["den", "xenia", "1979", "1985", "!", "$", "^"]
for i in range(1, 6): # 1..5
for combination in itertools.combinations(parts, i):
for permutation in itertools.permutations(list(combination)):
s="".join(permutation)
t=[]
for char in s:
if char.isalpha():
to_be_appended=[string.lower(char), string.upper(char)]
if char.lower()=='e':
to_be_appended.append('3')
elif char.lower()=='i':
to_be_appended.append('1')
elif char.lower()=='o':
to_be_appended.append('0')
t.append(to_be_appended)
else:
t.append([char])
for q in itertools.product(*t):
print "".join(q)
...
dEnxenia
dEnxeniA
dEnxenIa
...
D3nx3N1a
D3nx3N1A
D3nXenia
D3nXeniA
D3nXenIa
...
^$1979!1985
^$19851979!
^$1985!1979
^$!19791985
^$!19851979
( 1,348,657 passwords )
Again, you can’t try to crack remote server with so many attempts, but this is really possible for password-protected
archive, known hash, etc...
48
4.4 Executable file watermarking/steganography using Lehmer code and factorial
number system
In short: how to hide information not in objects, but in order of objects.
Almost any binary executable file has text strings like (these are from CRT):
.rdata:0040D398 aR6002FloatingP:
.rdata:0040D398 text "UTF-16LE", 'R6002 ',0Dh,0Ah
.rdata:0040D398 text "UTF-16LE", '- floating point support not
loaded ',0Dh,0Ah,0
.rdata:0040D3F2 align 8
.rdata:0040D3F8 aR6008NotEnough:
.rdata:0040D3F8 text "UTF-16LE", 'R6008 ',0Dh,0Ah
.rdata:0040D3F8 text "UTF-16LE", '- not enough space for
arguments ',0Dh,0Ah,0
.rdata:0040D44C align 10h
.rdata:0040D450 aR6009NotEnough:
.rdata:0040D450 text "UTF-16LE", 'R6009 ',0Dh,0Ah
.rdata:0040D450 text "UTF-16LE", '- not enough space for
environment ',0Dh,0Ah,0
.rdata:0040D4A8 aR6010AbortHasB:
.rdata:0040D4A8 text "UTF-16LE", 'R6010 ',0Dh,0Ah
.rdata:0040D4A8 text "UTF-16LE", '- abort() has been called ',0Dh
,0Ah,0
.rdata:0040D4EE align 10h
.rdata:0040D4F0 aR6016NotEnough:
.rdata:0040D4F0 text "UTF-16LE", 'R6016 ',0Dh,0Ah
.rdata:0040D4F0 text "UTF-16LE", '- not enough space for thread
data',0Dh,0Ah,0
.rdata:0040D548 aR6017Unexpecte:
.rdata:0040D548 text "UTF-16LE", 'R6017 ',0Dh,0Ah
.rdata:0040D548 text "UTF-16LE", '- unexpected multithread lock
error ',0Dh,0Ah,0
.rdata:0040D5A2 align 8
.rdata:0040D5A8 aR6018Unexpecte:
.rdata:0040D5A8 text "UTF-16LE", 'R6018 ',0Dh,0Ah
.rdata:0040D5A8 text "UTF-16LE", '- unexpected heap error ',0Dh,0
Ah,0
.rdata:0040D5EA align 10h
Can we hide some information there? Not in string themselves, but in order of strings? Given the fact, that compiler
doesn’t guarantee at all, in which order the strings will be stored in object/executable file.
Let’s say, we’ve got 26 text strings, and we can swap them how we want, because their order isn’t important at all. All
possible permutations of 26 objects is 26! = 403291461126605635584000000.
How much information can be stored here? log2(26!)= 88, i.e., 88 bits or 11 bytes!
11 bytes of your data can be converted to a (big) number and back, OK.
What is next? Naive way is: enumerate all permutations of 26 objects, number each, find permutation of the number
we’vegotandpermute26textstrings,storetofileandthat’sit. Butwecan’titerateover403291461126605635584000000
49
permutations.
This is where factorial number system 2 and Lehmer code 3 can be used. In short, for all my non-mathematically
inclined readers, this is a way to find a permutation of specific number without use of any significant resources. And
back: gived specific permutation, we can find its number.
This piece of code I’ve copypasted from https://gist.github.com/lukmdo/7049748 and reworked slightly:
# python 3.x
import math
def iter_perm(base, *rargs):
"""
:type base: list
:param rargs: range args [start ,] stop[, step]
:rtype: generator
"""
if not rargs:
rargs = [math.factorial(len(base))]
for i in range(*rargs):
yield perm_from_int(base, i)
def int_from_code(code):
"""
:type code: list
:rtype: int
"""
num = 0
for i, v in enumerate(reversed(code), 1):
num *= i
num += v
return num
def code_from_int(size, num):
"""
:type size: int
:type num: int
:rtype: list
"""
code = []
for i in range(size):
num, j = divmod(num, size - i)
code.append(j)
return code
def perm_from_code(base, code):
2
https://en.wikipedia.org/wiki/Factorial_number_system
3
https://en.wikipedia.org/wiki/Lehmer_code
50
"""
:type base: list
:type code: list
:rtype: list
"""
perm = base.copy()
for i in range(len(base) - 1):
j = code[i]
if i != i+j:
print ("swapping %d, %d" % (i, i+j))
perm[i], perm[i+j] = perm[i+j], perm[i]
return perm
def perm_from_int(base, num):
"""
:type base: list
:type num: int
:rtype: list
"""
code = code_from_int(len(base), num)
print ("Lehmer code=", code)
return perm_from_code(base, code)
def code_from_perm(base, perm):
"""
:type base: list
:type perm: list
:rtype: list
"""
p = base.copy()
n = len(base)
pos_map = {v: i for i, v in enumerate(base)}
w = []
for i in range(n):
d = pos_map[perm[i]] - i
w.append(d)
if not d:
continue
t = pos_map[perm[i]]
pos_map[p[i]], pos_map[p[t]] = pos_map[p[t]], pos_map[p[i]]
p[i], p[t] = p[t], p[i]
return w
def int_from_perm(base, perm):
51
"""
:type base: list
:type perm: list
:rtype: int
"""
code = code_from_perm(base, perm)
return int_from_code(code)
def bin_string_to_number(s):
rt=0
for i, c in enumerate(s):
rt=rt<<8
rt=rt+ord(c)
return rt
def number_to_bin_string(n):
rt=""
while True:
r=n & 0xff
rt=rt+chr(r)
n=n>>8
if n==0:
break
return rt[::-1]
s="HelloWorld"
print ("s=", s)
num=bin_string_to_number (s)
print ("num=", num)
perm=perm_from_int(list(range(26)), num)
print ("permutation/order=", perm)
num2=int_from_perm(list(range(26)), [14, 17, 9, 19, 11, 16, 23, 0, 2, 13, 20,
18, 21, 24, 10, 1, 22, 4, 7, 6, 15, 12, 5, 3, 8, 25])
print ("recovered num=", num2)
s2=number_to_bin_string(num2)
print ("recovered s=", s2)
I’m encoding a ”HelloWorld” binary string (in fact, any 11 bytes can be used) into a number. Number is then converted
into Lehmer code. Then the perm_from_code() function permute initial order according to the input Lehmer code:
s= HelloWorld
num= 341881320659934023674980
Lehmer code= [14, 16, 7, 16, 7, 11, 17, 7, 1, 4, 10, 7, 9, 11, 6, 2, 6, 1, 2, 4,
0, 0, 0, 0, 0, 0]
swapping 0, 14
swapping 1, 17
swapping 2, 9
swapping 3, 19
swapping 4, 11
52
swapping 5, 16
swapping 6, 23
swapping 7, 14
swapping 8, 9
swapping 9, 13
swapping 10, 20
swapping 11, 18
swapping 12, 21
swapping 13, 24
swapping 14, 20
swapping 15, 17
swapping 16, 22
swapping 17, 18
swapping 18, 20
swapping 19, 23
permutation/order= [14, 17, 9, 19, 11, 16, 23, 0, 2, 13, 20, 18, 21, 24, 10, 1,
22, 4, 7, 6, 15, 12, 5, 3, 8, 25]
This is it: [14, 17, 9, 19, 11, 16, 23, 0, 2, 13, 20, 18, 21, 24, 10, 1, 22, 4, 7, 6, 15, 12, 5, 3, 8, 25]. First put 14th string, then 17s
string, then 9th one, etc.
Now you’ve got a binary file from someone and want to read watermark from it. Get an order of strings from it and
convert it back to binary string:
recovered num= 341881320659934023674980
recovered s= HelloWorld
If you have more text strings (not unusual for most executable files), you can encode more.
100 strings: log2(100!) = 524 bits = 65 bytes.
1000 strings: log2(1000!) = 8529 bits = 1066 bytes! You can store some text here!
How would you force a C/C++ compiler to make specific order of text strings? This is crude, but workable:
char blob[]="hello10hello20";
char *msg1=blob;
char *msg2=blob+8;
printf ("%sn", msg1);
printf ("%sn", msg2);
They can be even aligned on 16-byte border.
... or they can be placed into .s/.asm assembly file and compiled into .o/.obj and then linked to your program.
... or you can swap text strings in already compiled executable and correct their addresses in corresponding instruc-
tions. If an executable file is not packed/obfuscated, this is possible.
Aside of order of text strings, you can try to hack a linker and reorder object files in the final executable. Of course, no
one cares about its order. And go figure out, what is hidden there.
Surely, hidden data can be encrypted, checksum or MAC can be added, etc.
53
Other ideas to consider: reorder functions and fix all addresses, reorder basic blocks within a function, register allo-
cator hacking, etc.
Links I find helpful in understanding factorial number system and Lehmer code, aside of Wikipedia:
• https://gist.github.com/lukmdo/7049748
• https://github.com/scmorse/permutils/blob/master/src/permutils.js
• http://2ality.com/2013/03/permutations.html
• http://www.keithschwarz.com/interesting/code/factoradic-permutation/FactoradicPermutation
4.5 De Bruijn sequences; leading/trailing zero bits counting
4.5.1 Introduction
Let’s imagine there is a very simplified code lock accepting 2 digits, but it has no ”enter” key, it just checks 2 last
entered digits. Our task is to brute force each 2-digit combination. Naïve method is to try 00, 01, 02 ... 99. That require
2*100=200 key pressings. Will it be possible to reduce number of key pressings during brute-force? It is indeed so, with
the help of De Bruijn sequences. We can generate them for the code lock, using Wolfram Mathematica:
In[]:= DeBruijnSequence[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, 2]
Out[]= {6, 8, 6, 5, 4, 3, 2, 1, 7, 8, 7, 1, 1, 0, 9, 0, 8, 0, 6, 6, 
0, 5, 5, 0, 4, 4, 0, 3, 3, 0, 2, 7, 2, 2, 0, 7, 7, 9, 8, 8, 9, 9, 7, 
0, 0, 1, 9, 1, 8, 1, 6, 1, 5, 1, 4, 1, 3, 7, 3, 1, 2, 9, 2, 8, 2, 6, 
2, 5, 2, 4, 7, 4, 2, 3, 9, 3, 8, 3, 6, 3, 5, 7, 5, 3, 4, 9, 4, 8, 4, 
6, 7, 6, 4, 5, 9, 5, 8, 5, 6, 9}
The result has exactly 100 digits, which is 2 times less than our initial idea can offer. By scanning visually this 100-digits
array, you’ll find any number in 00..99 range. All numbers are overlapped with each other: second half of each number
is also first half of the next number, etc.
Here is another. We need a sequence of binary bits with all 3-bit numbers in it:
In[]:= DeBruijnSequence[{0, 1}, 3]
Out[]= {1, 0, 1, 0, 0, 0, 1, 1}
Sequence length is just 8 bits, but it has all binary numbers in 000..111 range. You may visually spot 000 in the middle
of sequence. 111 is also present: two first bits of it at the end of sequence and the last bit is in the beginning. This is so
because De Bruijn sequences are cyclic.
There is also visual demonstration: http://demonstrations.wolfram.com/DeBruijnSequences/.
4.5.2 Trailing zero bits counting
In the Wikipedia article about De Bruijn sequences we can find:
The symbols of a De Bruijn sequence written around a circular object (such as a wheel of a robot)
can be used to identify its angle by examining the n consecutive symbols facing a fixed point.
Indeed: if we know De Bruijn sequence and we observe only part of it (any part), we can deduce exact position of this
part within sequence.
54
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers
Math for programmers

More Related Content

What's hot

Gcrypt
GcryptGcrypt
Elementary algorithms
Elementary algorithmsElementary algorithms
Elementary algorithms
saurabh goel
 
Tinyos programming
Tinyos programmingTinyos programming
Tinyos programming
ssuserf04f61
 
C++ For Quantitative Finance
C++ For Quantitative FinanceC++ For Quantitative Finance
C++ For Quantitative Finance
ASAD ALI
 
Tutorial111
Tutorial111Tutorial111
Tutorial111
Achsan Tarmudi
 
Elements of Applied Mathematics for Engineers
Elements of Applied Mathematics for EngineersElements of Applied Mathematics for Engineers
Elements of Applied Mathematics for Engineers
Vincent Isoz
 
A practical introduction_to_python_programming_heinold
A practical introduction_to_python_programming_heinoldA practical introduction_to_python_programming_heinold
A practical introduction_to_python_programming_heinold
FaruqolayinkaSalako
 
I do like cfd vol 1 2ed_v2p2
I do like cfd vol 1 2ed_v2p2I do like cfd vol 1 2ed_v2p2
I do like cfd vol 1 2ed_v2p2
NovoConsult S.A.C
 
Algorithms for programmers ideas and source code
Algorithms for programmers ideas and source code Algorithms for programmers ideas and source code
Algorithms for programmers ideas and source code
Duy Phan
 
Notes for C++ Programming / Object Oriented C++ Programming for MCA, BCA and ...
Notes for C++ Programming / Object Oriented C++ Programming for MCA, BCA and ...Notes for C++ Programming / Object Oriented C++ Programming for MCA, BCA and ...
Notes for C++ Programming / Object Oriented C++ Programming for MCA, BCA and ...
ssuserd6b1fd
 
c programming
c programmingc programming
c programming
Arun Umrao
 
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 1 of 5 by...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 1 of 5 by...Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 1 of 5 by...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 1 of 5 by...
ssuserd6b1fd
 
Notes of 8051 Micro Controller for BCA, MCA, MSC (CS), MSC (IT) & AMIE IEI- b...
Notes of 8051 Micro Controller for BCA, MCA, MSC (CS), MSC (IT) & AMIE IEI- b...Notes of 8051 Micro Controller for BCA, MCA, MSC (CS), MSC (IT) & AMIE IEI- b...
Notes of 8051 Micro Controller for BCA, MCA, MSC (CS), MSC (IT) & AMIE IEI- b...
ssuserd6b1fd
 
Circuitikzmanual
CircuitikzmanualCircuitikzmanual
Circuitikzmanual
Frank Rojas
 
Javanotes5 linked
Javanotes5 linkedJavanotes5 linked
Javanotes5 linked
Aravindharamanan S
 
Communication
CommunicationCommunication
Communication
Deepak Kumar
 
Javanotes6 linked
Javanotes6 linkedJavanotes6 linked
Javanotes6 linked
Sandile Mabika
 
Tutorial edit
Tutorial editTutorial edit
Tutorial edit
Boris Popov
 
0802 python-tutorial
0802 python-tutorial0802 python-tutorial
0802 python-tutorial
Zahid Hasan
 

What's hot (19)

Gcrypt
GcryptGcrypt
Gcrypt
 
Elementary algorithms
Elementary algorithmsElementary algorithms
Elementary algorithms
 
Tinyos programming
Tinyos programmingTinyos programming
Tinyos programming
 
C++ For Quantitative Finance
C++ For Quantitative FinanceC++ For Quantitative Finance
C++ For Quantitative Finance
 
Tutorial111
Tutorial111Tutorial111
Tutorial111
 
Elements of Applied Mathematics for Engineers
Elements of Applied Mathematics for EngineersElements of Applied Mathematics for Engineers
Elements of Applied Mathematics for Engineers
 
A practical introduction_to_python_programming_heinold
A practical introduction_to_python_programming_heinoldA practical introduction_to_python_programming_heinold
A practical introduction_to_python_programming_heinold
 
I do like cfd vol 1 2ed_v2p2
I do like cfd vol 1 2ed_v2p2I do like cfd vol 1 2ed_v2p2
I do like cfd vol 1 2ed_v2p2
 
Algorithms for programmers ideas and source code
Algorithms for programmers ideas and source code Algorithms for programmers ideas and source code
Algorithms for programmers ideas and source code
 
Notes for C++ Programming / Object Oriented C++ Programming for MCA, BCA and ...
Notes for C++ Programming / Object Oriented C++ Programming for MCA, BCA and ...Notes for C++ Programming / Object Oriented C++ Programming for MCA, BCA and ...
Notes for C++ Programming / Object Oriented C++ Programming for MCA, BCA and ...
 
c programming
c programmingc programming
c programming
 
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 1 of 5 by...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 1 of 5 by...Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 1 of 5 by...
Notes for C Programming for MCA, BCA, B. Tech CSE, ECE and MSC (CS) 1 of 5 by...
 
Notes of 8051 Micro Controller for BCA, MCA, MSC (CS), MSC (IT) & AMIE IEI- b...
Notes of 8051 Micro Controller for BCA, MCA, MSC (CS), MSC (IT) & AMIE IEI- b...Notes of 8051 Micro Controller for BCA, MCA, MSC (CS), MSC (IT) & AMIE IEI- b...
Notes of 8051 Micro Controller for BCA, MCA, MSC (CS), MSC (IT) & AMIE IEI- b...
 
Circuitikzmanual
CircuitikzmanualCircuitikzmanual
Circuitikzmanual
 
Javanotes5 linked
Javanotes5 linkedJavanotes5 linked
Javanotes5 linked
 
Communication
CommunicationCommunication
Communication
 
Javanotes6 linked
Javanotes6 linkedJavanotes6 linked
Javanotes6 linked
 
Tutorial edit
Tutorial editTutorial edit
Tutorial edit
 
0802 python-tutorial
0802 python-tutorial0802 python-tutorial
0802 python-tutorial
 

Similar to Math for programmers

Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspective
e2wi67sy4816pahn
 
Python_Programming_and_Numerical_Methods_A_Guide_for_Engineers_and.pdf
Python_Programming_and_Numerical_Methods_A_Guide_for_Engineers_and.pdfPython_Programming_and_Numerical_Methods_A_Guide_for_Engineers_and.pdf
Python_Programming_and_Numerical_Methods_A_Guide_for_Engineers_and.pdf
jankoabel2022
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
valentincivil
 
Ns doc
Ns docNs doc
Ns doc
Pratik Joshi
 
The maxima book
The maxima bookThe maxima book
The maxima book
ssuser0efdb21
 
Symbol from NEM Whitepaper 0.9.6.3
Symbol from NEM Whitepaper 0.9.6.3Symbol from NEM Whitepaper 0.9.6.3
Symbol from NEM Whitepaper 0.9.6.3
Alexander Schefer
 
python learn basic tutorial learn easy..
python learn basic tutorial learn easy..python learn basic tutorial learn easy..
python learn basic tutorial learn easy..
MURTHYVENKAT2
 
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxnumpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
sin3divcx
 
Best Python tutorial (release 3.7.0)
Best Python tutorial (release 3.7.0)Best Python tutorial (release 3.7.0)
Best Python tutorial (release 3.7.0)
youssef bouferdou
 
Python everthing
Python everthingPython everthing
Python everthing
AbdullahAbdullahabdu1
 
0802 python-tutorial
0802 python-tutorial0802 python-tutorial
0802 python-tutorial
urvishathummar1
 
A_Practical_Introduction_to_Python_Programming_Heinold.pdf
A_Practical_Introduction_to_Python_Programming_Heinold.pdfA_Practical_Introduction_to_Python_Programming_Heinold.pdf
A_Practical_Introduction_to_Python_Programming_Heinold.pdf
ssuser7fcce2
 
A practical introduction_to_python_programming_heinold
A practical introduction_to_python_programming_heinoldA practical introduction_to_python_programming_heinold
A practical introduction_to_python_programming_heinold
the_matrix
 
A Practical Introduction To Python Programming
A Practical Introduction To Python ProgrammingA Practical Introduction To Python Programming
A Practical Introduction To Python Programming
Nat Rice
 
A_Practical_Introduction_to_Python_Programming_Heinold.pdf
A_Practical_Introduction_to_Python_Programming_Heinold.pdfA_Practical_Introduction_to_Python_Programming_Heinold.pdf
A_Practical_Introduction_to_Python_Programming_Heinold.pdf
TariqSaeed80
 
book.pdf
book.pdfbook.pdf
book.pdf
dentistnikhil
 
452042223-Modern-Fortran-in-practice-pdf.pdf
452042223-Modern-Fortran-in-practice-pdf.pdf452042223-Modern-Fortran-in-practice-pdf.pdf
452042223-Modern-Fortran-in-practice-pdf.pdf
kalelboss
 
javanotes5.pdf
javanotes5.pdfjavanotes5.pdf
javanotes5.pdf
kmspega
 
main-moonmath.pdf
main-moonmath.pdfmain-moonmath.pdf
main-moonmath.pdf
SonalKumarSingh8
 
An Introduction to MATLAB for Geoscientists.pdf
An Introduction to MATLAB for Geoscientists.pdfAn Introduction to MATLAB for Geoscientists.pdf
An Introduction to MATLAB for Geoscientists.pdf
vishnuraj764102
 

Similar to Math for programmers (20)

Stochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning PerspectiveStochastic Processes and Simulations – A Machine Learning Perspective
Stochastic Processes and Simulations – A Machine Learning Perspective
 
Python_Programming_and_Numerical_Methods_A_Guide_for_Engineers_and.pdf
Python_Programming_and_Numerical_Methods_A_Guide_for_Engineers_and.pdfPython_Programming_and_Numerical_Methods_A_Guide_for_Engineers_and.pdf
Python_Programming_and_Numerical_Methods_A_Guide_for_Engineers_and.pdf
 
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...Ric walter (auth.) numerical methods and optimization  a consumer guide-sprin...
Ric walter (auth.) numerical methods and optimization a consumer guide-sprin...
 
Ns doc
Ns docNs doc
Ns doc
 
The maxima book
The maxima bookThe maxima book
The maxima book
 
Symbol from NEM Whitepaper 0.9.6.3
Symbol from NEM Whitepaper 0.9.6.3Symbol from NEM Whitepaper 0.9.6.3
Symbol from NEM Whitepaper 0.9.6.3
 
python learn basic tutorial learn easy..
python learn basic tutorial learn easy..python learn basic tutorial learn easy..
python learn basic tutorial learn easy..
 
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxnumpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
numpyxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 
Best Python tutorial (release 3.7.0)
Best Python tutorial (release 3.7.0)Best Python tutorial (release 3.7.0)
Best Python tutorial (release 3.7.0)
 
Python everthing
Python everthingPython everthing
Python everthing
 
0802 python-tutorial
0802 python-tutorial0802 python-tutorial
0802 python-tutorial
 
A_Practical_Introduction_to_Python_Programming_Heinold.pdf
A_Practical_Introduction_to_Python_Programming_Heinold.pdfA_Practical_Introduction_to_Python_Programming_Heinold.pdf
A_Practical_Introduction_to_Python_Programming_Heinold.pdf
 
A practical introduction_to_python_programming_heinold
A practical introduction_to_python_programming_heinoldA practical introduction_to_python_programming_heinold
A practical introduction_to_python_programming_heinold
 
A Practical Introduction To Python Programming
A Practical Introduction To Python ProgrammingA Practical Introduction To Python Programming
A Practical Introduction To Python Programming
 
A_Practical_Introduction_to_Python_Programming_Heinold.pdf
A_Practical_Introduction_to_Python_Programming_Heinold.pdfA_Practical_Introduction_to_Python_Programming_Heinold.pdf
A_Practical_Introduction_to_Python_Programming_Heinold.pdf
 
book.pdf
book.pdfbook.pdf
book.pdf
 
452042223-Modern-Fortran-in-practice-pdf.pdf
452042223-Modern-Fortran-in-practice-pdf.pdf452042223-Modern-Fortran-in-practice-pdf.pdf
452042223-Modern-Fortran-in-practice-pdf.pdf
 
javanotes5.pdf
javanotes5.pdfjavanotes5.pdf
javanotes5.pdf
 
main-moonmath.pdf
main-moonmath.pdfmain-moonmath.pdf
main-moonmath.pdf
 
An Introduction to MATLAB for Geoscientists.pdf
An Introduction to MATLAB for Geoscientists.pdfAn Introduction to MATLAB for Geoscientists.pdf
An Introduction to MATLAB for Geoscientists.pdf
 

More from mustafa sarac

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma son
mustafa sarac
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
mustafa sarac
 
Latka december digital
Latka december digitalLatka december digital
Latka december digital
mustafa sarac
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manual
mustafa sarac
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
mustafa sarac
 
The book of Why
The book of WhyThe book of Why
The book of Why
mustafa sarac
 
BM sgk meslek kodu
BM sgk meslek koduBM sgk meslek kodu
BM sgk meslek kodu
mustafa sarac
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimiz
mustafa sarac
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?
mustafa sarac
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mi
mustafa sarac
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?
mustafa sarac
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Markets
mustafa sarac
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimi
mustafa sarac
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
mustafa sarac
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
mustafa sarac
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
mustafa sarac
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guide
mustafa sarac
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020
mustafa sarac
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dice
mustafa sarac
 
Handbook of covid 19 prevention and treatment
Handbook of covid 19 prevention and treatmentHandbook of covid 19 prevention and treatment
Handbook of covid 19 prevention and treatment
mustafa sarac
 

More from mustafa sarac (20)

Uluslararasilasma son
Uluslararasilasma sonUluslararasilasma son
Uluslararasilasma son
 
Real time machine learning proposers day v3
Real time machine learning proposers day v3Real time machine learning proposers day v3
Real time machine learning proposers day v3
 
Latka december digital
Latka december digitalLatka december digital
Latka december digital
 
Axial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manualAxial RC SCX10 AE2 ESC user manual
Axial RC SCX10 AE2 ESC user manual
 
Array programming with Numpy
Array programming with NumpyArray programming with Numpy
Array programming with Numpy
 
The book of Why
The book of WhyThe book of Why
The book of Why
 
BM sgk meslek kodu
BM sgk meslek koduBM sgk meslek kodu
BM sgk meslek kodu
 
TEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimizTEGV 2020 Bireysel bagiscilarimiz
TEGV 2020 Bireysel bagiscilarimiz
 
How to make and manage a bee hotel?
How to make and manage a bee hotel?How to make and manage a bee hotel?
How to make and manage a bee hotel?
 
Cahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir miCahit arf makineler dusunebilir mi
Cahit arf makineler dusunebilir mi
 
How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?How did Software Got So Reliable Without Proof?
How did Software Got So Reliable Without Proof?
 
Staff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital MarketsStaff Report on Algorithmic Trading in US Capital Markets
Staff Report on Algorithmic Trading in US Capital Markets
 
Yetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimiYetiskinler icin okuma yazma egitimi
Yetiskinler icin okuma yazma egitimi
 
Consumer centric api design v0.4.0
Consumer centric api design v0.4.0Consumer centric api design v0.4.0
Consumer centric api design v0.4.0
 
State of microservices 2020 by tsh
State of microservices 2020 by tshState of microservices 2020 by tsh
State of microservices 2020 by tsh
 
Uber pitch deck 2008
Uber pitch deck 2008Uber pitch deck 2008
Uber pitch deck 2008
 
Wireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guideWireless solar keyboard k760 quickstart guide
Wireless solar keyboard k760 quickstart guide
 
State of Serverless Report 2020
State of Serverless Report 2020State of Serverless Report 2020
State of Serverless Report 2020
 
Dont just roll the dice
Dont just roll the diceDont just roll the dice
Dont just roll the dice
 
Handbook of covid 19 prevention and treatment
Handbook of covid 19 prevention and treatmentHandbook of covid 19 prevention and treatment
Handbook of covid 19 prevention and treatment
 

Recently uploaded

How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
Ortus Solutions, Corp
 
Disk to Cloud: Abstract your File Operations with CBFS
Disk to Cloud: Abstract your File Operations with CBFSDisk to Cloud: Abstract your File Operations with CBFS
Disk to Cloud: Abstract your File Operations with CBFS
Ortus Solutions, Corp
 
AI Chatbot Development – A Comprehensive Guide  .pdf
AI Chatbot Development – A Comprehensive Guide  .pdfAI Chatbot Development – A Comprehensive Guide  .pdf
AI Chatbot Development – A Comprehensive Guide  .pdf
ayushiqss
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
confluent
 
@ℂall @Girls Kolkata ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
@ℂall @Girls Kolkata  ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe@ℂall @Girls Kolkata  ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
@ℂall @Girls Kolkata ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Misti Soneji
 
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptxAddressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Sparity1
 
Java SE 17 Study Guide for Certification - Chapter 01
Java SE 17 Study Guide for Certification - Chapter 01Java SE 17 Study Guide for Certification - Chapter 01
Java SE 17 Study Guide for Certification - Chapter 01
williamrobertherman
 
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model SafeKolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Misti Soneji
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Estuary Flow
 
Splunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptxSplunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptx
sudsdeep
 
BoxLang Developer Tooling: VSCode Extension and Debugger
BoxLang Developer Tooling: VSCode Extension and DebuggerBoxLang Developer Tooling: VSCode Extension and Debugger
BoxLang Developer Tooling: VSCode Extension and Debugger
Ortus Solutions, Corp
 
Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
Mumbai @Call @Girls Whatsapp 9930687706 With High Profile ServiceMumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
kolkata dolls
 
dachnug51 - HCL Domino Roadmap .pdf
dachnug51 - HCL Domino Roadmap      .pdfdachnug51 - HCL Domino Roadmap      .pdf
dachnug51 - HCL Domino Roadmap .pdf
DNUG e.V.
 
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Medical / Health Care (+971588192166) Mifepristone and Misoprostol tablets 200mg
 
Bhiwandi @Call @Girls Whatsapp 000000000 With Best And No 1
Bhiwandi @Call @Girls Whatsapp 000000000 With Best And No 1Bhiwandi @Call @Girls Whatsapp 000000000 With Best And No 1
Bhiwandi @Call @Girls Whatsapp 000000000 With Best And No 1
arvindkumarji156
 
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
e-Definers Technology
 
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
Ortus Solutions, Corp
 
How to debug ColdFusion Applications using “ColdFusion Builder extension for ...
How to debug ColdFusion Applications using “ColdFusion Builder extension for ...How to debug ColdFusion Applications using “ColdFusion Builder extension for ...
How to debug ColdFusion Applications using “ColdFusion Builder extension for ...
Ortus Solutions, Corp
 
Intro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AIIntro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AI
Ortus Solutions, Corp
 
@Call @Girls in Tiruppur 🤷‍♂️ XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class ...
 @Call @Girls in Tiruppur 🤷‍♂️  XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class ... @Call @Girls in Tiruppur 🤷‍♂️  XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class ...
@Call @Girls in Tiruppur 🤷‍♂️ XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class ...
Mona Rathore
 

Recently uploaded (20)

How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
 
Disk to Cloud: Abstract your File Operations with CBFS
Disk to Cloud: Abstract your File Operations with CBFSDisk to Cloud: Abstract your File Operations with CBFS
Disk to Cloud: Abstract your File Operations with CBFS
 
AI Chatbot Development – A Comprehensive Guide  .pdf
AI Chatbot Development – A Comprehensive Guide  .pdfAI Chatbot Development – A Comprehensive Guide  .pdf
AI Chatbot Development – A Comprehensive Guide  .pdf
 
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
 
@ℂall @Girls Kolkata ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
@ℂall @Girls Kolkata  ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe@ℂall @Girls Kolkata  ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
@ℂall @Girls Kolkata ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
 
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptxAddressing the Top 9 User Pain Points with Visual Design Elements.pptx
Addressing the Top 9 User Pain Points with Visual Design Elements.pptx
 
Java SE 17 Study Guide for Certification - Chapter 01
Java SE 17 Study Guide for Certification - Chapter 01Java SE 17 Study Guide for Certification - Chapter 01
Java SE 17 Study Guide for Certification - Chapter 01
 
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model SafeKolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
 
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple StepsSeamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
Seamless PostgreSQL to Snowflake Data Transfer in 8 Simple Steps
 
Splunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptxSplunk_Remote_Work_Insights_Overview.pptx
Splunk_Remote_Work_Insights_Overview.pptx
 
BoxLang Developer Tooling: VSCode Extension and Debugger
BoxLang Developer Tooling: VSCode Extension and DebuggerBoxLang Developer Tooling: VSCode Extension and Debugger
BoxLang Developer Tooling: VSCode Extension and Debugger
 
Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
Mumbai @Call @Girls Whatsapp 9930687706 With High Profile ServiceMumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
Mumbai @Call @Girls Whatsapp 9930687706 With High Profile Service
 
dachnug51 - HCL Domino Roadmap .pdf
dachnug51 - HCL Domino Roadmap      .pdfdachnug51 - HCL Domino Roadmap      .pdf
dachnug51 - HCL Domino Roadmap .pdf
 
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
Abortion pills in Fujairah *((+971588192166*)☎️)¥) **Effective Abortion Pills...
 
Bhiwandi @Call @Girls Whatsapp 000000000 With Best And No 1
Bhiwandi @Call @Girls Whatsapp 000000000 With Best And No 1Bhiwandi @Call @Girls Whatsapp 000000000 With Best And No 1
Bhiwandi @Call @Girls Whatsapp 000000000 With Best And No 1
 
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
 
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
 
How to debug ColdFusion Applications using “ColdFusion Builder extension for ...
How to debug ColdFusion Applications using “ColdFusion Builder extension for ...How to debug ColdFusion Applications using “ColdFusion Builder extension for ...
How to debug ColdFusion Applications using “ColdFusion Builder extension for ...
 
Intro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AIIntro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AI
 
@Call @Girls in Tiruppur 🤷‍♂️ XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class ...
 @Call @Girls in Tiruppur 🤷‍♂️  XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class ... @Call @Girls in Tiruppur 🤷‍♂️  XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class ...
@Call @Girls in Tiruppur 🤷‍♂️ XXXXXXXX 🤷‍♂️ Tanisha Sharma Best High Class ...
 

Math for programmers

  • 1. Mathematics for programmers (early draft) Dennis Yurichev <math@yurichev.com> September 22, 2020
  • 2. 2
  • 3. Contents 1 Prime numbers 1 1.1 Integer factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.1 Using composite number as a container . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1.2 Using composite number as a container (another example) . . . . . . . . . . . . . . . . . . . . 4 1.2 Coprime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 1.3 Semiprime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4 How RSA1 works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.1 Fermat little theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.2 Euler’s totient function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.4.3 Euler’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.4 RSA example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4.5 So how it works? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.4.6 Breaking RSA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9 1.4.7 The difference between my simplified example and a real RSA algorithm . . . . . . . . . . . . 9 1.4.8 The RSA signature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.4.9 Hybrid cryptosystem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2 Modulo arithmetics 11 2.1 Quick introduction into modular arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.1.1 Modular arithmetic on CPUs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 2.1.2 Remainder of division by modulo 2n . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 2.1.3 Getting random numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 2.2 Modulo inverse, part I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 2.2.1 No remainder? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.3 Modulo inverse, part II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 2.4 Reversible linear congruential generator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 2.5 Getting magic number using extended Euclidean algorithm . . . . . . . . . . . . . . . . . . . . . . . . 19 3 Probability 23 3.1 Text strings right in the middle of compressed data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2 Autocomplete using Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.1 Dissociated press . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 3.2.2 Autocomplete . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3.2.3 Further work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.4 The files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 3.2.5 Read more . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.3 random.choices() in Python 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 1 Rivest–Shamir–Adleman cryptosystem 3
  • 4. 4 Combinatorics 41 4.1 Soldering a headphones cable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 4.2 Vehicle license plate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 4.3 Forgotten password . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 4.4 Executable file watermarking/steganography using Lehmer code and factorial number system . . . . . 49 4.5 De Bruijn sequences; leading/trailing zero bits counting . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.5.2 Trailing zero bits counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.5.3 Leading zero bits counting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58 4.5.4 Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5.5 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5.6 Generation of De Bruijn sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 4.5.7 Other articles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 5 Galois Fields, GF(2) and yet another explanation of CRC2 61 5.1 What is wrong with checksum? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.2 Division by prime . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5.3 (Binary) long divison . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5.4 (Binary) long division, version 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5.5 Shortest possible introduction into GF(2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 5.6 CRC32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 5.7 Rationale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 5.8 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 6 Logarithms 69 6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.1.1 Children’s approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 6.1.2 Scientists’ and engineers’ approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2 Logarithmic scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2.1 In human perception . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 6.2.2 In electronics engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 6.2.3 In IT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72 6.2.4 Web 2.0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.3 Multiplication and division using addition and subtraction . . . . . . . . . . . . . . . . . . . . . . . . 73 6.3.1 Logarithmic slide rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 6.3.2 Logarithmic tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 6.3.3 Working with very small and very large numbers . . . . . . . . . . . . . . . . . . . . . . . . . 76 6.3.4 IEEE 754: adding and subtracting exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.4 Exponentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 6.5 Square root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6.6 Base conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80 6.7 Binary logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.7.1 Denoting a number of bits for some value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 6.7.2 Calculating binary logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.7.3 O(log n) time complexity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 6.8 Common (base 10) logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.9 Natural logarithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.9.1 Savings account in your bank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 6.9.2 Exponential decay . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 2 Cyclic redundancy check 4
  • 5. 7 Symbolic computation 93 7.1 Rational data type . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 8 Graph theory 97 8.1 Clique in graph theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 8.1.1 Social graph: simple example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 8.1.2 Social graph: IRC network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 8.1.3 Attempt to find communities in IRC social graph . . . . . . . . . . . . . . . . . . . . . . . . . . 99 8.1.4 Social graph: social networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.1.5 Links graph: Wikipedia . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 8.1.6 Social graph: LiveJournal spammers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.1.7 Links graph: link farms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 8.1.8 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 9 GCD and LCM 105 9.1 GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105 9.2 Oculus VR Flicks and GCD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 9.3 LCM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 9.3.1 File copying routine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 10 Linear algebra 111 10.1 Gaussian elimination: Kirchhoff’s circuit laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 10.1.1 Gaussian elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 11 Acronyms used 117 Thanks Thanks to those who helped: Slava “Avid” Kazakov, Serhiy Matviychuk. 5
  • 6. 6
  • 7. Chapter 1 Prime numbers Prime numbers are the numbers which has no divisors except itself and 1. This can be represented graphically. Let’s take 9 balls or some other objects. 9 balls can be arranged into rectangle: ooo ooo ooo So are 12 balls: oooo oooo oooo Or: ooo ooo ooo ooo So 9 and 12 are not prime numbers. 7 is prime number: ooooooo Or: o o o o o o o It’s not possible to form a rectangle using 7 balls, or 11 balls or any other prime number. 1
  • 8. The fact that balls can be arranged into rectangle shows that the number can be divided by the number which is rep- resented by height and width of rectangle. Balls of prime number can be arranged vertically or horizontally, meaning, there are only two divisors: 1 and the prime number itself. 1.1 Integer factorization Natural number can be either prime or composite number. Composite number is a number which can be breaked up by product of prime numbers. Let’s take 100. It’s not a prime number. By the fundamental theorem of arithmetic, any number can be represented as product of prime numbers, in only one single way. So the composite number phrase means that the number is composed of prime numbers. Let’s factor 100 in Wolfram Mathematica: Listing 1.1: Wolfram Mathematica In[]:= FactorInteger[100] Out[]= {{2, 2}, {5, 2}} This mean that 100 can be constructed using 2 and 5 prime numbers (22 · 52): Listing 1.2: Wolfram Mathematica In[]:= 2^2*5^2 Out[]= 100 1.1.1 Using composite number as a container Even more than that, it’s possible to encode some information in prime numbers using factoring. Let’s say, we would encode the “Hello” text string. First, let’s find ASCII codes of each character in the string: Listing 1.3: Wolfram Mathematica In[]:= ToCharacterCode["Hello"] Out[]= {72, 101, 108, 108, 111} Let’s find a first 5 prime numbers, each number for each character: Listing 1.4: Wolfram Mathematica In[]:= Map[Prime[#] &, Range[5]] Out[]= {2, 3, 5, 7, 11} Build a huge number using prime numbers as bases and ASCII codes as exponents, then get a product of all them (272 · 3101 · 5108 · 7108 · 11111): Listing 1.5: Wolfram Mathematica In[]:= tmp = 2^72*3^101*5^108*7^108*11^111 Out[]= 1649465578065933994718255257642275679479006861206428830641826551739434 9344066214616222018844835866267141943107823334187149334898562231349428 5708281252457614466981636618940129599457183300076472809826225406689893 2
  • 9. 5837524252859632074600687844523389231265776082000229507684707641601562 5000000000000000000000000000000000000000000000000000000000000000000000 000 It’s a big number, but Wolfram Mathematica is able to factor it back: Listing 1.6: Wolfram Mathematica In[]:= FactorInteger[tmp] Out[]= {{2, 72}, {3, 101}, {5, 108}, {7, 108}, {11, 111}} A first number in each pair is prime number and the second is exponent. Get the text string back: Listing 1.7: Wolfram Mathematica In[]:= FromCharacterCode[Map[#[[2]] &, tmp]] Out[]= "Hello" That allows to have some fun. Let’s add exclamation point to the end of string by manipulating only the big number. ASCII code of exlamation point is 33. The next prime number after 11 is 13. So add it (by multiplying by 1333): Listing 1.8: Wolfram Mathematica In[]:= tmp = tmp*13^33 Out[]= 9494539005656577744061615691556750598033024729435332190254469113536733 9032823543118405499931761589928052797992206631285822671397023217541663 5920521812548793623881568510051214975599793760307837993570818136014139 9497958680836430182400405525832564875875193876694267121604212637095253 0725145452728611417114734649658203125000000000000000000000000000000000 000000000000000000000000000000000000000 So we got new number. Let’s factor it back and decode: Listing 1.9: Wolfram Mathematica In[]:= factored = FactorInteger[tmp] Out[]= {{2, 72}, {3, 101}, {5, 108}, {7, 108}, {11, 111}, {13, 33}} In[]:= FromCharacterCode[Map[#[[2]] &, factored]] Out[]= "Hello!" Wow, that works. Will it be possible to remove one ’l’ character from the string at the third position? The ’l’ has the ASCII code of 108 and it is exponent for two prime numbers in our expression: 5 (first ’l’) and 7 (second ’l’). To knock out the character, we divide the big number by the corresponding prime number with the exponent of 108 (divide by 5108): Listing 1.10: Wolfram Mathematica In[]:= tmp = tmp/5^108 Out[]= 3081154065769189664244341216329094565621009415122099836376732969546063 1079164051611808432546107410277501678916823138724630810880390384343750 1196528030610615786507542545262118293483878711112407171889948257893463 3
  • 10. 8494741216231004109210436295299274515484540190050751059821909485854359 9630924207126074604240892753608704 In[]:= factored = FactorInteger[tmp] Out[]= {{2, 72}, {3, 101}, {7, 108}, {11, 111}, {13, 33}} In[]:= FromCharacterCode[Map[#[[2]] &, factored]] Out[]= "Helo!" 1.1.2 Using composite number as a container (another example) Let’ssay, theinitialcontainer numberis1. Let’sincrementthenumberatthesecondpositionwithinitbymultiplicating by the first prime number (3): Listing 1.11: Wolfram Mathematica In[]:= tmp = 1*3 Out[]= 3 Then let’s set the number at fourth posistion to 123. The fourth prime number is 7 (the percent sign in Mathematica denotes the last result): Listing 1.12: Wolfram Mathematica In[]:= tmp = tmp*7^123 Out[]= 26557071110804040505330743411815438275701018334410643480070773 5780279761186999642944265644421128096489029 Then let’s set the number at fifth position to 456. The fifth prime number is 11: Listing 1.13: Wolfram Mathematica In[]:= tmp = tmp*11^456 Out[]= 19917948660639605307938425372554395433764512138284060223646519 1257621966825293455339751080188144910510813322192288287162176499976800 9147068591160798243308591883294649069355015558472564457422829073938118 4396204999051856879940101934339913600942451006747982291524910653185084 4057972896537894301213735252418639782974592077393028390193060138936503 0125465578958567377627063815620261557939484036628536230966158222960762 8509899641574477547658142457598168497006309608599830554758951672484533 9216105863754463712957143732563375990198901073698626584903164027850451 8825659837114080589573087269 Then let’s decrement the number at fourth position, the fourth prime number is 7: Listing 1.14: Wolfram Mathematica In[]:= tmp = tmp/7 Out[]= 28454212372342293297054893389363422048235017340405800319495027 3225174238321847793342501543125921300729733317417554695945966428538287 0210097987372568919012274118992355813364307940675092082032612962768740 6280292855788366971343002763342733715632072866782831845035586647407263 4368532709339849001733907503455199689963702967704326271704371627052147 4
  • 11. 1607807969940810539467234022314659368484977195183623187094511747086804 0728428059392110782368774939425954995723299440856900792512788103549334 1737294091077805304224491046519108557427001533855180835575948611214931 260808548159154369939012467 Let’s factor the composite number and get all the numbers we set inside container (1, 122, 456): Listing 1.15: Wolfram Mathematica In[]:= FactorInteger[tmp] Out[]= {{3, 1}, {7, 122}, {11, 456}} So the resulting number has 3 · 7122 · 11456 form at the end. This is somewhat wasteful way to store the numbers, but out of curiosity: since there are infinite number of prime numbers and so any number of infinitely big numbers can be stored within one (although huge) composite number. 1.2 Coprime numbers Coprime numbers are the 2 or more numbers which don’t have any common divisors. In mathematical lingo, the GCD1 of all coprime numbers is 1. 3 and 5 are coprimes. So are 7 and 10. So are 4, 5 and 9. Coprime numbers are the numerator and denominator in fraction which cannot be reduced further (irreducible frac- tion). For example, 130 14 is 65 7 after reduction (or simplification), 65 and 7 are coprime to each other, but 130 and 14 are not (they has 2 as common divisor). One application of coprime numbers in engineering is to make number of cogs on cogwheel and number of chain elements on chain to be coprimes. Let’s imagine bike cogwheels and chain: Figure 1.1: The picture was taken from www.mechanical-toys.com Ifyouchoose 5 as number of cogson one of cogwhell andyou have10 or 15 or 20chain elements, eachcogon cogwheel will meet some set of the same chain elements. For example, if there are 5 cogs on cogwheel and 20 chain elements, each cog will meet only 4 chain elements and vice versa: each chain element has its own cog on cogwheel. This is bad because both cogwheel and chain will run out slightly faster than if each cog would interlock every chain elements at some point. To reach this, number of cogs and chain elements could be coprime numbers, like 5 and 11, or 5 and 23. That will make each chain element interlock each cog evenly, which is better. 1 Greatest Common Divisor 5
  • 12. 1.3 Semiprime numbers Semiprime is a product of two prime numbers. An interesting property of semiprime: In 1974 the Arecibo message was sent with a radio signal aimed at a star cluster . It consisted of 1679 binary digits intended to be interpreted as a 23×73 bitmap image. The number 1679 = 23×73 was chosen because it is a semiprime and therefore can only be broken down into 23 rows and 73 columns , or 73 rows and 23 columns. ( https://en.wikipedia.org/wiki/Semiprime ) 1.4 How RSA works RSA public-key cryptosystem (named after its inventors: Ron Rivest, Adi Shamir and Leonard Adleman) is the most used asymmetric public-private key cryptosystem. 1.4.1 Fermat little theorem Fermatlittletheoremstatesthatifpisprime,thiscongruenceisvalidforanyaintheenvironmentofmoduloartihemtic of base p: ap−1 ≡ 1 (mod p). There are proofs, which are, perhaps, too tricky for this article which is intended for beginners. So far, you can just take it as granted. This theorem may be used to sieve prime numbers. So you take, for example, 10 and test it. Let’s take some random a value (123) (Wolfram Mathematica): Listing 1.16: Wolfram Mathematica In[]:= Mod[123^(10 - 1), 10] Out[]= 3 We’ve got 3, which is not 1, indicating the 10 is not prime. On the other hand, 11 is prime: Listing 1.17: Wolfram Mathematica In[]:= Mod[123^(11 - 1), 11] Out[]= 1 This method is not perfect, some composite p numbers can lead to 1, for example p=1105, but can be used as a method to sieve vast amount of prime numbers candidates. 1.4.2 Euler’s totient function It is a number of coprime numbers under some n. Denoted as ϕ(n), pronounced as phi. For the sake of simplification, you may just keep in mind that if n = pq (i.e., product of two prime numbers), φ(pq) = (p − 1)(q − 1). This is true for RSA environment. 6
  • 13. 1.4.3 Euler’s theorem Euler’s theorem is a generalization of Fermat little theorem. It states: aφ(n) ≡ 1 (mod n) But again, for the sake of simplification, we may keep in mind that Euler’s theorem in the RSA environment is this: a(p−1)(q−1) ≡ 1 (mod n) ... where n = pq and both p and q are prime numbers. This theorem is central to RSA algorithm. 1.4.4 RSA example There are The Sender and The Receiver. The Receiver generates two big prime numbers (p and q) and publishes its product (n = pq). Both p and q are kept secret. For the illustration, let’s randomly pick p and q among the first 50 prime numbers in Wolfram Mathematica: Listing 1.18: Wolfram Mathematica In[]:= p = Prime[RandomInteger[50]] Out[]= 89 In[]:= q = Prime[RandomInteger[50]] Out[]= 43 In[]:= n = p*q Out[]= 3827 3827 is published as public key, named public key modulus or modulo. It is semiprime. There is also public key expo- nent e, which is not secret, is often 65537, but we will use 17 to keep all results tiny. Now The Sender wants to send a message (123 number) to The Receiver and he/she uses one-way function: Listing 1.19: Wolfram Mathematica In[]:= e = 17 Out[]= 17 In[]:= encrypted = Mod[123^e, n] Out[]= 3060 3060 is encrypted message, which can be decrypted only using p and q values separately. This is one-way function, be- cause only part of exponentiation result is left. One and important consequence is that even The Sender can’t decrypt it. This is why you can encrypt a piece of text in PGP/GnuPG to someone using his/her public key, but can’t decrypt it. Perhaps, that’s how CryptoLockers works, making impossible to decrypt the files. To recover message (123), p and q values must be known. First, we get the result of Euler’s totient function (p − 1)(q − 1) (this is the point where p and q values are needed): Listing 1.20: Wolfram Mathematica In[]:= totient = (p - 1)*(q - 1) Out[]= 3696 7
  • 14. Now we calculating decrypting exponent using multiplicative modulo inverse (multiplicative inverse was also de- scribed in here (2) (e−1 (mod totient = (p − q)(q − 1))): Listing 1.21: Wolfram Mathematica In[]:= d = PowerMod[e, -1, totient] Out[]= 2609 Now decrypt the message: Listing 1.22: Wolfram Mathematica In[18]:= Mod[encrypted^d, n] Out[18]= 123 So the d exponent forms another one-way function, restoring the work of what was done during encryption. 1.4.5 So how it works? It works, because e and d exponents are reciprocal to each other by modulo totient = (p − 1)(q − 1): Listing 1.23: Wolfram Mathematica In[]:= Mod[e*d, totient] (* check *) Out[]= 1 This allows... med = m (mod n) (1.1) Or in Mathematica: Listing 1.24: Wolfram Mathematica In[]:= Mod[123^(e*d), n] Out[]= 123 So the encryption process is me (mod n), decryption is (me)d = m (mod n). To prove congruence 1.1, we first should prove the following congruence: ed ≡ 1 (mod ((p − 1)(q − 1))) ... using modular arithmetic rules, it can be rewritten as: ed = 1 + h(p − 1)(q − 1) h is some unknown number which is present here because it’s not known how many times the final result was rounded while exponentiation (this is modulo arithmetic after all). So med = m (mod n) can be rewritten as: m1+h((p−1)(q−1)) ≡ m (mod n) ...and then to: m m(p−1)(q−1) h ≡ m (mod n) 8
  • 15. The last expression can be simplified using Euler’s theorem (stating that a(p−1)(q−1) ≡ 1 (mod n)). The result is: m(1)h ≡ m (mod n) ... or just: m ≡ m (mod n) 1.4.6 Breaking RSA We can try to factor n semiprime (or RSA modulus) in Mathematica: Listing 1.25: Wolfram Mathematica In[]:= FactorInteger[n] Out[]= {{43, 1}, {89, 1}} And we getting correct p and q, but this is possible only for small values. When you use some big ones, factorizing is extremely slow, making RSA unbreakable, if implemented correctly. The bigger p, q and n numbers, the harder to factorize n, so the bigger keys in bits are, the harder it to break. 1.4.7 The difference between my simplified example and a real RSA algorithm In my example, public key is n = pq (product) and secret key are p and q values stored separately. This is not very efficient, to calculate totient and decrypting exponent each time. So in practice, a public key is n and e, and a secret key is at least n and d, and d is stored in secret key precomputed. For example, here is my PGP public key2: dennis@...:∼$ gpg --export-options export-reset-subkey-passwd --export-secret - subkeys 0x3B262349! | pgpdump Old: Secret Key Packet(tag 5)(533 bytes) Ver 4 - new Public key creation time - Tue Jun 30 02:08:38 EEST 2015 Pub alg - RSA Encrypt or Sign(pub 1) RSA n(4096 bits) - ... RSA e(17 bits) - ... ... ... so there are available openly big (4096 bits) n and e (17 bits). And here is my PGP secret key: dennis@...:∼$ gpg --export-options export-reset-subkey-passwd --export-secret - subkeys 0x55B5C64F! | pgpdump gpg: about to export an unprotected subkey You need a passphrase to unlock the secret key for user: "Dennis Yurichev <dennis@yurichev.com>" 4096-bit RSA key, ID 55B5C64F, created 2015-06-29 gpg: gpg-agent is not available in this session Old: Secret Key Packet(tag 5)(533 bytes) 2 https://yurichev.com/pgp.html 9
  • 16. Ver 4 - new Public key creation time - Tue Jun 30 02:08:38 EEST 2015 Pub alg - RSA Encrypt or Sign(pub 1) RSA n(4096 bits) - ... RSA e(17 bits) - ... ... Old: Secret Subkey Packet(tag 7)(1816 bytes) Ver 4 - new Public key creation time - Tue Jun 30 02:08:38 EEST 2015 Pub alg - RSA Encrypt or Sign(pub 1) RSA n(4096 bits) - ... RSA e(17 bits) - ... RSA d(4093 bits) - ... RSA p(2048 bits) - ... RSA q(2048 bits) - ... RSA u(2048 bits) - ... Checksum - 94 53 ... ... it has all variables stored in the file, including d, p and q. 1.4.8 The RSA signature It’s possible to sign a message to publish it, so everyone can check the signature. For example, The Publisher wants to sign the message (let’s say, 456). Then he/she uses d exponent to compute signature: Listing 1.26: Wolfram Mathematica In[]:= signed = Mod[456^d, n] Out[]= 2282 Now he publishes n = pq (3827), e (17 in our example), the message (456) and the signature (2282). Some other Consumers can verify its signature using e exponent and n: Listing 1.27: Wolfram Mathematica In[]:= Mod[2282^e, n] Out[]= 456 ... this is another illustration that e and d exponents are complement each other, by modulo totient = (p − 1)(q − 1). The signature can only be generated with the access to p and q values, but it can be verified using product (n = pq) value. 1.4.9 Hybrid cryptosystem RSA is slow, because exponentiation is slow and exponentiation by modulo is also slow. Perhaps, this is the reason why it was considered impractical by GCHQ when Clifford Cocks first came with this idea in 1970s. So in practice, if The Sender wants to encrypt some big piece of data to The Receiver, a random number is generated, which is used as a key for symmetrical cryptosystem like DES or AES. The piece of data is encrypted by the random key. The key is then encrypted by RSA to The Receiver and destroyed. The Receiver recovers the random key using RSA and decrypts all the data using DES or AES. 10
  • 17. Chapter 2 Modulo arithmetics 2.1 Quick introduction into modular arithmetic Modular arithmetic is an environment where all values are limited by some number (modulo). Many textbooks has clock as example. Let’s imagine old mechanical analog clock. There hour hand points to one of number in bounds of 0..11 (zero is usually shown as 12). What hour will be if to sum up 10 hours (no matter, AM or PM) and 4 hours? 10+4 is 14 or 2 by modulo 12. Naively you can just sum up numbers and subtract modulo base (12) as long as it’s possible. Modern digital watch shows time in 24 hours format, so hour there is a variable in modulo base 24. But minutes and seconds are in modulo 60 (let’s forget about leap seconds for now). Another example is US imperial system of measurement: human height is measured in feets and inches. There are 12 inches in feet, so when you sum up some lengths, you increase feet variable each time you’ve got more than 12 inches. Another example I would recall is password cracking utilities. Often, characters set is defined in such utilities. And when you set all Latin characters plus numbers, you’ve got 26+10=36 characters in total. If you brute-forcing a 6- characters password, you’ve got 6 variables, each is limited by 36. And when you increase last variable, it happens in modular arithmetic rules: if you got 36, set last variable to 0 and increase penultimate one. If it’s also 36, do the same. If the very first variable is 36, then stop. Modular arithmetic may be very helpful when you write multi-threading (or distributed) password cracking utility and you need to slice all passwords space by even parts. This is yet another application of modulo arithmetic, which many of us encountered in childhood. Given a counting rhyme, like: Eeny, meeny, miny, moe, Catch a tiger by the toe. If he hollers , let him go, Eeny, meeny, miny, moe. ( https://en.wikipedia.org/wiki/Eeny,_meeny,_miny,_moe ) ... predict, who will be choosen. If I’m correct, that rhyme has 16 items, and if a group of kids constist of, say, 5 kids, who will be choosen? 16 mod 5 = 1, meaning, the next kid after the one at whom counting had begun. Or 7 kids, 16 mod 7 = 2. Count two kids after the first one. If you can calculate this quickly, you can probably get an advantage by choosing a better place within a circle... 11
  • 18. Now let’s recall old mechanical counters which were widespread in pre-digital era: Figure 2.1: The picture was stolen from http://www.featurepics.com/ — sorry for that! This counter has 6 wheels, so it can count from 0 to 106 − 1 or 999999. When you have 999999 and you increase the counter, it will resetting to 000000— this situation is usually understood by engineers and computer programmers as overflow. And if you have 000000 and you decrease it, the counter will show you 999999. This situation is often called “wrap around”. See also: http://en.wikipedia.org/wiki/Integer_overflow. 2.1.1 Modular arithmetic on CPUs The reason I talk about mechanical counter is that CPU registers acting in the very same way, because this is, perhaps, simplest possible and efficient way to compute using integer numbers. This implies that almost all operations on integer values on your CPU is happens by modulo 232 or 264 depending on your CPU. For example, you can sum up 0x87654321 and 0xDEADBABA, which resulting in 0x16612FDDB. This value is too big for 32-bit register, so only 0x6612FDDB is stored, and leading 1 is dropped. If you will multiply these two num- bers, the actual result it 0x75C5B266EDA5BFFA, which is also too big, so only low 32-bit part is stored into destination register: 0xEDA5BFFA. This is what happens when you multiply numbers in plain C/C++ language, but some readers may argue: when sum is too big for register, CF (carry flag) is set, and it can be used after. And there is x86 MUL instruc- tion which in fact produces 64-bit result in 32-bit environment (in EDX:EAX registers pair). That’s true, but observing just 32-bit registers, this is exactly environment of modulo with base 232. Now that leads to a surprising consequence: almost every result of arithmetic operation stored in general purpose reg- ister of 32-bit CPU is in fact remainder of division operation: result is always divided by 232 and remainder is left in reg- ister. For example, 0x16612FDDB is too large for storage, and it’s divided by 232 (or 0x100000000). The result of division (quotient) is 1 (which is dropped) and remainder is 0x6612FDDB (which is stored as a result). 0x75C5B266EDA5BFFA divided by 232 (0x100000000) produces 0x75C5B266 as a result of division (quotient) and 0xEDA5BFFA as a remainder, the latter is stored. And if your code is 32-bit one in 64-bit environment, CPU registers are bigger, so the whole result can be stored there, but high half is hidden behind the scenes – because no 32-bit code can access it. 12
  • 19. By the way, this is the reason why remainder calculation is often called ”division by modulo”. C/C++ has a percent sign (“%”) for this operation, but some other PLs like Pascal and Haskell has ”mod” operator. Usually, almost all sane computer programmers works with variables as they never wrapping around and value here is always in some limits which are defined preliminarily. However, this implicit division operation or ”wrapping around” can be exploited usefully. 2.1.2 Remainder of division by modulo 2n ... canbeeasilycomputedwithANDoperation. Ifyouneedarandomnumberinrangeof0..16, hereyougo: rand()&0xF. That helps sometimes.</p> <p>For example, you need a some kind of wrapping counter variable which always should be in 0..16 range. What you do? Programmers often write this: int counter=0; ... counter++; if (counter==16) counter=0; But here is a version without conditional branching: int counter=0; ... counter++; counter=counter&0xF; As an example, this I found in the git source code: char *sha1_to_hex(const unsigned char *sha1) { static int bufno; static char hexbuffer[4][GIT_SHA1_HEXSZ + 1]; static const char hex[] = "0123456789abcdef"; char *buffer = hexbuffer[3 & ++bufno], *buf = buffer; int i; for (i = 0; i < GIT_SHA1_RAWSZ; i++) { unsigned int val = *sha1++; *buf++ = hex[val >> 4]; *buf++ = hex[val & 0xf]; } *buf = '0'; return buffer; } ( https://github.com/git/git/blob/aa1c6fdf478c023180e5ca5f1658b00a72592dc6/hex.c ) This function returns a pointer to the string containing hexadecimal representation of SHA1 digest (like ”4e1243bd22c66e76c2ba9eddc1f91394e57f9f83”). But this is plain C and you can calculate SHA1 for some block, get pointer to the string, then calculate SHA1 for another block, get pointer to the string, and both pointers are still 13
  • 20. points to the same string buffer containing the result of the second calculation. As a solution, it’s possible to allo- cate/deallocate string buffer each time, but more hackish way is to have several buffers (4 are here) and fill the next each time. The bufno variable here is a buffer counter in 0..3 range. Its value increments each time, and its value is also always kept in limits by AND operation (3 & ++bufno). The author of this piece of code (seemingly Linus Torvalds himself) went even further and forgot (?) to initialize bufno counter variable, which will have randomgarbageat the functionstart. Indeed: no matter, which buffer we are starting each time! This can be mistake which isn’t affect correctness of the code, or maybe this is left so intentionally – I don’t know. 2.1.3 Getting random numbers When you write some kind of videogame, you need random numbers, and the standard C/C++ rand() function gives you them in 0..0x7FFF range (MSVC) or in 0..0x7FFFFFFF range (GCC). And when you need a random number in 0..10 range, the common way to do it is: X_coord_of_something_spawned_somewhere=rand() % 10; Y_coord_of_something_spawned_somewhere=rand() % 10; No matter what compiler do you use, you can think about it as 10 is subtraced from rand() result, as long as there is still a number bigger than 10. Hence, result is remainder of division of rand() result by 10. One nasty consequence is that neither 0x8000 nor 0x80000000 cannot be divided by 10 evenly, so you’ll get some numbers slightly more often than others. I tried to calculate in Mathematica. Here is what you get if you write <i>rand() In[]:= Counts[Map[Mod[#, 3] &, Range[0, 16^^8000 - 1]]] Out[]= <|0 -> 10923, 1 -> 10923, 2 -> 10922|> So a number 2 appers slightly seldom than others. Here is a result for rand() % 10: In[]:= Counts[Map[Mod[#, 10] &, Range[0, 16^^8000 - 1]]] Out[]= <|0 -> 3277, 1 -> 3277, 2 -> 3277, 3 -> 3277, 4 -> 3277, 5 -> 3277, 6 -> 3277, 7 -> 3277, 8 -> 3276, 9 -> 3276|> Numbers 8 and 9 appears slightly seldom (3276 against 3277). Here is a result for rand() % 100: In[]:= Counts[Map[Mod[#, 100] &, Range[0, 16^^8000 - 1]]] Out[]= <|0 -> 328, 1 -> 328, 2 -> 328, 3 -> 328, 4 -> 328, 5 -> 328, 6 -> 328, 7 -> 328, 8 -> 328, 9 -> 328, 10 -> 328, 11 -> 328, 12 -> 328, 13 -> 328, 14 -> 328, 15 -> 328, 16 -> 328, 17 -> 328, 18 -> 328, 19 -> 328, 20 -> 328, 21 -> 328, 22 -> 328, 23 -> 328, 24 -> 328, 25 -> 328, 26 -> 328, 27 -> 328, 28 -> 328, 29 -> 328, 30 -> 328, 31 -> 328, 32 -> 328, 33 -> 328, 34 -> 328, 35 -> 328, 36 -> 328, 37 -> 328, 38 -> 328, 39 -> 328, 40 -> 328, 41 -> 328, 42 -> 328, 43 -> 328, 44 -> 328, 45 -> 328, 46 -> 328, 47 -> 328, 48 -> 328, 49 -> 328, 50 -> 328, 51 -> 328, 52 -> 328, 53 -> 328, 54 -> 328, 55 -> 328, 56 -> 328, 57 -> 328, 58 -> 328, 59 -> 328, 60 -> 328, 61 -> 328, 62 -> 328, 63 -> 328, 64 -> 328, 65 -> 328, 14
  • 21. 66 -> 328, 67 -> 328, 68 -> 327, 69 -> 327, 70 -> 327, 71 -> 327, 72 -> 327, 73 -> 327, 74 -> 327, 75 -> 327, 76 -> 327, 77 -> 327, 78 -> 327, 79 -> 327, 80 -> 327, 81 -> 327, 82 -> 327, 83 -> 327, 84 -> 327, 85 -> 327, 86 -> 327, 87 -> 327, 88 -> 327, 89 -> 327, 90 -> 327, 91 -> 327, 92 -> 327, 93 -> 327, 94 -> 327, 95 -> 327, 96 -> 327, 97 -> 327, 98 -> 327, 99 -> 327|> …now larger part of numbers happens slightly seldom, these are 68...99. This is sometimes called modulo bias. It’s perhaps acceptable for videogames, but may be critical for scientific simu- lations, including Monte Carlo method. Constructing a PRNG1 with uniform distribution may be tricky, there are couple of methods: http://www.reddit.com/r/algorithms/comments/39tire/using_a_01_generator_generate_a_random_number/, http://www.prismmodelchecker.org/casestudies/dice.php. 2.2 Modulo inverse, part I Example: this piece of code divides by 17: #include <stdio.h> #include <stdint.h> uint32_t div17 (uint32_t a) { return a*0xf0f0f0f1; }; int main() { printf ("%dn", div17(1700)); // result=100 printf ("%dn", div17(34)); // result=2 printf ("%dn", div17(2091)); // result=123 }; How it works? Let’s imagine, we work on 4-bit CPU, it has 4-bit registers, each can hold a value in 0..15 range. Now we want to divide by 3 using multiplication. Let’s find modulo inverse of 3 using Wolfram Mathematica: In[]:= PowerMod[3, -1, 16] Out[]= 11 This is in fact solution of a 3m = 16k + 1 equation (16 = 24): In[]:= FindInstance[3 m == 16 k + 1, {m, k}, Integers] Out[]= {{m -> 11, k -> 2}} 1 Pseudorandom number generator 15
  • 22. The ”magic number” for division by 3 is 11. Multiply by 11 instead of dividing by 3 and you’ll get a result (quotient). This works, let’s divide 6 by 3. We can now do this by multiplying 6 by 11, this is 66=0x42, but on 4-bit register, only 0x2 will be left in register (0x42 ≡ 2 mod 24). Yes, 2 is correct answer, 6/3=2. Let’s divide 3, 6 and 9 by 3, by multiplying by 11 (m). |123456789 abcdef0 | 1 2 . . . f0 |123456789 abcdef0 | 1 2 . . . f0 |123456789 abcdef0 | 1 2 . . . f0 |123456789 abcdef0 | m=11 |*********** | . . . | | . . . | | . . . | | 3/3 3m=33 |****************|** . . . **|* | . . . | | . . . | | 6/3 6m=66 |****************|** . . . **|****************|** . . . **|** | . . . | | 9/3 9m=99 |****************|** . . . **|****************|** . . . **|****************|** . . . **|*** | A “protruding” asterisk(s) (“*”) in the last non-empty chunk is what will be left in 4-bit register. This is 1 in case of 33, 2 if 66, 3 if 99. In fact, this ”protrusion” is defined by 1 in the equation we’ve solved. Let’s replace 1 with 2: In[]:= FindInstance[3 m == 16 k + 2, {m, k}, Integers] Out[]= {{m -> 6, k -> 1}} Now the new ”magic number” is 6. Let’s divide 3 by 3. 3*6=18=0x12, 2 will be left in 4-bit register. This is incorrect, we have 2 instead of 1. 2 asterisks are ”protruding”. Let’s divide 6 by 3. 6*6=36=0x24, 4 will be left in the register. This is also incorrect, we now have 4 ”protruding” asterisks instead of correct 2. Replace 1 in the equation by 0, and nothing will ”protrude”. 2.2.1 No remainder? Now the problem: this only works for dividends in 3x form, i.e., which can be divided by 3 with no remainder. Try to divide 4 by 3, 4*11=44=0x2c, 12 will be left in register, this is incorrect. The correct quotient is 1. We can also notice that the 4-bit register is ”overflown” during multiplication twice as much as in ”incorrect” result in low 4 bits. Here is what we can do: use only high 4 bits and drop low 4 bits. 4*11=0x2c and 2 is high 4 bits. Divide 2 by 2, this is 1. Let’s ”divide” 8 by 3. 8*11=88=0x58. 5/2=2, this is correct answer again. Now this is the formula we can use on our 4-bit CPU to divide numbers by 3: ”x*3 » 4 / 2” or ”x*3 » 5”. This is the same as almost all modern compilers do instead of integer division, but they do this for 32-bit and 64-bit registers. 2.3 Modulo inverse, part II A very simple function: int f(int a) { return a/9; }; If compiled by non-optimizing GCC 4.4.1... Listing 2.1: Non-optimizing GCC 4.4.1 public f f proc near 16
  • 23. arg_0 = dword ptr 8 push ebp mov ebp, esp mov ecx, [ebp+arg_0] mov edx, 954437177 ; 38E38E39h mov eax, ecx imul edx sar edx, 1 mov eax, ecx sar eax, 1Fh mov ecx, edx sub ecx, eax mov eax, ecx pop ebp retn f endp And it can be rewritten back to pure C like that: #include <stdint.h> uint32_t divide_by_9 (uint32_t a) { return ((uint64_t)a * (uint64_t)954437177) >> 33; // 954437177 = 0x38e38e39 }; This function still works, without division operation. How? From school-level mathematics, we can remember that division by 9 can be replaced by multiplication by 1 9. In fact, sometimes compilers do so for floating-point arithmetics, for example, FDIV instruction in x86 code can be replaced by FMUL. At least MSVC 6.0 will replace division by 9 by multiplication by 0.111111... and sometimes it’s hard to be sure, what operation was in the original source code. But when we operate over integer values and integer CPU registers, we can’t use fractions. However, we can rework fraction like that: result = x 9 = x · 1 9 = x · 1·MagicNumber 9·MagicNumber Given the fact that division by 2n is very fast (using shifts), we now should find that MagicNumber, for which the following equation will be true: 2n = 9 · MagicNumber. Division by 232 is somewhat hidden: lower 32-bit of product in EAX is not used (dropped), only higher 32-bit of product (in EDX) is used and then shifted by additional 1 bit. In other words, the assembly code we have just seen multiplicates by 954437177 232+1 , or divides by 232+1 954437177. To find a divisor we just have to divide numerator by denominator. Using Wolfram Alpha, we can get 8.99999999.... as result (which is close to 9). Many people miss “hidden” division by 232 or 264, when lower 32-bit part (or 64-bit part) of product is not used. This is why division by multiplication is difficult to understand at the beginning. 17
  • 24. 2.4 Reversible linear congruential generator LCG2 PRNG is very simple: just multiply seed by some value, add another one and here is a new random number. Here is how it is implemented in MSVC (the source code is not original one and is reconstructed by me):</p> uint32_t state; uint32_t rand() { state=state*214013+2531011; return (state >>16)&0x7FFF; }; The last bit shift is attempt to compensate LCG weakness and we may ignore it so far. Will it be possible to make an inverse function to rand(), which can reverse state back? First, let’s try to think, what would make this possible? Well, if state internal variable would be some kind of BigInt or BigNum container which can store infinitely big numbers, then, although state is increasing rapidly, it would be possible to reverse the process. But state isn’t BigInt/BigNum, it’s 32-bit variable, and summing operation is easily reversible on it (just subtract 2531011 at each step). As we may know now, multiplication is also reversible: just multiply the state by modular multiplicative inverse of 214013! #include <stdio.h> #include <stdint.h> uint32_t state; void next_state() { state=state*214013+2531011; }; void prev_state() { state=state -2531011; // reverse summing operation state=state*3115528533; // reverse multiply operation. 3115528533 is modular inverse of 214013 in 232. }; int main()[style=customc] { state=12345; printf ("state=%dn", state); next_state(); printf ("state=%dn", state); next_state(); printf ("state=%dn", state); next_state(); printf ("state=%dn", state); prev_state(); 2 Linear congruential generator 18
  • 25. printf ("state=%dn", state); prev_state(); printf ("state=%dn", state); prev_state(); printf ("state=%dn", state); }; Wow, that works! state=12345 state=-1650445800 state=1255958651 state=-456978094 state=1255958651 state=-1650445800 state=12345 It’s hard to find a real-world application of reversible LCG, but this could be the one: a media player with forward/back- ward buttons. Once shuffle is clicked, random number is generated (number of item to be played). User clicks forward: get a new random number by calculating the next state. User clicks backward: get it by calculating the previous state. Thus, ausercouldnavigatethroughsome”virtual”(butconsistent)playlist, whichisevennotpresentinmediaplayer’s memory! 2.5 Getting magic number using extended Euclidean algorithm Extended Euclidean algorithm can find x/y for given a/b in the diophantine equation: ax + by = gcd(a, b). x/y are also known as “Bézout coefficients”. However, since a/b are coprime to each other, gcd(a, b) = 1, and the algorithm will find x/y for the ax + by = 1 equation. Let’s plug a = 3 and b = 216 (like if we finding magic constant for 16-bit CPU): #include <assert.h> #include <stdio.h> #include <stdint.h> // copypasted and reworked from https://en.wikipedia.org/wiki/Extended_Euclidean_algorithm void EGCD (int a, int b) { int s=0; int old_s=1; int t=1; int old_t=0; int r=b; int old_r=a; int tmp; while (r!=0) { int quotient=old_r/r; tmp=r; r=old_r - quotient*r; old_r=tmp; tmp=s; s=old_s - quotient*s; old_s=tmp; tmp=t; t=old_t - quotient*t; old_t=tmp; 19
  • 26. }; printf ("GCD: %dn", old_r); if (old_r!=1) { printf("%d and %d are not coprimes!n", a, b); return; }; printf ("Bézout coefficients: %d %dn", old_s, old_t); printf ("quotients by the GCD (s,t): %d %dn", s, t); // see also: https://math.stackexchange.com/q/1310415 if (old_s >=0 && old_t <=0) printf ("corrected coefficients: %d(0x%x) %d(0x%x)n", old_s, old_s, -old_t, -old_t); else if (old_s <=0 && old_t >=0) printf ("corrected coefficients: %d(0x%x) %d(0x%x)n", old_s+b, old_s+b, -old_t+a, -old_t+a); else assert(0); }; void main() { EGCD(3, 0x10000); }; GCD: 1 Bézout coefficients: -21845 1 quotients by the GCD (s,t): 65536 -3 corrected coefficients: 43691(0xaaab) 2(0x2) That means, x=-21845, y=1. Is it correct? −21845 ∗ 3 + 65536 ∗ 1 = 1 65536-21845=0xaaab, and this is the magic number for division by 3 on 16-bit CPU. GCD(any_odd_number, 2n) = 1 for any values. Hence, we can find a magic number for any even number. But, this is not true for even numbers. We can’t find magic coefficient for even number like 10. But you can find it for 5, and then add additional division by 2 using shift right. If you try to compile x 10 code, GCC can do it like: push %ebp mov %esp,%ebp mov 0x8(%ebp),%eax mov $0xcccccccd ,%edx mul %edx mov %edx,%eax shr $0x3,%eax pop %ebp ret 20
  • 27. This is in fact the magic number for division by 5. And there is 3 instead of 2 in the SHR instruction, so the result is divided by 2. Extended Euclidean algorithm is probably an efficient way of finding magic number, but needless to say, this equation can be solved using other ways. In “SAT/SMT by Example”3you can find a method of finding ”magic number” using SMT-solver. 3 https://sat-smt.codes/ 21
  • 28. 22
  • 29. Chapter 3 Probability 3.1 Text strings right in the middle of compressed data You can download Linux kernels and find English words right in the middle of compressed data: % wget https://www.kernel.org/pub/linux/kernel/v4.x/linux -4.10.2.tar.gz % xxd -g 1 -seek 0x515c550 -l 0x30 linux -4.10.2.tar.gz 0515c550: c5 59 43 cf 41 27 85 54 35 4a 57 90 73 89 b7 6a .YC.A'.T5JW.s..j 0515c560: 15 af 03 db 20 df 6a 51 f9 56 49 52 55 53 3d da .... .jQ.VIRUS=. 0515c570: 0e b9 29 24 cc 6a 38 e2 78 66 09 33 72 aa 88 df ..)$.j8.xf.3r... % wget https://cdn.kernel.org/pub/linux/kernel/v2.3/linux -2.3.3.tar.bz2 % xxd -g 1 -seek 0xa93086 -l 0x30 linux -2.3.3.tar.bz2 00a93086: 4d 45 54 41 4c cd 44 45 2d 2c 41 41 54 94 8b a1 METAL.DE-,AAT... 00a93096: 5d 2b d8 d0 bd d8 06 91 74 ab 41 a0 0a 8a 94 68 ]+......t.A....h 00a930a6: 66 56 86 81 68 0d 0e 25 6b b6 80 a4 28 1a 00 a4 fV..h..%k...(... One of Linux kernel patches in compressed form has the “Linux” word itself: % wget https://cdn.kernel.org/pub/linux/kernel/v4.x/testing/patch -4.6-rc4.gz % xxd -g 1 -seek 0x4d03f -l 0x30 patch -4.6-rc4.gz 0004d03f: c7 40 24 bd ae ef ee 03 2c 95 dc 65 eb 31 d3 f1 .@$.....,..e.1.. 0004d04f: 4c 69 6e 75 78 f2 f3 70 3c 3a bd 3e bd f8 59 7e Linux..p<:.>..Y∼ 0004d05f: cd 76 55 74 2b cb d5 af 7a 35 56 d7 5e 07 5a 67 .vUt+...z5V.^.Zg Other English words I’ve found in other compressed Linux kernel trees: linux -4.6.2.tar.gz: [maybe] at 0x68e78ec linux -4.10.14.tar.xz: [OCEAN] at 0x6bf0a8 linux -4.7.8.tar.gz: [FUNNY] at 0x29e6e20 linux -4.6.4.tar.gz: [DRINK] at 0x68dc314 23
  • 30. linux -2.6.11.8.tar.bz2: [LUCKY] at 0x1ab5be7 linux -3.0.68.tar.gz: [BOOST] at 0x11238c7 linux -3.0.16.tar.bz2: [APPLE] at 0x34c091 linux -3.0.26.tar.xz: [magic] at 0x296f7d9 linux -3.11.8.tar.bz2: [TRUTH] at 0xf635ba linux -3.10.11.tar.bz2: [logic] at 0x4a7f794 There is a nice illustration of apophenia and pareidolia (human’s mind ability to see faces in clouds, etc) in Lurkmore, Russian counterpart of Encyclopedia Dramatica. As they wrote in the article about electronic voice phenomenon1, you can open any long enough compressed file in hex editor and find well-known 3-letter Russian obscene word, and you’ll find it a lot: but that means nothing, just a mere coincidence. AndIwasinterestedincalculation, howbigcompressedfilemustbetocontainallpossible3-letter, 4-letter, etc, words? In my naive calculations, I’ve got this: probability of the first specific byte in the middle of compressed data stream with maximal entropy is 1 256, probability of the 2nd is also 1 256, and probability of specific byte pair is 1 256·256 = 1 2562 . Probabilty of specific triple is 1 2563 . If the file has maximal entropy (which is almost unachievable, but …) and we live in an ideal world, you’ve got to have a file of size just 2563 = 16777216, which is 16-17MB. You can check: get any compressed file, and use rafind2 to search for any 3-letter word (not just that Russian obscene one). It took ≈ 8-9 GB of my downloaded movies/TV series files to find the word “beer” in them (case sensitive). Perhaps, these movies wasn’t compressed good enough? This is also true for a well-known 4-letter English obscene word. My approach is naive, so I googled for mathematically grounded one, and have find this question: “Time until a con- secutive sequence of ones in a random bit sequence” 2. The answer is: (p−n−1)/(1−p), where p is probability of each event and n is number of consecutive events. Plug 1 256 and 3 and you’ll get almost the same as my naive calculations. So any 3-letter word can be found in the compressed file (with ideal entropy) of length 2563 =≈ 17MB, any 4-letter word — 2564 = 4.7GB (size of DVD). Any 5-letter word — 2565 =≈ 1TB. For the piece of text you are reading now, I mirrored the whole kernel.org website (hopefully, sysadmins can forgive me), and it has ≈ 430GB of compressed Linux Kernel source trees. It has enough compressed data to contain these words, however, I cheated a bit: I searched for both lowercase and uppercase strings, thus compressed data set I need is almost halved. This is quite interesting thing to think about: 1TB of compressed data with maximal entropy has all possible 5-byte chains, but the data is encoded not in chains itself, but in the order of chains (no matter of compression algorithm, etc). Now the information for gamblers: one should throw a dice ≈ 42 times to get a pair of six, but no one will tell you, when exactly this will happen. I don’t remember, how many times coin was tossed in the “Rosencrantz & Guildenstern Are Dead” movie, but one should toss it ≈ 2048 times and at some point, you’ll get 10 heads in a row, and at some other point, 10 tails in a row. Again, no one will tell you, when exactly this will happen. Compressed data can also be treated as a stream of random data, so we can use the same mathematics to determine probabilities, etc. If you can live with strings of mixed case, like “bEeR”, probabilities and compressed data sets are much lower: 1283 = 2MB for all 3-letter words of mixed case, 1284 = 268MB for all 4-letter words, 1285 = 34GB for all 5-letter words, etc. Moral of the story: whenever you search for some patterns, you can find it in the middle of compressed blob, but that means nothing else then coincidence. In philosophical sense, this is a case of selection/confirmation bias: you find 1 http://archive.is/gYnFL 2 http://math.stackexchange.com/questions/27989/time-until-a-consecutive-sequence-of-ones-in-a-random-bit-sequence/ 27991#27991 24
  • 31. what you search for in “The Library of Babel” 3. 3.2 Autocomplete using Markov chains TL;DR: collect statistics, for a given natural language, what words come most often after a word/pair of words/triplet of words. What are most chess moves played after 1.e4 e5 2.Nf3 Nf6? A big database of chess games can be queried, showing statistics: ( from https://chess-db.com/ ) Statistics shown is just number of games, where a corresponding 3rd move was played. The same database can be made for natural language words. 3.2.1 Dissociated press Thisisawell-knownjoke: https://en.wikipedia.org/wiki/Dissociated_press,https://en.wikipedia.org/ wiki/SCIgen, https://en.wikipedia.org/wiki/Mark_V._Shaney. I wrote a Python script, took Sherlock Holmes stories from Gutenberg library... #!/usr/bin/env python3 # -*- coding: utf-8 -*- # python 3! due to random.choices() import operator , random from collections import defaultdict 3 A short story by Jorge Luis Borges 25
  • 32. with open ("all.txt", "r") as myfile: data=myfile.read() sentences=data.lower().replace('r',' ').replace('n',' ').replace('?','.'). replace('!','.').replace('“','.').replace('”','.').replace(""",".").replace( '‘',' ').replace('-',' ').replace('’',' ').replace(''',' ').split(".") def remove_empty_words(l): return list(filter(lambda a: a != '', l)) # key=list of words, as a string, delimited by space # (I would use list of strings here as key, but list in not hashable) # val=dict, k: next word; v: occurrences first={} second={} def update_occ(d, seq, w): if seq not in d: d[seq]=defaultdict(int) d[seq][w]=d[seq][w]+1 for s in sentences: words=s.replace(',',' ').split(" ") words=remove_empty_words(words) if len(words)==0: continue for i in range(len(words)): if i>=1: update_occ(first, words[i-1], words[i]) if i>=2: update_occ(second , words[i-2]+" "+words[i-1], words[i]) """ print ("first table:") for k in first: print (k) # https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary -by- value s=sorted(first[k].items(), key=operator.itemgetter(1), reverse=True) print (s[:20]) print ("") """ """ print ("second table:") for k in second: print (k) # https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary -by- value s=sorted(second[k].items(), key=operator.itemgetter(1), reverse=True) 26
  • 33. print (s[:20]) print ("") """ text=["it", "is"] # https://docs.python.org/3/library/random.html#random.choice def gen_random_from_tbl(t): return random.choices(list(t.keys()), weights=list(t.values()))[0] text_len=len(text) # generate at most 100 words: for i in range(200): last_idx=text_len -1 tmp=text[last_idx -1]+" "+text[last_idx] if tmp in second: new_word=gen_random_from_tbl(second[tmp]) else: # fall-back to 1st order tmp2=text[last_idx] if tmp2 not in first: # dead-end break new_word=gen_random_from_tbl(first[tmp2]) text.append(new_word) text_len=text_len+1 print (" ".join(text)) And here are some 1st-order Markov chains. First part is a first word. Second is a list of words + probabilities of appear- ance of each one, in the Sherlock Holmes stories. However, probabilities are in form of words’ numbers. In other word, how often each word appears after a word? return [('to', 28), ('or', 12), ('of', 8), ('the', 8), ('for', 5), ('with', 4), ('by', 4), ('journey ', 4), ('and', 3), ('when', 2), ('ticket', 2), ('from', 2), ('at ', 2), ('you', 2), ('i', 2), ('since', 1), ('on', 1), ('caused', 1), ('but', 1), ('it', 1)] of [('the', 3206), ('a', 680), ('his', 584), ('this', 338), ('it', 304), ('my', 295), ('course', 225), ('them', 167), ('that', 138), ('your', 137), ('our', 135), ('us', 122), ('her', 116), ('him', 111), ('an', 105), ('any', 102), (' these', 92), ('which', 89), ('all', 82), ('those', 75)] by 27
  • 34. [('the', 532), ('a', 164), ('his', 67), ('my', 39), ('no', 34), ('this', 31), (' an', 30), ('which', 29), ('your', 21), ('that', 19), ('one', 17), ('all', 17) , ('jove', 16), ('some', 16), ('sir', 13), ('its', 13), ('him', 13), ('their ', 13), ('it', 11), ('her', 10)] this [('is', 177), ('agreement ', 96), ('morning ', 81), ('was', 61), ('work', 60), (' man', 56), ('case', 43), ('time', 39), ('matter', 37), ('ebook', 36), ('way', 36), ('fellow', 28), ('room', 27), ('letter', 24), ('one', 24), ('i', 23), ('young', 21), ('very', 20), ('project ', 18), ('electronic ', 18)] no [('doubt', 179), ('one', 134), ('sir', 61), ('more', 56), ('i', 55), ('no', 42), ('other', 36), ('means', 36), ('sign', 34), ('harm', 23), ('reason', 22), (' difficulty ', 22), ('use', 21), ('idea', 20), ('longer', 20), ('signs', 18), ('good', 17), ('great', 16), ('trace', 15), ('man', 15)] Now some snippets from 2nd-order Markov chains. the return [('of', 8), ('track', 1), ('to', 1)] return of [('sherlock ', 6), ('lady', 1), ('the', 1)] for the [('use', 22), ('first', 12), ('purpose ', 9), ('sake', 8), ('rest', 7), ('best', 7), ('crime', 6), ('evening ', 6), ('ebooks', 6), ('limited ', 6), ('moment', 6), ('day', 5), ('future', 5), ('last', 5), ('world', 5), ('time', 5), ('loss ', 4), ('second', 4), ('night', 4), ('remainder ', 4)] use of [('the', 15), ('anyone', 12), ('it', 6), ('project ', 6), ('and', 6), ('having ', 2), ('this', 1), ('my', 1), ('a', 1), ('horses', 1), ('some', 1), ('arguing ', 1), ('troubling ', 1), ('artificial ', 1), ('invalids ', 1), ('cocaine ', 1), (' disguises ', 1), ('an', 1), ('our', 1)] you may [('have', 17), ('copy', 13), ('be', 13), ('do', 9), ('choose', 7), ('use', 6), ('obtain', 6), ('convert ', 6), ('charge', 6), ('demand', 6), ('remember ', 5), ('find', 5), ('take', 4), ('think', 3), ('possibly ', 3), ('well', 3), ('know ', 3), ('not', 3), ('say', 3), ('imagine ', 3)] the three [('of', 8), ('men', 4), ('students ', 2), ('glasses ', 2), ('which', 1), ('strips ', 1), ('is', 1), ('he', 1), ('gentlemen ', 1), ('enterprising ', 1), ('massive ', 1), ('quarter ', 1), ('randalls ', 1), ('fugitives ', 1), ('animals ', 1), (' shrieked ', 1), ('other', 1), ('murderers ', 1), ('fir', 1), ('at', 1)] it was 28
  • 35. [('a', 179), ('the', 78), ('not', 68), ('only', 40), ('in', 30), ('evident ', 28) , ('that', 28), ('all', 25), ('to', 21), ('an', 18), ('my', 17), ('at', 17), ('impossible ', 17), ('indeed', 15), ('no', 15), ('quite', 15), ('he', 14), (' of', 14), ('one', 12), ('on', 12)] Now two words is a key in dictionary. And we see here an answer for the question ”how often each words appears after a sequence of two words?” Now let’s generate some rubbish: it is just to the face of granite still cutting the lower part of the shame which i would draw up so as i can t have any visitors if he is passionately fond of a dull wrack was drifting slowly in our skin before long vanished in front of him of this arch rebel lines the path which i had never set foot in the same with the heads of these have been supplemented or more waiting to tell me all that my assistant hardly knowing whether i had 4 pounds a month however is foreign to the other hand were most admirable but because i know why he broke into a curve by the crew had thought it might have been on a different type they were about to set like a plucked chicken s making the case which may help to play it at the door and called for the convenience of a boat sir maybe i could not be made to draw some just inference that a woman exactly corresponding to the question was how a miners camp had been dashed savagely against the rock in front of the will was drawn across the golden rays and it it is the letter was composed of a tall stout official had come the other way that she had been made in this i expect that we had news that the office of the mud settling or the most minute exactness and astuteness represented as i come to see the advantages which london then and would i could not speak before the public domain ebooks in compliance with any particular paper edition as to those letters come so horribly to his feet upon the wet clayey soil but since your wife and of such damage or cannot be long before the rounds come again whenever she might be useful to him so now my fine fellow will trouble us again and again and making a little wizened man darted out of the baskervilles *** produced by roger squires updated editions will be the matter and let out forty three diamonds of the attack we carried him into the back and white feather were but a terrible fellow he is not for your share in an instant holmes clapped his hands and play with me anyhow i d ha known you under that name in a considerable treasure was hid for no other it is the unofficial force the shutter open but so far as to what i say to me like that which is rather too far from at ease for i knew that we were driving; but soon it came just as he opened it myself i was not a usual signal between you and you thought it might be removed to a magistrate than upon the luminous screen of the one word would he have wanted there are business relations between him and we listened to his eyes to the spot as soon as their master s affairs of life since she was but one day we hoped would screen me from under his arm chair of shining red leather chair his flask in his chair and gave me his name is sherlock holmes took each face of a barmaid in bristol and marry her but learned from dr 29
  • 36. it is selden the criminal or lunatic who had left a match might mar my career had reached the tower of the crime was committed before twelve foolish tradesmen in a bad lot though the authorities are excellent at amassing facts though what its object might be a better nerve for the creature flapped and struggled and writhed his hands in her bosom lady hilda was down on the table and the curious will so suddenly upon the floor were now nearly five thousand pounds will be impossible but i struck him down and roared into the shadow of the thing seemed simplicity itself said i you seem most fortunate in having every characteristic of the place is deserted up to keep some clandestine appointment and found as i have no desire to settle this little matter of fact of absolute ignorance and he with a similar pearl without any light upon what he had found ourselves at the time and my heart sank for barrymore at the close of november and holmes fears came to think over the afghan campaign yet shown by this time at which he now and i have seen his death this morning he walked straight into By first look, these pieces of text are visually OK, but it is senseless. Some people (including me) find it amusing. Spammers also use this technique to make email message visually similar to a meaningful text, albeit it is not mean- ingful at all, rather absurdic and funny. 3.2.2 Autocomplete It’s surprising how easy this can be turned into something rather practically useful. #!/usr/bin/env python3 # -*- coding: utf-8 -*- import operator from collections import defaultdict with open ("all.txt", "r") as myfile: data=myfile.read() sentences=data.lower().replace('r',' ').replace('n',' ').replace('?','.'). replace('!','.').replace('“','.').replace('”','.').replace(""",".").replace( '‘',' ').replace('-',' ').replace('’',' ').replace(''',' ').split(".") def remove_empty_words(l): return list(filter(lambda a: a != '', l)) # key=list of words, as a string, delimited by space # (I would use list of strings here as key, but list in not hashable) # val=dict, k: next word; v: occurrences first={} second={} third={} def update_occ(d, seq, w): if seq not in d: d[seq]=defaultdict(int) 30
  • 37. d[seq][w]=d[seq][w]+1 for s in sentences: words=s.replace(',',' ').split(" ") words=remove_empty_words(words) if len(words)==0: continue for i in range(len(words)): # only two words available: if i>=1: update_occ(first, words[i-1], words[i]) # three words available: if i>=2: update_occ(second , words[i-2]+" "+words[i-1], words[i]) # four words available: if i>=3: update_occ(third, words[i-3]+" "+words[i-2]+" "+words[i-1], words[i ]) """ print ("third table:") for k in third: print (k) # https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary -by- value s=sorted(third[k].items(), key=operator.itemgetter(1), reverse=True) print (s[:20]) print ("") """ test="i can tell" #test="who did this" #test="she was a" #test="he was a" #test="i did not" #test="all that she" #test="have not been" #test="who wanted to" #test="he wanted to" #test="wanted to do" #test="it is just" #test="you will find" #test="you shall" #test="proved to be" test_words=test.split(" ") test_len=len(test_words) last_idx=test_len -1 31
  • 38. def print_stat(t): total=float(sum(t.values())) # https://stackoverflow.com/questions/613183/how-do-i-sort-a-dictionary-by-value s=sorted(t.items(), key=operator.itemgetter(1), reverse=True) # take 5 from each sorted table for pair in s[:5]: print ("%s %d%%" % (pair[0], (float(pair[1])/total)*100)) if test_len >=3: tmp=test_words[last_idx -2]+" "+test_words[last_idx -1]+" "+test_words[ last_idx] if tmp in third: print ("* third order. for sequence:",tmp) print_stat(third[tmp]) if test_len >=2: tmp=test_words[last_idx -1]+" "+test_words[last_idx] if tmp in second: print ("* second order. for sequence:", tmp) print_stat(second[tmp]) if test_len >=1: tmp=test_words[last_idx] if tmp in first: print ("* first order. for word:", tmp) print_stat(first[tmp]) print ("") First, let’s also make 3rd-order Markov chains tables. There are some snippets from it: the return of [('sherlock ', 6), ('lady', 1), ('the', 1)] the use of [('the', 13), ('anyone', 12), ('project ', 6), ('having', 2), ('a', 1), ('horses ', 1), ('some', 1), ('troubling ', 1), ('artificial ', 1), ('invalids ', 1), (' disguises ', 1), ('it', 1)] of the second [('stain', 5), ('floor', 3), ('page', 1), ('there', 1), ('person', 1), ('party', 1), ('day', 1)] it was in [('the', 9), ('vain', 5), ('a', 4), ('his', 2), ('83', 1), ('one', 1), ('motion ', 1), ('truth', 1), ('my', 1), ('this', 1), ('march', 1), ('january ', 1), (' june', 1), ('me', 1)] was in the 32
  • 39. [('room', 3), ('habit', 3), ('house', 2), ('last', 2), ('loft', 2), ('spring ', 1), ('train', 1), ('bar', 1), ('power', 1), ('immediate ', 1), ('year', 1), (' midst', 1), ('forenoon ', 1), ('centre', 1), ('papers', 1), ('best', 1), (' darkest ', 1), ('prime', 1), ('hospital ', 1), ('nursery ', 1)] murder of the [('honourable ', 1), ('discoverer ', 1)] particulars of the [('crime', 1), ('inquest ', 1), ('habits ', 1), ('voyage', 1)] the death of [('the', 8), ('sir', 8), ('captain ', 3), ('their', 2), ('this', 2), ('his', 2), ('her', 2), ('dr', 2), ('sherlock ', 1), ('mrs', 1), ('a', 1), ('van', 1), (' that', 1), ('drebber ', 1), ('mr', 1), ('stangerson ', 1), ('two', 1), ('selden ', 1), ('one', 1), ('wallenstein ', 1)] one of the [('most', 23), ('best', 3), ('main', 3), ('oldest', 2), ('greatest ', 2), (' numerous ', 2), ('corners ', 2), ('largest ', 2), ('papers', 2), ('richest ', 2), ('servants ', 2), ('moor', 2), ('finest', 2), ('upper', 2), ('very', 2), (' broad', 2), ('side', 2), ('highest ', 2), ('australian ', 1), ('great', 1)] so far as [('i', 17), ('to', 8), ('that', 2), ('the', 2), ('it', 2), ('was', 1), ('we', 1) , ('they', 1), ('he', 1), ('you', 1), ('a', 1), ('your', 1), ('this', 1), (' his', 1), ('she', 1)] You see, they looks as more precise, but tables are just smaller. You can’t use them to generate rubbish. 1st-order tables big, but ”less precise”. And here I test some 3-words queries, like as if they inputted by user: "i can tell" * third order. for sequence: i can tell you 66% my 16% them 16% * second order. for sequence: can tell you 23% us 17% me 11% your 11% this 5% * first order. for word: tell you 35% me 25% us 6% him 5% the 4% 33
  • 40. "she was a" * third order. for sequence: she was a blonde 12% child 6% passenger 6% free 6% very 6% * second order. for sequence: was a very 4% man 3% small 3% long 2% little 2% * first order. for word: a very 2% man 2% little 2% few 2% small 1% "he was a" * third order. for sequence: he was a man 11% very 7% young 3% tall 3% middle 2% * second order. for sequence: was a very 4% man 3% small 3% long 2% little 2% * first order. for word: a very 2% man 2% little 2% few 2% small 1% "i did not" * third order. for sequence: i did not know 22% say 9% even 3% tell 3% 34
  • 41. mind 3% * second order. for sequence: did not know 11% take 3% wish 3% go 2% say 2% * first order. for word: not a 4% be 4% have 3% been 3% to 2% "all that she" * third order. for sequence: all that she said 100% * second order. for sequence: that she had 22% was 19% would 7% is 5% has 5% * first order. for word: she was 12% had 10% is 5% said 3% would 3% "have not been" * third order. for sequence: have not been able 25% here 25% employed 12% shadowed 12% personally 12% * second order. for sequence: not been for 9% in 6% there 6% a 4% slept 4% * first order. for word: been a 5% in 4% the 2% so 2% 35
  • 42. taken 1% "who wanted to" * third order. for sequence: who wanted to see 100% * second order. for sequence: wanted to see 15% know 15% speak 10% ask 10% hide 5% * first order. for word: to the 11% be 6% me 4% his 2% my 2% "he wanted to" * third order. for sequence: he wanted to know 50% do 50% * second order. for sequence: wanted to see 15% know 15% speak 10% ask 10% hide 5% * first order. for word: to the 11% be 6% me 4% his 2% my 2% "wanted to do" * third order. for sequence: wanted to do the 100% * second order. for sequence: to do with 23% so 14% it 10% the 4% and 3% * first order. for word: do you 24% not 20% 36
  • 43. with 6% so 4% it 4% "it is just" * third order. for sequence: it is just possible 42% as 33% killing 4% such 4% in 4% * second order. for sequence: is just possible 25% as 22% the 8% a 8% to 5% * first order. for word: just as 13% now 5% a 4% to 3% the 2% "you will find" * third order. for sequence: you will find that 20% it 20% me 8% the 8% a 6% * second order. for sequence: will find it 19% that 17% the 9% me 7% a 5% * first order. for word: find that 13% the 10% it 9% out 8% a 6% "you shall" * second order. for sequence: you shall have 15% 37
  • 44. see 12% know 12% hear 9% not 9% * first order. for word: shall be 20% have 7% not 4% see 3% i 3% "proved to be" * third order. for sequence: proved to be a 42% the 14% too 4% something 4% of 4% * second order. for sequence: to be a 11% the 3% in 3% able 2% so 2% * first order. for word: be a 7% the 4% in 2% of 2% no 2% Perhaps, results from all 3 tables can be combined, with the data from 3rd order table used in highest priority (or weight). And this is it — this can be shown to user. Aside of Conan Doyle works, your software can collect user’s input to adapt itself for user’s lexicon, slang, memes, etc. Of course, user’s ”tables” should be used with highest priority. I have no idea, what Apple/Android devices using for hints generation, when user input text, but this is what I would use as a first idea. As a bonus, this can be used for language learners, to get the idea, how a word is used in a sentence. 3.2.3 Further work Comma can be a separator as well, etc... 3.2.4 The files ... includingConanDoyle’sstories(2.5M).https://github.com/DennisYurichev/Math-for-programmers/tree/ master/prob/markov. Surely, any other texts can be used, in any language... 38
  • 45. Another related post is about typos: https://yurichev.com/blog/fuzzy_string/. 3.2.5 Read more Meaningful Random Headlines by Markov Chain A discussion at hacker news 3.3 random.choices() in Python 3 This is a very useful function4: weights (or probabilities) can be added. (I used it in my Markov chains example (3.2).) For example: #!/usr/bin/env python3 import random for i in range(1000): print (random.choices("123", [25, 60, 15])) Let’s generate 1000 random numbers in 1..3 range: $ python3 tst.py | sort | uniq -c 234 ['1'] 613 ['2'] 153 ['3'] ”1” is generated in 25% of cases, ”2” in 60% and ”3” in 15%. Well, almost. Here is another use of it. You know, when you send an email, the final destination is a server somewhere. But it may be irresponsible. So network engineers add additional servers, ”relays”, which can hold your email for some time. For example, what is about gmail.com? % dig gmail.com MX ... ;; ANSWER SECTION: gmail.com. 3600 IN MX 5 gmail-smtp-in.l.google.com. gmail.com. 3600 IN MX 10 alt1.gmail-smtp-in.l.google. com. gmail.com. 3600 IN MX 20 alt2.gmail-smtp-in.l.google. com. gmail.com. 3600 IN MX 30 alt3.gmail-smtp-in.l.google. com. gmail.com. 3600 IN MX 40 alt4.gmail-smtp-in.l.google. com. 4 https://docs.python.org/3/library/random.html#random.choices 39
  • 46. ... The first server is primary (marked with 5). Other 4 (alt...) are relays. They can hold emails for user@gmail.com if the main server is down. Of course, relays also can be down. So an MTA (Message transfer agent) tries to send an email via the first server in list, then via the second, etc. If all are down, MTA is waiting for some time (not infinitely). See also: https://en.wikipedia.org/wiki/MX_record. A number (5/10/20/30/40) is priority: MX records contain a preference indication that MUST be used in sorting if more than one such record appears (see below). Lower numbers are more preferred than higher ones. If there are multiple destinations with the same preference and there is no clear reason to favor one (e.g., by recognition of an easily reached address), then the sender-SMTP MUST randomize them to spread the load across multiple mail exchangers for a specific organization. ( https://tools.ietf.org/html/rfc5321 ) NowifyouwantyourMTAbepolite, youcanmakeitpokerelayswithsomeprobability, unloadingthemainmailserver. In any case, the internal network withing Google is way better than a link between you and any of these mail servers. And it would be OK to drop an email to any of these mail servers listed in MX records. This is how a destination server can be chosen: random.choices(range(4), weights=[1/5, 1/10, 1/20, 1/40]) I’m using reciprocal weights (1/x) because the lower priority, the higher probability it is to be chosen. What if I want to send 100 emails to someone@gmail.com? >>> [random.choices(range(4), weights=[1/5, 1/10, 1/20, 1/40])[0] for x in range (100)] [1, 1, 2, 1, 0, 2, 1, 0, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 2, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 1, 3, 0, 0, 2, 1, 0, 0, 0, 0, 1, 2, 2, 1, 0, 0, 0, 1, 2, 0, 0, 0, 0, 0, 0, 3, 0, 0, 0, 1, 0, 2, 1, 0, 0, 2, 0, 0, 0, 3, 2, 0, 1, 2, 0, 1, 1, 3, 1, 1, 1, 1] 1000? (I’m using the collections.Counter5 here for gathering statistics). >>> Counter([random.choices(range(4), weights=[1/5, 1/10, 1/20, 1/40])[0] for x in range(1000)]) Counter({0: 535, 1: 268, 2: 129, 3: 68}) 535 emails will be sent via the first (primary) mail server, 268/129/68 – via corresponding relays. This is probably not how MTAs usually operates, but this is how it could be done. 5 https://docs.python.org/3/library/collections.html#collections.Counter 40
  • 47. Chapter 4 Combinatorics 4.1 Soldering a headphones cable This is a real story: I tried to repair my headphone’s cable, soldering 3 wires with minijack together. But which is left? Right? I couldn’t even find ground wire. I asked myself, how many ways there are to solder 3 wires to 3-pin minijack? I could try them all and pick the combination that sounds best. With Python’s itertools module this is just: import itertools wires=["red", "green", "blue"] for i in itertools.permutations(wires): print i ('red', 'green', 'blue') ('red', 'blue', 'green ') ('green', 'red', 'blue') ('green', 'blue', 'red') ('blue', 'red', 'green ') ('blue', 'green', 'red') (Just 6 ways.) What if there are 4 wires? import itertools wires=["red", "green", "blue", "yellow"] for i in itertools.permutations(wires): print i ('red', 'green', 'blue', 'yellow ') ('red', 'green', 'yellow', 'blue') ('red', 'blue', 'green', 'yellow ') ('red', 'blue', 'yellow', 'green ') 41
  • 48. ('red', 'yellow', 'green', 'blue') ('red', 'yellow', 'blue', 'green ') ('green', 'red', 'blue', 'yellow ') ('green', 'red', 'yellow', 'blue') ('green', 'blue', 'red', 'yellow ') ('green', 'blue', 'yellow', 'red') ('green', 'yellow', 'red', 'blue') ('green', 'yellow', 'blue', 'red') ('blue', 'red', 'green', 'yellow ') ('blue', 'red', 'yellow', 'green ') ('blue', 'green', 'red', 'yellow ') ('blue', 'green', 'yellow', 'red') ('blue', 'yellow', 'red', 'green ') ('blue', 'yellow', 'green', 'red') ('yellow', 'red', 'green', 'blue') ('yellow', 'red', 'blue', 'green ') ('yellow', 'green', 'red', 'blue') ('yellow', 'green', 'blue', 'red') ('yellow', 'blue', 'red', 'green ') ('yellow', 'blue', 'green', 'red') (24 ways.) This is what is called permutation in combinatorics. 4.2 Vehicle license plate There was a hit and run. And you’re police officer. And this is what your only witness says about 4-digit license plate of the guilty vehicle: There was 13: 1 and 3 together , I'm sure, but not sure where, 13 as first 2 digits, last 2 digits or in the middle. And there was also 6, or maybe 8, or maybe even 9, not sure which, but one of them. Combinatorics textbooks are abound with exercises like this: can you enumerate all possible 4-digit numbers con- strained in this way? import itertools part1_list=["13"] part2_list=["6", "8", "9"] part3_list=["0", "1", "2", "3", "4", "5", "6", "7", "8", "9"] for i in itertools.product(part1_list , part2_list , part3_list): for j in itertools.permutations(i): print "".join(j) 1360 1306 42
  • 49. 6130 6013 ... 9139 9913 9139 9913 ( 180 numbers ) This is something you can query registered vehicle database with... 4.3 Forgotten password You forgot a password, but this is what you remember: there was a name of your parent, or wife, or one of children. Also, someone’s year of birth. And one punctuation character, which are so recommended in passwords. Can you enumerate all possible passwords? import itertools part1_list=["jake", "melissa", "oliver", "emily"] part2_list=["1987", "1954", "1963"] part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","] for part1 in part1_list: for part2 in part2_list: for part3 in part3_list: l=[part1, part2, part3] for i in list(itertools.permutations(l)): print "".join(i) jake1987! jake!1987 1987jake! 1987!jake !jake1987 ... 1963emily, 1963,emily ,emily1963 ,1963emily ( 936 of them in total ) But nested for’s are not aesthetically pleasing. They can be replaced with ”cartesian product” operation: import itertools 43
  • 50. part1_list=["jake", "melissa", "oliver", "emily"] part2_list=["1987", "1954", "1963"] part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","] for l in itertools.product(part1_list , part2_list , part3_list): for i in list(itertools.permutations(l)): print "".join(i) And this is a way to memorize it: the length of the final result equals to lengths of all input lists multiplied with each other (like ”product”). import itertools part1_list=["jake", "melissa", "oliver", "emily"] # 4 elements part2_list=["1987", "1954", "1963"] # 3 elements part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","] # 13 elements for l in itertools.product(part1_list , part2_list , part3_list): print l ('jake', '1987', '!') ('jake', '1987', '@') ('jake', '1987', '#') ('jake', '1987', '$') ('jake', '1987', '%') ('jake', '1987', '&') ('jake', '1987', '*') ... ('emily', '1963', '*') ('emily', '1963', '-') ('emily', '1963', '=') ('emily', '1963', '_') ('emily', '1963', '+') ('emily', '1963', '.') ('emily', '1963', ',') 4*3*13=156, and this is a size of a list, to be permuted... Now the new problem: some Latin characters may be uppercased, some are lowercased. I’ll add another ”cartesian product” operation to alter a final string in all possible ways: import itertools , string part1_list=["jake", "melissa", "oliver", "emily"] part2_list=["1987", "1954", "1963"] part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","] for l in itertools.product(part1_list , part2_list , part3_list): 44
  • 51. for i in list(itertools.permutations(l)): s="".join(i) t=[] for char in s: if char.isalpha(): t.append([string.lower(char), string.upper(char)]) else: t.append([char]) for q in itertools.product(*t): print "".join(q) JAke1987! JAkE1987! JAKe1987! JAKE1987! jake!1987 jakE!1987 jaKe!1987 jaKE!1987 ... ,1963eMIly ,1963eMIlY ,1963eMILy ,1963eMILY ,1963Emily ,1963EmilY ,1963EmiLy ( 56160 passwords ) Now leetspeak1 This is somewhat popular only among youngsters, but still, this is what people of all age groups do: replacing ”o” with ”0” in their passwords, ”e” with ”3”, etc. Let’s add this as well: import itertools , string part1_list=["jake", "melissa", "oliver", "emily"] part2_list=["1987", "1954", "1963"] part3_list=["!","@","#","$","%","&","*","-","=","_","+",".",","] for l in itertools.product(part1_list , part2_list , part3_list): for i in list(itertools.permutations(l)): s="".join(i) t=[] for char in s: if char.isalpha(): to_be_appended=[string.lower(char), string.upper(char)] if char.lower()=='e': 1 urbandictionary.com 45
  • 52. to_be_appended.append('3') elif char.lower()=='i': to_be_appended.append('1') elif char.lower()=='o': to_be_appended.append('0') t.append(to_be_appended) else: t.append([char]) for q in itertools.product(*t): print "".join(q) jake1987! jakE1987! jak31987! jaKe1987! jaKE1987! jaK31987! ... ,1963EM1lY ,1963EM1Ly ,1963EM1LY ,19633mily ,19633milY ,19633miLy ( 140400 passwords ) Obviously, you can’t try all 140400 passwords on Facebook, Twitter or any other well-protected internet service. But this is a peace of cake to brute-force them all on password protected RAR-archive or feed them all to John the Ripper, HashCat, etc. All the files: https://github.com/DennisYurichev/Math-for-programmers/tree/master/comb/password. Now let’s use combinations from itertools Python package. Let’s say, you remember that your password has maybe your name, maybe name of your wife, your year of birth, or her, and maybe couple of symbols like !, $, ^. import itertools parts=["den", "xenia", "1979", "1985", "secret", "!", "$", "^"] for i in range(1, 6): # 1..5 for combination in itertools.combinations(parts, i): print "".join(combination) Here we enumerate all combinations of given strings, 1-string combinations, then 2-, up to 5-string combinations. No string can appear twice. 46
  • 53. ... denxenia1979 denxenia1985 denxeniasecret denxenia! denxenia$ denxenia^ den19791985 den1979secret den1979! ... xenia1985secret$^ xenia1985!$^ xeniasecret!$^ 19791985secret!$ 19791985secret!^ ... (218 passwords) Now let’s permute all string in all possible ways: import itertools parts=["den", "xenia", "1979", "1985", "secret", "!", "$", "^"] for i in range(1, 6): # 1..5 for combination in itertools.combinations(parts, i): for permutation in itertools.permutations(list(combination)): print "".join(permutation) ... ^den xenia1979 1979xenia xenia1985 1985xenia xeniasecret secretxenia xenia! !xenia ... ^!$1985secret ^!$secret1985 ^$1985secret! ^$1985!secret ^$secret1985! ^$secret!1985 ^$!1985secret ^$!secret1985 47
  • 54. (8800 passwords) And finally, let’s alter all Latin characters in lower/uppercase ways and add leetspeek, as I did before: import itertools , string parts=["den", "xenia", "1979", "1985", "!", "$", "^"] for i in range(1, 6): # 1..5 for combination in itertools.combinations(parts, i): for permutation in itertools.permutations(list(combination)): s="".join(permutation) t=[] for char in s: if char.isalpha(): to_be_appended=[string.lower(char), string.upper(char)] if char.lower()=='e': to_be_appended.append('3') elif char.lower()=='i': to_be_appended.append('1') elif char.lower()=='o': to_be_appended.append('0') t.append(to_be_appended) else: t.append([char]) for q in itertools.product(*t): print "".join(q) ... dEnxenia dEnxeniA dEnxenIa ... D3nx3N1a D3nx3N1A D3nXenia D3nXeniA D3nXenIa ... ^$1979!1985 ^$19851979! ^$1985!1979 ^$!19791985 ^$!19851979 ( 1,348,657 passwords ) Again, you can’t try to crack remote server with so many attempts, but this is really possible for password-protected archive, known hash, etc... 48
  • 55. 4.4 Executable file watermarking/steganography using Lehmer code and factorial number system In short: how to hide information not in objects, but in order of objects. Almost any binary executable file has text strings like (these are from CRT): .rdata:0040D398 aR6002FloatingP: .rdata:0040D398 text "UTF-16LE", 'R6002 ',0Dh,0Ah .rdata:0040D398 text "UTF-16LE", '- floating point support not loaded ',0Dh,0Ah,0 .rdata:0040D3F2 align 8 .rdata:0040D3F8 aR6008NotEnough: .rdata:0040D3F8 text "UTF-16LE", 'R6008 ',0Dh,0Ah .rdata:0040D3F8 text "UTF-16LE", '- not enough space for arguments ',0Dh,0Ah,0 .rdata:0040D44C align 10h .rdata:0040D450 aR6009NotEnough: .rdata:0040D450 text "UTF-16LE", 'R6009 ',0Dh,0Ah .rdata:0040D450 text "UTF-16LE", '- not enough space for environment ',0Dh,0Ah,0 .rdata:0040D4A8 aR6010AbortHasB: .rdata:0040D4A8 text "UTF-16LE", 'R6010 ',0Dh,0Ah .rdata:0040D4A8 text "UTF-16LE", '- abort() has been called ',0Dh ,0Ah,0 .rdata:0040D4EE align 10h .rdata:0040D4F0 aR6016NotEnough: .rdata:0040D4F0 text "UTF-16LE", 'R6016 ',0Dh,0Ah .rdata:0040D4F0 text "UTF-16LE", '- not enough space for thread data',0Dh,0Ah,0 .rdata:0040D548 aR6017Unexpecte: .rdata:0040D548 text "UTF-16LE", 'R6017 ',0Dh,0Ah .rdata:0040D548 text "UTF-16LE", '- unexpected multithread lock error ',0Dh,0Ah,0 .rdata:0040D5A2 align 8 .rdata:0040D5A8 aR6018Unexpecte: .rdata:0040D5A8 text "UTF-16LE", 'R6018 ',0Dh,0Ah .rdata:0040D5A8 text "UTF-16LE", '- unexpected heap error ',0Dh,0 Ah,0 .rdata:0040D5EA align 10h Can we hide some information there? Not in string themselves, but in order of strings? Given the fact, that compiler doesn’t guarantee at all, in which order the strings will be stored in object/executable file. Let’s say, we’ve got 26 text strings, and we can swap them how we want, because their order isn’t important at all. All possible permutations of 26 objects is 26! = 403291461126605635584000000. How much information can be stored here? log2(26!)= 88, i.e., 88 bits or 11 bytes! 11 bytes of your data can be converted to a (big) number and back, OK. What is next? Naive way is: enumerate all permutations of 26 objects, number each, find permutation of the number we’vegotandpermute26textstrings,storetofileandthat’sit. Butwecan’titerateover403291461126605635584000000 49
  • 56. permutations. This is where factorial number system 2 and Lehmer code 3 can be used. In short, for all my non-mathematically inclined readers, this is a way to find a permutation of specific number without use of any significant resources. And back: gived specific permutation, we can find its number. This piece of code I’ve copypasted from https://gist.github.com/lukmdo/7049748 and reworked slightly: # python 3.x import math def iter_perm(base, *rargs): """ :type base: list :param rargs: range args [start ,] stop[, step] :rtype: generator """ if not rargs: rargs = [math.factorial(len(base))] for i in range(*rargs): yield perm_from_int(base, i) def int_from_code(code): """ :type code: list :rtype: int """ num = 0 for i, v in enumerate(reversed(code), 1): num *= i num += v return num def code_from_int(size, num): """ :type size: int :type num: int :rtype: list """ code = [] for i in range(size): num, j = divmod(num, size - i) code.append(j) return code def perm_from_code(base, code): 2 https://en.wikipedia.org/wiki/Factorial_number_system 3 https://en.wikipedia.org/wiki/Lehmer_code 50
  • 57. """ :type base: list :type code: list :rtype: list """ perm = base.copy() for i in range(len(base) - 1): j = code[i] if i != i+j: print ("swapping %d, %d" % (i, i+j)) perm[i], perm[i+j] = perm[i+j], perm[i] return perm def perm_from_int(base, num): """ :type base: list :type num: int :rtype: list """ code = code_from_int(len(base), num) print ("Lehmer code=", code) return perm_from_code(base, code) def code_from_perm(base, perm): """ :type base: list :type perm: list :rtype: list """ p = base.copy() n = len(base) pos_map = {v: i for i, v in enumerate(base)} w = [] for i in range(n): d = pos_map[perm[i]] - i w.append(d) if not d: continue t = pos_map[perm[i]] pos_map[p[i]], pos_map[p[t]] = pos_map[p[t]], pos_map[p[i]] p[i], p[t] = p[t], p[i] return w def int_from_perm(base, perm): 51
  • 58. """ :type base: list :type perm: list :rtype: int """ code = code_from_perm(base, perm) return int_from_code(code) def bin_string_to_number(s): rt=0 for i, c in enumerate(s): rt=rt<<8 rt=rt+ord(c) return rt def number_to_bin_string(n): rt="" while True: r=n & 0xff rt=rt+chr(r) n=n>>8 if n==0: break return rt[::-1] s="HelloWorld" print ("s=", s) num=bin_string_to_number (s) print ("num=", num) perm=perm_from_int(list(range(26)), num) print ("permutation/order=", perm) num2=int_from_perm(list(range(26)), [14, 17, 9, 19, 11, 16, 23, 0, 2, 13, 20, 18, 21, 24, 10, 1, 22, 4, 7, 6, 15, 12, 5, 3, 8, 25]) print ("recovered num=", num2) s2=number_to_bin_string(num2) print ("recovered s=", s2) I’m encoding a ”HelloWorld” binary string (in fact, any 11 bytes can be used) into a number. Number is then converted into Lehmer code. Then the perm_from_code() function permute initial order according to the input Lehmer code: s= HelloWorld num= 341881320659934023674980 Lehmer code= [14, 16, 7, 16, 7, 11, 17, 7, 1, 4, 10, 7, 9, 11, 6, 2, 6, 1, 2, 4, 0, 0, 0, 0, 0, 0] swapping 0, 14 swapping 1, 17 swapping 2, 9 swapping 3, 19 swapping 4, 11 52
  • 59. swapping 5, 16 swapping 6, 23 swapping 7, 14 swapping 8, 9 swapping 9, 13 swapping 10, 20 swapping 11, 18 swapping 12, 21 swapping 13, 24 swapping 14, 20 swapping 15, 17 swapping 16, 22 swapping 17, 18 swapping 18, 20 swapping 19, 23 permutation/order= [14, 17, 9, 19, 11, 16, 23, 0, 2, 13, 20, 18, 21, 24, 10, 1, 22, 4, 7, 6, 15, 12, 5, 3, 8, 25] This is it: [14, 17, 9, 19, 11, 16, 23, 0, 2, 13, 20, 18, 21, 24, 10, 1, 22, 4, 7, 6, 15, 12, 5, 3, 8, 25]. First put 14th string, then 17s string, then 9th one, etc. Now you’ve got a binary file from someone and want to read watermark from it. Get an order of strings from it and convert it back to binary string: recovered num= 341881320659934023674980 recovered s= HelloWorld If you have more text strings (not unusual for most executable files), you can encode more. 100 strings: log2(100!) = 524 bits = 65 bytes. 1000 strings: log2(1000!) = 8529 bits = 1066 bytes! You can store some text here! How would you force a C/C++ compiler to make specific order of text strings? This is crude, but workable: char blob[]="hello10hello20"; char *msg1=blob; char *msg2=blob+8; printf ("%sn", msg1); printf ("%sn", msg2); They can be even aligned on 16-byte border. ... or they can be placed into .s/.asm assembly file and compiled into .o/.obj and then linked to your program. ... or you can swap text strings in already compiled executable and correct their addresses in corresponding instruc- tions. If an executable file is not packed/obfuscated, this is possible. Aside of order of text strings, you can try to hack a linker and reorder object files in the final executable. Of course, no one cares about its order. And go figure out, what is hidden there. Surely, hidden data can be encrypted, checksum or MAC can be added, etc. 53
  • 60. Other ideas to consider: reorder functions and fix all addresses, reorder basic blocks within a function, register allo- cator hacking, etc. Links I find helpful in understanding factorial number system and Lehmer code, aside of Wikipedia: • https://gist.github.com/lukmdo/7049748 • https://github.com/scmorse/permutils/blob/master/src/permutils.js • http://2ality.com/2013/03/permutations.html • http://www.keithschwarz.com/interesting/code/factoradic-permutation/FactoradicPermutation 4.5 De Bruijn sequences; leading/trailing zero bits counting 4.5.1 Introduction Let’s imagine there is a very simplified code lock accepting 2 digits, but it has no ”enter” key, it just checks 2 last entered digits. Our task is to brute force each 2-digit combination. Naïve method is to try 00, 01, 02 ... 99. That require 2*100=200 key pressings. Will it be possible to reduce number of key pressings during brute-force? It is indeed so, with the help of De Bruijn sequences. We can generate them for the code lock, using Wolfram Mathematica: In[]:= DeBruijnSequence[{0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, 2] Out[]= {6, 8, 6, 5, 4, 3, 2, 1, 7, 8, 7, 1, 1, 0, 9, 0, 8, 0, 6, 6, 0, 5, 5, 0, 4, 4, 0, 3, 3, 0, 2, 7, 2, 2, 0, 7, 7, 9, 8, 8, 9, 9, 7, 0, 0, 1, 9, 1, 8, 1, 6, 1, 5, 1, 4, 1, 3, 7, 3, 1, 2, 9, 2, 8, 2, 6, 2, 5, 2, 4, 7, 4, 2, 3, 9, 3, 8, 3, 6, 3, 5, 7, 5, 3, 4, 9, 4, 8, 4, 6, 7, 6, 4, 5, 9, 5, 8, 5, 6, 9} The result has exactly 100 digits, which is 2 times less than our initial idea can offer. By scanning visually this 100-digits array, you’ll find any number in 00..99 range. All numbers are overlapped with each other: second half of each number is also first half of the next number, etc. Here is another. We need a sequence of binary bits with all 3-bit numbers in it: In[]:= DeBruijnSequence[{0, 1}, 3] Out[]= {1, 0, 1, 0, 0, 0, 1, 1} Sequence length is just 8 bits, but it has all binary numbers in 000..111 range. You may visually spot 000 in the middle of sequence. 111 is also present: two first bits of it at the end of sequence and the last bit is in the beginning. This is so because De Bruijn sequences are cyclic. There is also visual demonstration: http://demonstrations.wolfram.com/DeBruijnSequences/. 4.5.2 Trailing zero bits counting In the Wikipedia article about De Bruijn sequences we can find: The symbols of a De Bruijn sequence written around a circular object (such as a wheel of a robot) can be used to identify its angle by examining the n consecutive symbols facing a fixed point. Indeed: if we know De Bruijn sequence and we observe only part of it (any part), we can deduce exact position of this part within sequence. 54