What are Prime Numbers and why are they so cool? |
Importance of Prime Numbers in Cryptography |
Trial Division |
Wilson's Theorem |
Fermat's Little Theorem |
Miller-Rabin Test |
Other tests |
Chapter 1 of 7
For layman's sake, I'm going to start off by clarifying what prime numbers are, trial division to detect one and their basic application in cryptography. Feel free to jump on to the sections on primality testing, if you're only looking for those.
By definition,
A prime number is a natural number that has no positive divisors other than 1 and itself.
Meaning -
A prime number is a number that has exactly two factors : 1 and itself.
For example, 14 is not a prime number because it has two additional factors (numbers that multiply together to form 14) - 2, 7, besides 1 and itself. Hence it's not prime but the opposite of it - composite.
5, however, is a prime. It has no other factors except one and itself. So do 7, 13, 19, 23 and so on.
From the definition, it's pretty easy to conclude that for testing whether or not a given number is prime, simply look for it's factors. If there are more than two inevitable factors (1 and the number itself), the given number is not a prime.
Let's try to do that with a random number. Say, 199.
From our discussion above, 199 has two obvious factors - 1 and 199 itself. If 199 has no other factors, it's a prime. Else, it's composite.
So all we gotta do is -
If we find even one single case, where the remainder is zero, it means 199 has a third factor apart from 1 and itself, making it a non-prime, composite number.
That's 197 instances of division starting at 2 and ending at 198 -
Case 1 : Division by 2
Case 2 : Division by 3
Case 3: Division by 4 -
We can skip this. Because if 4 was a factor of 199, 2 would have been too - which it isn't.
That said, we can skip all multiples of 2 - all even numbers.
Case 4: Division by 5 -
Case 5: Division by 7 -
Case 6: Division by 9 -
We know 9 is a multiple of 3 and if it were a factor of 199, 3 would be too - which it isn't.
This means we can skip 9 and all other multiples of 3.
Same is true with all multiples of 5, 7,11, 13..and all other prime numbers.
So for our basic primality test, it's enough if we divide 199 by all prime numbers between 1 to 199. So we jump on straight to 11, skipping 9 and 10...
Case 7: Division by 11 -
Case 8: Division by 13 -
Case 9: Division by 17 -
Did you observe something?
The quotient has been reducing as we go ahead in our division streak and divide 199 with bigger numbers. If you go further head, the quotients will become even more smaller.
For a moment if we assume 199 is not prime, there must be a divisor 'x' and a quotient 'q' with no remainder -
which means -
meaning, the number we are dividing 199 by (x) and the quotient that comes out (q) are both, factors of 199.
But in case 9 above, neither 17(the divisor) nor 11(the quotient) are factors of 199.
Given that the quotient will go on decreasing below 11, case 9 onwards, we know there aren't any factors of 199 below 11 (we checked them all). Since both the quotient and divisor need to be factors of 199 for it to be composite, we need not check for any number greater than 17 in this case (because the quotient will always be lesser than 11).
So, it's quite legit to say that we can stop our trial division once the quotient becomes smaller than the divisor.
Finally, we can save ourselves from several other attempts of division and safely say that 199 is a prime.
Imagine using the above method to test whether the number
is prime or not.
There's no slick trick to check fast enough, whether or not a large number is prime. So much so that finding large prime numbers has been an obsession for mathematicians.
Also, the very fact, that finding large prime factors of a number is hard, is the basis of cryptography which is fundamental to cyber security.
I'll try to walk you through how this works briefly, so you get a ballpark idea and are able to appreciate prime numbers even more.
Chapter 2 of 7
Everytime you shop online, you are required to send your bank login details through to your bank, for completing the transaction.
Encryption makes sure no one can get access to your bank details and steal from you. We're not going to dive deep into it, but here's how encryption basically works -
Unless someone knows what P1 and P2 are, they simply cannot decrypt and get access to your bank details.
Here's the fun part now - both P1 and P2 are prime numbers - really large ones! And anyone who gets access to those, can decrypt and read your details.
So can't someone do that? Practically, not quite. You see, the component 'C' in the public key, is basically -
While it's easy for a computer to calculate P1 times P2 and generate a C, it's practically very, very cumbersome and time-taking for even the fastest ones to work this simple equation backwards and find the prime factors of C, ie. P1 and P2.
With the current computing power, it could take a computer program years to find these monster prime factors, P1 and P2, of the composite number C used in the public key.
Anyway, this is what fundamentally makes encryption the most powerful tool in modern day cyber security.
Interestingly enough, in 1992, Peter Shor came up with an algorithm that takes in a number N and spits out it's prime factors. The catch is that Shor's algorithm can only run on a quantum computer (that's if we could build one with sufficient number of qubits and if it can work without succumbing to noise or other quantum decoherence phenomena).
Now that you have an idea how cool prime numbers can get, here's a short video for you to dive a little more deeper into encryption using prime numbers -
Next, we jump on to some popular methods for testing prime numbers and a possible algorithm we can write (in python) for each of those.
Chapter 3 of 7
We discussed all about testing whether or not a number is prime using trial division - which could get fairly frustrating.
No matter how frustrating, this method is deterministic in detecting a prime (resulting in100% in accuracy), contrary to most methods which are probabilistic in nature.
And so, if I had to write a program which could take in any number N and spit out the boolean - "True" only if N is prime, i'd write this :
def isPrime(N):
if N < 2 : return 'ValueError'
for i in range(2,N): # all numbers from 2 to N-1
if N % i == 0: # i is third factor of N
return 'Composite'
return 'Prime'
Let's try to make the range (2, N) shorter here, so our program has a comparitively shorter runtime for slightly bigger numbers too.
Let's assume for a moment that N is composite. In that case, it has factors p and q -
Now, if
then clearly,
Which is not true! Because it contradicts -
This implies -
Meaning, if we keep the range of our function between 2 and sqrt(N), we should definitely get one of it's factor (q or n) in case it's a composite number. And that's all we really need to prove it's not prime.
With that, we drastically reduce the amount of work our program would have to do.
The function could be written as -
#python 3
?
def isPrime(N):
if N < 2: return 'ValueError'
for i in range(2, int(N**(1/2))+1): #for every instance in range 2 to sqrt(N)
if (N % i) == 0:
return 'Composite'
return 'Prime'
This function is more efficient and lesser time exhaustive than the previous one, simply because it has lesser numbers to operate division on.
To demonstrate this, let's just start a timer function in the beginning of our primality test and stop it right before the test returns some value.
Here's the required code for the shorter range function (same can be applied to the long range function) -
#python 3
?
from datetime import datetime #import datetime module
?
def isPrime(n):
start=datetime.now() #start timer
if n < 2 : return 'ValueError'
for i in range(2, int(n**(1/2))+1):
if (n % i) == 0 :
print(datetime.now()-start) #end timer and return difference
return 'Composite'
print(datetime.now()-start)
return 'Prime'
?
I used both functions to check the primality of the 10,000th prime number -
The function will take N = 104729 as an argument and print the runtime before returning the boolean True.
Runtime for function with range(2, N) :
Runtime for function with range(2, sqrt(N)+1) :
That's about 300 times faster!
Let's discuss other testing methods and see what's the runtime like for a computer program.
Chapter 4 of 7
Wilson's theorem comes right out of the number theory and is a deterministic (resulting in100% in accuracy) method to find primes. It states that -
A natural number p, such that :
is prime, if and only if :
I'm not going to dive deeper and demonstrate a proof here, but we'll check for this theorem to confirm 7 as a prime, then jump to writing the function for it and calculating the runtime.
For p = 7, Wilson's Theorem gives us -
which implies -
Now, lets start with the left side of the equation.
If 7 is prime (which - spoiler - it is!), then the right side of the equation must be equal to 6. And is it?
Sure, is. And so, 7 is prime.
So let's write a function that could perform a primality test on a number using Wilson's theorem...
#python 3
?
def isPrime(n):
if n < 2: return False
fac = 1
for i in range(2,n):
fac = (fac * i) % n
if (fac == (n-1)):
return 'Composite' # true if (n-1)! mod n = n-1, that is (n-1)! = -1 mod n
else :
return False
?
The runtime for this is quite poor. Testing for the same number - 104,729, the runtime was -
That's half as fast, compared to trial division using range (2,n).
So I'd still prefer the trial division test with range (2, sqrt(n)+1) for a primality test, which is about 300 times faster than the other two we have tried.
Chapter 5 of 7
Fermat's theorem states -
Here's a sweet proof to this.
Primality testing using Fermat's theorem (pronounced : " Fermaa's ") is probabilistic in nature. This means you can never be 100% sure if 'p' is prime.
Let me demonstrate this test using a prime number 5. According to Fermat's little theorem, all integers (a) between 1 and 5 (including 1 and 5) will give a number -
which will be divisible by 5. Let's start with 1 -
Looks like it works perfectly.
Let's try another number - 341. Brace for some huge numbers -
Wait. What just happened there?
While applying Wilson's theorem to 341, when we have :
But when we have :
And we actually do have factors of 341 -
proving that it is a composite number.
And that's why we call 341 a pseudoprime. Because it passes Fermat's primality test for base 2.
Interestingly, in the above example, 2 is called Fermat's liar because it suggests 341 is a prime and 3 is called Fermat's witness because it proves 341 is composite.
Key takeaway here is that -
While applying Fermat's theorem for primality testing, a number has to fail the test just once to be ruled out as composite - with confidence.
We call these Carmichael numbers.
561 is the first carmichael number. And believe it or not, it passes Fermat's test for every base - meaning :
And yet -
Carmichael numbers are quite rare to find though. In the first 25 billion numbers, there are only 2183 numbers that are charmichael.
These caveats, where Fermat's test may tell us a number is prime while it may just be a Pseudoprime or Carmichael, are what make Fermat's test probabilistic in nature.
Nevertheless, let's see how we can increase the probability of our result to be maximum and write a program for this.
Let's start off by simplifying the testing equation so we can easily write a logic around it. We need to check -
Now, if 'p' is prime, it doesn't share a factor with 'a' and the greatest common divisor - GCD (a,p) is always 1. Applying the cancellation law, we can re-write the above equation as -
Now, we can build a logic for our function much more easily. Here's what our function will do -
Now, to increase the probability of our result to be accurate, we can repeat the test 'k' times for several random values of 'a'. This way, it'll be less unlikely that 'a' would turn out to be Fermat's liar each time.
So the program would take 2 arguments - the number to test (p) and the number of times we want the test to repeat (k) with random 'a' values. Here's the code -
#python 3
?
from math import gcd
import random
?
def isPrime(p,k):
for i in range(k):
a = random.randrange(2,p) #pick a random integer 1<a<p
if gcd(a,p) != 1: #check if GCD is 1
return 'Composite'
elif (pow(a, p-1, p) != 1) : # check if pow(x, y) % z or not --> computed more efficiently with this notation
return 'Composite'
return 'Prime'
?
Do you notice how the number of tests our function has to perform does not scale with the number, unlike trial division?
This is key to Fermat's primality test performance. It's runtime is drastically lower than trial division test for large numbers.
I calculated the runtime for testing the prime -
Here's what I got with trial division (short range) :
And here's the runtime for Fermat's test with 100 repetitions (k = 100)
About 30 times faster. And this will increase further for bigger and bigger numbers as per the relation -
where 'k' is number of times we test and 'p' is the number that's being tested.
Fermat's test is fast, no doubt. But the probabilistic nature and the chance of encountering carmichael numbers could make one question it's reliability.
Chapter 6 of 7
This test works as fast as Fermat's test with an added bonus of detecting carmichael numbers!
Here's how it works -
To test a number 'p' for primality, choose a number -
Now let there be a number 's' an odd number 'd', such that -
If the following holds true -
'p' is composite and 'a' is the witness of it. Else, 'p' is probably prime.
This clearly means that the Miller-Rabin test is contrapositive to testing a prime. It tells you whether a number is composite or not. If not, it 'probably' is a prime.
We've got -
Next, we need to find an 's' and 'd' such that -
We can write 30 as -
Having found 's' and 'd', we can now pick a random 'a', such that -
I'll work here with a = 2 for the sake of simplicity.
Now, if for 31 to be composite, each of the following must hold true -
which can be written as -
Let's get on to testing the first inequivalence. Computing the left side, we got -
And the right side would be -
So we get -
Hence, 31 failed the very 1st test for being composite. And so, it's probably a prime number.
Miller-Rabin is, again, a probabilistic primality test. And since it involves a base variable 'a', by running the test 'k' times, we can increase the probability of our outcome to be more accurate.
Here's what we'll need to do in our code -
Here's what we have -
import random
?
def isPrime(p,k):
if p < 2 : return 'Enter a number greater than 1'
if 1 < p < 4 : return 'Prime'
if p % 2 == 0 : return 'Composite' # check for p --> even
# get values of s and d that satisfy (p-1)=(2^s)*d
s = 0
while (p-1) % (2**s) == 0: # keep increasing s until d becomes a fraction and then take the value of d at (s-1)
s+=1
d = (p-1)/(2**s)
d = (p-1)/(2**(s-1)) # d is an odd whole number now, as required.
for i in range(k):
a = random.randrange(2,p) # pick a random 'a'
for r in range(0,s):
if ((a**d) % p) != (1 % p) and (a**(d*2**r) % p) != (-1 % p): return 'Composite' #test conditions
return 'Probably Prime'
?
As mentioned earlier, the runtime for this is almost equal to Fermat's test.
Let's discuss some other cool tests in the next section.
Chapter 7 of 7
This is quite similar in performance as the Rabin-Miller test but slightly less accurate. To test a number 'p', find an integer 'a' < 'p'.
If the expression -
then 'p' is composite and 'a' is the witness of it.
Lucas test is another probabilistic, contrapositive primality test. It tells you whether a number is 'p' composite. If not, it may be prime.
So this test is dependent on the Lucas sequence. It's similar to the Fibonacci sequence with the only difference that the second element is 3, instead of 1.
Now let me demonstrate how this test works -
For 5 to be prime, pick the 5th element in the sequence (11) and take 1 from it -
All we need to do now is to check if 10 is a multiple of 5. If it is, then 5 may be prime. If not, it's composite.
In this case -
This is a deterministic test to test out any Mersenne Prime!
Mersenne prime numbers are primes that can be expressed as one less than some power of two.
For example -
Now this test, is dependent on a sequence called the Lucas-Lehmer sequence. Where every number is 2 less than the square of it's predecessor -
Although the series gets monstrous very quickly, the test is fairly simple. For any mersenne number 'N' -
find the (p-1)th element in the Lucas-Lehmer sequence and see if 'N' divides that element. If it does, 'N' is definitely prime. If not, it's definitely composite!
Let's check if 7 - a mersenne number is prime or not. 7 can be written as -
So we need to check if the (p-1)th element ie. the 2nd element, which equals 14 is multiple of 7...
Here's how we do it -
If 'N' is prime, you will reach a point in the series where an element is 0. And you would have found 'N' to be prime without dealing with any monster number.
This is probably the most efficient deterministic test and one that a lot of mathematicians are excited about. It has a fairly okay runtime and it's 100% accurate.
So, to test 'P' we start off by building a polynomial -
If all coefficients of equation 1 are divisible by 'p', then 'p' is 100% prime.
Let's test p = 3.
The polynomial should be -
Clearly, the coefficient 3 is divisible by p = 3.