The impacts of known prime generation patterns

I always believed that prime numbers don't follow any clear pattern, but yesterday I learned from this page [1] that any prime number bigger than 3 can follow the pattern (6*n+1) and/or (6*n -1), where n is any natural number.


The authors indicate that they verified this theory by generating 1000000 prime numbers using Matlab. Therefore, it seems that there is no official proof of this theory, but also nothing proves that it is an incorrect theory. 


Let's suppose that this theory is mathematically proven. Then, two questions can be raised:

  1. Question1: Can this theory remove the need to primality tests?
  2. Question2: Can this theory allow us to factorize natural numbers efficiently?

Before I handle these questions, I want to verify first the prime generation theory. I wrote a Python function that allows finding the count of prime numbers within a range of numbers and comparing this function with another function that follows the brute force method to generate the prime numbers.


Here are the functions. The first function applies the prime number generation theory, while the second checks for every possible number (i.e., the brute force method).


def myfunc():

....:  i=0

....:  for n in range(1, 100000000, 1):

....:    q=6*n+1

....:    p=6*n-1

....:    if(is_prime(q)):

....:        i=i+1

....:    if (is_prime(p)):

....:        i=i+1

....:  print("number of primes=",i)                                                                                                                                


def myfunc2():

....:  i=0

....:  for n in range(5, 600000001, 1):

....:   if(is_prime(n)):

....:     i=i+1

....:  print("number of primes=",i)                                                                                                                          


Both functions use the same primality test function implemented by sage. But the brute force function will take longer as it tests every possible number in a range of natural numbers. The example below shows the count of prime numbers generated by myfunc that follows the generation pattern and the time taken to execute the function. The example shows that we find 31324701 prime numbers in the range of 600000001.


sage: %time myfunc()

number of primes= 31324701

CPU times: user 1min 34s, sys: 457 ms, total: 1min 35s

Wall time: 1min 35s


We can see below that the brute force function generates the exact count of prime numbers, but it takes longer to execute.


sage: %time myfunc2()

number of primes= 31324701

CPU times: user 5min 3s, sys: 1.37 s, total: 5min 5s

Wall time: 5min 5s


Let's now handle our both questions.


Question1: can this theory remove the need to primality test functions?

Because prime generation theory gives us a pattern to generate a prime number, we may say that we don't need a primality test function anymore. However, the problem with the prime generation theory is that it doesn't give us an exact prime pattern. Instead, it states that it can be either 6*n +1 or 6*n-1 or both. Thus, there is some uncertainty about whether the generated number is prime or not. 


To demonstrate this issue, I have written the following Python function that generates numbers with the patterns 6*n +1 and 6*n-1. In addition, I have used the primality test function of sage math tool to test whether the generated number is a prime or not. I aim to measure the success percentage to provide some idea about the uncertainty degree.


sage: def primegenerator(ra):

....:     primes=[]

....:     noprimes=[]

....:     for i in range (3,ra):

....:         pr=6 * i + 1

....:         pr2= 6 * i -1

....:         if(is_prime(pr)):

....:             primes.append(pr)

....:         else:

....:             noprimes.append(pr)

....:         if(is_prime(pr2)):

....:             primes.append(pr2)

....:         else:

....:             noprimes.append(pr2)

....:     print("number of primes",len(primes))                 

....:     print("number of no primes",len(noprimes))              

....:     print("success percentage",                           len(primes)/(len(noprimes)+len(primes)))                             


I have then generated the list of primes of different ranges and shown the success percentage, whether the generated numbers are prime or not. The example below shows that the success percentage decreases when we increase the scope of tested numbers.


sage: primegenerator(1000)

number of primes 777

number of no primes 1217

success percentage 0.38966900702106316

sage: primegenerator(10000)

number of primes 6050

number of no primes 13944

success percentage 0.30259077723316996

sage: primegenerator(100000)

number of primes 49091

number of no primes 150903

success percentage 0.24546236387091613

sage: primegenerator(1000000)

number of primes 412843

number of no primes 1587151

success percentage 0.2064221192663578

sage: primegenerator(10000000)

number of primes 3562108

number of no primes 16437886

success percentage 0.17810545343163603

sage: primegenerator(100000000)

number of primes 31324697

number of no primes 168675297

success percentage 0.15662348969870468


Consequently, this primary generation theory can not allow obtaining prime numbers with certainty, so we still need the primality test functions.


Question2: Can we factorize natural numbers efficiently if we know the pattern of prime numbers? 


Theoretically, we should be able to reduce the time needed to factor a natural number into its primes because the pattern of the prime numbers is known.


We know that the RSA algorithm depends on the fact that factorization of the modulus N is hard. I want to check whether this can be made easier if we use the pattern of prime numbers (6*n+1) and/or (6*n-1). 


The modulus N is calculated by selecting two random prime numbers, p and q, as follows:


N=p*q 


N is public, but p and q must be kept secret. When p and q are big enough, it isn't easy to factor N and finds p and q. But can the prime generation theory makes it easier ? 


As per the theory, p and q have the pattern (6*n+1) and/or (6*n-1). It means that N can take the following forms: 


N=(6*k+1)(6*j+1)

Or

N=(6*k+1)(6*j-1) 

Or

N=(6*k-1)(6*j+1)

Or

N=(6*k-1)(6*j-1) 


We can break RSA if we can find k or j. 


Because N is public, it is possible to predict the number of digits of p and q because all RSA implementations select the same number of digits for p and q. So if the number of N has 8 digits, I can expect p or q to be at least 4 digits.


Knowing the number of digits for p and q allows me to define the scope of the loop. I have written the following function that tries to factorise N  based on some expectations of the size of p and/or q.


Here is my function. 


 def myfunc10(n,size):

....:     l=(10**size)-1

....:     d=10**(size-1)

....:     for i in range(d,l):

....:         if(n%(6*i-1)==0):

....:             break

....:         if(n%(6*i+1)==0):

....:             break

....:     if(i==l-1):

....:         print("No prime can be found")                                                                                                      

....:     else:

....:         print("your prime factor is either:",i*6+1, "or/and:",i*6-1)                                                                       


It should be noted that there is no need to start the loop from 1 because the minimum i depends on the size of the expected factor. So if p equals to 3606277 (7 digits), we need to start the loop from 10^5, not from 1. However, this will not significantly reduce the complexity of the function.


Let's test it. First, I generate the random prime number P=32099 (5 digits). q also has 5 digits and equals to 87679. So N will have 10 digits. Then, I will test my function with the size of factors equal to 4 and 5. As we can see, I'm able to find the factors.



sage: p=random_prime(1000000)

sage: p

32099

sage: q=random_prime(1000000)

sage: q

87679

sage: N=p*q

sage: N

2814408221

sage: myfunc10(N,4)

your prime factor is either: 32101 or/and: 32099

sage: myfunc10(N,5)

your prime factor is either: 87679 or/and: 87677


However, my function will not allow determining the factors of big natural numbers because it has an exponential complexity O(10^n). Typically, q and p must have at least 150 digits, and N must have at least 300 digits. Finding p or q requires a loop of size 10^150, which is not feasible.  


Consequently, knowing the prime generation pattern is not making the factorization problem easy to solve.






References


Comments

Popular posts from this blog

ChatGPT or CheatGPT? The impact of ChatGPT on education

What does information security really mean?

Abstract Mathematical Explanation with Examples for Neural Networks