Originally published January 7, 2018 @ 2:43 pm
I have no intention of competing with GIMPS. To me this is a fun scripting exercise that may produce useful results beyond this initial application. Our goal is to take a bunch of integers and pick out the prime numbers. There are a couple of simple ways to spot a prime number from shell. One is to use perl (or python, or java, etc.) with the awesome “/^1?$|^(11+?)\1+$/” regex (Abigail, 1998).
perl -wle 'print "Prime" if (1 x shift) !~ /^1?$|^(11+?)+$/' [number]
Another way is to use the “factor” utility (part of the coreutils package) that decomposes integers into prime factors. You can wrap that command in a small script called “primefind”:
#!/bin/bash if [ ! "" ] ; then exit 1 ; else n="" ; fi if [ `BC_LINE_LENGTH=0 bc <<<"${n}" | factor | awk '{print NF}'` -eq 2 ] ; then echo "${n}" ; fi
Here are examples of using both methods to find all prime number between 1 and 2^4:
# for i in $(seq 1 `BC_LINE_LENGTH=0 bc <<<"2^4"`) ; do echo $i | primefind $i; done 2 3 5 7 11 13 # for i in $(seq 1 `BC_LINE_LENGTH=0 bc <<<"2^4"`) ; do if [ `perl -wle 'if ((1 x shift) !~ /^1?$|^(11+?)+$/) { print 0 } else { print 1 }' $i` -eq 0 ]; then echo $i; fi; done 2 3 5 7 11 13
As far as performance is concerned, the second method is considerably (about three times) faster. The next step is to increase the range, but here we may run into our first unexpected result: a Perl bug with the regex iteration limit. The bug is known, but seems to re-appear in various Perl versions, including some of the latest:
for i in $(seq 67867333 67867967) ; do if [ `perl -wle 'if ((1 x shift) !~ /^1?$|^(11+?)+$/) { print 0 } else { print 1 }' $i` -eq 0 ]; then echo $i; fi; done Complex regular subexpression recursion limit (32766) exceeded at -e line 1.
The Perl bug aside, this elegant regex approach will eventually grind to a halt as it tries every possible factorization as a pattern match. The regex makes n attempts to match a string of length n against each given pattern. And, as unary expansion, this gives us 2^(2n).
Consider the following shell script (geirha, 2008):
#!/bin/bash n=$1 function sqrt() { local i=1 j=1 n=$1 while (( i*i <= n )) do ((j=i, i++)) done echo "$j" } function isprime() { local i=3 n=$1 h=0 (( n >= 2 )) || return (( n == 2 || n == 3 )) && return (( n % 2 == 0 )) && return 1 h=$(( $(sqrt $n) + 1 )) while (( i < h )) do (( n % i == 0 )) && return 1 (( i += 2 )) done return 0 } if isprime $n ; then echo $n ; fi
While a bit more complex than the Perl regex, this approach is a lot faster for large integers. We can couple it with xargs to take advantage of multi-core CPUs:
time seq 67857333 67867967 | xargs -n1 -P`grep -c processor /proc/cpuinfo` /var/adm/bin/prime3.sh ... real 0m20.389s user 10m14.600s sys 0m23.247s
Going back to the factor utility, here’s a much faster implementation:
time seq `BC_LINE_LENGTH=0 bc <<<"2^15"` `BC_LINE_LENGTH=0 bc <<<"2^16"` | xargs -n100 -P`grep -c processor /proc/cpuinfo` factor | awk '{gsub(":","")}; NF==2{print $1}{}' ... real 0m0.238s user 0m0.460s sys 0m1.035s
You may get even better time using parallel and passing multiple arguments per line (option -N). The advantage of parallel over xargs in this case is the ability to distribute load across multiple compute nodes on the network.
seq `BC_LINE_LENGTH=0 bc <<<"2^15"` `BC_LINE_LENGTH=0 bc <<<"2^16"` | parallel --gnu --will-cite -j10% -P10% -N256 factor | awk '{gsub(":","")}; NF==2{print $1}{}'
With very large numbers, seq becomes useless:
seq `BC_LINE_LENGTH=0 bc <<<"2^64"` `BC_LINE_LENGTH=0 bc <<<"2^110"`|more 18446744073709551616 18446744073709551616 18446744073709551618 18446744073709551620 18446744073709551620 18446744073709551620 ...
Brace expansion will not work either:
echo {1..10} 1 2 3 4 5 6 7 8 9 10 echo {18446744073709551644..18446744073709551656} {18446744073709551644..18446744073709551656}
Shell arithmetic is out of the question as well (and, contrary to popular belief, the answer to life, the universe and everything is not 42, but it’s close.):
i=18446744073709551656 echo $i 18446744073709551656 ((i++)) echo $i 41
These are all unfortunate consequences of the range limitation of a 64-bit signed long int. To a point, the answer is to stick with bc, until you hit this limit:
BC_LINE_LENGTH=0 bc <<<10^10^10 Runtime error (func=(main), adr=14): exponent too large in raise
The dc doesn’t work either:
dc <<< '10 10000000000^pq' Runtime error: exponent too large in raise 1
And we can’t use Bash exponentiation:
var=$((10**10**10)) echo $var 0
Don’t be angry with Unix: considering that the observable universe consists of about 1011 galaxies, 1021 stars, 1078 atoms, and 1088 photons, the 10^10^10 limit of Bash seems rather generous. Luckily, here is Hypercalc, either Perl or JavaScript version. The large-number limit of Hypercalc in Knuth up-arrow notation is 10↑↑(1010):
yum -y install perl-Time-HiRes ; mkdir -p /var/adm/bin ; wget -O /var/adm/bin/hypercalc.pl "http://mrob.com/pub/comp/hypercalc/hypercalc.txt" ; chmod 755 /var/adm/bin/hypercalc.pl ; ln -s /var/adm/bin/hypercalc.pl /usr/bin/hypercalc
But what to do with it is a subject for later.
According to Wikipedia, the largest known prime number is 274,207,281 − 1 found by GIMPS in January of 2016. This is still well within bc and factor limitations, so, theoretically at least, running the following command will eventually confirm this finding:
BC_LINE_LENGTH=0 bc -l <<< 2^74207281-1 | factor | awk '{gsub(":","")}; NF==2{print "prime"}{}'
However, how long would this take? Well, 274,207,281 − 1 contains 22,338,618 digits, which puts it between 10^10^7 and 10^10^8 in the following attempt to estimate approximate time required to run the command above:
for i in 1 2 3 4 5 6 7 8 9 ; do echo "------------------" ; echo "10^10^$i" ; time -p BC_LINE_LENGTH=0 bc -l <<< 10^10^$i | factor >/dev/null 2>&1 ; done
The time I got for each iteration on a 2.9GHz Xeon E5-2690 and 96GB of RAM is as follows:
x seconds 10^10^1 0.01 10^10^2 0.01 10^10^3 0.01 10^10^4 0.03 10^10^5 1.11 10^10^6 101.66 10^10^7 yawn... 10^10^8 yawn... 10^10^9 yawn...
As you can see, I had no patience to get past 10^10^6. Still, based on those few data points and assuming exponential growth (a good assumption in this case), I got y = 1.151806818·10-8 e3.815837901 x , where x is 10^10^x and y is estimated time to execute “bc <<< 10^10^x | factor”. Using this equation, we can see about how long it would’ve taken my computer to confirm that 274,207,281 − 1 is in fact a prime number:
x seconds 10^10^7 4589.45 10^10^8 208429.00 10^10^9 9465770.00
So, I would have needed somewhere between 1.3 hours to 2.4 days, but closer to hours. And that’s just to confirm a known prime number. As you may imagine, actually finding one in that range would require lots of CPUs or an equivalent amount of luck.
So what about the elusive 10^10^10 – how long would it take me to confirm a prime number in that range, when one is inevitably found? Well, the Wolfram Alpha I’ve been using to solve the above equation also fails at that point and probably for the very same reason. The aforementioned Hypercalc, however, has no such problem:
hypercalc "1.151806818*10^(-8)*e^(3.815837901*10)/60/60/24/365" 13.631588907193
So, just over thirteen-and-a-half years.
Experienced Unix/Linux System Administrator with 20-year background in Systems Analysis, Problem Resolution and Engineering Application Support in a large distributed Unix and Windows server environment. Strong problem determination skills. Good knowledge of networking, remote diagnostic techniques, firewalls and network security. Extensive experience with engineering application and database servers, high-availability systems, high-performance computing clusters, and process automation.