For the past eight months or so, I've been working part time as a contributor to the OpenDreamKit project , funded by the EU through their Horizon2020 initiative.
One of the deliverables we had for that project was a parallel version of the quadratic sieve for factoring large integers (typically in the 20-90 decimal digit range).
The executive summary is that this work has now been committed to the Flint repository  and is usable (see the qsieve_factor function in the qsieve module).
The starting point was an old implementation of the quadratic sieve in Flint which only worked for small integers, and which was very fragile, written mostly by me.
A Google Summer of Code student Nitin Kumar did some preliminary work on it. I've finished off that implementation and parallelised it, tuned it and made it more robust, as part of the OpenDreamKit project.
- trial division
- Euler's method
- Fermat's method
- One line factor
- Lehman's algorithm
- p-1 method
- p+1 method
- Elliptic Curve Method (ECM)
- Pollard's rho
- Quadratic sieve
- General number field sieve
- Special number field sieve
In the range 20-90 decimal digits, there are two methods that one wishes to use. If the number has small factors that were not found by trial division or Pollard's Rho, there is the ECM method. But if one is fairly sure that all that remains are two large factors prime factors, i.e. n = pq, then one usually makes use of the quadratic sieve method.
Difference of squares
As for many factoring methods, the quadratic sieve ultimately expresses the number n to be factored
as a difference of squares n = a^2 - b^2 = (a - b)(a + b).
As can be seen from the difference of squares formula, this factors n if a - b is not 1.
The quadratic sieve makes use of a variant of this idea, namely to find two integers a and b such that a^2 = b^2 mod n.
If a and b are not the same modulo n (and differ by more than 1 modulo n) then we have (a - b)(a + b) = wn for some integer w.
This doesn't guarantee we factor n, but about half the time it will do so.
Instead of looking for values a such that a^2 is a square of another integer b modulo n, we look for many values of a such that a^2 has lots of small prime factors. We then multiply these together to build up a square.
For example, if
a1^2 = 2*3*7^2 mod n
a2^2 = 2*7 mod n
a3^2 = 3*7 mod n
then (a1*a2*a3)^2 = (2*3*7^2)^2 mod n, and we can take a = a1*a2*a3 and b = 2*3*7^2. Then we will have a^2 = b^2 mod n as required.
Expressions such as the ones above are called relations, and the small primes that appear on the right side of those relations make up what is called a factor base.
The idea is to pick a range, say [2, 10000] which the factor base primes are allowed to be in, and then try random values ai^2 modulo n to see if they factorise fully over the factor base.
Note that if all the values a1, a2, etc are taken in the range [sqrt(n), sqrt(2n)) then their square will be in the range [n, 2n), which means that their squares can be reduced modulo n simply by subtracting n.
In other words, we consider the values a1^2 - n, a2^2 - n and so on.
The quadratic sieve
The quadratic sieve is a practical improvement on Dixon's method.
Firstly, we note that if a1^2 - n is divisible by the factor base prime p, then so is (a1 + p)^2 - n. Secondly, we note that there is only one other value i less than p for which (a1 + i)^2 - n is divisible by p.
In short, if I want to find all the values i in some range [-M, M] say for which i^2 - n is divisible by a factor base prime p, I just need to find the two values i in the range [0, p) which work, and all the others will be a multiple of p away from those two.
This leads to a useful algorithm that can be implemented on a computer. First find the two square roots of n modulo p, to get the initial two values of i, then set an array up in memory, with all the values in the range [-M, M]. From the two initial values of i, stride off by steps of p in each direction and mark off all the other values of i for which i^2 - n will be divisible by p.
If we do this for some big array V = [-M, M] for each of the factor base primes p in the factor base range, F = [2, N] say, then we can just look at all the entries in the array A and see which ones correspond to values that are highly divisible by factor base primes.
In other words, we can find lots of values a1, a2, a3, etc, in V = [-M, M] such that a1^2 - n, a2^2 - n, a3^2 - n are fully divisible by factor base primes.
This process of striding off by steps of p in a big array V = [-M, M] to cheaply mark off all entries corresponding to values divisible by p is known as sieving.
There are many, many improvements that can be made to the naive quadratic sieve above. We list some of them here:
The naive sieve essentially looks for solutions to x^2 - n = 0 mod p, where p is a factor base prime. But note that if x is just a little larger than sqrt(n) the value x^2 - n will be very small. But as soon as x gets too big, x^2 - n will also be large. The probability that it completely factors over the factor base F = [2, N] diminishes, and it becomes harder and harder to find relations.
To get around this, we use polynomials other than x^2.
There's a few different methods of doing this. The first method is to choose only those values of x for which x^2 - n is divisible by a known prime q, say. These lie in an arithmetic progression.
We choose q to be larger than the factor base primes. Although this makes the corresponding values of x^2 larger, we know in advance that q^2 will be a factor, which we can remove, bringing things down to about the same size as for the naive x^2 - n case.
By switching the value of q, we get to look for relations amongst fairly small values, for each different value of q.
A second technique is to use polynomials (Ax + B)^2 - n for different values of A and B.
If we choose the values A and B such that n = B^2 - AC for some integer c, then the polynomials become A^2x^2 + 2ABx + B^2 - B^2 + AC = A(Ax^2 + 2Bx + C). In other words, we are guaranteed a factor of A that we can remove.
The game then becomes finding relations amongst small values of Ax^2 + 2Bx + C.
There are heuristics for exactly how large the coefficient A should be to give the optimal speed in the algorithm.
Sometimes it turns out to be more efficient to find a factor of kn for some small multiple k than for the original number n we are trying to factor. The Knuth-Schroeppel algorithm helps find such a multiplier k.
To avoid just finding a bad factorisation, we always take the greatest common divisor with n instead of kn in the final step of the algorithm. For example, if we have kn = a^2 - b^2, we take the greatest common divisor of a - b with n to avoid getting an extraneous factor of k in our final factorisation.
Self initialisation (hypercube method)
In switching the polynomials Ax^2 + 2Bx + C mentioned above, we have to find solutions to Ax^2 + 2Bx + C modulo p for each factor base prime. This can be quite expensive every time we switch polynomial.
The self initialisation method saves additional time by giving us many cheap switches of B for each value of A that is used.
The way this is done is to make the value A a product of small factor base primes. If there are s such factor base primes, then it turns out that there are 2^s values of B that can be used and which are cheap to compute. Moreover, the solutions modulo p that we need to compute for each factor base prime p can be computed from one another very cheaply, using some precomputed information.
Large prime variant
We can allow partial relations which are a product of factor base primes and one larger prime. We compute many of these partial relations, and whenever two are found with the same large prime, we multiply the partial relations together, which gives us the equivalent of one full relation.
With some graph theoretic techniques, one can also allow two large primes per partial relation. However, this technique is only useful for very large factorisations. It is even possible to use three large primes, but this is typically only useful on very large clusters for extremely large factorisations.
Small prime variant
It turns out that a nontrivial proportion of the time spent sieving is used to sieve with very small factor base primes p. These hit the sieve much more often than larger primes and so take a disproportionate amount of time for the value they contribute to the sieving process.
The small prime variant skips sieving with these primes and only checks divisibility by these primes for factorisations that are very close to being relations.
Block Lanczos linear algebra
The problem of finding a combination of relations to multiply together to give a square can be achieved by representing the problem as a linear algebra problem over GF2. If a prime occurs to an odd power in a relation, the appropriate location in the GF2 matrix is set to 1, otherwise it is set to 0. The problem then becomes one of looking for kernel vectors of this matrix over GF2.
The Block Lanczos algorithm can be used to find such kernel vectors quite efficiently.
This method is similar to the self-initialisation method, except that the coefficient A of the polynomial is chosen to have one large factor q, which is increased every time the polynomial is switched. This helps avoid duplicate relations in an easy way and lets one use the same product of small primes over and over in the coefficient A.
This is another technique designed to help with the selection fo the coefficients A of the polynomials. The possible factor base primes that could make up the factors of the coefficient A are divided into two sets, those with odd indices and those with even indices in the array of factor base primes.
All but one of the factors is taken from the first set, then the final prime is chosen from the other set such that the product gives the most optimal value of A.
There are many other tricks that can be used to cut the time used to test whether something that looks like a relation really is a relation. There are also lots of strategies for combining large prime partial relations into full relations.
There are also numerous tricks associated with the processor cache. Rather a lot of precomputed information is stored for each factor base prime. The sieve interval itself may also be larger than the processor cache. Things can be dramatically sped up by breaking such large blocks of information into smaller blocks and processing small blocks at a time. Each small sieve block is processed against each small block of factor base primes, and so on.
These days, there are also many opportunities to use SIMD intrinsics, e.g. to search the sieve interval for highly composite values which might be relations.
We also make use of precomputed inverses to do divisions and reductions modulo small values, e.g. by factor base primes.
Parallelising the relation sieving
The relation sieving in the quadratic sieve can be relatively easily parallelised. As the sieving per polynomial is largely independent of the sieving for any other polynomial (with the exception of the file handling for the large prime partial relations), it is quite easy to parallelise the sieving portion of the quadratic sieve.
In Flint we use OpenMP to parallelise the relation sieving. We combine the Carrier-Wagstaff and Bradford-Monagan-Percival (and of course the self-initialisation) methods to create polynomials that can be switched easily without too many duplicate relations. We then sieve for different polynomials in different threads.
There are many points at which the quadratic sieve can either abort early, or fail, especially if tuning values aren't set quite right. In our Flint implementation, if we fail to find a factor of a number n, we restart the sieve with a larger factor base. This shouldn't ever happen in practice if the tuning values are set just right, but it is possible in theory.
We also increase the size of the factor base if something goes wrong, for example, we ran out of polynomials before generating enough relations, and so on. Again, these problems should not happen in practice, but they do happen if wildly incorrect tuning values are supplied.
The code to implement all of the above is now merged into Flint and has been really thoroughly tested and tuned for number up to about 90 digits.
In summary, the implementation is relatively competitive from about 130 bits onward.
Some tables giving timings for one and multiple threads are given in our final report for the EU  which is included at the start of the GitHub issue for that project.
It takes about 10 years to really implement a world class quadratic sieve implementation. We would have a long way to go if our aim was to have world beating performance. But for a general purpose library, what we have right now is actually pretty reasonable. There are lots of things we could improve. These include:
- Further cache efficient handling of large arrays in the sieve
- Implementing double and triple large prime variants
- Parallelising the file handling for large relations
- Implementing a quadratic sieve which handles small factorisations (50-130 bits) efficiently