Hello again everybody,
Have not posted around in a while. Things are getting a little busy as the first week of semester went by. In the meantime, I blogged about the curious properties of Domino Tilings and their Markov Chain connection which is another subject I am enrolled in this semester. Hope, that account is also accessible to most of you; I would definitely suggest you to have a look at that post - the combinatorial ramifications for Markov Chains are great. And, if you ever come to Georgia Tech, you should definitely consider sitting through that class. Prof Randall is teaching that course and she is a great teacher.
So, anyways today I am going to walk you through some more problems in Lattice Based Cryptography. The blog is certainly lagging a little behind what I had thought; I hope I would get around that within this week, and if not, the week next to this one.
So, lets tackle another problem from Homework 2 of Oded Regev's.
Q-5) Find a basis $b_1, b_2,...,b_n$ such that after we apply one reduction step of the LLL algorithm to it, the maximum length of a vector in it increases [even by as much as $\Omega(\sqrt n)$].
(This is Roy Kasher's solution. In retrospect, this solution seems very easy; I think it indeed is - only I could not find it [:(] )
Ans) So, what do you think? Any matrix (rather, set of basis vectors) that springs to your mind? Notice that you are gonna apply LLL algorithm's first step.
That would impact the length of all the basis vectors. So, we would try to keep some vectors in our original basis which have got loads of zero components (not all). We hope that the steps of LLL that involve "reducing" vector $b_i$ in the sense of subtracting appropriate multiples of vector $b_j$ for $j < i$ will jitter these non-zero entries and these jitters combined would make some vector really large.
This rough intuitive idea is captured readily by some experiments in the following basis.
$\begin{pmatrix}1&3/4&0&...&0\\0&1&3/4&...&0\\...&...&...&...&...\\...&...&...&...&...\\0&0&...&1&3/4\end{pmatrix}$
Now the nice thing about this matrix is it is (nearly) in Gram Schmidt Orthogonal form already - after all the GSO matrix in this case is the identity matrix. Length of all the columns are $\Theta(1)$.
After LLL reduction step, the vector $b_n$ undergoes some intersting gyrations which we gleefully watch (because it solves the problem). The $i^{th}$ column now becomes $\begin{pmatrix} \pm1/4\\ \pm1/4\\...\\ \pm1/4 \\1\\0\\0\\...\\0\end{pmatrix}$ where $1$ is present in the $i^{th}$ location.
Q-6) Show an algorithm that solves SVP exactly in time $2O(n^2).poly(D)$ where $n$ is the rank of the lattice and $D$ is the input size.
The problem comes with a hint which I think is fair to be reproduced here.
Hint: show that if we represent the shortest vector in an LLL-reduced basis, none of the coefficients can be larger than $2^{cn}$ for some $c$.
Again, I had to look up Kasher's solution, though this time I had a few ideas of my own. I am just giving myself some solace that I was probably close to the solution myself.
Ans) Well, I am a bit tired. Please allow this post to get a little more delayed.
Monday, January 18, 2010
Wednesday, January 6, 2010
Drills on LLL algorithm
Hello again,
Today I intend to discuss some problems from Regev's 2nd homework on the subject. I am indebted to Oded Regev, Nir Bitansky and Roy Kasher for these solutions. A special thanks to you all for letting me use your solutions. I will give due credit to the solvers when I use their solutions (and not mine).
So, first lets prove that a $\delta$-LLL reduced basis with $\delta = 3/4$ satisfies the following properties. (This is the $4th$ problem in the homework. The questions are in bold and are italicised).
4-a) Prove the following
$||b_1|| \leq 2^{(n-1)/4}*(det \Lambda)^{1/n}$
Do you see how to do it? Easy. Just make use of fact that $||b_1|| = ||b_1^*||$.
And that $(\delta-1/4)*||b_i^*||^2 \leq ||b_{i+1}^*||^2$
For $\delta = 3/4$, this means that $||b_i^*||^2 < 2* ||b_{i+1}^*||^2$.
Extending this, we quick get $||b_1|| = ||b_1^*|| \leq 2^{(i-1)/2}||b_i^*|| \forall i$
In particular, $||b_1^*|| \leq ||b_1^*||$
$||b_1^*|| \leq \sqrt2||b_2^*||$
and, $||b_1^*|| \leq 2||b_3^*||$
and so on upto $||b_1^*|| \leq 2^{(n-1)/2}||b_n^*||$
Multiplying all inequalities, we get $||b_1^*||^n = ||b_1||^n \leq det(\Lambda)*2^{(n*(n-1))/4}$
$\Rightarrow ||b_1|| \leq det(\Lambda)^{1/n}*2^{((n-1))/4}$
Hence proved.
4-b) Next consider this question
For any $1 \leq i \leq n, ||b_i|| \leq 2^{(i-1)/2}*||b_i^*||$
Why is that true? You remember the representation of basis vectors in Gram-Schmidt representation, right?
Let me write it down for your convenience.
$\begin{pmatrix}||b_1*||&\mu_{2,1}||b_1*||&...&\mu_{n,1}||b_1*||\\0&||b_2*||&...&\mu_{n,2}||b_2*||\\...&...&...&...\\...&...&...&...\\...&...&...&...\\0&0&...&||b_n*||\end{pmatrix}$
I hope you can see why the solution write itself. Pick $b_i$'s representation from this matrix.
You know that $||b_i||^2 = ||\mu_{i,1}b_1^*||^2 + ||\mu_{i,2}b_2^*||^2 + ... + ||\mu_{i,i-1}b_{i-1}^*||^2 + ||b_i^*||^2$
But, the Basis is $3/4$-LLL reduced. From the first condition on a base to be LLL reduced, we know that all $|\mu_{i,j}| \leq 1/2$
Thus, $||b_i||^2 \leq 1/4||b_1^*||^2 + 1/4||b_2^*||^2 + ... + 1/4||b_{i-1}^*||^2 + ||b_i^*||^2$
And, we also notice that $||b_l^*||^2 \leq 2^{(m-l)/2} ||b_m^*||^2$ for $l \leq m$. Thus, we have
$||b_i||^2 \leq ||b_i^*||^2 + \sum_{j=1}^{j=i-1}2^{i-j-2}||b_j^*||^2$
and you can work out that this implies the inequality in question.
4-c) Prove the following
$\Pi_{i=1}^{i=n}||b_i|| \leq 2^{(n*(n-1)/4)}*det(\Lambda)$
DO you remember the statement of 4-b). This one follows directly. (why?)
The homework also says the following -
Remark: the quantity $\Pi_{i=1}^{i=n}||b_i||/det(\Lambda)$ is known as the orthogonality defect of the basis; to see why, notice that it is 1 iff the basis is orthogonal; it can never be less than one by Hadamard’s inequality.
4-d) Prove the following
For any $1 \leq i \leq j \leq n$, $||b_i|| \leq 2^{(j-1)/2}||b_j^*||$
Well what do you think? What can you tell me using 4-b). ||b_i|| \leq 2^{(i-1)/2}*||b_i^*||$, right? And what do you know about the relation between $||b_i^*|| and ||b_j^*|| for $i \leq j$.
$||b_i^*|| \leq 2^{(j-i)/2}||b_j^*||$
Combine these two to get
$||b_i|| \leq 2^{(j-1)/2}||b_j^*||$ which settles the question.
4-e) Here is another. Prove
For any $1 \leq i \leq n$, $\lambda_i(\Lambda) \leq 2^{(i-1)/2}||b_i^*||$
Hmmm...what should we do now. Perhaps get some sleep. But this is easy. Lets finish $4^{th}$ problem altogether and then sleep (or take a break or whatever).
So, whats your idea? You have got to say something about the $i_{th}$ successive minima. It is less than some multiple of $||b_i^*||$. How to create the desired terms? Maybe you could show a relation between $\lambda_i$ and $||b_i||$ and then use our lovely 4-b)
That looks a bad prospect. We do not as of yet know of a direct connection between the two. Lets make some observations and keep the definition of $ith$ successive minima in your head.
You know that $\forall j \leq i$, $||b_j|| \leq 2^{(i-1)/2}||b_i^*||$ from 4-d). Thus, all these $i$ vectors (which are clearly independent) are less than the claimed RHS. In particular, the RHS is an upper bound on the length of $ith$ successive minima - which is precisely the statement we set out to prove.
4-f) Prove that
For any $1 \leq i \leq n$, $\lambda_i(\Lambda) \geq 2^{-(n-1)/2}||b_i||$
Hmm..I was not able to solve this problem on my own. Below, I invoke Roy Kasher's solution. He notes that
"By LLL property, for $i \eq j$, $||b_i^*|| \leq 2^{(j-i)/2}||b_j^*||$. That is, $||b_j^*|| \geq 2^{(i-j)/2}||b_i^*||$. Using 4-b), we obtain $||b_j^*|| \geq 2^{-(j-1)/2}||b_i||$."
So far so good. Now he fits in the missing piece (or the piece that I missed).
From previous homework (or in our case previous post), we know that $\lambda_i \geq min_{j=i...n}||b_j*||$.
Now everything falls into place. As Kasher goes onto note,
"$\lambda_i \geq min_{j=i...n}||b_j*|| \geq min_{j=1}^n 2^{-(j-1)/2}||b_i|| = 2^{-(n-1)/2}||b_i||$"
There are some more problems to come up. Wait till I get around to do that.
Thanks and Have a Good day
Today I intend to discuss some problems from Regev's 2nd homework on the subject. I am indebted to Oded Regev, Nir Bitansky and Roy Kasher for these solutions. A special thanks to you all for letting me use your solutions. I will give due credit to the solvers when I use their solutions (and not mine).
So, first lets prove that a $\delta$-LLL reduced basis with $\delta = 3/4$ satisfies the following properties. (This is the $4th$ problem in the homework. The questions are in bold and are italicised).
4-a) Prove the following
$||b_1|| \leq 2^{(n-1)/4}*(det \Lambda)^{1/n}$
Do you see how to do it? Easy. Just make use of fact that $||b_1|| = ||b_1^*||$.
And that $(\delta-1/4)*||b_i^*||^2 \leq ||b_{i+1}^*||^2$
For $\delta = 3/4$, this means that $||b_i^*||^2 < 2* ||b_{i+1}^*||^2$.
Extending this, we quick get $||b_1|| = ||b_1^*|| \leq 2^{(i-1)/2}||b_i^*|| \forall i$
In particular, $||b_1^*|| \leq ||b_1^*||$
$||b_1^*|| \leq \sqrt2||b_2^*||$
and, $||b_1^*|| \leq 2||b_3^*||$
and so on upto $||b_1^*|| \leq 2^{(n-1)/2}||b_n^*||$
Multiplying all inequalities, we get $||b_1^*||^n = ||b_1||^n \leq det(\Lambda)*2^{(n*(n-1))/4}$
$\Rightarrow ||b_1|| \leq det(\Lambda)^{1/n}*2^{((n-1))/4}$
Hence proved.
4-b) Next consider this question
For any $1 \leq i \leq n, ||b_i|| \leq 2^{(i-1)/2}*||b_i^*||$
Why is that true? You remember the representation of basis vectors in Gram-Schmidt representation, right?
Let me write it down for your convenience.
$\begin{pmatrix}||b_1*||&\mu_{2,1}||b_1*||&...&\mu_{n,1}||b_1*||\\0&||b_2*||&...&\mu_{n,2}||b_2*||\\...&...&...&...\\...&...&...&...\\...&...&...&...\\0&0&...&||b_n*||\end{pmatrix}$
I hope you can see why the solution write itself. Pick $b_i$'s representation from this matrix.
You know that $||b_i||^2 = ||\mu_{i,1}b_1^*||^2 + ||\mu_{i,2}b_2^*||^2 + ... + ||\mu_{i,i-1}b_{i-1}^*||^2 + ||b_i^*||^2$
But, the Basis is $3/4$-LLL reduced. From the first condition on a base to be LLL reduced, we know that all $|\mu_{i,j}| \leq 1/2$
Thus, $||b_i||^2 \leq 1/4||b_1^*||^2 + 1/4||b_2^*||^2 + ... + 1/4||b_{i-1}^*||^2 + ||b_i^*||^2$
And, we also notice that $||b_l^*||^2 \leq 2^{(m-l)/2} ||b_m^*||^2$ for $l \leq m$. Thus, we have
$||b_i||^2 \leq ||b_i^*||^2 + \sum_{j=1}^{j=i-1}2^{i-j-2}||b_j^*||^2$
and you can work out that this implies the inequality in question.
4-c) Prove the following
$\Pi_{i=1}^{i=n}||b_i|| \leq 2^{(n*(n-1)/4)}*det(\Lambda)$
DO you remember the statement of 4-b). This one follows directly. (why?)
The homework also says the following -
Remark: the quantity $\Pi_{i=1}^{i=n}||b_i||/det(\Lambda)$ is known as the orthogonality defect of the basis; to see why, notice that it is 1 iff the basis is orthogonal; it can never be less than one by Hadamard’s inequality.
4-d) Prove the following
For any $1 \leq i \leq j \leq n$, $||b_i|| \leq 2^{(j-1)/2}||b_j^*||$
Well what do you think? What can you tell me using 4-b). ||b_i|| \leq 2^{(i-1)/2}*||b_i^*||$, right? And what do you know about the relation between $||b_i^*|| and ||b_j^*|| for $i \leq j$.
$||b_i^*|| \leq 2^{(j-i)/2}||b_j^*||$
Combine these two to get
$||b_i|| \leq 2^{(j-1)/2}||b_j^*||$ which settles the question.
4-e) Here is another. Prove
For any $1 \leq i \leq n$, $\lambda_i(\Lambda) \leq 2^{(i-1)/2}||b_i^*||$
Hmmm...what should we do now. Perhaps get some sleep. But this is easy. Lets finish $4^{th}$ problem altogether and then sleep (or take a break or whatever).
So, whats your idea? You have got to say something about the $i_{th}$ successive minima. It is less than some multiple of $||b_i^*||$. How to create the desired terms? Maybe you could show a relation between $\lambda_i$ and $||b_i||$ and then use our lovely 4-b)
That looks a bad prospect. We do not as of yet know of a direct connection between the two. Lets make some observations and keep the definition of $ith$ successive minima in your head.
You know that $\forall j \leq i$, $||b_j|| \leq 2^{(i-1)/2}||b_i^*||$ from 4-d). Thus, all these $i$ vectors (which are clearly independent) are less than the claimed RHS. In particular, the RHS is an upper bound on the length of $ith$ successive minima - which is precisely the statement we set out to prove.
4-f) Prove that
For any $1 \leq i \leq n$, $\lambda_i(\Lambda) \geq 2^{-(n-1)/2}||b_i||$
Hmm..I was not able to solve this problem on my own. Below, I invoke Roy Kasher's solution. He notes that
"By LLL property, for $i \eq j$, $||b_i^*|| \leq 2^{(j-i)/2}||b_j^*||$. That is, $||b_j^*|| \geq 2^{(i-j)/2}||b_i^*||$. Using 4-b), we obtain $||b_j^*|| \geq 2^{-(j-1)/2}||b_i||$."
So far so good. Now he fits in the missing piece (or the piece that I missed).
From previous homework (or in our case previous post), we know that $\lambda_i \geq min_{j=i...n}||b_j*||$.
Now everything falls into place. As Kasher goes onto note,
"$\lambda_i \geq min_{j=i...n}||b_j*|| \geq min_{j=1}^n 2^{-(j-1)/2}||b_i|| = 2^{-(n-1)/2}||b_i||$"
There are some more problems to come up. Wait till I get around to do that.
Thanks and Have a Good day
Sunday, January 3, 2010
LLL algorithm and its impact on Lattice Problems
(Please note that in this blog as well, when I say that consider a vector $\lambda_i$, I actually want you to consider a vector of length $\lambda_i$. And by the way I normally mean lengths in $l_2$ norm. Can you please keep this information in your mind as you go along; otherwise things might seem a little, or way too much, strange)
So, finally we are inching towards our target. Next, we seek to attack the big guns - the LLL algorithm. It has a bearing upon nearly all aspects of lattice based cryptography. We will in particular investigate the connection with SVP and another problem whose name has not been mentioned so far in this blog - CVP or the Closest Vector Problem.
This post will majorly delve upon 3 different topics. First, we would talk about LLL algorithm, the ideas leading to its development, the notion of reduced basis and what is it good for. Next, we would talk a bit about SVP and CVP sketching a little more background about the problems. Finally, we will see what LLL has to say about these problems. These are the 3 major topics discussed here but they are not in any particular order. In fact, there will be a zig-zagging through these 3 major areas throughout this post.
So, lets get started. First I will tell develop a little of the historical context behind the development of the LLL algorithm. In early 1980's, H.W. Lenstra (not HomeWork Lenstra!) was working on a problem which asked him to determine whether or not a given triangle (in $\RR^2$) with rational end points contained a lattice point. He began with a vague idea of saying that "fatter triangles will contain" a lattice point; "skinnier ones might not" and developed quite a wonderful solution.
In his own words "The solution essentially consists of denying that "special" triangles exist. If the triangle K looks a bit weird, why not apply a nonsingular linear transformation τ such that the triangle τ[Κ] looks better?" To be specific, he chooses τ so that the transformed triangle is equilateral. (Later on, he changed it to a right angled isosceles triangle) (I am just copy/pasting notes from his original paper - its extremely readable). He goes on to say "Clearly, $K$ contains an element of $Z^2$ if and only if the new triangle τ[Κ] contains an element of the lattice $L$ = τ($Z^2$) . If $e_1$, $e_2$ are the standard basis vectors of $\RR^2$, then τ$(e_1)$, τ$(e_2)$ form a basis for $L$ in the sense that $L = Z*τ(e_1) + Z*τ(e_2)$."
Lenstra, thus notes "The problem has now been shifted from the triangle to the lattice." You just need to find the basis for this lattice which can be done in quite a few ways in poly time. (Maybe this statement needs explanation. See previous posts for this). Then he invokes the idea of covering radius which basically tells him the minimum radius a circle sitting in the plane with a lattice point at its center must have so that the series of all such circles spans the $2-d$ plane. The crucial thing is - this covering radius can be calculated and we are okay with using any polynomial upper bound on the covering radius.
Thus, after having found one such value, Lenstra just checks if the triangle is big enough to inscribe a circle with radius greater than equal to the above value. In case, the triangle is big enough - good, end of story, the triangle did contain a lattice point. In case, the triangle is not big enough, there are a series of lines parallel to the one basis vector hitting the triangle. There are a polynomial number of such lines. Just check for these lines, one at a time, if someone among them manages to get one of their lattice points inside the triangle.
This idea can be immediately generalised to higher dimensions which is where the rudimentary steps towards LLL first developed from. The problem goes like this - given a convex body (defined by a set of linear inequalities) in higher ($n \geq 3$) dimensions you are asked to determine if it contains a $n$-dimensional lattice point.
Lenstra's idea goes like - transform the convex body into a "cuter convex body". Look for lattice points with the given basis inside the transformed body the following way. Find the covering radius (or some polynomial upper bound on covering radius). Determine whether the transformed body can inscribe a $n-dimensional$ sphere. If it can, some lattice point is there inside the body. If it cannot, the body is hit by span of vectors apart from $b_1$ (which form a hyperplane). These hyperplanes are equi-spaced at gaps of $b_1$. Just see whether some of these hyperplanes contain the desired lattice point.
What led to the discovery of the algorithm was, of course, the transformation step. Lenstra found that in effect, his approach kept on improving the angle between the incident vectors by pushing them closer to being orthogonal. He wrote to Lovasz about his concerns who later came up with the classic reduction scheme and its equally amazing analysis.
In fact, in the two dimensional case, this is precisely what the algorithm did. It turns out this two dimensional case was known long before. It had been solved by the prince of Mathematics, Carl Friedrich Gauss. It is from Gauss's algorithm that we begin our tour of this topic. Our target is to "reduce" a lattice basis to another equivalent basis which makes some questions easier to answer.
What reduction scheme to follow depends on the situation you are in i.e., on the problem you are trying to solve. No one reduction scheme works "exactly" for all cases. So, we compromise and stay happy with LLL reduction which is "reasonable" to some extent in that it ensures that you do not loose too much. Gauss's algorithm, which we will outline shortly, is just a special case of LLL reduction scheme.
But first let me point out a few reduction schemes in advance which do not work. As you now know, the successive minima vectors do not determine a Basis for the lattice in general. In two-dimensional case, however, they do. So, basing the definition on the basis of successive minima vectors is out. Similarly basing it on orthogonalized set of vectors is out. Also, we have a bad luck in that a mutually orthogonal system of vectors need not be short enough as some problems (SVP or CVP) require. LLL algorithm however finds a nice way out which we will detail a little more shortly; but first its time we turned to Gauss's algorithm. (In what follows, we assume that we are talking $l_2$ norm; though Gauss's algorithm is easily extendable to other norms too.)
Gauss starts off by defining a condition for "reducedness" in 2 dimensions for the given basis vectors $b_1$ and $b_2$ which basically says that the basis is reduced if the diagonals of the parallelepiped determined by it are both larger than its sides. That is, $||b_1||, ||b_2|| \leq ||b_1 + b_2||, ||b_1 - b_2||$.
The definition is rooted in the fact that we know that for $2$ dimensional case any lattice Basis can be reduced to the basis with successive minima as its basis vectors. Now I will make clear the relation between reduced base and successive minima of the lattice.
In the given setting ($2-dimensional$ lattices), a basis is reduced if and only if the basis vectors are of length $\lambda_1$ and $\lambda_2$ in some order. How do we prove this statement? Well, you just need to keep the definition of reduced basis in mind and proceed straightaway for the forward direction. Do you see the forward direction proof yet? It just says that if your basis is $b_1$ and $b_2$ with $||b_1|| = \lambda_1$ and $||b_2|| = \lambda_2$ then this basis is reduced. Its too clear - the above statement looks like the proof itself.
Its backward direction which is some sport. It says that if you have a reduced basis, then the vectors $b_1$ and $b_2$ in it better have lengths $\lambda_1$ and $\lambda_2$. Why is that? First write out any lattice vector $Bx$ in this basis as $x_1*b_1 + x_2*b_2$ where $x_1$ and $x_2$ are integers. Assume without loss of generality that $||b_1|| \leq ||b_2||$. You want to show that any vector $Bx$ is larger than either of $b_1$ and $b_2$.
For this, you will also need the following easy to prove lemma that if $||x|| \leq ||x + y||$, then $||x + y|| \leq ||x + \alpha*y||$ where $\alpha > 1$. The lemma basically says that as you move further out along a line, so that a little motion increases your distance from some point $P$, further motion in that direction will increase your distance more from $P$.
Now, out of my laziness, and partly because I think I have more important things to discuss here in this blog, I would not complete this proof (unless you want me to).
Now we finally come to Gauss's algorithm. We will follow it up with its complexity analysis. Gauss's idea is very reminiscent of Euclidean algorithm for GCD calculation. You are invited to try spotting the similarities. The algorithm can be seen as a greedy algorithm which iteratively "shrinks" the longest vector in the given basis to find an equivalent one. It continues as long as the reduced vector becomes shorter than the other one and then it stops.
Note that in the algorithm below we subtract from $b_1$ a quantity close to $\mu_{2,1}$ times the vector $b_2$. Subtracting $b_2$ would have orthogonalized it, subtracting this much would sort of orthogonalize it which is another property a "dream basis" should have.
(The following has been taken from the book Algorithmic Cryptanalysis which contains a very good discussion of this algorithm)
Gauss's Algorithm
Require: Initial lattice basis ($b_1$, $b_2$)
if $b_1 < b_2$ then
Exchange $b_1$ and $b_2$
end if
repeat
Find integer $c$ that minimizes $b_1$ $-$ $cb_2$ by:
$c \leftarrow \lfloor \mu_{2,1} \rceil$
Let $b_1 \leftarrow b_1- cb_2$
Swap $b_1$ and $b_2$
until $||b_1|| \leq ||b_2||$
Output ($b_1$, $b_2$) as reduced basis
You can find $c$ quickly by binary search. The norm of the longest vector reduces at every step and so the algorithm terminates; the norm cannot go below $\lambda_1$. Further, each iteration of the algorithm clearly takes polynomial time. So, in order to show that the algorithm is a poly time algorithm, all we need to do is just to show that the total number of iterations is polynomial in size of the input. And it can be shown that this must be the case as every iteration decreases the size of the longest vector by $1/2$. I would not go into the details but just mention that it can be proved by first noticing that every iteration shrinks the longest vector by a multiplicative factor.
But in order to develop better intuition for what we want to discuss, it is nice to do the complexity survey the following way. Consider a variation of Gauss's algorithm which we call $t$-Gauss algorithm. (Here $t \geq 1$ is a new parameter to complicate Gauss's algorithm and to simplify our discussion when we talk about LLL).
t-Gauss's Algorithm
Require: Initial lattice basis ($b_1$, $b_2$)
Paramter $t \geq 1$
if $b_1 < b_2$ then
Exchange $b_1$ and $b_2$
end if
repeat
Find integer $c$ that minimizes $b_1$ $-$ $cb_2$ by:
$c \leftarrow \lfloor \mu_{2,1} \rceil$
Let $b_1 \leftarrow b_1- cb_2$
Swap $b_1$ and $b_2$
until $||b_1|| \leq t*||b_2||$ ==> The only step differing from Gauss's algo
Output ($b_1$, $b_2$) as reduced basis
The algorithm above is cute in the sense that it ensures that the shortest vector shrinks by a factor of t in all iterations except the last one.
Why? Think about it. After the swap step, $b_2$ is your new (shrunk) vector and the previous value of $b_2$ is stored in $b_1$. As long as you remain in the loop now, you must have $t||b_2|| \leq ||b_1||$. That is, $t$ times the new (shrunk) vector is less than the ||b_2|| that you began the round with (or equivalently shortest vector at the start of round, $b_2$, shrunk in size to $b_2$ at the start of next round.
At the risk of overexplaining which might obscure things, I will keep quiet and just state that by a similar reasoning you can see that the longest vector also must decrease in every round by a factor of $t$ (except maybe the first round). Also, you can notice that the longest vector in a round becomes the shortest vector in the next one. That is to say, the longest one shrinks by atleat a factor of $t$.
As a consequence, this algorithm has polynomial number of iterations upper bounded by O$((log(max (b_1, b_2))/t )$. You can derive an expression for number of iterations and by throwing in a bit of analysis, you can show that this algorithm is polynomial for $t=1$ which is the classical Gauss algorithm.
This wraps up a chapter (and if it does not, let me sweep it under the rug)
Next, we have come one more step closer to our target - starting LLL. But I did not tell you the expanded form, did I? It is Lenstra-Lenstra and Lovasz algorithm after its inventors.
So let me grab a coke and come back.
Great so I am back. Coke's gone too.
As I told you earlier, there is no single reduction scheme that works "exactly" for all cases in poly time. We are instead happy with a "reasonable" algorithm which works most of the time. The presentation here follows closely Oded Regev's lecture notes on the subject. We will also discuss a bit from this lecture. So, let me mention the three major characteristics that LLL algorithm shares with Gauss's algorithm. I will sketch some high level points first. It is instructive to keep them in mind as you study the algorithm, but I will repeat them when needed so you do not really need to worry.
1) Define a LLL reduced basis
2) Present an algorithm to find that basis
3) Analyse the running time.
First allow me to mention that now we are no longer talking full rank lattices. Lattices will, in general, have fewer (or equal) vectors as compared to the dimension.
Next, let me define the notion of reducedness we use. (Just reiterating. Keep in your mind that its the Basis which is being reduced; saying you are reducing a lattice does not make sense in our context)
We say that a basis $(b_1, b_2, b_3 ..... b_n) \in \RR^{m*n}$ is $\delta$-LLL reduced (for a parameter $\delta$ about which we will talk later) if the following holds true
1) $|\mu_{i,j}| \leq 1/2$ for all $i > j$ where $\mu_{i,j}$ are the Gram-Schmidt coefficients
2) for any consecutive vectors $b_i$, $b_{i+1}$ the following holds
$\delta$ times Projection of $b_i$ on span of $(b_i*, b_{i+1}*....b_n*)$ is less than or equal to Projection of $b_{i+1}$ on the same span.
You might be having some sense of deja vu, or more precisely you might be feeling that the 2nd requirement for a Basis to be $\delta$-LLL reduced can be written more clearly. (After all, isn't projection of $b_i$ on span of $(b_i*, b_{i+1}*....b_n*)$ something nice? Yes, it is. It is your Gram-Schmidt orthogonalized vector $b_i*$. Think about it. You took away from $b_i$ the components it would leave along $b_1, b_2....b_{i-1}$ by projecting it on the above span. This gives $b_i*$). More formally, we put it this way. We define a family of projection functions, $\pi_i(x)$, for each index $i$ which measures the projection of vector $x$ onto span $(b_i*, b_{i+1}*....b_n*)$.
It is given as $\pi_i(x)$ = $\sum_{j=1}^{j=n}($<$x, b_j^*$>$/$<$b_j^*, b_j^*$>$)$$b_j^*$
We then note $\pi_i(b_i) = b_i*$. Thus, the 2nd requirement for LLL reduction says that $\delta*b_i^* \leq b_{i+1}^*$ for all consecutive vectors $b_i$ and $b_{i+1}$.
Also, I would like you to notice these characteristics in Gauss's algorithm wherein you have
1) $\mu_{2,1} \leq 1/2$
2) $||b_1||| \leq ||b_2||$
You are invited to verify this.
Okay. I guess you would want a cleaner way to understand the first requirement for a basis to be LLL reduced. Do not fear, thats what we do next. Any guesses on how will I try making the content of this requirement clear? How do you think I will do that?
(Relax, its not actually a totally unthinkable way. Its just an alternative convenient representation which helps bind the ideas in your mind; I did too much about ado about nothing [:)] )
By representing the vectors in the "clean basis". We represent the basis vectors in the Gram-Schmidt Orthogonal system to achieve this. If you would recall, in this basis the original Basis vectors are written this way
$\begin{pmatrix}||b_1^*||&\mu_{2,1}||b_1^*||&...&\mu_{n,1}||b_1^*||\\0&||b_2^*||&...&\mu_{n,2}||b_2^*||\\...&...&...&...\\...&...&...&...\\...&...&...&...\\0&0&...&||b_n^*||\end{pmatrix}$
Quoting from Regev's lecture "Here, column $i$ shows co-ordinates of $b_i$ in orthonormal basis. The first condition in the definition of LLL-reduced basis guarantees that the absolute value of any off diagonal element is at most half the value written in the diagonal element of the same row. This can be written as"
$\begin{pmatrix}||b_1^*||&\leq1/2||b_1^*||&...&\leq1/2||b_1^*||\\0&||b_2^*||&...&\leq1/2||b_2^*||\\...&...&...&...\\...&...&...&...\\...&...&...&...\\0&0&...&||b_n^*||\end{pmatrix}$
For illustrating the $2nd$ condition consider the $2*2$ submatrix formed with the upper left entry at index $(i,i)$.
$\begin{pmatrix}||b_i^*||&\mu_{i+1,i}||b_i^*||\\0&||b_{i+1}^*||\end{pmatrix}$
This property just ensures that the $2nd$ column in the above submatrix is almost as long as the first one.
Now I request you to revisit the conditions for a Basis to be LLL-reduced. Keep those in mind for the following discussion which shows how closely LLL is related with SVP (Shortest Vector Problem). It can be used to give an approximate answer to this problem in higher dimensions. The approximation factor is exponential but nonetheless, it is a tremendous achievement. To make the discussion concerete, we would set $\delta = 3/4$.
Assume we have a LLL reduced basis which, by definition, satisfies the conditions I asked you to keep track of.
We know that :
$\delta*||\pi_i(b_i)||^2 \leq ||\pi_i(b_{i+1})||^2$
$\Rightarrow \delta||b_i||^2 \leq ||b_{i+1}^* + \mu_{i+1,i}b_i^*||^2$
$= ||b_{i+1}^*||^2 + ||\mu_{i+1,i}b_i^*||^2$
$= ||b_{i+1}^*||^2 + \mu_{i+1,i}^2||b_i^*||^2$
$ \leq ||b_{i+1}^*||^2 + 1/4*||b_i^*||^2$
Rearranging terms, we get
$(\delta - 1/4)*||b_i^*||^2 \leq ||b_{i+1}^*||^2$
which can be rewritten as
$||b_i^*||^2 \leq \alpha ||b_{i+1}^*||^2$ where $\alpha = \delta - 1/4$
For $\delta = 3/4$, we get $\alpha = 2$. And thus, we have $b_i$ is atmost twice as long as $b_{i+1}$.
And finally we obtain $\lambda_1 \geq min(b_i) \geq 2^n||b_1||$.
Neat! isn't it?
Now, I give you the algorithm (from Regev's notes)
INPUT: Lattice basis $b_1,b_2...,b_n \in Z^n$
OUTPUT: $\delta$-LLL-reduced basis for L(B)
Start: compute $b_1^*, b_2^*,..,b_n^*$
Reduction Step:
for $i = 2$ to $n$ do
for $j = i-1$ to $1$ do
$b_i \leftarrow b_i - c_{i,j}bj$ where $c_{i,j} =$ <$b_i, b_j^*$>$/$<$b_j^*, b_j^*$>
Swap Step:
if $\exists i$ s.t. $\delta||b_i||^2 > ||\mu_{i+1,i}b_i^* + b_{i+1}^*||^2$ then
$b_i \leftrightarrow b_{i+1}$
goto start
Output $b_1, b_2,...,b_n$
Thus, you can see that LLL algorithm is trying out an "extension" (or more precisely relaxation) of Gauss's algorithm in which we want condition 1 satisfied for all $\mu_{i,j}$'s and for condition 2, apart from the factor $\delta$, we ensure that the final reduced base is "cute" in the sense that its Gram-Schmidt Orthogonalization ensures that $b_i$ is not much bigger than its predecessor (its atmost $1/\delta$ times longer) and when this chain winds up all the way to $b_1^*$, it says that $b_1 = b_1^*$ is not too long.
At a high level, you can visualize LLL algorithm as an effort at combining Gram-Schmidt Orthogonalization with Gauss reduction for full rank lattice of dimension $2$. Quoting from Algorithmic Cryptanalysis, "it [LLL] considers the lattice generated by the orthogonal projection of two consecutive vectors $b_i$ and $b_{i+1}$ on the vector subspace generated by the previous vectors $(b_1, b_2,...,b_{i-1})$ and applies one iteration of t-Gauss reduction to this projected lattice. In order to also modify the high dimension lattice, all swap and translation operations are pulled back from the projected space and applied to the vectors $b_i$ and $b_{i+1}$ themselves. In addition, these vectors may also be changed by adding to them some vector from the sublattice with basis $(b_1, b_2,...,b_{i-1})$. The goal of this operation is to make sure that $b_i$ is not too far from the corresponding projected vector $b_i^*$".
You have been with me this far. Please have a little more patience, once you get the ideas right; you will still be the same; but you will like the subject more. You may need a break; but before that reflect upon the fact that we have completed the first two items in the list
1) Define a LLL reduced basis
2) Present an algorithm to find that basis
3) Analyse the running time.
The last one remains. And the end of the story is absolutely rewarding.
Had your break? That was fast. So, we are now down to the last point wherein we analyze the run time of the algorithm we developed. But first, its not even clear that the algorithm terminates. And it it does, does it indeed compute a $\delta$-LLL reduced basis correctly?
So lets dive right in. Say the algorithm terminates. How do we determine whether the output be correct? Thats easy, just check if the output basis satisfies the two conditions for a basis to be $\delta$-LLL reduced.
Condition $2$ is trivially satisfied. The algorithm was so scared to death by this condition, that it wont stop unless this condition is satisfied. (See the Swap step). Condition $1$ is also satisfied. This is because of the way algorithm operates at every $b_i$. For each $b_i$, the algorithm makes knowledge of the Gram Schmidt representation of the Basis vectors to ensure that all $\mu_{i,j}$ for $j < i$ are less than $1/2$ in the reduction step. This is achieved by subtracting suitable amounts of $b_j$'s for $j < i$ from $b_i$. Note that the order (from i-1 to 1)is crucial.
So, if we do not have anything to swap, we are home. If we have - ah those bad days. Now we would have to recompute Gram-Schmidt Orthogonalized system for the new basis. It would involve re-doing the Reduction Step and the Swap Step. Bad days.
But wait you say. Do I not need to recompute the Gram-Schmidt Orthogonalized system after every iteration of the reduction step? The answer is no. After all, such operations, $b_i \leftrightarrow b_i + ab_j$ never disturb the orthogonalized system. It stays the same.
Does the algorithm stop? You know it does. (Otherwise you would not be studying it.) We prove a stronger statement here by demonstrating not just that it terminates, but by showing the time that it takes to terminate.
Now, the last piece. Run time.
We have to show that the run time is polynomial in input size. To this end, we will first bound the number of iterations. Next, we will bound the running time of a single iteration.
For every iteration performed by the algorithm, one swap step is used. That is to say, that the number of iterations is same as the same as the number of swap operations. How do we bound this number?
In fact, we are going to see the idea that Lovasz used in his analysis, but it is instructive to see the more general model (or that dreadful word, paradigm) his approach for analysing this algorithm falls into. Below, we are going to use a sort of "amortized analysis" for bounding the number of iterations this algorithm goes through. We are reverse-engineering the process. We know the algorithm terminates. We want to find out a bound for this. In algorithmic studies, we often find it convenient to associate with every intermediate state of the algorithm a number which indicates the distance from the target. Without loss of generality, we assume that to begin with the target is at zero and it "stays" at that value (this might depend on the application - corresponding to a stupid move in some cases, the target might move further away. In our case however, we will choose a target and will craft intermediate steps carefully enough so that target stays at the same value.) Demonstrating that this target always decreases gives you an answer for the fact that the algorithm does terminate. Demonstrating that every "careful algorithmic step" decreases target by a particular amount gives a bound on the run time of the algorithm.
So, what do you think we choose as target in this application? What about your intermediate checkpoints? Indeed, your intermediate checkpoint is everytime you orthogonalize the system or every new iteration, but what do you associate an integer value with? And again with what do you associate your target value?
This is where Lovasz steals the ball from most of the mortals. See if you like his analysis. Since, Lovasz knows that the algorithm swaps adjacent vectors when they are out of order, he knows this step must be improving upon the "distance" to the target. So, we should associate a number with some quantity which "captures this information". Hmm, Enough fuss...what should target be anyway? Ah! dont be impatient. Think. Once you well move past a vector $b_{i+1}$, the effect of subsequent vectors on $b_{i}$ vanishes. Something tells us that we should associate an integer with the current Basis. (It is the obvious choice!). Further, its nice to associate it with the determinant of the basis as in its orhtogonalized form, determinant will be a piece of cake. But it has to capture what happens at a swap step. So, it will be like a product of determinants.
Without any more ado, we notice that the integer can be associated with the following product $\Pi_{i=1}^ndet(\Lambda_k)^2$ where $\Lambda_k$ is the lattice spanned by the first $k$ basis vectors. Since swapping ensures that "out of order" vectors obey $2nd$ condition of $\delta$-LLL reduced basis, we also get the reduction of determinant value as a free bonus!
We are decreasing the integer associated with the basis at every iteration! That too by a multiplicative factor. Thats an achievement. We can see that the algorithm has got something like $ln(d_0)*C$ iterations for some constant $C$ and some $d_0$ which we define shortly.
To see that what I said above is indeed happening notice the following ration after the $ith$ step. Assume that at the beginning of this iteration, $d_i$ is the value associated with the basis and after the swap, it becomes $d_i$'.
$d_i$' is the determinant of the lattice $\Lambda_i$' where $\Lambda_i$' $= L[b_1, b_2,...., b_{i-1},b_{i+1}]$. Consider the ratio $d_i/d_i$'. The determinants for $j < i$ are not modified and cancel out. The determinants for $j > i$ are same in absolute value. (Swap 2 columns). Thus, the only thing that stand uncancelled is, when $j = i$.
Thus, $d_i$'$/d_i = det(\Lambda_i$'$)/det(\Lambda_i)$
$ = det([b_1,....,b_{i-1}, b_{i+1}])^2/det([b_1,...,b_i])^2$
$ = ((\Pi_{j=1}^{j=i-1})^2*||\mu_{i+1,i}b_i^* + b_{i+1}||^2)/\Pi_{j=1}^{j=i}||b_j||^2$
$ = ||\pi_i(b_{i+1})||^2/||\pi_i{b_i}||^2$
$ \leq \delta$ by definition of $\delta$-LLL reduced basis
Thus, we notice, that after every swap, we inch closer to the target. The factor of inching closer is $1/\delta$. And thus the claim the number of iterations is $ln(d_0)/ln(1/\delta)$ is vindicated.
Now, we also have to bound the run time of a single iteration. How do you think we should go about it? Well, since we intend to find an upperbound on the number of steps, lets take a pessimistic approach. The number of arithmetic operations performed in each iteration is clearly polynomial. Thus, in order to prove a polynomial upper bound, we just need to show that the size of the numbers involved in the entire computation is also bounded by a polynomial in the size of the input. How worse can that get? (The pessimistic question)
By the way, I hope you have heard of the new (or quantum) difference a pessimist and optimist. The classical version goes this way. The pessimist "Glass is half empty". The Optimist "Glass is half full".The Quantum version. The Pessimist "It can't get any worse this". The Optimist "It can!"
So back to the problem. How worse can it, in fact, get? The LLL algorithm uses rational numbers. So we are under a two-sided attack. We have to bound both precision and the magnitude of the involved quantities.
(I will write the details in this space soon enough. For now, you can just assume it and rest).
Now we have seen the SVP connection. Whats the connection with CVP? As you might have guessed, this LLL algorithm can be used for an approximate solution to CVP as well. Here is how. Given a Basis $B$ and a target vector $t$ you need to find a lattice vector, $Bx$, that is closest to $t$. To this end, you just add $t$ to the basis and find the and run the reduction step of the LLL algorithm on this basis.
Quoting from Goldwasser-Micciano's book "Now you find a vector $x \in L(B)$ such that $t - x$ can be written as $t^* + \sum_{i=1}^nc_ib_i^*$ where $t^*$ is component of $t$ orthogonal to span$(b_1, b_2,...,b_n)$ and $|c_i| \leq 1/2$ $\forall$ $i = 1,2,..,n$."
This incidentally is the high level sketch of the approx-CVP algorithm as it can be shown that the distance of $t$ from $x$ is within an exponential approximation of the optimal. More on this in m future posts.
There is a lot more to come on this post. So, please wait. I will do that whenever I get time. (Yeah, you got it. I publish the post then edit and re-edit till it gets to the final version!)
So, finally we are inching towards our target. Next, we seek to attack the big guns - the LLL algorithm. It has a bearing upon nearly all aspects of lattice based cryptography. We will in particular investigate the connection with SVP and another problem whose name has not been mentioned so far in this blog - CVP or the Closest Vector Problem.
This post will majorly delve upon 3 different topics. First, we would talk about LLL algorithm, the ideas leading to its development, the notion of reduced basis and what is it good for. Next, we would talk a bit about SVP and CVP sketching a little more background about the problems. Finally, we will see what LLL has to say about these problems. These are the 3 major topics discussed here but they are not in any particular order. In fact, there will be a zig-zagging through these 3 major areas throughout this post.
So, lets get started. First I will tell develop a little of the historical context behind the development of the LLL algorithm. In early 1980's, H.W. Lenstra (not HomeWork Lenstra!) was working on a problem which asked him to determine whether or not a given triangle (in $\RR^2$) with rational end points contained a lattice point. He began with a vague idea of saying that "fatter triangles will contain" a lattice point; "skinnier ones might not" and developed quite a wonderful solution.
In his own words "The solution essentially consists of denying that "special" triangles exist. If the triangle K looks a bit weird, why not apply a nonsingular linear transformation τ such that the triangle τ[Κ] looks better?" To be specific, he chooses τ so that the transformed triangle is equilateral. (Later on, he changed it to a right angled isosceles triangle) (I am just copy/pasting notes from his original paper - its extremely readable). He goes on to say "Clearly, $K$ contains an element of $Z^2$ if and only if the new triangle τ[Κ] contains an element of the lattice $L$ = τ($Z^2$) . If $e_1$, $e_2$ are the standard basis vectors of $\RR^2$, then τ$(e_1)$, τ$(e_2)$ form a basis for $L$ in the sense that $L = Z*τ(e_1) + Z*τ(e_2)$."
Lenstra, thus notes "The problem has now been shifted from the triangle to the lattice." You just need to find the basis for this lattice which can be done in quite a few ways in poly time. (Maybe this statement needs explanation. See previous posts for this). Then he invokes the idea of covering radius which basically tells him the minimum radius a circle sitting in the plane with a lattice point at its center must have so that the series of all such circles spans the $2-d$ plane. The crucial thing is - this covering radius can be calculated and we are okay with using any polynomial upper bound on the covering radius.
Thus, after having found one such value, Lenstra just checks if the triangle is big enough to inscribe a circle with radius greater than equal to the above value. In case, the triangle is big enough - good, end of story, the triangle did contain a lattice point. In case, the triangle is not big enough, there are a series of lines parallel to the one basis vector hitting the triangle. There are a polynomial number of such lines. Just check for these lines, one at a time, if someone among them manages to get one of their lattice points inside the triangle.
This idea can be immediately generalised to higher dimensions which is where the rudimentary steps towards LLL first developed from. The problem goes like this - given a convex body (defined by a set of linear inequalities) in higher ($n \geq 3$) dimensions you are asked to determine if it contains a $n$-dimensional lattice point.
Lenstra's idea goes like - transform the convex body into a "cuter convex body". Look for lattice points with the given basis inside the transformed body the following way. Find the covering radius (or some polynomial upper bound on covering radius). Determine whether the transformed body can inscribe a $n-dimensional$ sphere. If it can, some lattice point is there inside the body. If it cannot, the body is hit by span of vectors apart from $b_1$ (which form a hyperplane). These hyperplanes are equi-spaced at gaps of $b_1$. Just see whether some of these hyperplanes contain the desired lattice point.
What led to the discovery of the algorithm was, of course, the transformation step. Lenstra found that in effect, his approach kept on improving the angle between the incident vectors by pushing them closer to being orthogonal. He wrote to Lovasz about his concerns who later came up with the classic reduction scheme and its equally amazing analysis.
In fact, in the two dimensional case, this is precisely what the algorithm did. It turns out this two dimensional case was known long before. It had been solved by the prince of Mathematics, Carl Friedrich Gauss. It is from Gauss's algorithm that we begin our tour of this topic. Our target is to "reduce" a lattice basis to another equivalent basis which makes some questions easier to answer.
What reduction scheme to follow depends on the situation you are in i.e., on the problem you are trying to solve. No one reduction scheme works "exactly" for all cases. So, we compromise and stay happy with LLL reduction which is "reasonable" to some extent in that it ensures that you do not loose too much. Gauss's algorithm, which we will outline shortly, is just a special case of LLL reduction scheme.
But first let me point out a few reduction schemes in advance which do not work. As you now know, the successive minima vectors do not determine a Basis for the lattice in general. In two-dimensional case, however, they do. So, basing the definition on the basis of successive minima vectors is out. Similarly basing it on orthogonalized set of vectors is out. Also, we have a bad luck in that a mutually orthogonal system of vectors need not be short enough as some problems (SVP or CVP) require. LLL algorithm however finds a nice way out which we will detail a little more shortly; but first its time we turned to Gauss's algorithm. (In what follows, we assume that we are talking $l_2$ norm; though Gauss's algorithm is easily extendable to other norms too.)
Gauss starts off by defining a condition for "reducedness" in 2 dimensions for the given basis vectors $b_1$ and $b_2$ which basically says that the basis is reduced if the diagonals of the parallelepiped determined by it are both larger than its sides. That is, $||b_1||, ||b_2|| \leq ||b_1 + b_2||, ||b_1 - b_2||$.
The definition is rooted in the fact that we know that for $2$ dimensional case any lattice Basis can be reduced to the basis with successive minima as its basis vectors. Now I will make clear the relation between reduced base and successive minima of the lattice.
In the given setting ($2-dimensional$ lattices), a basis is reduced if and only if the basis vectors are of length $\lambda_1$ and $\lambda_2$ in some order. How do we prove this statement? Well, you just need to keep the definition of reduced basis in mind and proceed straightaway for the forward direction. Do you see the forward direction proof yet? It just says that if your basis is $b_1$ and $b_2$ with $||b_1|| = \lambda_1$ and $||b_2|| = \lambda_2$ then this basis is reduced. Its too clear - the above statement looks like the proof itself.
Its backward direction which is some sport. It says that if you have a reduced basis, then the vectors $b_1$ and $b_2$ in it better have lengths $\lambda_1$ and $\lambda_2$. Why is that? First write out any lattice vector $Bx$ in this basis as $x_1*b_1 + x_2*b_2$ where $x_1$ and $x_2$ are integers. Assume without loss of generality that $||b_1|| \leq ||b_2||$. You want to show that any vector $Bx$ is larger than either of $b_1$ and $b_2$.
For this, you will also need the following easy to prove lemma that if $||x|| \leq ||x + y||$, then $||x + y|| \leq ||x + \alpha*y||$ where $\alpha > 1$. The lemma basically says that as you move further out along a line, so that a little motion increases your distance from some point $P$, further motion in that direction will increase your distance more from $P$.
Now, out of my laziness, and partly because I think I have more important things to discuss here in this blog, I would not complete this proof (unless you want me to).
Now we finally come to Gauss's algorithm. We will follow it up with its complexity analysis. Gauss's idea is very reminiscent of Euclidean algorithm for GCD calculation. You are invited to try spotting the similarities. The algorithm can be seen as a greedy algorithm which iteratively "shrinks" the longest vector in the given basis to find an equivalent one. It continues as long as the reduced vector becomes shorter than the other one and then it stops.
Note that in the algorithm below we subtract from $b_1$ a quantity close to $\mu_{2,1}$ times the vector $b_2$. Subtracting $b_2$ would have orthogonalized it, subtracting this much would sort of orthogonalize it which is another property a "dream basis" should have.
(The following has been taken from the book Algorithmic Cryptanalysis which contains a very good discussion of this algorithm)
Gauss's Algorithm
Require: Initial lattice basis ($b_1$, $b_2$)
if $b_1 < b_2$ then
Exchange $b_1$ and $b_2$
end if
repeat
Find integer $c$ that minimizes $b_1$ $-$ $cb_2$ by:
$c \leftarrow \lfloor \mu_{2,1} \rceil$
Let $b_1 \leftarrow b_1- cb_2$
Swap $b_1$ and $b_2$
until $||b_1|| \leq ||b_2||$
Output ($b_1$, $b_2$) as reduced basis
You can find $c$ quickly by binary search. The norm of the longest vector reduces at every step and so the algorithm terminates; the norm cannot go below $\lambda_1$. Further, each iteration of the algorithm clearly takes polynomial time. So, in order to show that the algorithm is a poly time algorithm, all we need to do is just to show that the total number of iterations is polynomial in size of the input. And it can be shown that this must be the case as every iteration decreases the size of the longest vector by $1/2$. I would not go into the details but just mention that it can be proved by first noticing that every iteration shrinks the longest vector by a multiplicative factor.
But in order to develop better intuition for what we want to discuss, it is nice to do the complexity survey the following way. Consider a variation of Gauss's algorithm which we call $t$-Gauss algorithm. (Here $t \geq 1$ is a new parameter to complicate Gauss's algorithm and to simplify our discussion when we talk about LLL).
t-Gauss's Algorithm
Require: Initial lattice basis ($b_1$, $b_2$)
Paramter $t \geq 1$
if $b_1 < b_2$ then
Exchange $b_1$ and $b_2$
end if
repeat
Find integer $c$ that minimizes $b_1$ $-$ $cb_2$ by:
$c \leftarrow \lfloor \mu_{2,1} \rceil$
Let $b_1 \leftarrow b_1- cb_2$
Swap $b_1$ and $b_2$
until $||b_1|| \leq t*||b_2||$ ==> The only step differing from Gauss's algo
Output ($b_1$, $b_2$) as reduced basis
The algorithm above is cute in the sense that it ensures that the shortest vector shrinks by a factor of t in all iterations except the last one.
Why? Think about it. After the swap step, $b_2$ is your new (shrunk) vector and the previous value of $b_2$ is stored in $b_1$. As long as you remain in the loop now, you must have $t||b_2|| \leq ||b_1||$. That is, $t$ times the new (shrunk) vector is less than the ||b_2|| that you began the round with (or equivalently shortest vector at the start of round, $b_2$, shrunk in size to $b_2$ at the start of next round.
At the risk of overexplaining which might obscure things, I will keep quiet and just state that by a similar reasoning you can see that the longest vector also must decrease in every round by a factor of $t$ (except maybe the first round). Also, you can notice that the longest vector in a round becomes the shortest vector in the next one. That is to say, the longest one shrinks by atleat a factor of $t$.
As a consequence, this algorithm has polynomial number of iterations upper bounded by O$((log(max (b_1, b_2))/t )$. You can derive an expression for number of iterations and by throwing in a bit of analysis, you can show that this algorithm is polynomial for $t=1$ which is the classical Gauss algorithm.
This wraps up a chapter (and if it does not, let me sweep it under the rug)
Next, we have come one more step closer to our target - starting LLL. But I did not tell you the expanded form, did I? It is Lenstra-Lenstra and Lovasz algorithm after its inventors.
So let me grab a coke and come back.
Great so I am back. Coke's gone too.
As I told you earlier, there is no single reduction scheme that works "exactly" for all cases in poly time. We are instead happy with a "reasonable" algorithm which works most of the time. The presentation here follows closely Oded Regev's lecture notes on the subject. We will also discuss a bit from this lecture. So, let me mention the three major characteristics that LLL algorithm shares with Gauss's algorithm. I will sketch some high level points first. It is instructive to keep them in mind as you study the algorithm, but I will repeat them when needed so you do not really need to worry.
1) Define a LLL reduced basis
2) Present an algorithm to find that basis
3) Analyse the running time.
First allow me to mention that now we are no longer talking full rank lattices. Lattices will, in general, have fewer (or equal) vectors as compared to the dimension.
Next, let me define the notion of reducedness we use. (Just reiterating. Keep in your mind that its the Basis which is being reduced; saying you are reducing a lattice does not make sense in our context)
We say that a basis $(b_1, b_2, b_3 ..... b_n) \in \RR^{m*n}$ is $\delta$-LLL reduced (for a parameter $\delta$ about which we will talk later) if the following holds true
1) $|\mu_{i,j}| \leq 1/2$ for all $i > j$ where $\mu_{i,j}$ are the Gram-Schmidt coefficients
2) for any consecutive vectors $b_i$, $b_{i+1}$ the following holds
$\delta$ times Projection of $b_i$ on span of $(b_i*, b_{i+1}*....b_n*)$ is less than or equal to Projection of $b_{i+1}$ on the same span.
You might be having some sense of deja vu, or more precisely you might be feeling that the 2nd requirement for a Basis to be $\delta$-LLL reduced can be written more clearly. (After all, isn't projection of $b_i$ on span of $(b_i*, b_{i+1}*....b_n*)$ something nice? Yes, it is. It is your Gram-Schmidt orthogonalized vector $b_i*$. Think about it. You took away from $b_i$ the components it would leave along $b_1, b_2....b_{i-1}$ by projecting it on the above span. This gives $b_i*$). More formally, we put it this way. We define a family of projection functions, $\pi_i(x)$, for each index $i$ which measures the projection of vector $x$ onto span $(b_i*, b_{i+1}*....b_n*)$.
It is given as $\pi_i(x)$ = $\sum_{j=1}^{j=n}($<$x, b_j^*$>$/$<$b_j^*, b_j^*$>$)$$b_j^*$
We then note $\pi_i(b_i) = b_i*$. Thus, the 2nd requirement for LLL reduction says that $\delta*b_i^* \leq b_{i+1}^*$ for all consecutive vectors $b_i$ and $b_{i+1}$.
Also, I would like you to notice these characteristics in Gauss's algorithm wherein you have
1) $\mu_{2,1} \leq 1/2$
2) $||b_1||| \leq ||b_2||$
You are invited to verify this.
Okay. I guess you would want a cleaner way to understand the first requirement for a basis to be LLL reduced. Do not fear, thats what we do next. Any guesses on how will I try making the content of this requirement clear? How do you think I will do that?
(Relax, its not actually a totally unthinkable way. Its just an alternative convenient representation which helps bind the ideas in your mind; I did too much about ado about nothing [:)] )
By representing the vectors in the "clean basis". We represent the basis vectors in the Gram-Schmidt Orthogonal system to achieve this. If you would recall, in this basis the original Basis vectors are written this way
$\begin{pmatrix}||b_1^*||&\mu_{2,1}||b_1^*||&...&\mu_{n,1}||b_1^*||\\0&||b_2^*||&...&\mu_{n,2}||b_2^*||\\...&...&...&...\\...&...&...&...\\...&...&...&...\\0&0&...&||b_n^*||\end{pmatrix}$
Quoting from Regev's lecture "Here, column $i$ shows co-ordinates of $b_i$ in orthonormal basis. The first condition in the definition of LLL-reduced basis guarantees that the absolute value of any off diagonal element is at most half the value written in the diagonal element of the same row. This can be written as"
$\begin{pmatrix}||b_1^*||&\leq1/2||b_1^*||&...&\leq1/2||b_1^*||\\0&||b_2^*||&...&\leq1/2||b_2^*||\\...&...&...&...\\...&...&...&...\\...&...&...&...\\0&0&...&||b_n^*||\end{pmatrix}$
For illustrating the $2nd$ condition consider the $2*2$ submatrix formed with the upper left entry at index $(i,i)$.
$\begin{pmatrix}||b_i^*||&\mu_{i+1,i}||b_i^*||\\0&||b_{i+1}^*||\end{pmatrix}$
This property just ensures that the $2nd$ column in the above submatrix is almost as long as the first one.
Now I request you to revisit the conditions for a Basis to be LLL-reduced. Keep those in mind for the following discussion which shows how closely LLL is related with SVP (Shortest Vector Problem). It can be used to give an approximate answer to this problem in higher dimensions. The approximation factor is exponential but nonetheless, it is a tremendous achievement. To make the discussion concerete, we would set $\delta = 3/4$.
Assume we have a LLL reduced basis which, by definition, satisfies the conditions I asked you to keep track of.
We know that :
$\delta*||\pi_i(b_i)||^2 \leq ||\pi_i(b_{i+1})||^2$
$\Rightarrow \delta||b_i||^2 \leq ||b_{i+1}^* + \mu_{i+1,i}b_i^*||^2$
$= ||b_{i+1}^*||^2 + ||\mu_{i+1,i}b_i^*||^2$
$= ||b_{i+1}^*||^2 + \mu_{i+1,i}^2||b_i^*||^2$
$ \leq ||b_{i+1}^*||^2 + 1/4*||b_i^*||^2$
Rearranging terms, we get
$(\delta - 1/4)*||b_i^*||^2 \leq ||b_{i+1}^*||^2$
which can be rewritten as
$||b_i^*||^2 \leq \alpha ||b_{i+1}^*||^2$ where $\alpha = \delta - 1/4$
For $\delta = 3/4$, we get $\alpha = 2$. And thus, we have $b_i$ is atmost twice as long as $b_{i+1}$.
And finally we obtain $\lambda_1 \geq min(b_i) \geq 2^n||b_1||$.
Neat! isn't it?
Now, I give you the algorithm (from Regev's notes)
INPUT: Lattice basis $b_1,b_2...,b_n \in Z^n$
OUTPUT: $\delta$-LLL-reduced basis for L(B)
Start: compute $b_1^*, b_2^*,..,b_n^*$
Reduction Step:
for $i = 2$ to $n$ do
for $j = i-1$ to $1$ do
$b_i \leftarrow b_i - c_{i,j}bj$ where $c_{i,j} =$ <$b_i, b_j^*$>$/$<$b_j^*, b_j^*$>
Swap Step:
if $\exists i$ s.t. $\delta||b_i||^2 > ||\mu_{i+1,i}b_i^* + b_{i+1}^*||^2$ then
$b_i \leftrightarrow b_{i+1}$
goto start
Output $b_1, b_2,...,b_n$
Thus, you can see that LLL algorithm is trying out an "extension" (or more precisely relaxation) of Gauss's algorithm in which we want condition 1 satisfied for all $\mu_{i,j}$'s and for condition 2, apart from the factor $\delta$, we ensure that the final reduced base is "cute" in the sense that its Gram-Schmidt Orthogonalization ensures that $b_i$ is not much bigger than its predecessor (its atmost $1/\delta$ times longer) and when this chain winds up all the way to $b_1^*$, it says that $b_1 = b_1^*$ is not too long.
At a high level, you can visualize LLL algorithm as an effort at combining Gram-Schmidt Orthogonalization with Gauss reduction for full rank lattice of dimension $2$. Quoting from Algorithmic Cryptanalysis, "it [LLL] considers the lattice generated by the orthogonal projection of two consecutive vectors $b_i$ and $b_{i+1}$ on the vector subspace generated by the previous vectors $(b_1, b_2,...,b_{i-1})$ and applies one iteration of t-Gauss reduction to this projected lattice. In order to also modify the high dimension lattice, all swap and translation operations are pulled back from the projected space and applied to the vectors $b_i$ and $b_{i+1}$ themselves. In addition, these vectors may also be changed by adding to them some vector from the sublattice with basis $(b_1, b_2,...,b_{i-1})$. The goal of this operation is to make sure that $b_i$ is not too far from the corresponding projected vector $b_i^*$".
You have been with me this far. Please have a little more patience, once you get the ideas right; you will still be the same; but you will like the subject more. You may need a break; but before that reflect upon the fact that we have completed the first two items in the list
1) Define a LLL reduced basis
2) Present an algorithm to find that basis
3) Analyse the running time.
The last one remains. And the end of the story is absolutely rewarding.
Had your break? That was fast. So, we are now down to the last point wherein we analyze the run time of the algorithm we developed. But first, its not even clear that the algorithm terminates. And it it does, does it indeed compute a $\delta$-LLL reduced basis correctly?
So lets dive right in. Say the algorithm terminates. How do we determine whether the output be correct? Thats easy, just check if the output basis satisfies the two conditions for a basis to be $\delta$-LLL reduced.
Condition $2$ is trivially satisfied. The algorithm was so scared to death by this condition, that it wont stop unless this condition is satisfied. (See the Swap step). Condition $1$ is also satisfied. This is because of the way algorithm operates at every $b_i$. For each $b_i$, the algorithm makes knowledge of the Gram Schmidt representation of the Basis vectors to ensure that all $\mu_{i,j}$ for $j < i$ are less than $1/2$ in the reduction step. This is achieved by subtracting suitable amounts of $b_j$'s for $j < i$ from $b_i$. Note that the order (from i-1 to 1)is crucial.
So, if we do not have anything to swap, we are home. If we have - ah those bad days. Now we would have to recompute Gram-Schmidt Orthogonalized system for the new basis. It would involve re-doing the Reduction Step and the Swap Step. Bad days.
But wait you say. Do I not need to recompute the Gram-Schmidt Orthogonalized system after every iteration of the reduction step? The answer is no. After all, such operations, $b_i \leftrightarrow b_i + ab_j$ never disturb the orthogonalized system. It stays the same.
Does the algorithm stop? You know it does. (Otherwise you would not be studying it.) We prove a stronger statement here by demonstrating not just that it terminates, but by showing the time that it takes to terminate.
Now, the last piece. Run time.
We have to show that the run time is polynomial in input size. To this end, we will first bound the number of iterations. Next, we will bound the running time of a single iteration.
For every iteration performed by the algorithm, one swap step is used. That is to say, that the number of iterations is same as the same as the number of swap operations. How do we bound this number?
In fact, we are going to see the idea that Lovasz used in his analysis, but it is instructive to see the more general model (or that dreadful word, paradigm) his approach for analysing this algorithm falls into. Below, we are going to use a sort of "amortized analysis" for bounding the number of iterations this algorithm goes through. We are reverse-engineering the process. We know the algorithm terminates. We want to find out a bound for this. In algorithmic studies, we often find it convenient to associate with every intermediate state of the algorithm a number which indicates the distance from the target. Without loss of generality, we assume that to begin with the target is at zero and it "stays" at that value (this might depend on the application - corresponding to a stupid move in some cases, the target might move further away. In our case however, we will choose a target and will craft intermediate steps carefully enough so that target stays at the same value.) Demonstrating that this target always decreases gives you an answer for the fact that the algorithm does terminate. Demonstrating that every "careful algorithmic step" decreases target by a particular amount gives a bound on the run time of the algorithm.
So, what do you think we choose as target in this application? What about your intermediate checkpoints? Indeed, your intermediate checkpoint is everytime you orthogonalize the system or every new iteration, but what do you associate an integer value with? And again with what do you associate your target value?
This is where Lovasz steals the ball from most of the mortals. See if you like his analysis. Since, Lovasz knows that the algorithm swaps adjacent vectors when they are out of order, he knows this step must be improving upon the "distance" to the target. So, we should associate a number with some quantity which "captures this information". Hmm, Enough fuss...what should target be anyway? Ah! dont be impatient. Think. Once you well move past a vector $b_{i+1}$, the effect of subsequent vectors on $b_{i}$ vanishes. Something tells us that we should associate an integer with the current Basis. (It is the obvious choice!). Further, its nice to associate it with the determinant of the basis as in its orhtogonalized form, determinant will be a piece of cake. But it has to capture what happens at a swap step. So, it will be like a product of determinants.
Without any more ado, we notice that the integer can be associated with the following product $\Pi_{i=1}^ndet(\Lambda_k)^2$ where $\Lambda_k$ is the lattice spanned by the first $k$ basis vectors. Since swapping ensures that "out of order" vectors obey $2nd$ condition of $\delta$-LLL reduced basis, we also get the reduction of determinant value as a free bonus!
We are decreasing the integer associated with the basis at every iteration! That too by a multiplicative factor. Thats an achievement. We can see that the algorithm has got something like $ln(d_0)*C$ iterations for some constant $C$ and some $d_0$ which we define shortly.
To see that what I said above is indeed happening notice the following ration after the $ith$ step. Assume that at the beginning of this iteration, $d_i$ is the value associated with the basis and after the swap, it becomes $d_i$'.
$d_i$' is the determinant of the lattice $\Lambda_i$' where $\Lambda_i$' $= L[b_1, b_2,...., b_{i-1},b_{i+1}]$. Consider the ratio $d_i/d_i$'. The determinants for $j < i$ are not modified and cancel out. The determinants for $j > i$ are same in absolute value. (Swap 2 columns). Thus, the only thing that stand uncancelled is, when $j = i$.
Thus, $d_i$'$/d_i = det(\Lambda_i$'$)/det(\Lambda_i)$
$ = det([b_1,....,b_{i-1}, b_{i+1}])^2/det([b_1,...,b_i])^2$
$ = ((\Pi_{j=1}^{j=i-1})^2*||\mu_{i+1,i}b_i^* + b_{i+1}||^2)/\Pi_{j=1}^{j=i}||b_j||^2$
$ = ||\pi_i(b_{i+1})||^2/||\pi_i{b_i}||^2$
$ \leq \delta$ by definition of $\delta$-LLL reduced basis
Thus, we notice, that after every swap, we inch closer to the target. The factor of inching closer is $1/\delta$. And thus the claim the number of iterations is $ln(d_0)/ln(1/\delta)$ is vindicated.
Now, we also have to bound the run time of a single iteration. How do you think we should go about it? Well, since we intend to find an upperbound on the number of steps, lets take a pessimistic approach. The number of arithmetic operations performed in each iteration is clearly polynomial. Thus, in order to prove a polynomial upper bound, we just need to show that the size of the numbers involved in the entire computation is also bounded by a polynomial in the size of the input. How worse can that get? (The pessimistic question)
By the way, I hope you have heard of the new (or quantum) difference a pessimist and optimist. The classical version goes this way. The pessimist "Glass is half empty". The Optimist "Glass is half full".The Quantum version. The Pessimist "It can't get any worse this". The Optimist "It can!"
So back to the problem. How worse can it, in fact, get? The LLL algorithm uses rational numbers. So we are under a two-sided attack. We have to bound both precision and the magnitude of the involved quantities.
(I will write the details in this space soon enough. For now, you can just assume it and rest).
Now we have seen the SVP connection. Whats the connection with CVP? As you might have guessed, this LLL algorithm can be used for an approximate solution to CVP as well. Here is how. Given a Basis $B$ and a target vector $t$ you need to find a lattice vector, $Bx$, that is closest to $t$. To this end, you just add $t$ to the basis and find the and run the reduction step of the LLL algorithm on this basis.
Quoting from Goldwasser-Micciano's book "Now you find a vector $x \in L(B)$ such that $t - x$ can be written as $t^* + \sum_{i=1}^nc_ib_i^*$ where $t^*$ is component of $t$ orthogonal to span$(b_1, b_2,...,b_n)$ and $|c_i| \leq 1/2$ $\forall$ $i = 1,2,..,n$."
This incidentally is the high level sketch of the approx-CVP algorithm as it can be shown that the distance of $t$ from $x$ is within an exponential approximation of the optimal. More on this in m future posts.
There is a lot more to come on this post. So, please wait. I will do that whenever I get time. (Yeah, you got it. I publish the post then edit and re-edit till it gets to the final version!)
Successive minima and Gram-Schmidt Orthogonalization
In the previous post, we discussed a non-trivial lower bound on the length of first successive minima or the shoterst non-zero lattice vector which we denote by $\lambda_1$.
(Please note that in this post and subsequent ones when I want you to consider a vector $\lambda_i$, I actually want you to consider a vector whose length is $\lambda_i$ (normally in $l_2$ norm)).
I told you that there is an alternative method to achieve the same ends. In fact, now I want to say a little bit more. I would like to prove that $\lambda_i \geq min_{j=i...n}||b_j*||$.
Previous post just discussed the corollary you can draw from this fact that $\lambda_1 \geq min_{i=1...n}||b_i*||$.
In this post, I will tackle a few other interesting problems. These problems are mostly taken from the first homework on Oded Regev's Course on the subject.
The purpose of these problems is to tell you about the nice interplay between the properties of Successive minima and Gram Schmidt Orthogonalization.
Some of these solutions are due to Roy Kasher and Nir Bitansky at Tel Aviv Computer Science
For convenience, we will assume that we are talking full rank lattices of rank $n$ unless otherwise specified.
Consider the representation of the basis vectors in the orthogonalized system as shown in the previous post.
Notice that the $ith$ co-ordinate in the $ith$ column is $||b_i*||$. All entries below this are $0$. All entries above are $\mu_{i,j}||b_j*||$ in row $j$ if $j < i$.
For your convenience, I reproduce the matrix below
$\begin{pmatrix}||b_1*||&\mu_{2,1}||b_1*||&...&\mu_{n,1}||b_1*||\\0&||b_2*||&...&\mu_{n,2}||b_2*||\\...&...&...&...\\...&...&...&...\\...&...&...&...\\0&0&...&||b_n*||\end{pmatrix}$
What will you say about the vectors $\lambda_1$, $\lambda_2$....$\lambda_j$ when the following question is asked - do any of these vectors have a non-zero component $x_i$ when $i \geq j$ ? So, what is it? Yes or No?
Yes! If it were not, then you wont have $j$ linearly independent vectors as the definition of $j^{th}$ successive minima requires.
Okay, what then? Awesomeness. You know that some of these vectors $\lambda_1$, $\lambda_2$...$\lambda_j$ has some $i \geq j$th component non-zero.
You also know that any lattice vector is formed by linear combination of the vectors in the above matrix. Thus any lattice vector (including the $j$ successive minima-attaining ones) formed by "integral helpings" from the vectors of the above basis. We know that for the successive minima-attaining ones, a particular "integral helping" is non-zero. Thus, your $\lambda_j$ is guranteed to be atleast this quantity wherefrom it follows that $\lambda_j \geq min_{j=i...n}||b_j||$.
There is more to come. We will tackle some more problems before we say goodbye today. So, grab a coke or pepsi (but no popcorn!), and dive right in.
(Do not dive in the coke; it gives a sticky feeling!)
This problem is for testing your understanding of Gram Schmidt Orthogonaliztion process. Can you, dear reader, disprove the following
"$\lambda_n \geq max_{i=1...n}||b_i*||$"
Hmmm...how do we proceed. You can think of it this way. Well, since we are asked to disprove it; any counter-example will do.
Next, we know that Gram Schmidt is supposed to be dependent on the sequence of the vectors in the input basis.
Lets talk $2*2$ matrices.
What if I take a particular basis, which sure enough does have smaller vectors than the given $b_1$. (In particular, I would like to have $b_2$ as one of the successive minimas; this will make my life cool!)
And sure enough a little tweaking by that pen of yours gives a counterexample as the matrix $\begin{pmatrix}1&0\\1&1\end{pmatrix}$
The following problem tests you on your understanding of Successive minima and independence. So, for a full rank lattice you have been provided with this list of all the successive minima attaining lattice vectors. I ask - can they capture the whole lattice?
Some are tempted to say yes for the simple reason that we are talking full rank lattices and have $n$ independent vectors. Nothing could be further from truth. The answer is no and the reason is simple - you expect to have the span same. Getting the lattice same is co-incidental. For example consider the following lattice basis vectors. This solution is due to Nir Bitansky.
$v_1$ = $\begin{pmatrix}2\\0\\0\\...\\0\\1\end{pmatrix}$ $v_2$ = $\begin{pmatrix}0\\2\\0\\...\\0\\1\end{pmatrix}$ $v_3$ = $\begin{pmatrix}0\\0\\2\\...\\0\\1\end{pmatrix}$ $v_{n-1}$ = $\begin{pmatrix}0\\0\\0\\...\\2\\1\end{pmatrix}$ AND $v_n$ = $\begin{pmatrix}1\\1\\1\\...\\1\\1\end{pmatrix}$
Look at $v_n$ again. It is the all 1's vector. In this system, it is clear that all the successive minimas are 2. And the $ith$ successive minima vector has $2$ at the $ith$ position with all other entries $0$. The basis generated by these successive minima vector does not even include the all $1$'s vector. And this proves the assertion in question.
For the $2$ dimensional full rank lattices however, we have a nice result. It says that the successive minima attaining vectors do form a basis. Do you see why? Let $\lambda_1$ and $\lambda_2$ be your successive minima. Being independent, they clearly hit every point in $\RR^2$. Thus, a lattice point can be written as $x*\lambda_1 + y*\lambda_2$ for some reals $x$ and $y$.
$\lfloor x\rceil*\lambda_1 + \lfloor y\rceil*\lambda_2$ clearly belongs to the lattice. And so does their difference, the vector $v$ which clearly satisfies the following
$||v|| < (||\lambda_1|| + ||\lambda_2||)/2$.
And the right side of the above inequality is clearly less than $||\lambda_2||$. Thus, you get two independent lattice vectors and one of them is less than $||\lambda_2||$ which is clearly a contradiction.
Next, we try the following problem from Oded Regev's homework. This is not really related to the properties of Successive minima or Gram Schmidt Orthogonalization; but it is an excercise you should solve after an exposure to Unimodular matrices.
The problem is -
Show that any unimodular matrix $U \in Z^{n*n}$ can be transformed to the identity matrix by the following three basic column operations: $a_i \leftrightarrow a_j$ , $a_i \leftarrow -a_i$, and $a_i \leftarrow a_i+k*a_j$ for some integer $k$.
There is a hint that goes with the problem - Use Eulidean algorithm. I tried the problem but was unable to solve it. (I had not seen the hint though).
Anyway, Roy Kasher's solution was mind blowing. Roy, if you are reading this, get a webpage fast. I would like to link your page with this blog and let people have a look at how humans would look when they evolve (going brain-wise).
I was confused about how to apply Gaussian Elimination as that may not guarantee integrality which is what I needed.
Anyway, without much ado, here is the solution that Roy gave.
First, assume that the determinant has been expanded along some arbitrary row. Then, clearly the value of the determinant is a multiple of the $gcd$ of the numbers in this row.
And, since determinant is known to be 1 (the matrix is unimodular), the gcd of elements in this row is 1. Thus, we know that there exist two elements in this row that are co-prime. And since this row was an arbitrary row, any row has got two elements which are co-prime.
First select two elements in the first row which are co-prime. Run Euclid GCD on these elements till you get a 1 in one of these elements. Also, "copy these operations" over to other elements in the column headed by these elements. The 3 operations allowed give us enough room to invoke these operations.
Thus, sooner or later, you get a 1 in one of these positions. After this, you just zero out the first element from the rest of the first row.
Next, you move on to second row; knowing like that perfect hunter that your prey, two co-prime elements must exist in this row (determinant is still 1!).
You repeat the sequence of operations till you arrive at the Identity Matrix!
What else can you state about unimodular matrices? What, for instance, can you say about their inverses?
It turns out their inverses are also unimodular. This is easy to see as the foregoing proof tells us the key thing. Only elementary column operations can carry a unimodular matrix $U$ to $I$. All the elementary operations $E_1$, $E_2$...$E_{final}$ can be chained up like $E_{final}*....E_2*E_1$ to give the inverse which is clearly a unimodular matrix as all the $E_i$'s are integral and it obviously has determinant $1$.
This also tells you the proof for the quite "obvious" fact that two Bases, $B_1$ and $B_2$ are equivalent iff one is derivable from the other using column operations.
Had enough! We are just getting warmed up. Here is another
"Show that for any full-rank integer lattice $\Lambda$, $det(\Lambda).Z^n \subseteq \Lambda$"
That reminds me - I did not tell you that |$det(\Lambda)$| is also something you can talk about. It is not basis specific (the absolute value of determinant of lattice).
Its clear if you keep in mind equivalent lattices can be derived from one other by multiplication of unimodular matrices.
Okay. So, this question asks you to show that all vectors of $Z^n$ when multiplied by $det(\Lambda)$ give a vector which is in the lattice.
To do this, the most natural way seems to be the following. Let $det(\Lambda) = d$. Now we will try to show that the following vectors can be derived using the given lattice
$v_1$ = $\begin{pmatrix}d&\\0&\\0&\\...&\\0\end{pmatrix}$ $v_2$ = $\begin{pmatrix}0&\\d&\\0&\\...&\\0\end{pmatrix}$ $v_3$ = $\begin{pmatrix}0&\\0&\\d&\\...&\\0\end{pmatrix}$... $v_n$ = $\begin{pmatrix}0&\\0&\\0&\\...&\\d\end{pmatrix}$
Can I do that?
Why not? Let us expand the determinant along first row. Assume that co-factors are denoted by $C_{ij}$ and the $ij^{th}$ entries by $a_{ij}$.
Then we know $a_{11}*(C_{11}) + a_{12}*(-C_{12}) + ..... + a_{1n}*((-1^{n+1})*C_{1n}) = d$.
Thus, we can get a $d$ in the first place of a vector, $v$, by taking $C_{11}$ of first vector, $-C_{12}$ of second vector and so on. Hmm...Looks like this will have some bearing upon $v_1$. What does this combination do to other components of $v$?
We would really like it if makes $v$ the same as $v_1$; lets keep our fingers crossed. The other components are like
$a_{i1}*(C_{i1}) + a_{i2}*(-C_{i2}) + ..... + a_{in}*((-1^{n+1})*C_{in})$.
This sum is 0 as we would have hoped! Can you see why? Its just because the cofactors are what you obtain with first row removed. And the first row is the same as $ith$ row. This is universally known to be zero. To reiterate,
$a_{i1}*(C_{i1}) + a_{i2}*(-C_{i2}) + ..... + a_{in}*((-1^{n+1})*C_{in}) = 0$
for all $i \neq 1$.
Similarly, you can obtain $v_2$, $v_3$ and all of $v_i$'s. Once you get them, you can get any integral combination you want in this world. This proves the statement in question.
There are some more problems which I will work out once I get some more time.
Till then, good bye and have a wonderful 2010 ahead.
(Please note that in this post and subsequent ones when I want you to consider a vector $\lambda_i$, I actually want you to consider a vector whose length is $\lambda_i$ (normally in $l_2$ norm)).
I told you that there is an alternative method to achieve the same ends. In fact, now I want to say a little bit more. I would like to prove that $\lambda_i \geq min_{j=i...n}||b_j*||$.
Previous post just discussed the corollary you can draw from this fact that $\lambda_1 \geq min_{i=1...n}||b_i*||$.
In this post, I will tackle a few other interesting problems. These problems are mostly taken from the first homework on Oded Regev's Course on the subject.
The purpose of these problems is to tell you about the nice interplay between the properties of Successive minima and Gram Schmidt Orthogonalization.
Some of these solutions are due to Roy Kasher and Nir Bitansky at Tel Aviv Computer Science
For convenience, we will assume that we are talking full rank lattices of rank $n$ unless otherwise specified.
Consider the representation of the basis vectors in the orthogonalized system as shown in the previous post.
Notice that the $ith$ co-ordinate in the $ith$ column is $||b_i*||$. All entries below this are $0$. All entries above are $\mu_{i,j}||b_j*||$ in row $j$ if $j < i$.
For your convenience, I reproduce the matrix below
$\begin{pmatrix}||b_1*||&\mu_{2,1}||b_1*||&...&\mu_{n,1}||b_1*||\\0&||b_2*||&...&\mu_{n,2}||b_2*||\\...&...&...&...\\...&...&...&...\\...&...&...&...\\0&0&...&||b_n*||\end{pmatrix}$
What will you say about the vectors $\lambda_1$, $\lambda_2$....$\lambda_j$ when the following question is asked - do any of these vectors have a non-zero component $x_i$ when $i \geq j$ ? So, what is it? Yes or No?
Yes! If it were not, then you wont have $j$ linearly independent vectors as the definition of $j^{th}$ successive minima requires.
Okay, what then? Awesomeness. You know that some of these vectors $\lambda_1$, $\lambda_2$...$\lambda_j$ has some $i \geq j$th component non-zero.
You also know that any lattice vector is formed by linear combination of the vectors in the above matrix. Thus any lattice vector (including the $j$ successive minima-attaining ones) formed by "integral helpings" from the vectors of the above basis. We know that for the successive minima-attaining ones, a particular "integral helping" is non-zero. Thus, your $\lambda_j$ is guranteed to be atleast this quantity wherefrom it follows that $\lambda_j \geq min_{j=i...n}||b_j||$.
There is more to come. We will tackle some more problems before we say goodbye today. So, grab a coke or pepsi (but no popcorn!), and dive right in.
(Do not dive in the coke; it gives a sticky feeling!)
This problem is for testing your understanding of Gram Schmidt Orthogonaliztion process. Can you, dear reader, disprove the following
"$\lambda_n \geq max_{i=1...n}||b_i*||$"
Hmmm...how do we proceed. You can think of it this way. Well, since we are asked to disprove it; any counter-example will do.
Next, we know that Gram Schmidt is supposed to be dependent on the sequence of the vectors in the input basis.
Lets talk $2*2$ matrices.
What if I take a particular basis, which sure enough does have smaller vectors than the given $b_1$. (In particular, I would like to have $b_2$ as one of the successive minimas; this will make my life cool!)
And sure enough a little tweaking by that pen of yours gives a counterexample as the matrix $\begin{pmatrix}1&0\\1&1\end{pmatrix}$
The following problem tests you on your understanding of Successive minima and independence. So, for a full rank lattice you have been provided with this list of all the successive minima attaining lattice vectors. I ask - can they capture the whole lattice?
Some are tempted to say yes for the simple reason that we are talking full rank lattices and have $n$ independent vectors. Nothing could be further from truth. The answer is no and the reason is simple - you expect to have the span same. Getting the lattice same is co-incidental. For example consider the following lattice basis vectors. This solution is due to Nir Bitansky.
$v_1$ = $\begin{pmatrix}2\\0\\0\\...\\0\\1\end{pmatrix}$ $v_2$ = $\begin{pmatrix}0\\2\\0\\...\\0\\1\end{pmatrix}$ $v_3$ = $\begin{pmatrix}0\\0\\2\\...\\0\\1\end{pmatrix}$ $v_{n-1}$ = $\begin{pmatrix}0\\0\\0\\...\\2\\1\end{pmatrix}$ AND $v_n$ = $\begin{pmatrix}1\\1\\1\\...\\1\\1\end{pmatrix}$
Look at $v_n$ again. It is the all 1's vector. In this system, it is clear that all the successive minimas are 2. And the $ith$ successive minima vector has $2$ at the $ith$ position with all other entries $0$. The basis generated by these successive minima vector does not even include the all $1$'s vector. And this proves the assertion in question.
For the $2$ dimensional full rank lattices however, we have a nice result. It says that the successive minima attaining vectors do form a basis. Do you see why? Let $\lambda_1$ and $\lambda_2$ be your successive minima. Being independent, they clearly hit every point in $\RR^2$. Thus, a lattice point can be written as $x*\lambda_1 + y*\lambda_2$ for some reals $x$ and $y$.
$\lfloor x\rceil*\lambda_1 + \lfloor y\rceil*\lambda_2$ clearly belongs to the lattice. And so does their difference, the vector $v$ which clearly satisfies the following
$||v|| < (||\lambda_1|| + ||\lambda_2||)/2$.
And the right side of the above inequality is clearly less than $||\lambda_2||$. Thus, you get two independent lattice vectors and one of them is less than $||\lambda_2||$ which is clearly a contradiction.
Next, we try the following problem from Oded Regev's homework. This is not really related to the properties of Successive minima or Gram Schmidt Orthogonalization; but it is an excercise you should solve after an exposure to Unimodular matrices.
The problem is -
Show that any unimodular matrix $U \in Z^{n*n}$ can be transformed to the identity matrix by the following three basic column operations: $a_i \leftrightarrow a_j$ , $a_i \leftarrow -a_i$, and $a_i \leftarrow a_i+k*a_j$ for some integer $k$.
There is a hint that goes with the problem - Use Eulidean algorithm. I tried the problem but was unable to solve it. (I had not seen the hint though).
Anyway, Roy Kasher's solution was mind blowing. Roy, if you are reading this, get a webpage fast. I would like to link your page with this blog and let people have a look at how humans would look when they evolve (going brain-wise).
I was confused about how to apply Gaussian Elimination as that may not guarantee integrality which is what I needed.
Anyway, without much ado, here is the solution that Roy gave.
First, assume that the determinant has been expanded along some arbitrary row. Then, clearly the value of the determinant is a multiple of the $gcd$ of the numbers in this row.
And, since determinant is known to be 1 (the matrix is unimodular), the gcd of elements in this row is 1. Thus, we know that there exist two elements in this row that are co-prime. And since this row was an arbitrary row, any row has got two elements which are co-prime.
First select two elements in the first row which are co-prime. Run Euclid GCD on these elements till you get a 1 in one of these elements. Also, "copy these operations" over to other elements in the column headed by these elements. The 3 operations allowed give us enough room to invoke these operations.
Thus, sooner or later, you get a 1 in one of these positions. After this, you just zero out the first element from the rest of the first row.
Next, you move on to second row; knowing like that perfect hunter that your prey, two co-prime elements must exist in this row (determinant is still 1!).
You repeat the sequence of operations till you arrive at the Identity Matrix!
What else can you state about unimodular matrices? What, for instance, can you say about their inverses?
It turns out their inverses are also unimodular. This is easy to see as the foregoing proof tells us the key thing. Only elementary column operations can carry a unimodular matrix $U$ to $I$. All the elementary operations $E_1$, $E_2$...$E_{final}$ can be chained up like $E_{final}*....E_2*E_1$ to give the inverse which is clearly a unimodular matrix as all the $E_i$'s are integral and it obviously has determinant $1$.
This also tells you the proof for the quite "obvious" fact that two Bases, $B_1$ and $B_2$ are equivalent iff one is derivable from the other using column operations.
Had enough! We are just getting warmed up. Here is another
"Show that for any full-rank integer lattice $\Lambda$, $det(\Lambda).Z^n \subseteq \Lambda$"
That reminds me - I did not tell you that |$det(\Lambda)$| is also something you can talk about. It is not basis specific (the absolute value of determinant of lattice).
Its clear if you keep in mind equivalent lattices can be derived from one other by multiplication of unimodular matrices.
Okay. So, this question asks you to show that all vectors of $Z^n$ when multiplied by $det(\Lambda)$ give a vector which is in the lattice.
To do this, the most natural way seems to be the following. Let $det(\Lambda) = d$. Now we will try to show that the following vectors can be derived using the given lattice
$v_1$ = $\begin{pmatrix}d&\\0&\\0&\\...&\\0\end{pmatrix}$ $v_2$ = $\begin{pmatrix}0&\\d&\\0&\\...&\\0\end{pmatrix}$ $v_3$ = $\begin{pmatrix}0&\\0&\\d&\\...&\\0\end{pmatrix}$... $v_n$ = $\begin{pmatrix}0&\\0&\\0&\\...&\\d\end{pmatrix}$
Can I do that?
Why not? Let us expand the determinant along first row. Assume that co-factors are denoted by $C_{ij}$ and the $ij^{th}$ entries by $a_{ij}$.
Then we know $a_{11}*(C_{11}) + a_{12}*(-C_{12}) + ..... + a_{1n}*((-1^{n+1})*C_{1n}) = d$.
Thus, we can get a $d$ in the first place of a vector, $v$, by taking $C_{11}$ of first vector, $-C_{12}$ of second vector and so on. Hmm...Looks like this will have some bearing upon $v_1$. What does this combination do to other components of $v$?
We would really like it if makes $v$ the same as $v_1$; lets keep our fingers crossed. The other components are like
$a_{i1}*(C_{i1}) + a_{i2}*(-C_{i2}) + ..... + a_{in}*((-1^{n+1})*C_{in})$.
This sum is 0 as we would have hoped! Can you see why? Its just because the cofactors are what you obtain with first row removed. And the first row is the same as $ith$ row. This is universally known to be zero. To reiterate,
$a_{i1}*(C_{i1}) + a_{i2}*(-C_{i2}) + ..... + a_{in}*((-1^{n+1})*C_{in}) = 0$
for all $i \neq 1$.
Similarly, you can obtain $v_2$, $v_3$ and all of $v_i$'s. Once you get them, you can get any integral combination you want in this world. This proves the statement in question.
There are some more problems which I will work out once I get some more time.
Till then, good bye and have a wonderful 2010 ahead.
Subscribe to:
Comments (Atom)