线性代数的几何解释
本文素材来自于up主3blue1brown的线性代数视频教程,从几何的角度解释了线性代数,比较有意思。
向量的本质
scalar之所以叫scalar是因为它可以scale向量?
线性组合,张成的空间和基
矩阵和线性变换
- 把矩阵在几何上就是一种对空间的线性变换。
- A 2x2 matrix, can be thought as a linear transformation on 2d space, each column represents the coordinate of the transformed basis vector.
- For example, matrix [0,-1;1,0] is 90 degree rotation counterclockwise transformation, since it makes the original x basis vector (1,0) to be (0,1) and original y basis vector (0,1) to be (-1,0).
- If two column vectors in a 2x2 matrix are linearly dependent, it means the linear transformation squishes all of 2-D space onto the line where those two vectors sit.
矩阵乘法和线性变换复合
Consider matrix multiplications as applying linear transformation sequentially.
行列式
- Some linear transformation stretches space, some squish it. The determinant of a matrix is the special scaling factor by which a linear transformation changes any area.
- If a determinant of a transformation is zero, it squishes all of space on to a line, or even onto a single point. So computing whether the determinant of a matrix is zero will give a way of whether or not the transformation associated with that matrix squishes everything onto a lower dimension.
- If the determinant is negative, it means the orientation of space is inverted.
逆矩阵,列空间,零空间
- Inverse of a matrix is a transformation that has the property: you first apply transformation A, then followed by $A^{-1}$, you will end up back in where you started.
- Consider the linear equation system, Ax=b. If A can be inverted, we can use $A^{-1}b$ to solve x. However, if |A| = 0, for example in 2d space, it squishes every input vector onto a line. If b accidentally falls onto this line, there still exist solution x. If not, there is no solution x.
- For a 3x3 matrix, its determinant equaling to 0 can either mean it squishes everything onto a plane or squishes everything on to a line. For the latter case, it’s harder for the solution to exist.
- Rank of a matrix is defined as the output dimension of the linear transformation associated with the matrix.
- If a matrix is not of full rank, say 2x2 matrix with rank 1, the transformation
点积和对偶性
点积的几何解释:projection one vector onto another, order doesn’t matter, show by symmetricity
Intuition behind dot product: why multiplying pairs and adding them together have anything to do with projection?
假设空间里有一个向量u
假设u是一个长度为1的向量,坐标是(ux,uy),把x轴上的单位向量和u的点积数值是(1,0)*(ux,uy) = ux,而根据对称性可知(在x轴和u之间做一条角平分线),把x轴的单位向量投影到u向量所在的数轴,得到投影的长度也是ux,两者获得了统一。这说明点积的运算法则就是把基向量投影到u上的长度乘u的长度(这里是u的长度是1所以省去。)
左乘一个1x2的矩阵[ux,uy]可以看做是一个2维到1维(数轴)的线性变换(投影变换),这个变换把原来的基向量映射到了和u在一条直线上的数轴上的两个数,这两个数分别就是[ux,uy]。
This is why doing dot product is projecting a vector onto the span of that unit vector and taking the length
If the u vector is not a unit vector, it’s same thing.
以线性变换的眼光看叉积
2d cases, [a,b] x [c,d], calculate the determinant of [a,b;c,d]
3d case[a,b,c] x [c,d,e],
Since this function goes from three dimensions to one dimension, there will be a 1x3 matrix that encodes this transformation.
Computationally:
Geometrical understanding of the formula:
- Consider the dot product as the length of projection of (x,y,z) onto p times the length of p.
- The right hand side depicts the volume of a 六面体。
- 六面体的体积公式是底面积乘垂直于底面的高,也就是(x,y,z)向着垂直于由v,w组成平面的直线的投影距离。
- So the vector p should have the property that:
- it is perpendicular to the plane formed by v and w.
- Its length is equal to the size of the 平行四边形底面
基变换
Consider y = Ax.
Consider y sits in a space where it has its own basis, x sits in a space where it has its own basis.
- matrix A means that 在y的坐标世界里的基向量在x的坐标世界里的坐标是什么
- Inverse transformation
- 我们一个矩阵M可以表示一个线性变换,比如[1,2;1,-1]就表示把原来的基向量(1,0)变成(1,2),把(0,1)变成(1,-1)。这些都是在x的坐标系里描述的,那我们如何在y的坐标系里描述相同的事情呢。首先我们取一个在y的坐标系下值为[a,b]的vector,通过左乘一个A转化成x坐标系的坐标,然后再左乘M,得到线性变换后在x坐标系下的vector,再左乘一个逆矩阵$A^{-1}$,得到最终[a,b]经过线性变换后在y坐标系里的coordinate值。To summary, it is $A^{-1}MA$
特征值与特征向量
- 经过某个线性变换以后,空间里的某些向量仅进行了拉伸操作,没有被旋转,这些向量就是这个,特征值是负的就表示反向。
- rotation的特征向量就是旋转的轴,特征值为1
- Av = $\lambda$v, (A-$\lambda$I)v = 0, so det(A-$\lambda I$)=0, meaning that the transformation squishes the space into a lower dimension
- A transformation sometimes do not have any eigenvectors like rotation transformation
- Shear transformation only has one eigenvectors, because all eigenvectors are the same
- Eigenbasis:
- Use eigenvectors as the new basis
- Every time we meet a diagona matrix, we can consider it as 在eigenbasis 下 linear transformation 所代表的矩阵。
- 假设我们有一个矩阵A,我们求出他的特征向量,如果这些特征向量能张成和原来空间一样大,他们就可以作为一组eigenbasis。利用上一章change of basis的知识,我们可以把A这个变换放到eigenbasis所在的空间,看看他是什么样子的。厉害的是,在eigenbasis所在的空间里,这个A所对应的变换被保证是一个对角矩阵所对应的变换,因为这些eigenbasis在这个变换下只进行了缩放操作,并没有旋转!
- 计算一个不是对角的矩阵 $A^{100}$ 的时候,可以先change basis到eigenbasis,算100次对角矩阵的乘法,在change basis回来。
Linear algebra for deep learning
Positive semidefinite
Formal definition:
判断条件:
- The matrix is PSD If and only if 所有顺序主子式$\geq$0
- The matrix is PSD If and only if 所有特征值都$\geq$0
Eigendecomposition of a matrix
- Let A be a square nxn matrix with n linearly independent eigenvectors $q_i$. Then A can be factorized as
- For nxn real symmetric matrix,:
- the eigenvalues are real.
- the eigenvectors can be chosen real and orthonormal. Here is the proof [https://math.stackexchange.com/questions/82467/eigenvectors-of-real-symmetric-matrices-are-orthogonal]
- Thus a real symmetric matrix A can be decomposed as ,where Q is an orthogonal matrix whose columns are the eigenvectors of A, and $\Lambda$ is a diagonal matrix whose entries are the eigenvalues of A.
Symmetric matrix
- Let A be a real symmetric dxd matrix:
- , here $\lambda_1$ is the smallest eigenvalue, and $\lambda_d$ is the largest.
Principle component analysis
Basic idea:
Finds a representation $z = W^{T}x$, where Var(z) is diagonal and the first feature has the largest variance
Use SVD:
$X = U \Sigma W^T$
Let z = XW, .
####
Singular value decomposition
Formal definition:
M = $U\Sigma V^T$
where M is a mxn matrix, U is an mxm real or complex unitary(orthonormal if M is real) matrix, $\Sigma$ is an mxn rectangular diagonal matrix with non-negative real numbers on the diagonal, and V is an nxn real or complex unitary matrix
The SVD is not unique. It’s always possible to choose the decomposition so that singular values $\Sigma_{ii}$ are in descending order. In this case, $\Sigma$ is uniquely determined by M.
- Intuitive interpretations:
- In terms of geometrically interpretation: a SVD is a transformation that contains: rotation, coordinate scaling, and reflection.
- Relations to eigenvalue decomposition