线性代数

线性代数的几何解释

本文素材来自于up主3blue1brown的线性代数视频教程，从几何的角度解释了线性代数，比较有意思。

向量的本质

scalar之所以叫scalar是因为它可以scale向量？

线性组合，张成的空间和基

矩阵和线性变换

把矩阵在几何上就是一种对空间的线性变换。
A 2x2 matrix, can be thought as a linear transformation on 2d space, each column represents the coordinate of the transformed basis vector.
For example, matrix [0,-1;1,0] is 90 degree rotation counterclockwise transformation, since it makes the original x basis vector (1,0) to be (0,1) and original y basis vector (0,1) to be (-1,0).
If two column vectors in a 2x2 matrix are linearly dependent, it means the linear transformation squishes all of 2-D space onto the line where those two vectors sit.

矩阵乘法和线性变换复合

Consider matrix multiplications as applying linear transformation sequentially.

行列式

Some linear transformation stretches space, some squish it. The determinant of a matrix is the special scaling factor by which a linear transformation changes any area.
If a determinant of a transformation is zero, it squishes all of space on to a line, or even onto a single point. So computing whether the determinant of a matrix is zero will give a way of whether or not the transformation associated with that matrix squishes everything onto a lower dimension.
If the determinant is negative, it means the orientation of space is inverted.

逆矩阵，列空间，零空间

Inverse of a matrix is a transformation that has the property: you first apply transformation A, then followed by $A^{-1}$, you will end up back in where you started.
Consider the linear equation system, Ax=b. If A can be inverted, we can use $A^{-1}b$ to solve x. However, if |A| = 0, for example in 2d space, it squishes every input vector onto a line. If b accidentally falls onto this line, there still exist solution x. If not, there is no solution x.
For a 3x3 matrix, its determinant equaling to 0 can either mean it squishes everything onto a plane or squishes everything on to a line. For the latter case, it’s harder for the solution to exist.
Rank of a matrix is defined as the output dimension of the linear transformation associated with the matrix.
If a matrix is not of full rank, say 2x2 matrix with rank 1, the transformation

点积和对偶性

点积的几何解释：projection one vector onto another, order doesn’t matter, show by symmetricity
Intuition behind dot product: why multiplying pairs and adding them together have anything to do with projection?
- 假设空间里有一个向量u
  
  假设u是一个长度为1的向量，坐标是(ux,uy)，把x轴上的单位向量和u的点积数值是(1,0)*(ux,uy) = ux，而根据对称性可知（在x轴和u之间做一条角平分线），把x轴的单位向量投影到u向量所在的数轴，得到投影的长度也是ux，两者获得了统一。这说明点积的运算法则就是把基向量投影到u上的长度乘u的长度（这里是u的长度是1所以省去。）
- 左乘一个1x2的矩阵[ux,uy]可以看做是一个2维到1维（数轴）的线性变换（投影变换）,这个变换把原来的基向量映射到了和u在一条直线上的数轴上的两个数，这两个数分别就是[ux,uy]。
- This is why doing dot product is projecting a vector onto the span of that unit vector and taking the length
- If the u vector is not a unit vector, it’s same thing.

以线性变换的眼光看叉积

2d cases, [a,b] x [c,d], calculate the determinant of [a,b;c,d]
3d case[a,b,c] x [c,d,e],

Since this function goes from three dimensions to one dimension, there will be a 1x3 matrix that encodes this transformation.
Computationally:
Geometrical understanding of the formula:
- Consider the dot product as the length of projection of (x,y,z) onto p times the length of p.
- The right hand side depicts the volume of a 六面体。
- 六面体的体积公式是底面积乘垂直于底面的高，也就是(x,y,z)向着垂直于由v,w组成平面的直线的投影距离。
- So the vector p should have the property that:
  - it is perpendicular to the plane formed by v and w.
  - Its length is equal to the size of the 平行四边形底面

基变换

Consider y = Ax.
Consider y sits in a space where it has its own basis, x sits in a space where it has its own basis.
matrix A means that 在y的坐标世界里的基向量在x的坐标世界里的坐标是什么
Inverse transformation
我们一个矩阵M可以表示一个线性变换，比如[1,2;1,-1]就表示把原来的基向量(1,0)变成(1,2)，把(0,1)变成(1,-1)。这些都是在x的坐标系里描述的，那我们如何在y的坐标系里描述相同的事情呢。首先我们取一个在y的坐标系下值为[a,b]的vector，通过左乘一个A转化成x坐标系的坐标，然后再左乘M，得到线性变换后在x坐标系下的vector，再左乘一个逆矩阵$A^{-1}$，得到最终[a,b]经过线性变换后在y坐标系里的coordinate值。To summary, it is $A^{-1}MA$

特征值与特征向量

经过某个线性变换以后，空间里的某些向量仅进行了拉伸操作，没有被旋转，这些向量就是这个，特征值是负的就表示反向。
rotation的特征向量就是旋转的轴，特征值为1
Av = $\lambda$v, (A-$\lambda$I)v = 0, so det(A-$\lambda I$)=0, meaning that the transformation squishes the space into a lower dimension
A transformation sometimes do not have any eigenvectors like rotation transformation
Shear transformation only has one eigenvectors, because all eigenvectors are the same
Eigenbasis:
- Use eigenvectors as the new basis
- Every time we meet a diagona matrix, we can consider it as 在eigenbasis 下 linear transformation 所代表的矩阵。
- 假设我们有一个矩阵A，我们求出他的特征向量，如果这些特征向量能张成和原来空间一样大，他们就可以作为一组eigenbasis。利用上一章change of basis的知识，我们可以把A这个变换放到eigenbasis所在的空间，看看他是什么样子的。厉害的是，在eigenbasis所在的空间里，这个A所对应的变换被保证是一个对角矩阵所对应的变换，因为这些eigenbasis在这个变换下只进行了缩放操作，并没有旋转！
- 计算一个不是对角的矩阵 $A^{100}$ 的时候，可以先change basis到eigenbasis，算100次对角矩阵的乘法，在change basis回来。

Linear algebra for deep learning

Positive semidefinite

Formal definition:
判断条件：
- The matrix is PSD If and only if 所有顺序主子式$\geq$0
- The matrix is PSD If and only if 所有特征值都$\geq$0

Eigendecomposition of a matrix

Let A be a square nxn matrix with n linearly independent eigenvectors $q_i$. Then A can be factorized as
For nxn real symmetric matrix,:
- the eigenvalues are real.
- the eigenvectors can be chosen real and orthonormal. Here is the proof [https://math.stackexchange.com/questions/82467/eigenvectors-of-real-symmetric-matrices-are-orthogonal]
- Thus a real symmetric matrix A can be decomposed as ,where Q is an orthogonal matrix whose columns are the eigenvectors of A, and $\Lambda$ is a diagonal matrix whose entries are the eigenvalues of A.

Symmetric matrix

Let A be a real symmetric dxd matrix:
- , here $\lambda_1$ is the smallest eigenvalue, and $\lambda_d$ is the largest.

Principle component analysis

Basic idea:

Finds a representation $z = W^{T}x$, where Var(z) is diagonal and the first feature has the largest variance
Use SVD:
- $X = U \Sigma W^T$
- Let z = XW, .

####

Singular value decomposition

Formal definition:
- M = $U\Sigma V^T$
- where M is a mxn matrix, U is an mxm real or complex unitary(orthonormal if M is real) matrix, $\Sigma$ is an mxn rectangular diagonal matrix with non-negative real numbers on the diagonal, and V is an nxn real or complex unitary matrix
- The SVD is not unique. It’s always possible to choose the decomposition so that singular values $\Sigma_{ii}$ are in descending order. In this case, $\Sigma$ is uniquely determined by M.

Intuitive interpretations:
- In terms of geometrically interpretation: a SVD is a transformation that contains: rotation, coordinate scaling, and reflection.
Relations to eigenvalue decomposition

线性代数的几何解释

向量的本质

线性组合，张成的空间和基

矩阵和线性变换

矩阵乘法和线性变换复合

行列式

逆矩阵，列空间，零空间

点积和对偶性

以线性变换的眼光看叉积

基变换

特征值与特征向量

Linear algebra for deep learning

Positive semidefinite

Eigendecomposition of a matrix

Symmetric matrix

Principle component analysis

Singular value decomposition

克拉默法则

线性方程组有非零解的条件