线性代数

线性代数的几何解释

本文素材来自于up主3blue1brown的线性代数视频教程,从几何的角度解释了线性代数,比较有意思。

向量的本质

scalar之所以叫scalar是因为它可以scale向量?

线性组合,张成的空间和基

矩阵和线性变换

  • 把矩阵在几何上就是一种对空间的线性变换。
  • A 2x2 matrix, can be thought as a linear transformation on 2d space, each column represents the coordinate of the transformed basis vector.
  • For example, matrix [0,-1;1,0] is 90 degree rotation counterclockwise transformation, since it makes the original x basis vector (1,0) to be (0,1) and original y basis vector (0,1) to be (-1,0).
  • If two column vectors in a 2x2 matrix are linearly dependent, it means the linear transformation squishes all of 2-D space onto the line where those two vectors sit.

矩阵乘法和线性变换复合

Consider matrix multiplications as applying linear transformation sequentially.

行列式

  • Some linear transformation stretches space, some squish it. The determinant of a matrix is the special scaling factor by which a linear transformation changes any area.
  • If a determinant of a transformation is zero, it squishes all of space on to a line, or even onto a single point. So computing whether the determinant of a matrix is zero will give a way of whether or not the transformation associated with that matrix squishes everything onto a lower dimension.
  • If the determinant is negative, it means the orientation of space is inverted.

逆矩阵,列空间,零空间

  • Inverse of a matrix is a transformation that has the property: you first apply transformation A, then followed by $A^{-1}$, you will end up back in where you started.
  • Consider the linear equation system, Ax=b. If A can be inverted, we can use $A^{-1}b$ to solve x. However, if |A| = 0, for example in 2d space, it squishes every input vector onto a line. If b accidentally falls onto this line, there still exist solution x. If not, there is no solution x.
  • For a 3x3 matrix, its determinant equaling to 0 can either mean it squishes everything onto a plane or squishes everything on to a line. For the latter case, it’s harder for the solution to exist.
  • Rank of a matrix is defined as the output dimension of the linear transformation associated with the matrix.
  • If a matrix is not of full rank, say 2x2 matrix with rank 1, the transformation

点积和对偶性

  • 点积的几何解释:projection one vector onto another, order doesn’t matter, show by symmetricity

  • Intuition behind dot product: why multiplying pairs and adding them together have anything to do with projection?

    • 假设空间里有一个向量u

      image-20200916203303459

      假设u是一个长度为1的向量,坐标是(ux,uy),把x轴上的单位向量和u的点积数值是(1,0)*(ux,uy) = ux,而根据对称性可知(在x轴和u之间做一条角平分线),把x轴的单位向量投影到u向量所在的数轴,得到投影的长度也是ux,两者获得了统一。这说明点积的运算法则就是把基向量投影到u上的长度乘u的长度(这里是u的长度是1所以省去。)

    • 左乘一个1x2的矩阵[ux,uy]可以看做是一个2维到1维(数轴)的线性变换(投影变换),这个变换把原来的基向量映射到了和u在一条直线上的数轴上的两个数,这两个数分别就是[ux,uy]。

    • This is why doing dot product is projecting a vector onto the span of that unit vector and taking the length

    • If the u vector is not a unit vector, it’s same thing.

以线性变换的眼光看叉积

  • 2d cases, [a,b] x [c,d], calculate the determinant of [a,b;c,d]

  • 3d case[a,b,c] x [c,d,e], image-20200916215040605

    Since this function goes from three dimensions to one dimension, there will be a 1x3 matrix that encodes this transformation.

    image-20200916215223755

  • Computationally:

    image-20200916215426284

  • Geometrical understanding of the formula:

    • Consider the dot product as the length of projection of (x,y,z) onto p times the length of p.
    • The right hand side depicts the volume of a 六面体。
    • 六面体的体积公式是底面积乘垂直于底面的高,也就是(x,y,z)向着垂直于由v,w组成平面的直线的投影距离。
    • So the vector p should have the property that:
      • it is perpendicular to the plane formed by v and w.
      • Its length is equal to the size of the 平行四边形底面

基变换

  • Consider y = Ax.

  • Consider y sits in a space where it has its own basis, x sits in a space where it has its own basis.

  • matrix A means that 在y的坐标世界里的基向量在x的坐标世界里的坐标是什么
  • Inverse transformation
  • 我们一个矩阵M可以表示一个线性变换,比如[1,2;1,-1]就表示把原来的基向量(1,0)变成(1,2),把(0,1)变成(1,-1)。这些都是在x的坐标系里描述的,那我们如何在y的坐标系里描述相同的事情呢。首先我们取一个在y的坐标系下值为[a,b]的vector,通过左乘一个A转化成x坐标系的坐标,然后再左乘M,得到线性变换后在x坐标系下的vector,再左乘一个逆矩阵$A^{-1}$,得到最终[a,b]经过线性变换后在y坐标系里的coordinate值。To summary, it is $A^{-1}MA$

特征值与特征向量

  • 经过某个线性变换以后,空间里的某些向量仅进行了拉伸操作,没有被旋转,这些向量就是这个,特征值是负的就表示反向。
  • rotation的特征向量就是旋转的轴,特征值为1
  • Av = $\lambda$v, (A-$\lambda$I)v = 0, so det(A-$\lambda I$)=0, meaning that the transformation squishes the space into a lower dimension
  • A transformation sometimes do not have any eigenvectors like rotation transformation
  • Shear transformation only has one eigenvectors, because all eigenvectors are the same
  • Eigenbasis:
    • Use eigenvectors as the new basis
    • Every time we meet a diagona matrix, we can consider it as 在eigenbasis 下 linear transformation 所代表的矩阵。
    • 假设我们有一个矩阵A,我们求出他的特征向量,如果这些特征向量能张成和原来空间一样大,他们就可以作为一组eigenbasis。利用上一章change of basis的知识,我们可以把A这个变换放到eigenbasis所在的空间,看看他是什么样子的。厉害的是,在eigenbasis所在的空间里,这个A所对应的变换被保证是一个对角矩阵所对应的变换,因为这些eigenbasis在这个变换下只进行了缩放操作,并没有旋转!
    • 计算一个不是对角的矩阵 $A^{100}$ 的时候,可以先change basis到eigenbasis,算100次对角矩阵的乘法,在change basis回来。

Linear algebra for deep learning

Positive semidefinite

  • Formal definition:

    image-20200922164640161

  • 判断条件:

    • The matrix is PSD If and only if 所有顺序主子式$\geq$0
    • The matrix is PSD If and only if 所有特征值都$\geq$0

Eigendecomposition of a matrix

  • Let A be a square nxn matrix with n linearly independent eigenvectors $q_i$. Then A can be factorized as image-20200922162927726
  • For nxn real symmetric matrix,:

Symmetric matrix

  • Let A be a real symmetric dxd matrix:
    • image-20200922164102133, here $\lambda_1$ is the smallest eigenvalue, and $\lambda_d$ is the largest.

Principle component analysis

  • Basic idea:

    Finds a representation $z = W^{T}x$, where Var(z) is diagonal and the first feature has the largest variance

  • Use SVD:

    • $X = U \Sigma W^T$

    • Let z = XW, image-20200922232606074 image-20200922232620154 .

####

Singular value decomposition

  • Formal definition:

    • M = $U\Sigma V^T$

    • where M is a mxn matrix, U is an mxm real or complex unitary(orthonormal if M is real) matrix, $\Sigma$ is an mxn rectangular diagonal matrix with non-negative real numbers on the diagonal, and V is an nxn real or complex unitary matrix

    • The SVD is not unique. It’s always possible to choose the decomposition so that singular values $\Sigma_{ii}$ are in descending order. In this case, $\Sigma$ is uniquely determined by M.

  • Intuitive interpretations:
    • In terms of geometrically interpretation: a SVD is a transformation that contains: rotation, coordinate scaling, and reflection.
  • Relations to eigenvalue decompositionimage-20200922173049324

克拉默法则

image-20200909114403555

image-20200909114431271

线性方程组有非零解的条件

image-20200909114859978