matrix methods | Chelsea Zou

Vectors

Canonical Unit Vectors (standard basis vectors)

$ \mathbb{e}_i$ = where the i-th position is 1 and all other positions are 0. For example, in $ \mathbb{R}^3 $: \[ \mathbf{e}_1 = (1, 0, 0), \quad \mathbf{e}_2 = (0, 1, 0), \quad \mathbf{e}_3 = (0, 0, 1) \] Any vector $ \mathbf{v} \in \mathbb{R}^n $ can be expressed as: \[ \mathbf{v} = v_1 \mathbf{e}_1 + v_2 \mathbf{e}_2 + \cdots + v_n \mathbf{e}_n \]

Inner Products (dot products)

\[ a \cdot b = a^T b \] Properties:
1. $ b^T a = a^T b $

2. $ (\gamma a)^T b = \gamma (a^T b)$
scaling a then taking dot product with b is same as taking the dot product of a and b then scaling it

3. $ (a + b)^T c = a^T c + b^T c $
the dot product of c with the sum of a and b is the same as the dot product of (a and c) plus (b and c)

Operations:
1. Picking out the i'th element: $ \quad e_{i}^T a = a_{i} $

2. Sum of elements: $ \quad 1_n a = a_1 + a_2 + a_3 ... a_n $

3. Sum of squares: $ \quad a^T a = a_{1}^2 + a_{2}^2 + a_{3}^2 ... a_{n}^2 $ $ \quad \quad a^T a = 0 \quad \text{iff} \quad a = 0 $

Tensor Product (Outer Product)

\[ \mathbf{a} \otimes \mathbf{b} = \mathbf{a} \mathbf{b}^T \] \[ a = \begin{bmatrix} a^0 \\ a^1 \\ a^2 \\ a^3 \end{bmatrix}, \quad b = \begin{bmatrix} b^0 \\ b^1 \\ b^2 \\ b^3 \end{bmatrix} \quad \quad a \otimes b = \begin{bmatrix} a^0 b^0 & a^0 b^1 & a^0 b^2 & a^0 b^3 \\ a^1 b^0 & a^1 b^1 & a^1 b^2 & a^1 b^3 \\ a^2 b^0 & a^2 b^1 & a^2 b^2 & a^2 b^3 \\ a^3 b^0 & a^3 b^1 & a^3 b^2 & a^3 b^3 \end{bmatrix} \]

Linear Functions

Satisfies superposition property (both homogeneity + additivity):

1. Homogeneity: $ f(ax) = a f(x) $
2. Additivity: $ f(x + y) = f(x) + f(y) $

Any linear function can be represented as a dot product $f(x) = a^T x $ and any dot product function is linear (for some fixed vector a)

\[ f(c_1 x + c_2 y) = \] \[ a^T (c_1 x + c_2 y) = a^T (c_1 x) + a^T (c_2 y) = c_1(a^T x) + c_2(a^T y) \] \[ = c_1 f(x) + c_2 f(y) \]

Affine Functions

Affine functions are linear functions $ f(x) = a^T x $ but offset by some b (linear always go through (0,0), affine functions need not): \[ f(x) = a^T x + b \]

Taylor Approximation

A first order taylor series is a linear approximation of the function f.
Using a nearby point z, we can approximate f(x) using: \[ \hat{f}(x) = f(z) + \Delta f(z)(x - z) \]

Norm and Distance

\[ ||x|| = \sqrt{x_{1}^2 + x_{2}^2 ...} = \sqrt{x^T x} \] \[ ||x||^2 = \textbf{sum of squares} = x_{1}^2 + x_{2}^2 ... = x^T x \] Properties:

1. $ ||cx|| = |c|||x|| $
2. $ ||x + y|| \leq ||x|| + ||y|| $
3. $ ||x|| \geq 0 $
4. $ ||x|| = 0 \quad $ iff $ \quad x = 0 $

Mean square value: $ \frac{||x||^2}{n} $
Root Mean square value (RMS): $ \frac{||x||}{\sqrt{n}} \quad $ (i.e., typical value of $ |x_i| $)

Block (stacked) Vectors:
\[ ||(a,b,c)||^2 = a^T a + b^T b + c^T c = ||a||^2 + ||b||^2 + ||c||^2 \]

Chebyshev's Inequality:

- Judges the size of a vector
- Most numbers in a vector can't be much bigger than its RMS

Suppose k elements in vector x is larger than some a. Then:
- Number of elements $ |x_i| > a $ is no more than $ ||x||^2 / a^2 $
- The percentage of elements $ |x_i| \geq a RMS(x) $ is $ \leq \frac{1}{a^2} $
\[ \frac{||x||^2}{a^2} \geq k \quad \text{or} \quad \frac{k}{n} \leq (\frac{RMS(x)}{a})^2 \]
Distance:

Distance between a and b is: $ || a - b || $

Triangle Inequality: Sum of twox sides must be greater than or equal to third side

Cauchy Schwartz Inequality: $ |a \cdot b| \leq ||a||||b|| $

Average of $ x = \frac{ x_1 + x_2 + ...}{n} = \frac{1^T x }{n} $

Demeaned vector $ \tilde{x} = x - avg(x)1 $

Standard deviation of $ x = rms(\tilde{x}) = \frac{||x - avg(x)1|| }{\sqrt{n}} $

$ rms(x)^2 = avg(x)^2 + std(x)^2 $

Standardized vector $ z = \frac{\tilde{x}}{std(x)}$

Example in Finance:
x = time series of returns
avd(x) = average return
std(x) = volatility

Risk-Reward Plot:

Angles

\[\theta = \cos^{-1}\left(\frac{\mathbf{a} \cdot \mathbf{b}}{\|\mathbf{a}\| \|\mathbf{b}\|}\right)\]

Linear Independence

Vectors $ \{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_k\} $ are linearly independent if $ c_1, c_2, \dots, c_k $ = 0 is the ONLY solution to:

\[ c_1 \mathbf{v}_1 + c_2 \mathbf{v}_2 + \dots + c_k \mathbf{v}_k = \mathbf{0}. \]

Basis

A set of vectors $ \{\mathbf{b}_1, \mathbf{b}_2, \dots, \mathbf{b}_n\} $ forms a basis for a vector space $ V $ if:

The vectors are linearly independent.
The vectors span the whole space $ V $ (i.e., any vector in $ V $ can be expressed as a linear combination of the basis vectors)

Ex: The vectors $(0,1)$ and $(1,0)$ span $\mathbb{R}^2 $

Orthonormal Expansion

- Orthonormal vectors are orthogonal and its norm is 1

If $ \{\mathbf{a}_1, \mathbf{a}_2, \dots, \mathbf{a}_n\} $ is an orthonormal basis for a vector space $ V $, any vector $ \mathbf{v} \in V $ can be expressed as:

\[ \mathbf{v} = \sum_{i=1}^n (\mathbf{v} \cdot \mathbf{a}_i) \mathbf{a}_i, \]

Gram-Schmidt Process

Orthogonal vectors means they are linearly independent but linearly independent vectors DOES NOT indicate they are orthogonal

Gram-Schmidt allows us to construct an orthonormal basis from a set of linearly independent vectors

Given a set of linearly independent vectors $ \{\mathbf{v}_1, \mathbf{v}_2, \dots, \mathbf{v}_n\} $, the Gram-Schmidt process generates an orthonormal set $ \{\mathbf{u}_1, \mathbf{u}_2, \dots, \mathbf{u}_n\} $ as follows:

\[ \mathbf{u}_1 = \frac{\mathbf{v}_1}{\|\mathbf{v}_1\|} \] \[ \mathbf{u}_2 = \frac{\mathbf{v}_2 - (\mathbf{v}_2 \cdot \mathbf{u}_1) \mathbf{u}_1}{\| \mathbf{v}_2 - (\mathbf{v}_2 \cdot \mathbf{u}_1) \mathbf{u}_1 \|} \] \[ \mathbf{u}_3 = \frac{\mathbf{v}_3 - (\mathbf{v}_3 \cdot \mathbf{u}_1) \mathbf{u}_1 - (\mathbf{v}_3 \cdot \mathbf{u}_2) \mathbf{u}_2}{\| \mathbf{v}_3 - (\mathbf{v}_3 \cdot \mathbf{u}_1) \mathbf{u}_1 - (\mathbf{v}_3 \cdot \mathbf{u}_2) \mathbf{u}_2 \|} \]

This process continues for all $ \mathbf{v}_i $ to produce orthonormal vectors.

Matrices

Lower Triangular Matrices

\[ \begin{bmatrix} a_{11} & 0 & 0 & 0 \\ a_{21} & a_{22} & 0 & 0 \\ a_{31} & a_{32} & a_{33} & 0 \\ a_{41} & a_{42} & a_{43} & a_{44} \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} = \begin{bmatrix} a_{11}x_1 \\ a_{21}x_1 + a_{22}x_2 \\ a_{31}x_1 + a_{32}x_2 + a_{33}x_3 \\ a_{41}x_1 + a_{42}x_2 + a_{43}x_3 + a_{44}x_4 \end{bmatrix} \]

Difference Matrices

\[ \begin{bmatrix} -1 & 1 & 0 & 0 \\ 0 & -1 & 1 & 0 \\ 0 & 0 & -1 & 1 \end{bmatrix} \begin{bmatrix} x_1 \\ x_2 \\ x_3 \\ x_4 \end{bmatrix} = \begin{bmatrix} x_2 - x_1 \\ x_3 - x_2 \\ x_4 - x_3 \end{bmatrix} \]

Rotation Matrices

A 2D rotation matrix for an angle $\theta$ is given by: \[ R(\theta) = \begin{bmatrix} \cos\theta & -\sin\theta \\ \sin\theta & \cos\theta \end{bmatrix} \]

Incidence Matrix

For a simple undirected graph with m edges and n nodes, the incidence matrix $ A \in \mathbb{R}^{nodes \times edges} $ is defined as: \[ A_{i,j} = \begin{cases} 1, & \text{if edge } j \text{ incoming to } i \\ -1, & \text{if edge } j \text{ leaving } i \\ 0, & \text{no connection} \end{cases} \]

Incidence Matrix

A simple example of a 1D convolution smoothing matrix with a 3-point moving average kernel is: \[ C = \begin{bmatrix} \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & 0 & \dots & 0 \\ 0 & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} & \dots & 0 \\ \vdots & \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & \frac{1}{3} & \frac{1}{3} & \frac{1}{3} \end{bmatrix} \]

Matrix Powers and Adjacency Matrix

Matrix Powers: The k'th power of an adjacency matrix $ A^k $ tells us number of paths of length k between 2 vertices
- Node 0 to 2 has length 2
- The entry at (0,2) of AA = 2

Convolution Matrices

\[ c_k = \sum_{i+j=k+1} a_i b_j \] $ c_1 $ means k = 2, so compute sum of products where (i = 1, j = 1)

$ c_2 $ means k = 3, so compute sum of products where (i = 1, j = 2) and (i = 2, j = 1)

$ c_2 $ means k = 4, so compute sum of products where (i = j = 2), (j = 1, i = 3), and (j = 3, i = 1)

Polynomial Multiplication

$c_k$ gives us the coefficients in $p(x)q(x)$ \[ p(x) = a_1 + a_2 x + a_3 x^2 + \ldots a_n x^{n-1} \quad \quad q(x) = b_1 + b_2 x + b_3 x^2 \ldots b_m x^{m-1} \] \[ p(x)q(x) = c_1 + c_2 x + c_3 x^2 + \ldots c_{n+m-1} x^{n+m-2} \]

Linear Mapping

A linear mapping $ f: \mathbb{R}^n \to \mathbb{R}^m $ can be represented by a matrix $ A_{m \times n} $ such that for any vector $ x \in \mathbb{R}^n $: \[ f(x) = Ax \] 1. f will always send 0 to 0
If f(0) $ \neq $ 0, f is NOT linear
If f(0) $=$ 0, f may or may not be linear

2. $ f(\alpha x + \beta y) = A(\alpha x + \beta y) = \alpha Ax + \beta Ay $

3. If f does not have a matrix representation, it is NONLINEAR

Finding A Given x

For an $n$-vector $\mathbf{x} = (x_1, \dots, x_n)$, we use the standard unit vectors to write: \[ \mathbf{x} = x_1 \mathbf{e}_1 + x_2 \mathbf{e}_2 + \cdots + x_n \mathbf{e}_n \] Since $f$ is linear, we have: \[ f(\mathbf{x}) = x_1 f(\mathbf{e}_1) + x_2 f(\mathbf{e}_2) + \cdots + x_n f(\mathbf{e}_n) \] Now, we put together the $m \times n$ matrix $A$ column by column: \[ A = \left[ f(\mathbf{e}_1) \,\, f(\mathbf{e}_2) \,\, \cdots \,\, f(\mathbf{e}_n) \right] \] Thus, we can write: \[ f(\mathbf{x}) = x_1 f(\mathbf{e}_1) + x_2 f(\mathbf{e}_2) + \cdots + x_n f(\mathbf{e}_n) = A \begin{bmatrix} x_1 \\ x_2 \\ \vdots \\ x_n \end{bmatrix} \]

Price-Demand Elasticity Model

- Prices are given by an $ n $-vector $ p $.
- Demand (e.g., units sold) is given as an $ n $-vector $ d $.
- The fractional (percent) change in price of good $ i $ is: \[ \delta_{price_i} = \frac{p_i^{\text{new}} - p_i}{p_i} \] The fractional (percent) change in demand of good $ i $ is: \[ \delta_{dem_i} = \frac{d_i^{\text{new}} - d_i}{d_i} \] The price-demand elasticity model assumes: \[ \delta_{dem} = E \delta_{price} \] \[ E = \begin{bmatrix} -0.3 & 0.1 & \cdots \\ \vdots & \vdots & \vdots \\ \vdots & \vdots & \vdots \end{bmatrix} \] \[ \delta_{dem_1} = -0.3 \delta_{price_1} + 0.1 \delta_{price_2} + \cdots \] Interpretation: If price of item 1 decreases by 30%, price of item 2 increases by 10%

Balancing Chemical Equations

The given redox reaction $Ra = Pb$, solve for a and b:

\[ a_1 \text{Cr}_2\text{O}_7^{2-} + a_2 \text{Fe}^{2+} + a_3 \text{H}^+ \rightarrow b_1 \text{Cr}^{3+} + b_2 \text{Fe}^{3+} + b_3 \text{H}_2\text{O} \]

$ R = \begin{bmatrix} 2 & 0 & 0 \\ 7 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \\ -2 & 2 & 1 \end{bmatrix} \quad \quad $ $ P = \begin{bmatrix} 1 & 0 & 0 \\ 0 & 0 & 1 \\ 0 & 1 & 0 \\ 0 & 0 & 2 \\ 3 & 3 & 0 \end{bmatrix} $

Linear Dynamical Systems

A linear dynamical system is defined by relating the state at time $ t $ to the state at time $ t + 1 $. It is a linear system if this relationship is linear. \[ \mathbf{x}_{t+1} = \mathbf{A}_t \mathbf{x}_t \] - $\mathbf{x}_t \in \mathbb{R}^n$ is the state vector at time $ t $
- $\mathbf{A}_t \in \mathbb{R}^{n \times n}$ is the **dynamics matrix** at time $ t $.

QR Factorization via Gram-Schmidt

Given a matrix $ A $ with linearly independent columns, run the Gram-Schmidt process to obtain orthonormal vectors $ q_1, \dots, q_k $, forming matrix $ Q $: \[ A = \begin{bmatrix} a_1 & a_2 & \dots & a_k \end{bmatrix} \quad \quad Q = \begin{bmatrix} q_1 & q_2 & \dots & q_k \end{bmatrix} \]

which satisfies:

\[ Q^T Q = I_k \]

Expressing $ A $ in terms of $ Q $ and $ R $:

\[ a_1 = R_{11} q_1 \]

\[ a_2 = R_{12} q_1 + R_{22} q_2 \]

\[ a_3 = R_{13} q_1 + R_{23} q_2 + R_{33} q_3 \]

\[ a_i = R_{1i} q_1 + R_{2i} q_2 + \dots + R_{ii} q_i \]

where:

\[ R_{ij} = q_i^T a_j \quad \text{for } i < j, \quad R_{ii} = \| \tilde{q}_i \| \]

The matrix $ R $ is an upper triangular $ k \times k $ matrix:

\[ R = \begin{bmatrix} R_{11} & R_{12} & R_{13} & \dots & R_{1k} \\ 0 & R_{22} & R_{23} & \dots & R_{2k} \\ 0 & 0 & R_{33} & \dots & R_{3k} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ 0 & 0 & 0 & \dots & R_{kk} \end{bmatrix} \]

QR Factorization:

\[ A = QR \]

where $ Q $ has orthonormal columns and $ R $ is upper triangular with positive diagonal entries.

Key properties:

$ A = QR $ is the QR factorization of $ A $.
$ Q^T Q = I_k $, meaning $ Q $ has orthonormal columns.
$ R $ is upper triangular with positive diagonal entries.
QR factorization can be computed using the Gram-Schmidt algorithm.

Invertible Matrices

Left Inverse: Matrix $ A $ has a left inverse if there exists a matrix $ L $ such that $L A = I $
- A's columns are linearly independent

Right Inverse: Matrix $ A $ has a right inverse if there exists a matrix $ R $ such that $A R = I $
- A's rows are linearly independent

Invertible if A has both a left and a right inverse, meaning there exists a matrix $ A^{-1} $ such that:

\[ A A^{-1} = A^{-1} A = I \]

If $ A $ is invertible:

$ A $ must be square
$ \det(A) \neq 0 $.
$ Ax = b $ has unique solution $x = A^{-1} b $
$ (A^{-1})^{-1} = A $

Pseudoinverse: If $ A $ is not square or not invertible, the pseudoinverse $ A^+ $ is used, which satisfies:

\[ A^+ = (A^T A)^{-1} A^T \quad \text{(if } A \text{ has full column rank)} \]

\[ A^+ = A^T (A A^T)^{-1} \quad \text{(if } A \text{ has full row rank)} \]

Least Squares Approximation

Given an overdetermined (A is tall) system of equations $ Ax = b $, an exact solution x may not exist. Instead, find $ \hat{x} $ that minimizes the least squares error: \[ J = \| Ax - b \|^2 \] Solution \[ \hat{x} = (A^T A)^{-1} A^T b = A^+ b \]

Multi- Objecive Least Squares Approximation

The multi-objective least squares problem can be formulated as:

\[ J = \sum_{i=1}^{m} w_i \| A_i x - b_i \|^2 \] \[ J = \left\| \begin{bmatrix} \sqrt{w_1} A_1 \\ \sqrt{w_2} A_2 \\ \vdots \\ \sqrt{w_m} A_m \end{bmatrix} x - \begin{bmatrix} \sqrt{w_1} b_1 \\ \sqrt{w_2} b_2 \\ \vdots \\ \sqrt{w_m} b_m \end{bmatrix} \right\|^2 \]