Skip to content

Latest commit

 

History

History
117 lines (65 loc) · 3.01 KB

MatrixCalculus.md

File metadata and controls

117 lines (65 loc) · 3.01 KB

Matrix Calculus

We derive a function of K,cost function : $$C(K,X,\Pi) = f(K,X) + r \cdot p(K,\Pi)$$

where:

  • $f$ objective function
  • $p$ penality function
  • $r$ real number

proposition

A,B,C are matrix

  • $A \odot B = B \odot A$

  • $A \odot (B + C) = A \odot B + A \odot C = (B + C) \odot A$

  • $A \cdot B + A \cdot C = A \cdot (B + C)$

  • $\mathbb{I} \odot A = \mathbb{I} \odot A^\top \Rightarrow \mathbb{I} \odot (A + A^\top) = 2\cdot \mathbb{I}\odot A = 2\cdot \mathbb{I}\odot A^\top$

objective function

function

$$ f(K,X) = \mathrm{tr}(X^\top \cdot \mathrm{inv}(K\odot \mathbb{I})\cdot (K\odot (\mathbb{I}-K))\cdot \mathrm{inv}(K\odot \mathbb{I})\cdot X) $$

gradient

factorize form:

$$ \nabla f = \mathbb{I} \odot (T_1 \odot(\mathbb{I} - 2 \cdot T_2 \cdot T_0)) - 2 \cdot T_1 \odot K $$

where :

  • $T_0 = \mathrm{inv}(K\odot \mathbb{I})$
  • $T_1 = T_0 \cdot X \cdot X^\top \cdot T_0 $
  • $T_3 = K \odot (\mathbb{I} - K)$

hessian

for the calculation of the hessian, we split in two parts.

derive1

$$ \nabla \text{derive1} = \mathrm{tr}(\mathbb{I} \odot (T_5 \cdot (\mathbb{I} - 2 \cdot (K \odot T_2) \cdot T_0))) $$

factorize form:

$$ \nabla \text{derive1} = -2\cdot(T_6 \odot (\mathbb{I}- 2\cdot K) + \mathbb{I}(T_0 \odot T_4 \odot T_5 - T_3 \cdot T_5 \odot T_0)) $$

where:

  • $T_0 = \mathrm{inv}(K\odot \mathbb{I})$
  • $T_2 = (\mathbb{I} - K)$
  • $T_3 = T_0 \cdot (K \odot T_2)$
  • $T_4 = \mathbb{I} - 2\cdot T_3$
  • $T_5 = T_0 \cdot X \cdot X^\top \cdot T_0$
  • $T_6 = T_5 \cdot T_0$

$\nabla \text{derive1}$ is not symetric.

derive2

$$ \nabla \text{derive2} = -2 \cdot \mathrm{tr}(T_1 \odot K ) $$

factorize form:

$$ \nabla \text{derive2} = -2 \cdot \mathbb{I} \odot (T_1 - 2 \odot T_0 \odot T_2 \odot T_1) $$

where:

  • $T_0 = \mathrm{inv}(K\odot \mathbb{I})$
  • $T_1 = T_1 = T_0 \cdot X \cdot X^\top \cdot T_0$
  • $T_2 = K \odot \mathbb{I}$

$\nabla \text{derive2}$ is symmetric.

penality function

function

$$ p(K,\Pi) = \mathrm{tr}((K \odot \mathbb{I}- \mathrm{diag}(\Pi))^2) $$

gradient

$$ \nabla p = 2 \cdot \mathbb{I} \odot (K \odot \mathbb{I} - \mathrm{diag}(\Pi)) $$

hessian

$$ \nabla^2 p = 2 \cdot \mathbb{I} $$

checking gradient of cost function

the cost gradient is good.

checking hessian of cost function

the cost is not symmetric because the derive1 therefore false. I don't know the reason that the objective function hessian is incorrect. In the programs, we use the trustregions method, the method only takes into account the cost function and the gradient