Entropy

$$H(X) = - \sum p(x) log p(x)$$

 

Properties of H

  1. $H(X) \geq 0$
  2. $H_b(X) = log_b a H_a(X)$
  3. Conditioning reduces entropy
    $H(X|Y) \leq H(X)$   (w/ eq. iff X and Y are independet)
  4. $H(X_1, X_2, ..., X_n) \leq \sum H(X_i)$    (w/ eq. iff $X_i$ are independent)
  5. $H(X) \leq log |X|$    (w/ eq. iff X is uniformly distributed over X)
  6. $H(p)$ is concave in $p$

 

Relative Entropy

$$D(p||q) = \sum p(x) log \frac {p(x)} {q(x)}$$

 

Mutual Information

$$I(X:Y) = \sum_x \sum_y p(x,y) log \frac {p(x,y)} {p(x)p(y)}$$

 

Alternative expressions

  • $H(X) = E_p log \frac {1} {p(X)}$
  • $H(X,Y) = E_p log \frac {1} {p(X,Y)}$
  • $H(X|Y) = E_p log \frac {1} {p(X|Y)}$
  • $I(X:Y) = E_p log \frac {p(X,Y)} {p(X)p(Y)}$
  • $D(p||q) = E_p log \frac {p(X)} {q(X)}$

 

Properties of D and I

  1. $I(X:Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = H(X) + H(Y) - H(X,Y)$
  2. $D(p||q) \geq 0$   (w/ eq. iff $p(x)=q(x)$)
  3. $I(X:Y) = D(p(x,y)||p(x)p(y)) \geq 0$    (w/ eq. iff $p(x,y)=p(x)p(y)$)
  4. If $|X|=m$, and $u$ is uniform distribution, then $D(p||u) = log (m) -H(p)$
  5. $D(p||q)$는 ($p$,$q$)에서 convex이다.

 

Chain rules

Entropy : $H(X_1, X_2, ..., X_n) = \sum H(X_i|X_{i-1}, ..., X_1)$

Mutual Information : $I(X_1, X_2, ..., X_n:Y) = \sum I(X_i:Y|X_1, ..., X_{i-1})$

Relative entropy : $D(p(x,y)||q(x,y)) = D(p(x)||q(x)) + D(p(y|x) || q(y|x))$

 

Jensen's inequality

$f$가 convex 함수면, $E[f(X)] \geq f(E[X])$

 

Log sum inequality

$$\sum a_i log \frac {a_i} {b_i} \geq (\sum a_i) log \frac {\sum a_i} {\sum b_i}$$

w/ eq. iff $\frac {a_i} {b_i} = constant$

 

Data-processing inequality

$X \rightarrow Y \rightarrow Z$가 Markov chain이면, $I(X:Y) \geq I(X:Z)$

 

Sufficient statistic

$T(X)$는 $f_\theta(x)$에 대해 sufficient하다. iff $I(\theta:X) = I(\theta:T(X))$

 

Fano's inequality

$P_e = Pr \{ \hat{X}(Y) ≠X \}$라 하자.

$$H(P_e) + P_e log |X| \geq H(X|Y)$$

 

Inequality

$X$와 $X'$이 independent and identically distributed면,

$$ Pr(X=X') \geq 2^{-H(X)}$$

 

AEP

"Almost all events are almost equally surprising"

특히, $X_1, X_2, ..., $가 iid이고 $p(x)$를 따르는 경우,

$- \frac {1} {n} log p(X_1, X_2, ..., X_n) \rightarrow H(X)$ in probability

 

Typical set

$A^{(n)}_\epsilon$은 다음 조건을 만족하는 $x_1, ..., x_n$의 시퀀스 집합이다.

$$2^{-n(H(X)+\epsilon)} \leq p(x_1, x_2, ..., x_n) \leq 2^{-n(H(X)-\epsilon)}$$

 

Properties of the typical set

  1. 만약 $(x_1, x_2, ..., x_n)$이 typical set이라면, $p(x_1, x_2, ..., x_n) = 2^{-n(H±\epsilon)}$
  2. $n$이 충분히 큰 경우에 $Pr\{A_epsilon^{(n)} > 1-\epsilon$
  3. $|A^{(n)}_epsilon| \leq 2^{n(H(X)+\epsilon)}$

 

Definition

$n\rightarrow \infty$일 때, $a_n = b_n$이면 $\frac {1} {n} log \frac {a_n} {b_n} \rightarrow 0$

 

Entropy rate

두가지 정의가 있다.

$$H(X) = \lim_{n \to \infty}  \frac {1} {n} H(X_1, X_2, ..., X_n)$$

$$H'(X) = \lim_{n \to \infty} H(X_n |X_{n-1}, ..., X_1) $$

 

Stationary process의 경우, $H(X) = H'(X)$

 

Entropy rate of stationary Markov chain

$$H(X) = - \sum_{i,j} \mu_i P_{ij} log P_{ij}$$

 

Functions of Markov chain

$X_1, X_2, ..., X_n$이 stationary Markov chain이고 $Y_i = \phi(X_i)$이면,

$$H(Y_n|Y_{n-1},...,Y_1, X_1) \leq H(Y) \leq H(Y_n|Y_{n-1}, ..., Y_1)$$

 

그리고

$$\lim_{n \to \infty} H(Y_n|Y_{n-1}, ..., Y_1, X_1) = H(Y) = \lim_{n \rightarrow \infty} H(Y_n | Y_{n-1}, ..., Y_1)$$