Entropy
$$H(X) = - \sum p(x) log p(x)$$
Properties of H
- $H(X) \geq 0$
- $H_b(X) = log_b a H_a(X)$
- Conditioning reduces entropy
$H(X|Y) \leq H(X)$ (w/ eq. iff X and Y are independet) - $H(X_1, X_2, ..., X_n) \leq \sum H(X_i)$ (w/ eq. iff $X_i$ are independent)
- $H(X) \leq log |X|$ (w/ eq. iff X is uniformly distributed over X)
- $H(p)$ is concave in $p$
Relative Entropy
$$D(p||q) = \sum p(x) log \frac {p(x)} {q(x)}$$
Mutual Information
$$I(X:Y) = \sum_x \sum_y p(x,y) log \frac {p(x,y)} {p(x)p(y)}$$
Alternative expressions
- $H(X) = E_p log \frac {1} {p(X)}$
- $H(X,Y) = E_p log \frac {1} {p(X,Y)}$
- $H(X|Y) = E_p log \frac {1} {p(X|Y)}$
- $I(X:Y) = E_p log \frac {p(X,Y)} {p(X)p(Y)}$
- $D(p||q) = E_p log \frac {p(X)} {q(X)}$
Properties of D and I
- $I(X:Y) = H(X) - H(X|Y) = H(Y) - H(Y|X) = H(X) + H(Y) - H(X,Y)$
- $D(p||q) \geq 0$ (w/ eq. iff $p(x)=q(x)$)
- $I(X:Y) = D(p(x,y)||p(x)p(y)) \geq 0$ (w/ eq. iff $p(x,y)=p(x)p(y)$)
- If $|X|=m$, and $u$ is uniform distribution, then $D(p||u) = log (m) -H(p)$
- $D(p||q)$는 ($p$,$q$)에서 convex이다.
Chain rules
Entropy : $H(X_1, X_2, ..., X_n) = \sum H(X_i|X_{i-1}, ..., X_1)$
Mutual Information : $I(X_1, X_2, ..., X_n:Y) = \sum I(X_i:Y|X_1, ..., X_{i-1})$
Relative entropy : $D(p(x,y)||q(x,y)) = D(p(x)||q(x)) + D(p(y|x) || q(y|x))$
Jensen's inequality
$f$가 convex 함수면, $E[f(X)] \geq f(E[X])$
Log sum inequality
$$\sum a_i log \frac {a_i} {b_i} \geq (\sum a_i) log \frac {\sum a_i} {\sum b_i}$$
w/ eq. iff $\frac {a_i} {b_i} = constant$
Data-processing inequality
$X \rightarrow Y \rightarrow Z$가 Markov chain이면, $I(X:Y) \geq I(X:Z)$
Sufficient statistic
$T(X)$는 $f_\theta(x)$에 대해 sufficient하다. iff $I(\theta:X) = I(\theta:T(X))$
Fano's inequality
$P_e = Pr \{ \hat{X}(Y) ≠X \}$라 하자.
$$H(P_e) + P_e log |X| \geq H(X|Y)$$
Inequality
$X$와 $X'$이 independent and identically distributed면,
$$ Pr(X=X') \geq 2^{-H(X)}$$
AEP
"Almost all events are almost equally surprising"
특히, $X_1, X_2, ..., $가 iid이고 $p(x)$를 따르는 경우,
$- \frac {1} {n} log p(X_1, X_2, ..., X_n) \rightarrow H(X)$ in probability
Typical set
$A^{(n)}_\epsilon$은 다음 조건을 만족하는 $x_1, ..., x_n$의 시퀀스 집합이다.
$$2^{-n(H(X)+\epsilon)} \leq p(x_1, x_2, ..., x_n) \leq 2^{-n(H(X)-\epsilon)}$$
Properties of the typical set
- 만약 $(x_1, x_2, ..., x_n)$이 typical set이라면, $p(x_1, x_2, ..., x_n) = 2^{-n(H±\epsilon)}$
- $n$이 충분히 큰 경우에 $Pr\{A_epsilon^{(n)} > 1-\epsilon$
- $|A^{(n)}_epsilon| \leq 2^{n(H(X)+\epsilon)}$
Definition
$n\rightarrow \infty$일 때, $a_n = b_n$이면 $\frac {1} {n} log \frac {a_n} {b_n} \rightarrow 0$
Entropy rate
두가지 정의가 있다.
$$H(X) = \lim_{n \to \infty} \frac {1} {n} H(X_1, X_2, ..., X_n)$$
$$H'(X) = \lim_{n \to \infty} H(X_n |X_{n-1}, ..., X_1) $$
Stationary process의 경우, $H(X) = H'(X)$
Entropy rate of stationary Markov chain
$$H(X) = - \sum_{i,j} \mu_i P_{ij} log P_{ij}$$
Functions of Markov chain
$X_1, X_2, ..., X_n$이 stationary Markov chain이고 $Y_i = \phi(X_i)$이면,
$$H(Y_n|Y_{n-1},...,Y_1, X_1) \leq H(Y) \leq H(Y_n|Y_{n-1}, ..., Y_1)$$
그리고
$$\lim_{n \to \infty} H(Y_n|Y_{n-1}, ..., Y_1, X_1) = H(Y) = \lim_{n \rightarrow \infty} H(Y_n | Y_{n-1}, ..., Y_1)$$
'Study > Coding and Information Theory' 카테고리의 다른 글
[07] Channel Capacity (1) | 2024.06.09 |
---|---|
[06] Data Compression (1) | 2024.06.08 |
[05] Entropy Rates (1) | 2024.04.10 |
[04] AEP, Asymptotic Equipartition Property (0) | 2024.04.10 |
[03-2] Jensen's Inequality (0) | 2024.04.09 |