Entropy
H(X)=−∑p(x)logp(x)
Properties of H
- H(X)≥0
- Hb(X)=logbaHa(X)
- Conditioning reduces entropy
H(X|Y)≤H(X) (w/ eq. iff X and Y are independet) - H(X1,X2,...,Xn)≤∑H(Xi) (w/ eq. iff Xi are independent)
- H(X)≤log|X| (w/ eq. iff X is uniformly distributed over X)
- H(p) is concave in p
Relative Entropy
D(p||q)=∑p(x)logp(x)q(x)
Mutual Information
I(X:Y)=∑x∑yp(x,y)logp(x,y)p(x)p(y)
Alternative expressions
- H(X)=Eplog1p(X)
- H(X,Y)=Eplog1p(X,Y)
- H(X|Y)=Eplog1p(X|Y)
- I(X:Y)=Eplogp(X,Y)p(X)p(Y)
- D(p||q)=Eplogp(X)q(X)
Properties of D and I
- I(X:Y)=H(X)−H(X|Y)=H(Y)−H(Y|X)=H(X)+H(Y)−H(X,Y)
- D(p||q)≥0 (w/ eq. iff p(x)=q(x))
- I(X:Y)=D(p(x,y)||p(x)p(y))≥0 (w/ eq. iff p(x,y)=p(x)p(y))
- If |X|=m, and u is uniform distribution, then D(p||u)=log(m)−H(p)
- D(p||q)는 (p,q)에서 convex이다.
Chain rules
Entropy : H(X1,X2,...,Xn)=∑H(Xi|Xi−1,...,X1)
Mutual Information : I(X1,X2,...,Xn:Y)=∑I(Xi:Y|X1,...,Xi−1)
Relative entropy : D(p(x,y)||q(x,y))=D(p(x)||q(x))+D(p(y|x)||q(y|x))
Jensen's inequality
f가 convex 함수면, E[f(X)]≥f(E[X])
Log sum inequality
∑ailogaibi≥(∑ai)log∑ai∑bi
w/ eq. iff aibi=constant
Data-processing inequality
X→Y→Z가 Markov chain이면, I(X:Y)≥I(X:Z)
Sufficient statistic
T(X)는 fθ(x)에 대해 sufficient하다. iff I(θ:X)=I(θ:T(X))
Fano's inequality
Pe=Pr{ˆX(Y)≠X}라 하자.
H(Pe)+Pelog|X|≥H(X|Y)
Inequality
X와 X′이 independent and identically distributed면,
Pr(X=X′)≥2−H(X)
AEP
"Almost all events are almost equally surprising"
특히, X1,X2,...,가 iid이고 p(x)를 따르는 경우,
−1nlogp(X1,X2,...,Xn)→H(X) in probability
Typical set
A(n)ϵ은 다음 조건을 만족하는 x1,...,xn의 시퀀스 집합이다.
2−n(H(X)+ϵ)≤p(x1,x2,...,xn)≤2−n(H(X)−ϵ)
Properties of the typical set
- 만약 (x1,x2,...,xn)이 typical set이라면, p(x1,x2,...,xn)=2−n(H±ϵ)
- n이 충분히 큰 경우에 Pr{Aepsilon(n)>1−ϵ
- |A(n)epsilon|≤2n(H(X)+ϵ)
Definition
n→∞일 때, an=bn이면 1nloganbn→0
Entropy rate
두가지 정의가 있다.
H(X)=lim
H'(X) = \lim_{n \to \infty} H(X_n |X_{n-1}, ..., X_1)
Stationary process의 경우, H(X) = H'(X)
Entropy rate of stationary Markov chain
H(X) = - \sum_{i,j} \mu_i P_{ij} log P_{ij}
Functions of Markov chain
X_1, X_2, ..., X_n이 stationary Markov chain이고 Y_i = \phi(X_i)이면,
H(Y_n|Y_{n-1},...,Y_1, X_1) \leq H(Y) \leq H(Y_n|Y_{n-1}, ..., Y_1)
그리고
\lim_{n \to \infty} H(Y_n|Y_{n-1}, ..., Y_1, X_1) = H(Y) = \lim_{n \rightarrow \infty} H(Y_n | Y_{n-1}, ..., Y_1)
'Study > Coding and Information Theory' 카테고리의 다른 글
[07] Channel Capacity (1) | 2024.06.09 |
---|---|
[06] Data Compression (1) | 2024.06.08 |
[05] Entropy Rates (1) | 2024.04.10 |
[04] AEP, Asymptotic Equipartition Property (0) | 2024.04.10 |
[03-2] Jensen's Inequality (0) | 2024.04.09 |