Source Coding and Compression

Июль 26, 2022

Главная
Информатика
Source Coding and Compression

Содержание

2. Definition 1.1: The fundamental problem of communication is that of reproducing at one point either exactly
3. “THEN, OUR PROBLEM IS THE NOISE!” Information Theory How can we achieve perfect communication over an
4. The theory provides answers to two fundamental questions (among others): What is the irreducible complexity below
5. The information theory (IT) frame work: Communication System Line Coding Channel Coding (Shannon's 2nd theorem) Cryptology
6. The information theory (IT) frame work: Communication System Line Coding Channel Coding (Shannon's 2nd theorem) Cryptology
7. Communication System Information Theory ?
8. Communication System Error Free Transmission Erroneous Transmission Information Theory ?
9. The information theory (IT) frame work: Communication System Line Coding Channel Coding (Shannon's 2nd theorem) Cryptology
10. Also known as digital baseband modulation (1,0,1,1,0, … ) encoding digital information to make it resistant
11. The information theory (IT) frame work: Communication System Line Coding Channel Coding (Shannon's 2nd theorem) Cryptology
12. The Three Channel Properties Channels can only transport physical signals, e.g., electrical signals. Therefore, digital signals
13. A.K.A: Forward error correction (FEC) For controlling errors in data transmission over unreliable (generally → noisy)
14. Types of Channel Coding Block codes: A codeword with a length N consists of K information
15. The information theory (IT) frame work: Communication System Line Coding Channel Coding (Shannon's 2nd theorem) Cryptology
16. Cryptology or the “Hidden/Secret Study” Cryptography mechanism
17. Cryptology or the “Hidden/Secret Study” Alice encrypts her data using certain security parameters. Then, the encrypted
18. Cryptology World War II The German Lorenz cipher SZ42 (SZ for Schlüsselzusatz), one of the first
19. The information theory (IT) frame work: Communication System Line Coding Channel Coding (Shannon's 2nd theorem) Cryptology
20. a.k.a.: The Noiseless Coding Theorem, Bit-rate (Data) Reduction, and Data Compression The Source Coding Theorem Encoder
21. The Source Coding Theorem High comp. (98% less info) 1.14 KB) Original (108.5 KB) Medium comp.
22. Compression Measurement Criteria Compression ratio:|x|/|y| |x| represents the number of bits in y E.g.: |x| =
23. Now we will learn 3 concepts: Self-information Entropy Kraft's inequality The Source Coding Theorem
24. Real-World Coding: Morse(1844)
25. Real-World Coding: Morse(1844) Generally, high/low frequency => short/long codes!! - Not 100% consistent! Example: E vs.
26. Assume: A source with finite number of symbols S ={s1, s2, ..., sN} Symbol sn has
27. Properties of self-information: barking of a dog during breaking a house carry or does not carry
28. Self-Information Examples Example 1: the out come of flipping a coin if: The coin is fair,
29. Entropy as Information Content Entropy is a probabilistic model such that: Independent fair coin flips have
30. The Source Coding Theorem - II Theorem 1.2: (Entropy) The minimum average length of a codeword
31. The Source Coding Theorem - III Code Word and Code Length: Definition 1.2: A (binary) source
32. The Source Coding Theorem – IV Can you say why?
33. The Source Coding Theorem - V Conversely, given a set of codeword lengths satisfying the above
34. The Source Coding Theorem – IV Do It Your Self!!
35. Notation for Sequences & Codes Assume a sequence of symbols X = {X1;X2; … ;Xn} from
36. Lossless Data Compression Let's focus on the lossless data compression problem for now, and not worry
38. Скачать презентацию

Слайд 2

Definition 1.1: The fundamental problem of communication is that of reproducing

at one point either exactly or approximately a message selected at another point. (Claude Shannon, 1948)

SmOkInG CaN KiLL YoU!

Information Theory

Why Shannon used the word reproducing and not receiving? What is meant by exactly and approximately

Слайд 3

“THEN, OUR PROBLEM IS THE NOISE!”
Information Theory
How can we achieve perfect

communication over an imperfect, noisy communication channel?

Example: an analogue telephone line, over which two modems communicate digital Information. modem → line+noise → modem

Definition 1.1: The fundamental problem of communication is that of reproducing at one point either exactly or approximately a message selected at another point. (Claude Shannon, 1948)

Слайд 4

The theory provides answers to two fundamental questions (among others):
What

is the irreducible complexity below which a signal cannot be compressed? (Shannon Theorem 1)
How Can we correct errors or know their locations (Shannon Theorem 2)
What is the ultimate transmission rate for reliable communication over a noisy channel? (Shannon Theorem 3)
How can we protect them against hacking ? (Shannon Security Theorem)

Shannon's Information Theory

Слайд 5

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

theorem)
Cryptology (Shannon's security theory)
Source Coding (Shannon's 1st theorem)

Information Theory ?

Слайд 6

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

theorem)
Cryptology
Source Coding (Shannon's 1st theorem)

Information Theory ?

Слайд 7

Communication System
Information Theory ?

Слайд 8

Communication System
Error Free
Transmission
Erroneous
Transmission

Information Theory ?

Слайд 9

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

theorem)
Cryptology
Source Coding (Shannon's 1st theorem)

Information Theory ?

Слайд 10

Also known as digital baseband modulation (1,0,1,1,0, … )
encoding digital information

to make it resistant to certain forms of signal loss during transmission

Line Coding

Example: Differential Manchester !!! Not returning to zero! Why?
a binary '1' is referred to as a "phase inversion"; a binary '0' is a "constant phase"

Слайд 11

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

theorem)
Cryptology
Source Coding (Shannon's 1st theorem)

Information Theory ?

Слайд 12

The Three Channel Properties
Channels can only transport physical signals, e.g., electrical

signals. Therefore, digital signals must be converted to appropriate formats (remember the line coding or RF)
EvEn iF the signal is adapted to the channel, it does not pass it undisturbed !! The channel introduce errors
There is always an upper bound to the number of correct bits that you can send over the channel (Shannon Channel Capacity: C)

Слайд 13

A.K.A: Forward error correction (FEC)
For controlling errors in data transmission over

unreliable (generally → noisy) communication channels
Tx Encode the message with redundancy
The first error-correcting code in 1950 by Richard Hamming: Hamming (7,4) code
The simplest code is the repetition code!! transmit each bit several times until and try to find the maximum likelihood
Example:
0 0 0 → correct 0
0 0 1 → maybe 0
1 1 1 → correct 1
1 0 1 → maybe 1
1 0 0 → maybe 0

Channel Coding

Слайд 14

Types of Channel Coding
Block codes:
A codeword with a length N consists

of K information symbols (Ij) and M parity bits symbols
Code rate R = K/N
Convolution Coding:
information bits are transformed to n bits (not injection; rather tapped delay model!!)
Can be decoded using Viterbi algorithm

Coded bits

Information bits

Iterative decoding:
LDPC (Low-density parity-check code)
Exploit the low “1” counts into a graph (Tanner graph), then decode iteratively (needs very long codes)
Turbo Codes:
Combine two convolution codes, inner code and outer code

Слайд 15

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

theorem)
Cryptology
Source Coding (Shannon's 1st theorem)

Information Theory ?

Слайд 16

Cryptology or the “Hidden/Secret Study”
Cryptography mechanism

Слайд 17

Cryptology or the “Hidden/Secret Study”
Alice encrypts her data using certain security

parameters.
Then, the encrypted data is transmitted to Bob, who decrypts it using synchronization.
An eavesdropper, Eve, can hack the communication channel and decrypt the encrypted
data, if she manages to identify Alice’s security parameters.

Description of security in communication scheme

Слайд 18

Cryptology World War II
The German Lorenz cipher SZ42 (SZ for Schlüsselzusatz),

one of the first security “machines” in the world … see, no software ;-)

Слайд 19

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

theorem)
Cryptology
Source Coding (Shannon's 1st theorem)

Information Theory ?

Слайд 20

a.k.a.: The Noiseless Coding Theorem, Bit-rate (Data) Reduction, and Data Compression
The

Source Coding Theorem

Encoder

Decoder

Original

Compressed

decompressed

Lossless compression: x = x’
(a.k.a. entropy coding, reversible coding)
Applications: text compression
“Do not send money” → “Do now send money”
Lossy compression: x ≠ x’
(a.k.a. irreversible coding)
Applications: image compression
Keywords: distortion vs. perception

Слайд 21

The Source Coding Theorem
High comp. (98% less info) 1.14 KB)
Original (108.5

KB)

Medium comp. (92% less info), 4.82 KB

Low comp. (84% less info) 9.37 KB

Слайд 22

Compression Measurement Criteria
Compression ratio:|x|/|y|
|x| represents the number of bits in y
E.g.:

|x| = 65,536, |y| = 16384, compression = 4:1
Alt., data has been reduced by (|x|-|y|)/|y| = 75%
Other measures of source coding performance
Bits per sample
E.g. ASCII: 8 bits/char, RGB: 24/48/72 bits/pixel
Distortion (Only on lossy methods)
Depends on the human-perceived and/or mathematical difference between x and x’

Слайд 23

Now we will learn 3 concepts:
Self-information
Entropy
Kraft's inequality
The Source Coding Theorem

Слайд 24

Real-World Coding: Morse(1844)

Слайд 25

Real-World Coding: Morse(1844)
Generally, high/low frequency => short/long codes!!
- Not 100% consistent!
Example:

E vs. T and I vs. M

Слайд 26

Assume:
A source with finite number of symbols S ={s1, s2, ...,

sN}
Symbol sn has a probability P(sn)= pn
Lets assume that for every symbol sn, there is a ln different bits (bits/symbol)
Before we do any further progress, lets define the term: self-information that is:

The Source Coding Theorem - I

Theorem 1.1: (Self Information)
The self information is measured by:

Слайд 27

Properties of self-information:
barking of a dog during breaking a house carry

or does not carry too much info if
If he only bark in dangerous cases (info)
If he is barking all the time (no-info)
Independent events have accumulated self-information, i.e.,

Self-Information - I
to calculate the information in bits we need to take the logarithm with the base (b in the log base) = 2 and this mean:
for b = e (2.7183), information is measured in “nat” and for b = 10 is in dit (decimal digits)

Слайд 28

Self-Information Examples
Example 1: the out come of flipping a coin if:
The

coin is fair, i.e.,
then:
The coin tossing is not fair; some one cheating towards “T” ;-) :
and
Then: (No Surprise)
If we have a set of independent events, , then the average self-information associated with the experiment is

This is the entropy (H)

Слайд 29

Entropy as Information Content
Entropy is a probabilistic model such that:
Independent fair

coin flips have an entropy of 1 bit per flip. A source that always generates a long string of B's has an entropy of 0, since the next character will always be a 'B'.

Shannon showed that:
Definition 1.2: If the experiments is a source that puts out symbols sn from a set A, then the entropy is a measure of the average number of binary symbols (bits) needed for encoding the source

Avg. message size in bits

Слайд 30

The Source Coding Theorem - II
Theorem 1.2: (Entropy)
The minimum average length

of a codeword is:

Entropy is the minimum expected average length.
If the average length decreased than H, then the code will not be decoded

Слайд 31

The Source Coding Theorem - III
Code Word and Code Length:
Definition 1.2:

A (binary) source code C for a random variable S is a mapping from S to a (finite) binary string (string of bits). Let C(sn) be the codeword corresponding to sn and let l(sn) denote the length of C(sn).
To proceed, let us focus on codes that are “instantaneously” decoded, e.g., Prefix Code
Definition 1.3: A code is called a prefix code or an instantaneous code if no codeword is a prefix of any other codeword.
Example:
Non-prefix: {s1=0, s2=01, s3=011, s4=0111}
Prefix: {s1=0, s2=10, s3=110, s4=111 }

Слайд 32

The Source Coding Theorem – IV
Can you say why?

Слайд 33

The Source Coding Theorem - V
Conversely, given a set of codeword

lengths satisfying the above Inequality,
one can have instantaneous code with these word lengths!! (see only can!)

Слайд 34

The Source Coding Theorem – IV
Do It Your Self!!

Слайд 35

Notation for Sequences & Codes
Assume a sequence of symbols X =

{X1;X2; … ;Xn} from a finite source alphabet AX
We almost always use A = {0, 1} (e.g. computer files, digital, communication) but the theory can be generalized to any finite set.
Encoder: outputs a new sequence Y = {Y1;Y2; … ;Yn}, (using a possibly different code alphabet AY).
A symbol code, C, is a mapping of AX → AY ; We use c(x) to denote the codeword to which C maps x.

Encoder

Decoder

Original

Compressed

decompressed

Слайд 36

Lossless Data Compression
Let's focus on the lossless data compression problem for

now, and not worry about noisy channel coding for now. In practice these two problems (noise and compression) are handled separately.
An efficient code for the source data is able to (remove source redundancy)
Then (if necessary) we design a channel code to help us transmit the source code over the channel (adding redundancy) almost error free.
Assumptions (for now):
the channel is perfectly noiseless, i.e., the receiver sees exactly the encoder's output
we always require the output of the decoder to exactly match the original sequence X.
X is generated according to a fixed probabilistic model, p(X), i.e., which is known!!
We will measure the quality of our compression scheme by examining the average length of the encoded string Y, i.e., over p(X).

Encoder

Decoder

Original

Compressed

decompressed

Source Coding and Compression

Содержание

Definition 1.1: The fundamental problem of communication is that of reproducing

“THEN, OUR PROBLEM IS THE NOISE!”Information TheoryHow can we achieve perfect

The theory provides answers to two fundamental questions (among others): What

The information theory (IT) frame work:Communication SystemLine CodingChannel Coding (Shannon's 2nd

The information theory (IT) frame work:Communication SystemLine CodingChannel Coding (Shannon's 2nd

Communication SystemInformation Theory ?

Communication SystemError FreeTransmissionErroneousTransmission Information Theory ?

The information theory (IT) frame work:Communication SystemLine CodingChannel Coding (Shannon's 2nd

Also known as digital baseband modulation (1,0,1,1,0, … )encoding digital information

The information theory (IT) frame work:Communication SystemLine CodingChannel Coding (Shannon's 2nd

The Three Channel PropertiesChannels can only transport physical signals, e.g., electrical

A.K.A: Forward error correction (FEC)For controlling errors in data transmission over

Types of Channel CodingBlock codes:A codeword with a length N consists

The information theory (IT) frame work:Communication SystemLine CodingChannel Coding (Shannon's 2nd

Cryptology or the “Hidden/Secret Study”Cryptography mechanism

Cryptology or the “Hidden/Secret Study”Alice encrypts her data using certain security

Cryptology World War IIThe German Lorenz cipher SZ42 (SZ for Schlüsselzusatz),

The information theory (IT) frame work:Communication SystemLine CodingChannel Coding (Shannon's 2nd

a.k.a.: The Noiseless Coding Theorem, Bit-rate (Data) Reduction, and Data CompressionThe

The Source Coding TheoremHigh comp. (98% less info) 1.14 KB)Original (108.5

Compression Measurement CriteriaCompression ratio:|x|/|y||x| represents the number of bits in yE.g.:

Now we will learn 3 concepts:Self-informationEntropyKraft's inequalityThe Source Coding Theorem

Real-World Coding: Morse(1844)

Real-World Coding: Morse(1844)Generally, high/low frequency => short/long codes!!- Not 100% consistent!Example:

Assume:A source with finite number of symbols S ={s1, s2, ...,

Properties of self-information:barking of a dog during breaking a house carry

Self-Information ExamplesExample 1: the out come of flipping a coin if:The

Entropy as Information ContentEntropy is a probabilistic model such that:Independent fair

The Source Coding Theorem - IITheorem 1.2: (Entropy)The minimum average length

The Source Coding Theorem - IIICode Word and Code Length:Definition 1.2:

The Source Coding Theorem – IVCan you say why?

The Source Coding Theorem - VConversely, given a set of codeword

The Source Coding Theorem – IVDo It Your Self!!

Notation for Sequences & CodesAssume a sequence of symbols X =

Lossless Data CompressionLet's focus on the lossless data compression problem for

Похожие презентации

“THEN, OUR PROBLEM IS THE NOISE!”
Information Theory
How can we achieve perfect

The theory provides answers to two fundamental questions (among others):
What

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

Communication System
Information Theory ?

Communication System
Error Free
Transmission
Erroneous
Transmission

Information Theory ?

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

Also known as digital baseband modulation (1,0,1,1,0, … )
encoding digital information

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

The Three Channel Properties
Channels can only transport physical signals, e.g., electrical

A.K.A: Forward error correction (FEC)
For controlling errors in data transmission over

Types of Channel Coding
Block codes:
A codeword with a length N consists

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

Cryptology or the “Hidden/Secret Study”
Cryptography mechanism

Cryptology or the “Hidden/Secret Study”
Alice encrypts her data using certain security

Cryptology World War II
The German Lorenz cipher SZ42 (SZ for Schlüsselzusatz),

The information theory (IT) frame work:
Communication System
Line Coding
Channel Coding (Shannon's 2nd

a.k.a.: The Noiseless Coding Theorem, Bit-rate (Data) Reduction, and Data Compression
The

The Source Coding Theorem
High comp. (98% less info) 1.14 KB)
Original (108.5

Compression Measurement Criteria
Compression ratio:|x|/|y|
|x| represents the number of bits in y
E.g.:

Now we will learn 3 concepts:
Self-information
Entropy
Kraft's inequality
The Source Coding Theorem

Real-World Coding: Morse(1844)
Generally, high/low frequency => short/long codes!!
- Not 100% consistent!
Example:

Assume:
A source with finite number of symbols S ={s1, s2, ...,

Properties of self-information:
barking of a dog during breaking a house carry

Self-Information Examples
Example 1: the out come of flipping a coin if:
The

Entropy as Information Content
Entropy is a probabilistic model such that:
Independent fair

The Source Coding Theorem - II
Theorem 1.2: (Entropy)
The minimum average length

The Source Coding Theorem - III
Code Word and Code Length:
Definition 1.2:

The Source Coding Theorem – IV
Can you say why?

The Source Coding Theorem - V
Conversely, given a set of codeword

The Source Coding Theorem – IV
Do It Your Self!!

Notation for Sequences & Codes
Assume a sequence of symbols X =

Lossless Data Compression
Let's focus on the lossless data compression problem for