# communication theory

asher0411

ELEG 4003 Communications Theory

Fall 2014

Course Project #1

(Max. Score: 20)

Date Assigned: 09/16/2014

Assignment Due Date: October 02, 2014

Instruction:

Adherence to Prairie View A&M University's honor code is fully expected. Discussion with

other team members or getting external help is strictly prohibited (i.e., if you have any

doubts, consult the course instructor on the interpretation of problems). Each team of three

students needs to submit only one report. Clearly indicate the contributions of each group

members in the report. You should also write and sign the honor pledge on the project

report: “We have neither given nor received unauthorized assistance on this assignment.”

Late submissions (after 11:00 am on 10/02/2014) will not be accepted/graded.

Question 1 (10 points)

Consider a three-symbols alphabet with the specified probability of assignment shown below:

Listed with the input alphabet are six binary code assignments.

(a) Scan these codes and determine which codes are practical (can be used for data compression

application). Justify your answers.

(b) Design a Huffman code for the above three-symbols source alphabet shown above and find its

code efficiency (i.e., compression efficiency).

(c) Design a Shannon-Fano code for the above three-symbol source alphabet shown above and

find its code efficiency. Compare it with your answer in part (b).

(d) Can you suggest a technique to improve the code efficiency to achieve a greater compression

ratio? Determine the code efficiency for your improved source coding method.

Xi P(Xi)

a 0.70

b 0.25

c 0.05

Symbol Code 1 Code 2 Code 3 Code 4 Code 5 Code 6

a 00 00 0 1 1 1

b 11 01 1 10 01 00

c 11 10 11 100 11 01

Question 2 (10 points): Huffman Coding

In this problem, you will study the efficacy of Huffman source coding (data compression algorithm)

on two different data sources. Generate two test data files data1.txt and data2.txt. Create the first test

data file data1.txt such that it contains at least 20 characters (including spaces). Next create a second

test data file data2.txt consisting of about 20 binary digits (i.e., digits 0 and 1).

(a) Suppose you wish to compress the data1.txt and data2.txt using Huffman source coding method.

Find the compression efficiency and the average codeword length. Clearly show your final code

design (i.e., codeword for each source symbol).

(b) Repeat part (a) but now consider at least 5000 characters and 5000 digits for data1.txt and

data2.txt, respectively.

(c) Validate that your design and your program works fine (i.e., correct data encoding and decoding)

for both of your test data files in parts (a) and (b). Compare your results with the compression effi-

ciency attainable with “compress” routine in UNIX or with other data compression utilities (e.g.,

zip). Explain any interesting observations and/or trends from your results.

(d) Explain why Huffman coding is NOT used for data compression of practical data sources? In

other words, why practical source compression utilities (e.g., zip, gnuzip, etc.) do not use Huffman

coding method despite its optimality?

(e) Explain why it might be more advantageous to employ the Lempel-Ziv algorithm over Huffman

coding method for digital multimedia (image/audio/video) data compression applications?

Your report should not be longer than 12 pages (single-line spacing). Your write-up should include:

(a) any source code (e.g., MATLAB commands or m-files) that you have written/used;

(b) a brief description of the data compression algorithms that you have utilized or implemented

(highlighting their unique features and possible improvements);

(c) discussions on your results including a comparison with other standard source coding (data com-

pression) utilities used in UNIX or MS-DOS operating systems;

(d) sample listing of the encoded sequence and decoded sequences.