YOUR NAME: SECTION NUMBER: CSE 331, Spring 2000 Class Exercise 15 Algrithm Design Techniques-Greedy Algrithms March 13, 2000 Fllwing are five f the cmmn types f algrithms. Fr many prblems it is quite likely that at least ne f these methds will wrk. 1. Greedy Algrithms 2. Divide and Cnquer 3. Dynamic prgramming 4. Randmized algrithms 5. Backtracking Algrithms Greedy Algrithms: A greedy algrithm btains a slutin t a prblem by making a sequence f chices. Fr each decisin pint in the algrithm, the chice that seems best at the mment is chsen. That is, it makes lcally ptimal chice in the hpe that this chice will lead t the glbally ptimal slutin. Hwever, this heuristic strategy des nt always yield ptimal slutins. Thus greedy algrithms may nt always lead t the ptimal slutins but fr many prblems they d. Fr many f these prblems where greedy algrithms d nt yield ptimal slutins, they may still be useful because the perfrmance f the greedy algrithm may be much better than the algrithm giving ptimal slutins. An example f a greedy algrithm that leads t an ptimal slutin is A Simple Scheduling Prblem given n page 342 f the text bk. Here, given a set f jbs and their lengths f running time, find a schedule that will give the minimum average wait time. The greedy algrithm fr this prblem is t select the next jb fr running that has the shrtest running time. When this jb is cmpleted select the next jb frm the remaining jbs with the 1
shrtest running time, and s n. Nte that we are chsing the best jb as we g (i.e. selecting the best jb at the mment we schedule the next jb) withut wrrying abut whether it will lead t the glbal ptimal slutin r nt. Anther example f a greedy algrithm is the huffman cding fr file cmpressin Nrmally fixed length binary cdes (8 bits) are used t represent characters such as alphabatic and numeric characters. It is mre efficient, hwever, t build cdes fr a particular file, fr example, fr cmpressing the data in the file based n the frequencies f use f different characters inthe file. Huffman cde is such a cde and is a variable length cde with mre frequently used characters having smaller cde than the less frequently used characters in the file. If character cdes are fixed lengths then the end f the characters in the string can be determined easily by cunting 8 bits (fr 8-bit cde) per character. 1. Give the fixed minimum length cdes (cde wrds) fr the alphabet: {a,b,c.d}. Give the encding f the string aabcd. 2. Fr a lrge file where certain characters appear mre ften than thers, it is mre efficient t use variable length cdes. One prblem in using variable length cdes is t find a methd fr determining the bundaries f the cde wrds in a string. If A is 11, B is 00, C is 010, D is 10 and R is 011, what is the character stringfrthefllwing(ntethatgingfrmlefttrightintheencded string, substrings will uniquely identify the cdes): 11000111101011 3. Assume that the cdes fr A, B, C, D are as fllws: A is 0, B is 1 and C is 10 Can yu identify the bundaries f the cde wrds uniquely in the string 001 and 0010? 4. What relatinships have t be satisfied between the cdes s that cdes can be uniquely identified in the string? 2
Prefix cdes are cdes where n cde wrd is a prefix f sme ther cde wrd. 5. If we have a binary tree and the leaf ndes represent the characters and branches are labeled 0 s and 1 s, then the paths t the leaf ndes represent variable length cdes. These cdes d nt have any prefixes that are cdes f anther character, why? Give the variable length cdes fr each f the characters fr tw binary trees belw. 0 1 0 1 0 1 B D A 0 / 0 1 C R 6. D the cdes satisfy prefix prperty? Are they ptimal (i.e., can yu reduce the sizes f sme f the cdes)? 7. Thus, the tree has t be a full tree (i.e., all ndes in the tree except the leaves has t have tw children) fr the cdes t be ptimal. Why that is necessary? 8. A R B 3
D C Is the abve tree a full tree? Which leaf ndes in the tree shuld represent mre frequent characters and why? Huffman Cding 9. Huffman cdes are prefix cdes created using full trees. Here frequencies f the characters in the file are used t generate the cdes. Characters with high frequencies f ccurrences in the file have smaller cdes while thse with small frequencies f ccurrences have larger cdes. Huffman cde guarantees ptimal file cmpressin (i.e., fr all pssible sizes f cde wrds, huffman cdes give the largest file cmpressin). Huffman algrithm that generates the huffman cdes is based n greedy apprach. Initially, each character represents a ne nde subtree with the frequencies f the characters labeling the ndes. In each step tw subtrees with smallest frequencies are merged t create a larger tree. This is discussed n pages 348-350 f the text bk. Anexamplef hwthealgrithmwrksisattached. 10. Huffman algrithm is a gd example f the use f binary tree and the pririty queue. Draw the heap (an implementatin f pririty queue) fr the fllwing frequencies: A:50, B:40, C:5, D:3, E:1 11. Cnstruct the huffman tree fr the abve frequencies. 12. Hw is the heap used in cnstructing the Huffman tree? 13. The algrithm uses BuildHeap, Extract Min, and Insert. Indicate the number f times these peratins are used t create the huffman tree if there are C characters in the alphabet. Optimallity f huffman cde 14. Why des the tree has t be a full tree? 4
15. Is the ptimality changed if we swap tw characters at the same depth f the tree? 16. Hw can we imprve the cst if there is a deeper nde with higher frequency and a higher level nde with smaller frequency? 17. Can we say that the tw least frequent symbls are siblings at the deepest level? 5