1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
Texto de pré-visualização
Estruturas de Dados Programação 2 Codificação de Huffman Márcio Ribeiro marcioicufalbr twittercommarciomribeiro ASCII American Standard Code for Informa3on Interchange 8 bits per character Character Decimal Hexadecimal Binary a 97 61 01100001 b 98 62 01100010 c 99 63 01100011 d 100 64 01100100 z 122 7A 01111010 FixedLength Character Encodings Problem fixedlength encodings waste space Solu3on encodings of different lengths for different characters and assign shorter encodings to frequently occurring characters Character Binary e 01 o 100 s 111 t 00 test 00 01 111 00 000111100 test tested test VariableLength Character Encodings Character Binary e 01 o 100 s 111 t 00 000111100 test 01110100011001010111001101110100 71 smaller No characters encoding can be the prefix of another characters encoding Example we cannot have 001111100 Character Binary e 00 t 001 e or t Requirement Huffman Coding AAAAAABBBCCCCDDDEEF F 1 E 2 D 3 C 4 B 5 A 6 1 Read the text to determine the frequencies and create a list F 1 E 2 3 2 Remove and merge the nodes with the two lowest frequencies forming a new node that is their parent Frequency of parent sum of the frequencies of its children 3 D 3 C 4 B 5 A 6 F 1 E 2 C 4 B 5 6 A 6 3 D 3 F 1 E 2 A 6 F 1 E 2 3 D 3 6 C 4 B 5 9 C 4 B 5 9 A 6 F 1 E 2 3 D 3 6 12 C 4 B 5 9 A 6 F 1 E 2 3 D 3 6 12 21 0 0 0 0 1 1 1 1 C B D A Huffman Tree Character Binary A 11 B 01 C 00 D 101 E 1001 F 1000 C 4 B 5 9 A 6 F 1 E 2 3 D 3 6 12 21 0 1 E F Helpful functions Bit shi Operators x n x n Shifts the value of x left by n bits Shifts the value of x right by n bits Note here we consider unsigned Le shi x x 2 b7 b6 b5 b4 b3 b2 b1 b0 1 1 0 0 0 1 1 1 b7 b6 b5 b4 b3 b2 b1 b0 0 0 0 1 1 1 0 0 Right shi x x 2 b7 b6 b5 b4 b3 b2 b1 b0 1 1 0 0 0 1 1 1 b7 b6 b5 b4 b3 b2 b1 b0 0 0 1 1 0 0 0 1 Is bit i set int isbitisetunsigned char c int i unsigned char mask 1 i return mask c b7 b6 b5 b4 b3 b2 b1 b0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 i 4 1 4 4th bit is not set Sedng a bit unsigned char setbitunsigned char c int i unsigned char mask 1 i return mask c b7 b6 b5 b4 b3 b2 b1 b0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 0 1 0 1 1 1 i 4 1 4 4th bit is now set AB 2 0 0 0 0 1 1 1 1 C B D A Character Binary A 11 B 01 C 00 D 101 E 1001 F 1000 0 1 E F AAAAAABBBBBCCCCDDDEEF 11111111 11110101 01010100 00000010 11011011 00110011 000 CBFEDA 101 00000 00001011 CBFEDA 11111111 11110101 01010100 00000010 11011011 00110011 000 3 bits for the trash 13 bits for the tree size Tree Bytes Trash File data Header filenamehuff Follow the header Save the tree using and the preorder traversal Use as the scape character Encapsula3on Documented ADTs void in at least one Data Structure or Huffman to any file format Mandatory Rules Eclipse CUnit Eclipse Hex Editor Plugin github SOCIAL CODING References Chapter 16 Chapter 11
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
1
Linguagens de Programação
UFAL
Texto de pré-visualização
Estruturas de Dados Programação 2 Codificação de Huffman Márcio Ribeiro marcioicufalbr twittercommarciomribeiro ASCII American Standard Code for Informa3on Interchange 8 bits per character Character Decimal Hexadecimal Binary a 97 61 01100001 b 98 62 01100010 c 99 63 01100011 d 100 64 01100100 z 122 7A 01111010 FixedLength Character Encodings Problem fixedlength encodings waste space Solu3on encodings of different lengths for different characters and assign shorter encodings to frequently occurring characters Character Binary e 01 o 100 s 111 t 00 test 00 01 111 00 000111100 test tested test VariableLength Character Encodings Character Binary e 01 o 100 s 111 t 00 000111100 test 01110100011001010111001101110100 71 smaller No characters encoding can be the prefix of another characters encoding Example we cannot have 001111100 Character Binary e 00 t 001 e or t Requirement Huffman Coding AAAAAABBBCCCCDDDEEF F 1 E 2 D 3 C 4 B 5 A 6 1 Read the text to determine the frequencies and create a list F 1 E 2 3 2 Remove and merge the nodes with the two lowest frequencies forming a new node that is their parent Frequency of parent sum of the frequencies of its children 3 D 3 C 4 B 5 A 6 F 1 E 2 C 4 B 5 6 A 6 3 D 3 F 1 E 2 A 6 F 1 E 2 3 D 3 6 C 4 B 5 9 C 4 B 5 9 A 6 F 1 E 2 3 D 3 6 12 C 4 B 5 9 A 6 F 1 E 2 3 D 3 6 12 21 0 0 0 0 1 1 1 1 C B D A Huffman Tree Character Binary A 11 B 01 C 00 D 101 E 1001 F 1000 C 4 B 5 9 A 6 F 1 E 2 3 D 3 6 12 21 0 1 E F Helpful functions Bit shi Operators x n x n Shifts the value of x left by n bits Shifts the value of x right by n bits Note here we consider unsigned Le shi x x 2 b7 b6 b5 b4 b3 b2 b1 b0 1 1 0 0 0 1 1 1 b7 b6 b5 b4 b3 b2 b1 b0 0 0 0 1 1 1 0 0 Right shi x x 2 b7 b6 b5 b4 b3 b2 b1 b0 1 1 0 0 0 1 1 1 b7 b6 b5 b4 b3 b2 b1 b0 0 0 1 1 0 0 0 1 Is bit i set int isbitisetunsigned char c int i unsigned char mask 1 i return mask c b7 b6 b5 b4 b3 b2 b1 b0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 i 4 1 4 4th bit is not set Sedng a bit unsigned char setbitunsigned char c int i unsigned char mask 1 i return mask c b7 b6 b5 b4 b3 b2 b1 b0 1 1 0 0 0 1 1 1 0 0 0 1 0 0 0 0 1 1 0 1 0 1 1 1 i 4 1 4 4th bit is now set AB 2 0 0 0 0 1 1 1 1 C B D A Character Binary A 11 B 01 C 00 D 101 E 1001 F 1000 0 1 E F AAAAAABBBBBCCCCDDDEEF 11111111 11110101 01010100 00000010 11011011 00110011 000 CBFEDA 101 00000 00001011 CBFEDA 11111111 11110101 01010100 00000010 11011011 00110011 000 3 bits for the trash 13 bits for the tree size Tree Bytes Trash File data Header filenamehuff Follow the header Save the tree using and the preorder traversal Use as the scape character Encapsula3on Documented ADTs void in at least one Data Structure or Huffman to any file format Mandatory Rules Eclipse CUnit Eclipse Hex Editor Plugin github SOCIAL CODING References Chapter 16 Chapter 11