哈夫曼编码的长度不超过4_哈夫曼编码过程示意图

2024年 9月 3日下午7:43 • 激活谷笔记

哈夫曼编码　　前置知识　　贪心　　目标　　理解哈夫曼编码的生成过程　　What 　　霍夫曼编码（英语：Huffman Coding），又译为哈夫曼编码、赫夫曼编码，是一种用于无损数据压缩的熵编码（权编码）算法。由美国计算机科学家大卫·霍夫曼（David Albert Huffman）在1952年发明。　　Huffman coding is a greedy algorithm that constructs an optimal code for compressing a given string. The algorithm builds a binary tree based on the frequencies of the characters in the string, and each character’s codeword can be read by following a path from the root to the corresponding node. A move to the left corresponds to bit 0, and a move to the right corresponds to bit 1. 　　Initially, each character of the string is represented by a node whose weight is the number of times the character occurs in the string. Then at each step two nodes with minimum weights are combined by creating a new node whose weight is the sum of the weights of the original nodes. The process continues until all nodes have been combined. 　　Next we will see how Huffman coding creates the optimal code for the string AABACDACA. Initially, there are four nodes that correspond to the characters of the string:

　　The node that represents character A has weight 5 because character A appears 5 times in the string. The other weights have been calculated in the same way. 　　The first step is to combine the nodes that correspond to characters B and D, both with weight 1. The result is:

　　After this, the nodes with weight 2 are combined:

　　Finally, the two remaining nodes are combined:

　　Now all nodes are in the tree, so the code is ready. The following codewords can be read from the tree: 　　哈夫曼编码有什么用　　在计算机资料处理中，霍夫曼编码使用变长编码表对源符号（如文件中的一个字母）进行编码，其中变长编码表是通过一种评估来源符号出现概率的方法得到的，出现概率高的字母使用较短的编码，反之出现概率低的则使用较长的编码，这便使编码之后的字符串的平均长度、期望值降低，从而达到无损压缩数据的目的。例如，在英文中，e的出现概率最高，而z的出现概率则最低。当利用霍夫曼编码对一篇英文文章进行压缩时，e极有可能用一个比特来表示，而z则可能花去25个比特（不是26）。用普通的表示方法时，每个英文字母均占用一个字节，即8个比特。二者相比，e使用了一般编码的1/8的长度，z则使用了3倍多。倘若我们能实现对于英文中各个字母出现概率的较准确的估算，就可以大幅度提高无损压缩的比例。　　哈夫曼树　　霍夫曼树又称最优二叉树，是一种带权路径长度最短的二叉树。所谓树的带权路径长度，就是树中所有的叶结点的权值乘上其到根结点的路径长度（若根结点为0层，叶结点到根结点的路径长度为叶结点的层数）。树的路径长度是从树根到每一结点的路径长度之和，记为WPL=（W1*L1+W2*L2+W3*L3+…+Wn*Ln），N个权值Wi（i=1,2,…n）构成一棵有N个叶结点的二叉树，相应的叶结点的路径长度为Li（i=1,2,…n）。可以证明霍夫曼树的WPL是最小的。　　【查看】，享受生活：哈夫曼树深入剖析　　例题，CSP-J 2022 初赛，单选，第7题　　

　　总结　　哈夫曼编码，哈夫曼树，要会构造　　参考　　维基百科-霍夫曼编码

哈夫曼编码的长度不超过4_哈夫曼编码过程示意图

相关推荐