Show HN: 带有近似匹配的 Unix 压缩算法
1 分•作者: keepamovin•13 天前
LZW 算法被用于压缩,也应用于 GIF 格式。它是一种非常优雅且简单的算法(基于学习词典,并将源数据编码为它们的索引),在极限情况下,它会收敛到源数据的香农熵。<p>2013 年,我正在研究生物信息学,并有一个想法,将序列比对和编辑脚本之类的技术应用于压缩,而不是像 LZW 那样仅仅在字符串末尾添加内容。因此,LZW-X 的想法很久以前就诞生了,但直到最近,借助人工智能的力量,我才能够正确地实现和测试它。<p>这是它的正确实现,它揭示了我所直觉到的:使用这种方法可以获得收益。我认为这只是一个起点,是进一步探索的开端。<p>请查看:<a href="https://github.com/BrowserBox/LZW-X" rel="nofollow">https://github.com/BrowserBox/LZW-X</a>
查看原文
LZW is the algorithm used in compress and also in GIF. It is a beautifully elegant and simple algorithm (based on learning a dictionary of words, and encoding the source as their indices) that converges in the limit on the Shannon entropy of the source.<p>In 2013, I was studying bioinformatics and had an idea to apply something like sequence alignment and edit scripts to compression instead of just, as LZW, addition at the end of the string. So, the idea for LZW-X was born long ago, but it wasn't until recently, by the power of AI, that I could implement and test it properly.<p>This is that proper implementation and it reveals what I intuited: that there are gains to be had using a method like this. I consider this a first rung, a starting point for further exploration.<p>Check it out: <a href="https://github.com/BrowserBox/LZW-X" rel="nofollow">https://github.com/BrowserBox/LZW-X</a>