Rss Feed
Tweeter button
Facebook button
Technorati button
Reddit button
Myspace button
Linkedin button
Webonews button
Delicious button
Digg button
Flickr button
Stumbleupon button
Newsvine button

PyLZMA

Platform independent python bindings for the LZMA compression library.

PyLZMA

Impressed by the spectacular compression ratios of the Inno Setup compiler, I wanted to use the great compression algorithm LZMA in my own Python programs. As the LZMA SDK by Igor Pavlov is Open Source, it was no problem writing some Python wrappers for the C library. They currently run fine both on Windows and Linux, so hopefully, I can provide a tool that enables the user to read and create 7-zip compatible archives on Linux (as this is not supported by the original 7-zip).

Comparison

Here are the compression results of different data files with the zlib, bz2 and pylzma modules:

Description Original zlib bz2 pylzma
SVN export of version 0.1.0 542.720 100% 97.923 18.04% 79.660 14.68% 74.009 13.64%
20 JPEG wallpapers 7.178.240 100% 6.989.049 97.36% 7.022.040 97.82% 698.0443 97.24%
libxml2-2.6.22.tar 34.232.320 100% 4.567.489 13.34% 3.408.457 9.96% 2.475.885 7.23%

Depending on your input data, the differences between zlib/bz2 and pylzma may be even bigger!

Features

  • Compression / decompression of a single block of data
  • Compression from a file-like object (must provide a read method)
  • Streaming decompression through multiple calls to decompress
  • An initial library that supports reading of 7-zip archives (both solid and non-solid)
  • Compiles and runs on Windows, Linux and OSX
  • Multithreaded compression on Windows
  • Built with LZMA SDK 4.65

Download

You can download the binaries and the source code for the wrappers from the Python Package Index.

For building, simply run:

python setup.py build

Afterwards, you will find a file pylzma.pyd in the directory build/lib.win32-<PythonVersion> that can get imported by Python. On linux, the file will be called pylzma.so and can be found in a directory called build/lib.linux-<arch>-<PythonVersion>.

Compilation has been tested with Microsoft Visual Studio 2003, GCC 3 (Linux, Cygwin), GCC 4 (Linux) but should work with any ANSI C compiler. Please let me know if you encounter any problems.

Installation using Python eggs

If you installed the EasyInstall package, you can install the latest version of pylzma using the following command:

easy_install pylzma

Refer to the EasyInstall documentation for further details about installing Python eggs. EasyInstall queries the Python Package Index and automatically fetches the latest release.

Git repository

To get access to my development repository, head your browser to the following URL: http:/github.com/fancycode/pylzma

Third-party ports

You can find a port to FreeBSD on freshports.org.

A MacOS X port is maintained at darwinports.com.

Bugs

Please bring all issues to my Bugzilla bugtracker.

If you like this software, please give me some feedback!

Wednesday, April 7th, 2010 63 Comments

63 Comments to PyLZMA

  • Jonathan Harper says:

    Will there be a port of PyLZMA to Python 3 sometime soon?

  • Bussiere says:

    i’ve got an error under windows :
    C:\Temp\fancycode-pylzma-v0.4.2-0-gda25a6e\fancycode-pylzma-f6adfd5>python2 setu
    p.py install >> result.txt
    Traceback (most recent call last):
    File “setup.py”, line 147, in
    zip_safe = False,
    File “C:\DEV\Python2\lib\distutils\core.py”, line 152, in setup
    dist.run_commands()
    File “C:\DEV\Python2\lib\distutils\dist.py”, line 953, in run_commands
    self.run_command(cmd)
    File “C:\DEV\Python2\lib\distutils\dist.py”, line 972, in run_command
    cmd_obj.run()
    File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
    command\install.py”, line 76, in run
    File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
    command\install.py”, line 85, in do_egg_install
    File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
    dist.py”, line 395, in get_command_class
    File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\pkg_resourc
    es.py”, line 1954, in load
    File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
    command\easy_install.py”, line 21, in
    File “C:\DEV\Python2\lib\site-packages\setuptools-0.6c11-py2.7.egg\setuptools\
    package_index.py”, line 2, in
    File “C:\DEV\Python2\lib\urllib2.py”, line 94, in
    import httplib
    File “C:\DEV\Python2\lib\httplib.py”, line 70, in
    import socket
    File “C:\DEV\Python2\lib\socket.py”, line 47, in
    import _socket
    ImportError: DLL load failed: %1 nÆest pas une application Win32 valide.

  • Steve Kieu says:

    Hello,

    I tried to compile on centos 5.5 – the 64 bit version goes okay but the i386 machne got error:

    In file included from src/7zip/C/CpuArch.c:4:
    src/7zip/C/CpuArch.h:136: warning: function declaration isn’t a prototype
    src/7zip/C/CpuArch.h:137: warning: function declaration isn’t a prototype
    src/7zip/C/CpuArch.c:127: warning: function declaration isn’t a prototype
    src/7zip/C/CpuArch.c:160: warning: function declaration isn’t a prototype
    src/7zip/C/CpuArch.c: In function ‘MyCPUID’:
    src/7zip/C/CpuArch.c:75: error: can’t find a register in class ‘BREG’ while reloading ‘asm’
    error: command ‘gcc’ failed with exit status 1

    Please help.

    Many thanks for your supports

  • This is great, just what I am looking for.
    But I cannot build it, as it appears I need a C-compiler.
    Is there anyway I can circumvent this?

  • chen says:

    Hi,

    I want to find LZMA compression algorithm specification which is

    unabridged and detailed.Where can I find this?
    The information is not unabridged on http://7-zip.org/7z.html.

    Contect me
    EMAIL: sdfuch@yahoo.com.cn

    Thanks

    • Please contact the original authors of 7-Zip for any questions about the LZMA format. My library is just a wrapper and I don’t know nothing about the algorithm itself.

  • Jasonye says:

    Does it provide the readline interface, just like that in bz2 ?

  • Xavier says:

    Hey (:
    I’ve been trying to compile pylzma for windows 7 /python 2.6.6 amd64 , but I’m definitely out of luck …

    Here’s what I’ve tried:

    – Already had installed MSVC compiler for AMD64 with Visual Studio 2008 (v9.0)
    – Cloned the GIT repo
    – tried setup.py install, but got bunch of compiler error
    – tried to change all .c file extension to .cpp and modified the setup.py accordingly
    – Now they compile fine (after few tweaks in CpuArch and another file which had a switch/case block), but now the linker choke with these errors (Might look really bad in this little text field):

    Creating library build\temp.win-amd64-2.6\Release\src/pylzma\pylzma.lib and object build\temp.win-amd64-2.6\Release\src/pylzma\pylzma.exp
    pylzma.obj : error LNK2001: unresolved external symbol “char const * const doc_decompress” (?doc_decompress@@3QBDB)

    pylzma.obj : error LNK2001: unresolved external symbol “struct _object * __cdecl pylzma_decompress(struct _object *,struct _object *)” (?pylzma_decompress@@YAPEAU_object@@PEAU1@0@Z)

    pylzma.obj : error LNK2001: unresolved external symbol “char const * const doc_compress” (?doc_compress@@3QBDB)
    Aes.obj : error LNK2019: unresolved external symbol “void __cdecl AesCtr_Code_Intel(unsigned int *,unsigned char *,unsigned __int64)” (?AesCtr_Code_Intel@@YAXPEAIPEAE_K@Z) referenced in function AesGenTables

    Aes.obj : error LNK2019: unresolved external symbol “void __cdecl AesCbc_Decode_Intel(unsigned int *,unsigned char *,unsigned __int64)” (?AesCbc_Decode_Intel@@YAXPEAIPEAE_K@Z) referenced in function AesGenTables
    Aes.obj : error LNK2019: unresolved external symbol “void __cdecl AesCbc_Encode_Intel(unsigned int *,unsigned char *,unsigned __int64)” (?AesCbc_Encode_Intel@@YAXPEAIPEAE_K@Z) referenced in function AesGenTables
    AesOpt.obj : error LNK2019: unresolved external symbol _mm_aesenclast_si128 referenced in function “void __cdecl AesCbc_Encode_Intel(union __m128i *,union __m128i *,unsigned __int64)” (?AesCbc_Encode_Intel@@YAXPEAT__m128i@@0_K@Z)
    AesOpt.obj : error LNK2019: unresolved external symbol _mm_aesenc_si128 referenced in function “void __cdecl AesCbc_Encode_Intel(union __m128i *,union __m128i *,unsigned __int64)” (?AesCbc_Encode_Intel@@YAXPEAT__m128i@@0_K@Z)
    AesOpt.obj : error LNK2019: unresolved external symbol _mm_aesdeclast_si128 referenced in function “void __cdecl AesCbc_Decode_Intel(union __m128i *,union __m128i *,unsigned __int64)” (?AesCbc_Decode_Intel@@YAXPEAT__m128i@@0_K@Z)
    AesOpt.obj : error LNK2019: unresolved external symbol _mm_aesdec_si128 referenced in function “void __cdecl AesCbc_Decode_Intel(union __m128i *,union __m128i *,unsigned __int64)” (?AesCbc_Decode_Intel@@YAXPEAT__m128i@@0_K@Z)
    build\lib.win-amd64-2.6\pylzma.pyd : fatal error LNK1120: 10 unresolved externals
    error: command ‘”C:\Program Files (x86)\Microsoft Visual Studio 9.0\VC\BIN\amd64\link.exe”‘ failed with exit status 1120

    Any idea how I could make msvc to compile the .c files or how to solve those link issues? :/ I’ve some problem figuring what they mean, even with msdn (and my lack of c/cpp knowledge)

    Thanks a lot (: and can’t wait to use the python binding.

    Ps: Or even better, is there a way to install pylzma for windows 7 x64 and python 2.6.6?

  • […] Hello guys,I’ve already posted this question on the authors website (and still waiting for moderation), but I thought I might ask here as well.I’ve been trying […]

  • Stephen Evans says:

    I’ve just successfully built and installed pylzma-0.4.3 on Windows XP Pro with Python 2.7.1 using mingw. No problems. (Python installed with the Windows x86 MSI installer from python.org).

    pylzma is being used on (pickle) files that are transferred over slightly unreliable 9600 baud dialup connections. Half the size of zip compression. Reduced transfer time = fewer redials = lower costs and much less frustration. Thank you.

  • james says:

    help!
    if i try to install with easy_install, it exits with
    File “C:\Python26\lib\site-packages\setuptools-0.6c11-py2.6.egg\setuptools\package_index.py”, line 475, in fetch_distribution
    AttributeError: ‘NoneType’ ibject has no attribute ‘clone’
    you dont happen to have a simple, pre-built distribution do you? xxx

  • Edward says:

    What the heck?

    It says “You can download the binaries and the source code for the wrappers below,” but there is no link anywhere on this page that I can find.

    It looks like the best thing to do is go to http://www.joachim-bauch.de/
    From there it says “You can get the source tarball from the Python Package Index or github.”

    But the current page (projects/pylzma) is still the #1 hit on google for “python lzma” and although there is a link to github, I think the Python Package Index is easier for most people to download from, and certainly feels less scary than trying to fetch what may or may not be an experimental bleeding edge version in the source repository.

    • Well, by using “easy_install” as described above, the Python Package Index is queried for the latest released version. The text above was copied from my old page und surely is a bit unclear. I’ll update it to refer to the Python PI.

  • Stu says:

    Hi Joachim, I’d like to install PyLZMA and I’ve got error like this:

    setup.py:92: UnsupportedPlatformWarning: Multithreading is not supported on the platform “linux2”,
    please contact mail@joachim-bauch.de for more informations.
    please contact mail@joachim-bauch.de for more informations.””” % (sys.platform), UnsupportedPlatformWarning)
    building ‘pylzma’ extension
    creating build/temp.linux-i686-2.6
    creating build/temp.linux-i686-2.6/src
    creating build/temp.linux-i686-2.6/src/pylzma
    creating build/temp.linux-i686-2.6/src/sdk
    creating build/temp.linux-i686-2.6/src/7zip
    creating build/temp.linux-i686-2.6/src/7zip/C
    creating build/temp.linux-i686-2.6/src/compat
    gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -DWITH_COMPAT=1 -Isrc/sdk -I/usr/include/python2.6 -c src/pylzma/pylzma.c -o build/temp.linux-i686-2.6/src/pylzma/pylzma.o
    src/pylzma/pylzma.c:26: fatal error: Python.h: No such file or directory
    compilation terminated.
    error: command ‘gcc’ failed with exit status 1

    Thank you for your nice wrapper :)

  • Aaron says:

    Is there API documentation available?

    I can’t seem to find any hosted online. I’ve tried running help(pylzma), but many of the doc strings are marked as “todo” or do not explain any of the parameters. Some of the doc strings invite you to instead run “help(type(x))” where I assume x is the class/module/function. This is not helpful either.

  • djamba says:

    How convert lastwritetime file to python datetime?

  • Duane Fields says:

    Has anyone tried decompressing data generated from this JS lZMA port?

    http://nmrugg.github.com/LZMA-JS/LZMA-JS_demo_simple/LZMADemo_simple.html

    I’ve been unable to get it to work, though it seems like it should be compatible

  • Duane Fields says:

    Not that it’s the answer, but

    p = Popen([‘/opt/local/bin/unlzma’], stdout=PIPE,stdin=PIPE)
    payload = p.communicate(input=bytes)[0]

    Works perfectly decoding data from LZMA-JS, so I’m guessing its some difference between the command line version and the JS version.

  • Stu says:

    Hi Joachim, is there any pylzma egg package for Linux? I am browsing through pylzma repository and I’ve only found egg package for Windows.

    Thank you :)

    • There is no binary egg for Linux as the dependencies might vary between the different distributions. However it’s a lot easier on Linux install a compiler and build the modules than on Windows.

      Just get the source tarball and execute python setup.py bdist_egg to build the egg after extracting.

  • pash says:

    Hi,
    is there an example or documentation in how to use this library?
    e.g. compressing Files?

    Found no clue’s in the sourcecode or arround the web.
    Thank you.

  • Victor Aguilera says:

    Does anyone has a clear example of how to descompress a 7zip file?

  • […] for quite a while and couldn’t find a python implementation of ppmz, but I did find another method ported to python with lzma, the compressor behind 7zip. Lzma uses a different implementation of Lempel-Ziv, […]

  • David says:

    Hello,
    Sorry…I am looking for documentation…
    Install OK
    I just try to decompress a 7z file… without success
    Tx

  • 100 Workout says:

    Hey Joachim – you might never read this but thanks for writing up this post and showing off these libraries. I was trying to find a solution to use in Windows so I’ll download that package from PyPI and see what I can do.

    Cheers,

    -FB

  • Luca says:

    Hi,

    I’m quite new to handling binary files, I’m looking for a python implementation to decompress a 7zip and save the result as a new file.
    Any chance of pointing me to a small example illustrating this? (I understand this goes beyond the scope of pylzma…..still it would be very helpful to me to understand how to achieve it :)

    Thanks
    L

    • SpootDev says:

      Below is code I use to convert 7z to zip. Should be a good start for you.

      # open the archive file
      fp = open(archive_rom_name, ‘rb’)
      archive = Archive7z(fp)
      filenames = list(archive.getnames())
      z = zipfile.ZipFile(file_name + “.zip”, “w”,compression)
      # loop through all files in archive and write to zip
      for archive_file_name in filenames:
      try:
      zip_IO_buffer = StringIO
      cf = archive.getmember(archive_file_name)
      zip_IO_buffer = cf.read()
      z.writestr(archive_file_name, zip_IO_buffer)
      #del zip_IO_buffer
      except:
      z.close()
      os.remove(archive_file_name)
      break;
      z.close()

  • […] for quite a while and couldn’t find a python implementation of ppmz, but I did find another method ported to python with lzma, the compressor behind 7zip. Lzma uses a different implementation of Lempel-Ziv, […]

  • SpootDev says:

    How in the world does one create a 7z with multiple files in it? My google fu is failing me and I just can’t get it to work. Below is the method I’m using to 7z one file and it works fine. Do, I header/data/header/data, header/header/data/data or sumthing else entirely as I’ve tried em both?

    import struct
    # import compression mods
    import pylzma
    from py7zlib import Archive7z
    from StringIO import StringIO

    # open file and create 7z
    iconfile = open(“testbin.col”,”rb”)
    file_data = iconfile.read()
    fin = open(“test_encode.7z”, “wb”)
    archive_data = StringIO()
    comp_data = pylzma.compressfile(StringIO(file_data))

    # LZMA header
    result = comp_data.read(5)
    # size of uncompressed data
    result += struct.pack(‘<Q', len(file_data))
    # compressed data
    archive_data.write(result + comp_data.read())

    fin.write(archive_data.getvalue())
    archive_data.close()
    fin.close()
    iconfile.close()

    • Joshua Bowman says:

      PyLZMA doesn’t work with regular 7z files, only single-file LZMA archives. (Also not for LZMA2/XZ.) So quick answer: Pipe through tar before you compress. If you need to work with regular 7z files, you need to use 7z.dll or 7z.so.

  • stupon says:

    How to Use pylzma?
    There is a file l2.exe.zinn it must decompress, how do it?

    http://cdn.inn.ru/ncsoft/lineage2/patch/live_nb/system/l2.exe.zinn
    Size l2.exe 3130696

  • SpootDev says:

    I see, I guess that’s why I couldn’t get it to work. :) Guess I’ll just end up adding full 7z as a dependency of my current project. Thanks.

  • JimmyZ says:

    would you please add a win64 binary package? it’s a PITA to compile it :(

  • Nils says:

    I get the message: “DeprecationWarning: matchfinder selection is deprecated and will be ignored”.
    Why is it deprecated?
    How can i use 2 bytes hashing?

  • Terry Metcalf says:

    I’m using IronPython 2.7. Unfortunately, IronPython does not have setuptools.

    How would I go about manually compiling the library?

    T

  • Jordan says:

    Just wanted to say thanks for releasing this. It worked like a charm in my project. I took the python 3.2 fork off github, and it compiled and worked without a hitch for python 2.6, 2.7 and 3.2.

  • Alain says:

    Hi!

    Anyone can help me to make 2 little python script based on this pylzma:

    1. 7zcompress input.ext

    Compress with 7z WITHOUT 7z HEADER to input.ext.7z
    No need to be 7z compatible! 7z format contains header, file informations, and I no need it! I want a pure stream/string compressor based on 7z.
    I will use it from Linux CLI and need a SMALLEST file size.

    2. 7zdecompress input.ext.7z
    Output will be input.ext

    Thanks,
    Alain

  • jonty says:

    Hi,
    What are the pros/cons of using pylzma over pyliblzma?

  • Hamish says:

    Hi Joachim,

    thanks very much for your port of LZMA – I find it very useful for filtering archives of network data. However, I have a problem decompressing archives which are generated incrementally adding two or more files simultaneously. I have found that the assumption in Archive7z.__init__ that “every file has it’s own folder” does not hold. I appreciate this may be a cause of “Don’t Do That!”. I am sure I should raise this on your bugzilla, but you do not seem to have a category for py7zlib bugs?

    In order to work around the problem I found that (1) SubstreamsInfo.__init__ contained a bug:
    Where id == PROPERTY_SIZE, the sum must be set to zero before the inner loop (otherwise the total is carried across and incorrect sizes (often negative!) result.

    (2) In order to more easily process the unpacking info, I changed the loop and its preamble in Archive7z.__init__ to:

    self.solid = packinfo.numstreams == 1
    if self.solid:
    # the files are stored in substreams
    if hasattr(subinfo, ‘unpacksizes’):
    unpacksizes = subinfo.unpacksizes
    else:
    unpacksizes = [x.unpacksizes[0] for x in folders]
    else:
    # check every file has its own folder with compressed data
    if unpackinfo.numfolders == files.numfiles:
    unpacksizes = [x.unpacksizes[0] for x in folders]
    else:
    unpacksizes = subinfo.unpacksizes

    src_pos = self.afterheader
    maxsize = (self.solid and packinfo.packsizes[0]) or None

    idx2 = 0
    for fidx in range(unpackinfo.numfolders):
    folder = folders[fidx]

    pos = 0
    old_src_pos = src_pos
    numps = subinfo.numunpackstreams[fidx]
    for ssidx in range(numps):
    info = files.files[idx2]
    if info[’emptystream’]:
    continue

    info[‘compressed’] = (not self.solid and packsizes[fidx]) or None
    filesize = unpacksizes[idx2]
    info[‘uncompressed’] = filesize
    file = ArchiveFile(info, pos, src_pos,
    # unpacksizes[fidx],
    filesize,
    folder, self, maxsize=maxsize)
    if subinfo.digestsdefined[idx2]:
    file.digest = subinfo.digests[idx2]
    self.files.append(file)
    pos += unpacksizes[idx2]

    idx2 += 1

    src_pos = old_src_pos
    if not self.solid:
    src_pos += packsizes[fidx]

    My apologies for the length of this post, and any poor code formatting. I should also add that I have not conducted exhastive testing on my workaround – it just works for the kind of archives I am encoundering.

    Regards,
    Hamish

  • kbec says:

    Is there a chance to get py7ziplib.Archive7z works with archive’s subdirs on Windows? I got “IndexError: list index out of range” if opened archive contains directories.

  • […] want to use PyLZMA to extract a file from an archive (e.g. test.7z) and extract it to the same […]

  • […] I check the documentation on the authors page http://www.joachim-bauch.de/projects/pylzma/ and go to the designated folder I see the following […]

  • […] I’m sure that there might be some more obscure formats with better compression, but lzma is the best, of those that are well supported. There are some python bindings here. […]

  • […] I check the documentation on the authors page http://www.joachim-bauch.de/projects/pylzma/ and go to the designated folder I see the following […]

  • Leave a Reply