unzipping issues

Question

ch1zra 0 Newbie Poster

14 Years Ago

can I somehow extract single file from a zip archive without replicating directory structure inside zip file ?
for example file I need is :
archive.zip/folder1/folder2/fileIneed.doc
when I use zipfile.extract I get the file, but I get it in destination folder with same directory structure.

I'm at home now and my py file is at work, so I can't paste real code.

python

4 Contributors
11 Replies
144 Views
20 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by Gribouillis

All 11 Replies

Gribouillis 1,391 Programming Explorer

14 Years Ago

for some reason, it's working now again :
import os, time,  re,  Image, zipfile
t0 = time.clock()
path = "C:\\kontrolneliste\\docx\\"
for (path, dirs, files) in os.walk(path):
    for file in files:
        fname = file[:7]
        docx = path + '\\' + fname + '.docx'
        print docx
        destinationPath = 'c:\\aa\\' + fname + '\\'
        if not os.path.isdir(destinationPath):
            os.mkdir(destinationPath)
        sourceZip = zipfile.ZipFile(docx)
        for name in sourceZip.namelist():
            if name.find('word/media/')!= -1 :
                print re.sub('word/media/','',destinationPath)
                sourceZip.extract(name,destinationPath)
        sourceZip.close()
exectime = time.clock() - t0
print '--------------------------------------'
print 'Executed in: ', round(exectime,2), "seconds"
os.system('pause')
but again I get this structure : http://img831.imageshack.us/img831/1314/imgdn.jpg
@ Tech B
your script extracts everything, the way it should, but it stores all in one folder, overwriting all previous files.
so, is there some workarround to fix my script to extract into folder c:\aa\0000110\image1.png,
instead of
c:\aa\0000110\word\media\image1.png
or to extract em all in one run, and in next run to move them from word/media into root folders with second loop ?

I don't understand the print re.sub('word/media/','',destinationPath) . You probably meant to remove word/media/ from the path, and it should be destinationPath = re.sub(r'word[/\\]media[/\\]','',destinationPath) .

Gribouillis 1,391 Programming Explorer

14 Years Ago

tried that too (with my re.sub, and now with your too), but I am still getting the same output structure.

There seems to be a mistake: shouldn't you remove word/media/ from name instead of destinationPath ?

Gribouillis 1,391 Programming Explorer

14 Years Ago

Here is how I would try to write it

import time,  Image, zipfile
from kernilis.path import path
t0 = time.clock()
path = path("C:")/"kontrolneliste"/"docx"
word_media = path('word', 'media', '')
for (root, dirs, files) in os.walk(path):
    for file in files:
        fname = file[:7]
        docx = root/(fname + '.docx')
        print docx
        destinationPath = path('c:')/'aa'/fname
        if not destinationPath.isdir():
            destinationPath.mkdir()
        sourceZip = zipfile.ZipFile(docx)
        for name in sourceZip.namelist():
            if name. :
                name = path(*path(name).splitall())
                name = name.replace(word_media,'')
                sourceZip.extract(name,destinationPath)
        sourceZip.close()
exectime = time.clock() - t0
print '--------------------------------------'
print 'Executed in: ', round(exectime,2), "seconds"
os.system('pause')

I'm using a version of J Orendorff's very useful path module (I added and modified a few features), which I call kernilis.path. See the attached file.

This attachment is potentially unsafe to open. It may be an executable that is capable of making changes to your file system, or it may require specific software to open. Use caution and only open this attachment if you are comfortable working with zip files.

path.py_.zip (5.95 KB)

Edited 14 Years Ago by Gribouillis because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

TrustyTony 888 ex-Moderator Team Colleague Featured Poster · Answer 1 · 2010-10-20T02:15:25+00:00

You could create file of same name from read in file from ziparchive, you will not get same file attributes though:

from zipfile import ZipFile as zf
with zf('j:/Lataukset/Vb6501.zip') as f, open('MSBIND.DLL', 'wb') as out:
    out.write(f.read('Vb6501/MSBIND.DLL'))

ch1zra 0 Newbie Poster · Answer 2 · 2010-10-20T03:50:56+00:00

will check it out in the morning, thanx for the idea.

ch1zra 0 Newbie Poster · Answer 3 · 2010-10-20T11:26:57+00:00

I havent made my problem clear enough, so I'll do it now :)
I have a folder with ~2000 *.docx files that I want to loop through and extract all images.

code I have so far is following :

import os, time,  re,  Image, zipfile
t0 = time.clock()
path = 'C:\\kontrolneliste\\docx\\'
for (path, dirs, files) in os.walk(path):
    for file in files:
        fname = file[:7]
        docx = path + '\\' + fname + '.docx'
        print docx
        destinationPath = 'c:\\aa\\' + fname + '\\'
        if not os.path.isdir(destinationPath):
            os.mkdir(destinationPath)
        sourceZip = zipfile.ZipFile(docx)
        for name in sourceZip.namelist():
            print name
            if name.find('word/media/')!= -1 :
                print re.sub('word/media/','',destinationPath)
                sourceZip.extract(name,destinationPath)
        sourceZip.close()
##    print len(files)
exectime = time.clock() - t0
print '--------------------------------------'
print 'Executed in: ', round(exectime,2), "seconds"
os.system('pause')

this gives me the following error once I run it :

C:\kontrolneliste\docx\\0000110.docx
Traceback (most recent call last):
  File "C:\py\KListe_unzipper.py", line 12, in <module>
    sourceZip = zipfile.ZipFile(docx)
  File "C:\Python26\lib\zipfile.py", line 693, in __init__
    self._GetContents()
  File "C:\Python26\lib\zipfile.py", line 713, in _GetContents
    self._RealGetContents()
  File "C:\Python26\lib\zipfile.py", line 723, in _RealGetContents
    endrec = _EndRecData(fp)
  File "C:\Python26\lib\zipfile.py", line 189, in _EndRecData
    fpin.seek(-sizeEndCentDir, 2)
IOError: [Errno 22] Invalid argument

now, problem is that yesterday it worked, but it made directories within destination path like this :
c:\aa\0000110\word\media\image1.png

this morning I messed something up without backing up the working version... :!

help plz :)

Tech B 48 Posting Whiz in Training · Answer 4 · 2010-10-20T12:18:01+00:00

Tech B 48 Posting Whiz in Training

14 Years Ago

try this.

ch1zra 0 Newbie Poster · Answer 5 · 2010-10-20T13:29:57+00:00

for some reason, it's working now again :

import os, time,  re,  Image, zipfile
t0 = time.clock()
path = "C:\\kontrolneliste\\docx\\"
for (path, dirs, files) in os.walk(path):
    for file in files:
        fname = file[:7]
        docx = path + '\\' + fname + '.docx'
        print docx
        destinationPath = 'c:\\aa\\' + fname + '\\'
        if not os.path.isdir(destinationPath):
            os.mkdir(destinationPath)
        sourceZip = zipfile.ZipFile(docx)
        for name in sourceZip.namelist():
            if name.find('word/media/')!= -1 :
                print re.sub('word/media/','',destinationPath)
                sourceZip.extract(name,destinationPath)
        sourceZip.close()
exectime = time.clock() - t0
print '--------------------------------------'
print 'Executed in: ', round(exectime,2), "seconds"
os.system('pause')

but again I get this structure : http://img831.imageshack.us/img831/1314/imgdn.jpg

@ Tech B
your script extracts everything, the way it should, but it stores all in one folder, overwriting all previous files.
so, is there some workarround to fix my script to extract into folder c:\aa\0000110\image1.png,
instead of
c:\aa\0000110\word\media\image1.png
or to extract em all in one run, and in next run to move them from word/media into root folders with second loop ?

ch1zra 0 Newbie Poster · Answer 6 · 2010-10-20T13:56:28+00:00

tried that too (with my re.sub, and now with your too), but I am still getting the same output structure.

ch1zra 0 Newbie Poster · Answer 7 · 2010-10-20T14:53:16+00:00

first I get syntax error for line :
if name. :
then

Traceback (most recent call last):
  File "C:\py\_____3.py", line 2, in <module>
    from kernilis.path import path
ImportError: No module named kernilis.path

then I renamed path.py with kernilis.path.py into lib/site-packages, got error ,then renamed py file back to path.py and changed code to this :

import time,  Image, zipfile
from path import path

that gave me

File "C:\py\_____3.py", line 5, in <module>
    word_media = path('word', 'media', '')
TypeError: 'path' object is not callable

so now im confused :)
I'll try some things and will keep this thread up to date. I hope I'll solve this and maybe some1 else will benefit from it one day too :]

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 8 · 2010-10-20T14:56:58+00:00

first I get syntax error for line :
if name. :
then
Traceback (most recent call last):
  File "C:\py\_____3.py", line 2, in <module>
    from kernilis.path import path
ImportError: No module named kernilis.path
then I renamed path.py with kernilis.path.py into lib/site-packages, got error ,then renamed py file back to path.py and changed code to this :
import time,  Image, zipfile
from path import path
that gave me
File "C:\py\_____3.py", line 5, in <module>
    word_media = path('word', 'media', '')
TypeError: 'path' object is not callable
so now im confused :)
I'll try some things and will keep this thread up to date. I hope I'll solve this and maybe some1 else will benefit from it one day too :]

It's because at last 4 I kept your variable named path. It erases your import path.

unzipping issues

Recommended Answers Collapse Answers

All 11 Replies

Recommended Answers