Using Seek method to read on from a file header location in a bitstream

Question

Banjoplucker 0 Newbie Poster

14 Years Ago

Please look at my code at:

http://www.bpaste.net/show/11961/

I appear to have gone down a "blind alley" here. Having identified the location of the Word perfect headers, I need to verify that they are indeed genuine by reading the hex value 10 bytes from the beginning of the header. If it is \x0A, then I can confirm the header is genuine.

I am trying to use the "read", "seek" and "tell" methods to report the presence \x0a, 10 bytes in from the discovered header location. The size of the bitstream , meant that I had to read it in buffer chunks, which was fine for reporting the header locations. I would appear to be restricted to the While data loop.

Grateful for any pointers.

Thanks

Banjoplucker

python

3 Contributors
6 Replies
121 Views
16 Hours Discussion Span
Latest Post 14 Years Ago Latest Post by richieking

All 6 Replies

Gribouillis 1,391 Programming Explorer

14 Years Ago

I would suggest using a helper class like this one

import re

class Reader(object):
    header_re = re.compile(r"\xFF\x57\x50\x43")
    check_index = 10

    def __init__(self, ifile, bufsize = 4096):
        self.ifile = ifile
        self.start = 0 # absolute file position at the beginning of the buffer
        self.bufsize = bufsize
        self.buffer = ""
    
    @property
    def end(self):
        """The absolute file position at the end of the buffer.
        This is also the current file position in the ifile
        """
        return self.start + len(self.buffer)
    
    def goto(self, start):
        """Position the beginning of the buffer at an absolute start offset in the file"""
        if self.start <= start <= self.end:
            if self.start < start:
                self.buffer = self.buffer[start - self.start:]
                self.start = start
            if len(self.buffer) < self.bufsize:
                self.buffer += self.ifile.read(self.bufsize - len(self.buffer))
        else:
            self.ifile.seek(start)
            self.start = start
            self.buffer = self.ifile.read(self.bufsize)
            
    def report_headers():
        pos = 0
        while True:
            pos = pos if pos > self.start else self.end
            self.goto(pos)
            if len(self.buffer) < self.check_index + 1:
                return
            for match in self.header_re.finditer(self.buffer):
                abspos = self.start + match.start()
                if self.end < abspos + self.check_index + 1:
                    pos = abspos
                    break # restart finditer from abspos with a fresh buffer
                else:
                    if self.buffer[match.start() + self.check_index] == "\x0a":
                        log_file.write("Word Perfect header found at offset: %d\n" % abspos))
                        print "Word Perfect header found at offset: %d\n" % abspos)


if __name__ == "__main__":
    data_file = 'images' + '\\' + image_file
    print "Please wait while %s is read in" %data_file
    reader = Reader(open(data_file,"rb"))
    reader.report_headers()
    print "The total size of the file in bytes is: ", reader.end

Edited 14 Years Ago by Gribouillis because: n/a

Gribouillis 1,391 Programming Explorer

14 Years Ago

There is a self argument missing in report_headers(). This code is currently untested !

"If it's not tested, it's broken" (famous XP prophecy)

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

richieking 44 Master Poster · Answer 1 · 2010-12-05T19:16:24+00:00

Try this.....

import re
WORDPERFECT="\x0a"

with open(wordsearch,"rb")as word:
     buff=word.read()# You can also just read a small buffer to check. word.read(10)
     if re.search("^(\\x0a)$", buff,re.M ):
         print("Wordperfect registered")
         print(buff.find(WORDPERFECT))

Banjoplucker 0 Newbie Poster · Answer 2 · 2010-12-05T21:54:28+00:00

I would suggest using a helper class like this one

import re

class Reader(object):
    header_re = re.compile(r"\xFF\x57\x50\x43")
    check_index = 10

    def __init__(self, ifile, bufsize = 4096):
        self.ifile = ifile
        self.start = 0 # absolute file position at the beginning of the buffer
        self.bufsize = bufsize
        self.buffer = ""
    
    @property
    def end(self):
        """The absolute file position at the end of the buffer.
        This is also the current file position in the ifile
        """
        return self.start + len(self.buffer)
    
    def goto(self, start):
        """Position the beginning of the buffer at an absolute start offset in the file"""
        if self.start <= start <= self.end:
            if self.start < start:
                self.buffer = self.buffer[start - self.start:]
                self.start = start
            if len(self.buffer) < self.bufsize:
                self.buffer += self.ifile.read(self.bufsize - len(self.buffer))
        else:
            self.ifile.seek(start)
            self.start = start
            self.buffer = self.ifile.read(self.bufsize)
            
    def report_headers():
        pos = 0
        while True:
            pos = pos if pos > self.start else self.end
            self.goto(pos)
            if len(self.buffer) < self.check_index + 1:
                return
            for match in self.header_re.finditer(self.buffer):
                abspos = self.start + match.start()
                if self.end < abspos + self.check_index + 1:
                    pos = abspos
                    break # restart finditer from abspos with a fresh buffer
                else:
                    if self.buffer[match.start() + self.check_index] == "\x0a":
                        log_file.write("Word Perfect header found at offset: %d\n" % abspos))
                        print "Word Perfect header found at offset: %d\n" % abspos)


if __name__ == "__main__":
    data_file = 'images' + '\\' + image_file
    print "Please wait while %s is read in" %data_file
    reader = Reader(open(data_file,"rb"))
    reader.report_headers()
    print "The total size of the file in bytes is: ", reader.end

Thanks, Gribouillis,

I'm most grateful for the reply and the generosity of your time. Steep learning curve!!

Regards

Banjoplucker

Banjoplucker 0 Newbie Poster · Answer 3 · 2010-12-05T21:57:06+00:00

Try this.....

import re
WORDPERFECT="\x0a"

with open(wordsearch,"rb")as word:
     buff=word.read()# You can also just read a small buffer to check. word.read(10)
     if re.search("^(\\x0a)$", buff,re.M ):
         print("Wordperfect registered")
         print(buff.find(WORDPERFECT))

Thanks, Richie,

I will look closer at this as it should be useful for other attribute strings further past the point I wish to check, first off.

richieking 44 Master Poster · Answer 4 · 2010-12-06T01:31:50+00:00

well i was once like you. ;)

if you are happy.... then close this thread safely

Using Seek method to read on from a file header location in a bitstream

Recommended Answers Collapse Answers

All 6 Replies

Recommended Answers