Hi, i wasn't sure about the title of my question, hope its okay. Anyway, the program i have is suppose to read a binary file then find for specific bytes like this:

bytes1 = bytearray(b'\x41\x64\x6F\x62\x65\x20')
filename = "portrait1.dng"
with open(filename, "rb") as binaryfile:
    with open("foundhex.txt", "w") as found:
        found.write("File that is analysed : " + filename + "\n")
        found.write("Date of analysis : " + str(today) + "\n")
    while True:
        read_file = bytearray(binaryfile.read(1024))
        find_bytes1 = read_file.find(bytes1, 0)
        if fine_bytes1 != -1:
            with open("foundhex.txt", "a") as found1:
                found1.write("Found 41646F626520 at : " + str(find_bytes1) + "\n")
        if not read_file:
            break

basically, it finds the bytes then writes the positions. i checked the file that is being read using a hex editor and the bytes (bytes1) that i am looking for has 12 occurences but only 9 occurences of it are "found". so now im confused. is my program not reading the entire file, thats why only 9 found? or is there something wrong with my code? as of right now im only using a 16.2mb file but later on ill be using a 8gb file. is there a difference for file sizes when reading in chunks? because i ended up changing the "size" to random numbers and found that read(901) found 11 occurences, not only 9. haven't hit 12 yet though. Please, could someone explain this to me. thank you in advance.

Recommended Answers

All 3 Replies

if not read_file:This statement is executed when bytes1 is found at the beginning of the file, offset/read_file==0). Use instead

      with open("foundhex.txt", "a") as found1:
          while True:
              read_file = bytearray(binaryfile.read(1024))
              if len(read_file):
                  find_bytes1 = read_file.find(bytes1, 0)
                  if fine_bytes1 != -1:
                      found1.write("Found 41646F626520 at : " + str(find_bytes1) + "\n")
              else:
                  break

Also this statement find_bytes1 = read_file.find(bytes1, 0)
starts at the beginning every time, so you are finding only the first sting and not any subsequent strings. Finally, for this statement read_file = bytearray(binaryfile.read(1024))
what happens if half of bytes1 is in one read, and half is in the next read?

thank you for your reply woooee. ive ended up with this code, using re.finditer

import re

with open(filename, "rb") as binaryfile:
    while True:
        read_file = binaryfile.read()
        if len(read_file):
            for find_bytes in re.finditer(bytes1, read_file):
                with open("foundhex.txt", "a") as found1:
                    found1.write("Found bytes at : " + str(find_bytes.start()) + " " +str(find_bytes.end()) + "\n")
        else:
            break

it finds all occurences of bytes1

update : using re.finditer gives me an error : unhashable type bytearray. so this is my current code im using :

BLOCKSIZE = 65536
bytes1 = bytearray(base64.b16decode(self.txt_mac))
with open(self.txt_filename, "rb") as binaryfile :
   while True:
       readfile = bytearray(binaryfile.read(BLOCKSIZE))
       if len(readfile):
          index = 0
          while index < len(readfile):
              index = readfile.find(bytes1, index)
              if index != -1:
                with open("LogFile.txt", "a") as found:
                    found.write("Found bytes1 at : " + str(index) + "\n")
                    index += 6 # +6 because len(bytes1) == 6

               else:
                   break
        else:
            break

However, instead of using "a", accumulate all the text and only write "w" to the file once.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.