Find bytes from a read() does not find all occurences

Question

nadiam 0 Posting Pro in Training

7 Years Ago

Hi, i wasn't sure about the title of my question, hope its okay. Anyway, the program i have is suppose to read a binary file then find for specific bytes like this:

bytes1 = bytearray(b'\x41\x64\x6F\x62\x65\x20')
filename = "portrait1.dng"
with open(filename, "rb") as binaryfile:
    with open("foundhex.txt", "w") as found:
        found.write("File that is analysed : " + filename + "\n")
        found.write("Date of analysis : " + str(today) + "\n")
    while True:
        read_file = bytearray(binaryfile.read(1024))
        find_bytes1 = read_file.find(bytes1, 0)
        if fine_bytes1 != -1:
            with open("foundhex.txt", "a") as found1:
                found1.write("Found 41646F626520 at : " + str(find_bytes1) + "\n")
        if not read_file:
            break

basically, it finds the bytes then writes the positions. i checked the file that is being read using a hex editor and the bytes (bytes1) that i am looking for has 12 occurences but only 9 occurences of it are "found". so now im confused. is my program not reading the entire file, thats why only 9 found? or is there something wrong with my code? as of right now im only using a 16.2mb file but later on ill be using a 8gb file. is there a difference for file sizes when reading in chunks? because i ended up changing the "size" to random numbers and found that read(901) found 11 occurences, not only 9. haven't hit 12 yet though. Please, could someone explain this to me. thank you in advance.

python

2 Contributors
3 Replies
395 Views
2 Weeks Discussion Span
Latest Post 7 Years Ago Latest Post by nadiam

All 3 Replies

woooee 814 Nearly a Posting Maven

7 Years Ago

if not read_file:This statement is executed when bytes1 is found at the beginning of the file, offset/read_file==0). Use instead

      with open("foundhex.txt", "a") as found1:
          while True:
              read_file = bytearray(binaryfile.read(1024))
              if len(read_file):
                  find_bytes1 = read_file.find(bytes1, 0)
                  if fine_bytes1 != -1:
                      found1.write("Found 41646F626520 at : " + str(find_bytes1) + "\n")
              else:
                  break

Also this statement find_bytes1 = read_file.find(bytes1, 0)
starts at the beginning every time, so you are finding only the first sting and not any subsequent strings. Finally, for this statement read_file = bytearray(binaryfile.read(1024))
what happens if half of bytes1 is in one read, and half is in the next read?

Edited 7 Years Ago by woooee

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

nadiam 0 Posting Pro in Training · Answer 1 · 2018-02-15T08:41:55+00:00

thank you for your reply woooee. ive ended up with this code, using re.finditer

import re

with open(filename, "rb") as binaryfile:
    while True:
        read_file = binaryfile.read()
        if len(read_file):
            for find_bytes in re.finditer(bytes1, read_file):
                with open("foundhex.txt", "a") as found1:
                    found1.write("Found bytes at : " + str(find_bytes.start()) + " " +str(find_bytes.end()) + "\n")
        else:
            break

it finds all occurences of bytes1

nadiam 0 Posting Pro in Training · Answer 2 · 2018-03-01T15:32:57+00:00

update : using re.finditer gives me an error : unhashable type bytearray. so this is my current code im using :

BLOCKSIZE = 65536
bytes1 = bytearray(base64.b16decode(self.txt_mac))
with open(self.txt_filename, "rb") as binaryfile :
   while True:
       readfile = bytearray(binaryfile.read(BLOCKSIZE))
       if len(readfile):
          index = 0
          while index < len(readfile):
              index = readfile.find(bytes1, index)
              if index != -1:
                with open("LogFile.txt", "a") as found:
                    found.write("Found bytes1 at : " + str(index) + "\n")
                    index += 6 # +6 because len(bytes1) == 6

               else:
                   break
        else:
            break

However, instead of using "a", accumulate all the text and only write "w" to the file once.

Find bytes from a read() does not find all occurences

Recommended Answers Collapse Answers

All 3 Replies

Recommended Answers