Python and the JPEG Image File, Part 1, The Header
The JPEG image file format (.jpg) is very popular on the internet because you can pack a lot of picture information into a relatively small file. There are competing file formats like GIF and PNG. GIF is rather limited to the number of colors (8 bit = 256) compared to JPEG (24 bit = 16,777,216). JPEG can typically achieve 10:1 to 20:1 compression without visible loss. It allows you to specify a quality setting (1 - 100) with higher quality giving less compression.
Sharp edges in images like borders and embedded text give JPEG a hard time, that's why you can insert a text comment directly into the JPEG file header. As the name implies, the header is the first part of the JPEG file, followed by the compressed picture. In this part of the tutorial we take a look at the header of the a typical JPEG file and extract some information it contains. To make things simple, I have attached a sample JPEG image file that contains a 80x80 blue square at a resolution of 200dpi. This file also contains a text comment that can be extracted. Blue is the favorite color of vegaseat.
# print out the hex bytes of a jpeg file, find end of header, image size, and extract any text comment # (JPEG = Joint Photographic Experts Group) # tested with Python24 vegaseat 21sep2005 try: # the sample jpeg file is an "all blue 80x80 200dpi image, saved at a quality of 90" # with the quoted comment added imageFile = 'Blue80x80x200C.JPG' data = open(imageFile, "rb").read() except IOError: print "Image file %s not found" % imageFile raise SystemExit # initialize empty list hexList =  for ch in data: # make a hex byte byt = "%02X" % ord(ch) hexList.append(byt) #print hexList # test print print "hex dump of a 80x80 200dpi all blue jpeg file:" print "(the first two bytes FF and D8 mark a jpeg file)" print "(index 6,7,8,9 spells out the subtype JFIF)" k = 0 for byt in hexList: # add spacer every 8 bytes if k % 8 == 0: print " ", # new line every 16 bytes if k % 16 == 0: print print byt, k += 1 print print "-"*50 # the header goes from FF D8 to the first FF C4 marker for k in range(len(hexList)-1): if hexList[k] == 'FF' and hexList[k+1] == 'C4': print "end of header at index %d (%s)" % (k, hex(k)) break # find pixel width and height of image # located at offset 5,6 (highbyte,lowbyte) and 7,8 after FF C0 or FF C2 marker for k in range(len(hexList)-1): if hexList[k] == 'FF' and (hexList[k+1] == 'C0' or hexList[k+1] == 'C2'): #print k, hex(k) # test height = int(hexList[k+5],16)*256 + int(hexList[k+6],16) width = int(hexList[k+7],16)*256 + int(hexList[k+8],16) print "width = %d height = %d pixels" % (width, height) # find any comment inserted into the jpeg file # marker is FF FE followed by the highbyte/lowbyte of comment length, then comment text comment = "" for k in range(len(hexList)-1): if hexList[k] == 'FF' and hexList[k+1] == 'FE': #print k, hex(k) # test length = int(hexList[k+2],16)*256 + int(hexList[k+3],16) #print length # test for m in range(length-3): comment = comment + chr(int(hexList[k + m + 4],16)) #print chr(int(hexList[k + m + 4],16)), # test #print hexList[k + m + 4], # test if len(comment) > 0: print comment else: print "No comment"