vegaseat 1,735 DaniWeb's Hypocrite Team Colleague

Python and the JPEG Image File, Part 1, The Header

Intro

The JPEG image file format (.jpg) is very popular on the internet because you can pack a lot of picture information into a relatively small file. There are competing file formats like GIF and PNG. GIF is rather limited to the number of colors (8 bit = 256) compared to JPEG (24 bit = 16,777,216). JPEG can typically achieve 10:1 to 20:1 compression without visible loss. It allows you to specify a quality setting (1 - 100) with higher quality giving less compression.

JPEG header

Sharp edges in images like borders and embedded text give JPEG a hard time, that's why you can insert a text comment directly into the JPEG file header. As the name implies, the header is the first part of the JPEG file, followed by the compressed picture. In this part of the tutorial we take a look at the header of the a typical JPEG file and extract some information it contains. To make things simple, I have attached a sample JPEG image file that contains a 80x80 blue square at a resolution of 200dpi. This file also contains a text comment that can be extracted. Blue is the favorite color of vegaseat.

# print out the hex bytes of a jpeg file, find end of header, image size, and extract any text comment
# (JPEG = Joint Photographic Experts Group)
# tested with Python24    vegaseat    21sep2005

try:
    # the sample jpeg file is an "all blue 80x80 200dpi image, saved at a quality of 90"
    # with the quoted comment added
    imageFile = 'Blue80x80x200C.JPG'
    data = open(imageFile, "rb").read()
except IOError:
    print "Image file %s not found" % imageFile
    raise SystemExit

# initialize empty list
hexList = []
for ch in data:
    # make a hex byte
    byt = "%02X" % ord(ch)
    hexList.append(byt)

#print hexList  # test

print

print "hex dump of a 80x80 200dpi all blue jpeg file:"
print "(the first two bytes FF and D8 mark a jpeg file)"
print "(index 6,7,8,9 spells out the subtype JFIF)"
k = 0
for byt in hexList:
    # add spacer every 8 bytes
    if k % 8 == 0:
        print "  ",
    # new line every 16 bytes
    if k % 16 == 0:
        print
    print byt,
    k += 1
    
print
print "-"*50

# the header goes from FF D8 to the first FF C4 marker
for k in range(len(hexList)-1):
    if hexList[k] == 'FF' and hexList[k+1] == 'C4':
        print "end of header at index %d (%s)" % (k, hex(k))
        break

# find pixel width and height of image
# located at offset 5,6 (highbyte,lowbyte) and 7,8 after FF C0 or FF C2 marker
for k in range(len(hexList)-1):
    if hexList[k] == 'FF' and (hexList[k+1] == 'C0' or hexList[k+1] == 'C2'):
        #print k, hex(k)  # test
        height = int(hexList[k+5],16)*256 + int(hexList[k+6],16)
        width  = int(hexList[k+7],16)*256 + int(hexList[k+8],16)
        print "width = %d  height = %d pixels" % (width, height)

# find any comment inserted into the jpeg file
# marker is FF FE followed by the highbyte/lowbyte of comment length, then comment text
comment = ""
for k in range(len(hexList)-1):
    if hexList[k] == 'FF' and hexList[k+1] == 'FE':
        #print k, hex(k)  # test
        length = int(hexList[k+2],16)*256 + int(hexList[k+3],16)
        #print length  # test
        for m in range(length-3):
            comment = comment + chr(int(hexList[k + m + 4],16))
            #print chr(int(hexList[k + m + 4],16)),  # test
            #print hexList[k + m + 4],  # test
            
if len(comment) > 0:
    print comment
else:
    print "No comment"
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.