Hi there. I'm new to Python. I started learning it 3 days ago. I've figured out everything by myself so far, but this is stumping me:

Whenever I load a file such as a html document or a text file, it loads fine. But when I load this gif or png file (probably many other formats, too), it only loads the first few bits of it. It happens in both urlopen() and normal file open().

E.g. http://www.majhost.com/gallery/Lijik/Star-Wars-Figures-1/ewjclay.png

Only loads:

‰PNG

Here is my code. It's a programme I made (the first one I've ever made, actually - in the past I always lost interest half-way) to help people learn regular expressions (because when I was learning them yesterday I found it a hassle using the Python interpreter). It's helpful for anybody using regexps, though, I reckon, because it's a great testing ground. The other programme I'm making downloads all the images from a website by applying a regexp to the HTML code and then forming absolute links to the images and putting them in a set to remove duplicates. It's currently just a script but tomorrow I'll make it a programme where you can preview the images and select which ones to save to your computer.

Also, any comments on my coding style - what I'm doing wrong, what I'm doing inefficiently, etc, would be appreciated.

Thanks!

import urllib
import re
import wx

# Welcome to the Regular Express!
# Code by Fuse

# source = file name or URL
# contents = what's inside the file loaded
# patternArea = what is the regexp to use?
# looadArea = what is the url or file to load?

FRAME_WIDTH = 500
FRAME_HEIGHT = 600

class MainFrame(wx.Frame):

    def __init__(self):
        wx.Frame.__init__(self, None, title = 'The Regular Express',
                      pos = (200, 75), size=(FRAME_WIDTH, FRAME_HEIGHT))

        self.background = wx.Panel(self)

        self.infoBtn = wx.Button(self.background, label = 'Info')
        self.infoBtn.Bind(wx.EVT_BUTTON, self.infoFunction)

        self.clearBtn = wx.Button(self.background, label = 'Clear')
        self.clearBtn.Bind(wx.EVT_BUTTON, self.clearArea)

        self.loadBtn = wx.Button(self.background, label = 'Load')
        self.loadBtn.Bind(wx.EVT_BUTTON, self.loadFile)

        self.goBtn = wx.Button(self.background, label = 'Go!')
        self.goBtn.Bind(wx.EVT_BUTTON, self.patternMatching)

        self.inputArea = wx.TextCtrl(self.background, style = wx.TE_MULTILINE)
        self.loadArea = wx.TextCtrl(self.background)
        self.patternArea = wx.TextCtrl(self.background)
        self.outputArea = wx.TextCtrl(self.background, style = wx.TE_MULTILINE | wx.TE_READONLY)
        
        self.inputArea.SetValue('Enter the data you want to parse into this area. Alternatively, press load and select a file instead.')
        self.outputArea.SetValue('The data will appear here once it has been parsed.')
        self.patternArea.SetValue("Enter your regular expression here.")
        self.loadArea.SetValue("Enter file location here.")

        self.horBoxOne = wx.BoxSizer()
        self.horBoxOne.Add(self.patternArea, proportion = 1, flag = wx.EXPAND)
        self.horBoxOne.Add(self.goBtn, proportion = 0, flag = wx.LEFT, border = 0)
        self.horBoxOne.Add(self.infoBtn, proportion = 0, flag = wx.LEFT, border = 0)

        self.horBoxTwo = wx.BoxSizer()
        self.horBoxTwo.Add(self.loadArea, proportion = 1, flag = wx.EXPAND)
        self.horBoxTwo.Add(self.loadBtn, proportion = 0, flag = wx.LEFT, border = 0)
        self.horBoxTwo.Add(self.clearBtn, proportion = 0, flag = wx.LEFT, border = 0)
        
        self.verBox = wx.BoxSizer(wx.VERTICAL)
        self.verBox.Add(self.inputArea, proportion = 1, flag = wx.EXPAND, border = 5)
        self.verBox.Add(self.horBoxTwo, proportion = 0, flag = wx.EXPAND | wx.ALL, border = 0)
        self.verBox.Add(self.horBoxOne, proportion = 0, flag = wx.EXPAND | wx.ALL, border = 0)
        self.verBox.Add(self.outputArea, proportion = 2, flag = wx.EXPAND, border = 5)

        self.background.SetSizer(self.verBox)
        self.Show()


    def loadFile(self, event):
        source = self.loadArea.GetValue()
        if re.search(r'^http://.*?', source) != None:
            target = urllib.urlopen(source)
        else:    
            target = open(source, 'rb')
        self.inputArea.SetValue(target.read())
        target.close()

    def patternMatching(self, event):  
        matchList = re.findall(self.patternArea.GetValue(), self.inputArea.GetValue())
        self.outputArea.SetValue('\n'.join(['%s' % v for v in matchList]))

    def clearArea(self, event):
        self.inputArea.SetValue('')

    def infoFunction(self, event):
        self.outputArea.SetValue(r"""Regular Expression Syntax:

 . means all characters except the newline character '\n'
Example: "H.llo" would match "Hello" as well as "H llo"
Note: if you wish to type the actual '.' character, type '\.' instead. This applies to '^', '$', '*', etc aswell.
Note: alternatively, you can type [.] or [^] etc. More on that below.

 ^ matches the start of the string.
Example: "Hello" would match "Hello" in "Hello, soldier." but not "He said 'Hello' to me."

 $ matches the end of the string. Similar to '^'.
Example: "bye$" would match "bye" in "Goodbye" but not "Goodbye."

 * matches the preceding regexp 0 or more times.
Example: "Hello*" matches "Hello", "Helloooo" and "Hell".
Example: "H.*" matches "Hello", "Hello!", "Hell", "Hi" and even "H".

 ? matches the preceding regexp 0 to 1 times.
Example: "Hello?" matches "Helloooo" up to "Hello".
Note: '?' is also used to enable and disable greediness. More on that below.

 *?, +?, ?? and greediness
Taken from the Python documentation:
"The '*', '+', and '?' qualifiers are all greedy; they match as much text as possible. Sometimes this behaviour isn't desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding "?" after the qualifier makes it perform the match in non-greedy or minimal fashion; as few characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'."

Note to self: type up more tomorrow.""")

app = wx.App()
window = MainFrame()
app.MainLoop()

The other programme I'm making downloads all the images from a website by applying a regexp to the HTML code (the easy part) and then forming absolute links to the images (the hard part) and putting them in a set to remove duplicates. It's currently just a script but tomorrow I'll add a GUI where you can preview the images and select which ones to save to your computer. If anybody is interested, I can post the code for that tomorrow when I finish?

Oh, and does anybody know how to get an explorer-like view for loading files in wxPython (as in any normal Windows app)? I could use one here, instead of having to type in the relative or absolute file location manually.

Recommended Answers

All 4 Replies

Cheers mate! That's ace.

Sorry if the OP is a jumble; I haven't had sleep. (I repeated myself about the image grabber script.)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.