Hi,
This is my first time on daniweb, so if I've posted in the wrong section or there's already a similarly-themed thread or I've done something wrong I apologise. Anyway, I'm hoping someone can prod me in the right direction with a problem I'm having right now.
I'm currently working on a project which aims to form a sort of public desktop network (i.e. the screens of connected computers can be viewed remotely - think Remote Assistance but more public and less confined to LANs or small-medium businesses). Nothing too flash or high-quality - just for my own interest at this stage.
I'm writing the application in Python 2.6.1, and am implementing it using the standard Tkinter module interface. For image handling I am also using the Python Imaging Library (PIL).
Anyway, I'm not quite sure how to approach a problem I've already encountered. I want to regularly capture the contents of the screen, which in itself is thankfully made incredibly trivial by PIL (doing the same in C++ in the past was a pain).
However, the program will be displaying the contents of other screens and of its own screen, and therefore if it includes itself in the screenshot then when the screens are viewed within the program a sort of recursive image situation emerges (if that makes sense) - I hope you can picture that.
So, ideally, I need a way for the screen capture to somehow ignore the window of my application (I've said active window in the thread title to save space but obviously it may or may not be the active window) and store the remainder of the screen (including what is hidden behind my window).
Right now, I'm using ImageGrab.grab() to capture the screen, converting the returned result into a format Tkinter can natively handle using ImageTk.PhotoImage() , then storing it in a label (just to test the screen capture process, not part of broader program). I am repeating this process at regular, quick intervals (right now every 200ms, but this is just an arbitrarily selected interval - although it will obviously need to be pretty small for my purposes).
Here's the rough code to give you an idea (in the capture_screen function, a method of my application class - some commented lines explained below):

def capture_screen(self):
        #temporarily hide window so recursive screenshot is avoided
        #self.master.withdraw()
        screen_contents = ImageGrab.grab()
        #self.master.deiconify()
        size = win_width, win_height
        screen_contents = screen_contents.resize(size, Image.ANTIALIAS)
        screen_image = ImageTk.PhotoImage(screen_contents)
        self.screen_displayer.config(image = screen_image)
        self.screen_displayer.image = screen_image
        self.screen_displayer.grid(row = 0, column = 0)
        #update display
        self.master.after(200, self.capture_screen)

You might notice the commented out self.master.withdraw() and self.master.deiconify() - my first idea was to for the briefest of instants hide the application so that at the moment the screenshot was taken the application would be invisible. I hoped that this would happen so quickly (in three code lines, though this I guess means little next to the function of the lines) that the visibility toggling would be unseen by the user, not even a flicker. Unsurprisingly, it was false hope and the result was worse than a small flicker, rendering the program pretty unusable.
Another idea that I have vaguely is the option of attempting it in C++ - which from my experience would give me lower-level access to all the mechanisms needed and thus the control to implement the selective screen capture, and then interfacing this C++ with my Python application. However, to be honest, although C++ can in its own way be fun, the sheer effort that always seems to accompany learning how to do something new in the language puts me quite off this option. The less micromanaging nature of Python is what attracted me to it in the first place, and I'd rather do it solely in Python if at all possible (at the moment at least, until future speed requirements may need to be met).
So anyway, sorry for the long post and the long ramble. I don't usually approach other programmers for help - solving my problems is usually an independent process and usually eventually works out. But I just thought it makes more sense to try and find someone who may have tried something similar before and could save me lots of internet trawling (not that I haven't already tried a bit of hunting).
If anyone could help it'd be greatly appreciated.
Thanks in advance,
mrpoate

Recommended Answers

All 6 Replies

What you seem to want would require access to some kind of desktop composting engine.

Windows didn't introduce a composition engine until Vista (Desktop Window Manager), so you can't actually see what's "behind" your window in earlier versions (at least in my understanding). And desktop composition isn't always guaranteed with it.

From what I understand, in earlier versions of Windows, applications just basically paint to the area of the screen assigned to them, so the desktop is just one giant flat bitmap. You can take captures of the whole thing, or just sections of it, but not actually see what's behind your window since there is no "behind".

My suggestion to you would be to consider running the server part of the application without a window, hence it can capture the desktop and send the images without having the image "recursion", or just manipulate the bitmap so that the area your window occupied is "whited out", if the recursion bothers you (a little extra processing).

Other things to think about: Vista's desktop window manager has an API that allows you to get thumbnails of the different windows (thumbnails is misleading, you can get them up to the size of the respective window). If you can somehow find a function in the windows api that would give you the window positions and ordering, you can use the desktop image as a base and then blit the other windows on (except your own) in the correct positions and order. But obviously, your program would only work on Vista, and even then, it would stop working when aero gets turned off.

What you seem to want would require access to some kind of desktop composting engine.

Windows didn't introduce a composition engine until Vista (Desktop Window Manager), so you can't actually see what's "behind" your window in earlier versions (at least in my understanding). And desktop composition isn't always guaranteed with it.

From what I understand, in earlier versions of Windows, applications just basically paint to the area of the screen assigned to them, so the desktop is just one giant flat bitmap. You can take captures of the whole thing, or just sections of it, but not actually see what's behind your window since there is no "behind".

My suggestion to you would be to consider running the server part of the application without a window, hence it can capture the desktop and send the images without having the image "recursion", or just manipulate the bitmap so that the area your window occupied is "whited out", if the recursion bothers you (a little extra processing).

Other things to think about: Vista's desktop window manager has an API that allows you to get thumbnails of the different windows (thumbnails is misleading, you can get them up to the size of the respective window). If you can somehow find a function in the windows api that would give you the window positions and ordering, you can use the desktop image as a base and then blit the other windows on (except your own) in the correct positions and order. But obviously, your program would only work on Vista, and even then, it would stop working when aero gets turned off.

Thanks a lot for your response.
The desktop composition engine sounds like that sort of thing I'd be looking for, but unfortunately I was really looking for something that worked on the other Window releases as well and originally even on other platforms (but I'm thinking this might have been too ambitious). Also, like you said, its not fully reliable, and obviously something that is would be preferable (but I guess not absolutely necessary - the program isn't really intended to be commercial grade).

Yeah, that was the impression I got also - the desktop being traditionally one big flat bitmap. But, although I only have a fairly basic grasp of the inner workings of Windows (if that), I was trying to picture it all in my head - when a window is moved, and ordered to repaint on another area of the screen, whatever was "behind" it comes into view. So doesn't that suggest that somehow, somewhere in the depths of the computer, the pixel data corresponding to the revealed section existed and could be fetched when it was needed for display? Unless it is somehow unavailable or blocked from programmatic access, I would think C++ or C or a low-level language would be able to access it (isn't that basically the strength of low-level languages - that they provide great control over and access to the system?).

I did find an interesting C++ article written by Feng Yuan (see: http://www.fengyuan.com/article/wmprint.html) which details a method of capturing a window that may be partially obscured by other windows - so in other words capturing the hidden parts "behind" whatever is in the immediate foreground/on the highest level. It seems to revolve around the WM_PRINT and WM_PRINTCLIENT messages, but to get it successfully working Yuan suggests this trick that seems pretty complicated (haven't studied it properly, though). Anyway, I'm still not sure whether this solution can be extended to find the actual desktop minus my window.

Your suggestion of running an independent, windowless server-side I'm not sure would work because the GUI has to be running for the output of the server operations to be of any use (i.e. to view the screenshot data sent by the server, the main GUI program has to be open, which in turn corrupts the server screenshot data with the "recursion"). I think your suggestion may help the remote user, who would be able to view the user with the windowless server-side (say he's called Franklin) properly, but as soon as Franklin opened his window it would be ruined and Franklin could not view his own monitor in any channel (the connected users will be displayed in user-made channels).

Your "whiting out" suggestion is a good suggestion, and I did actually already consider it - it would I think involve temporarily making invisible the widgets which will display connected user's desktops, just while the screenshot is being taken. My worry here is like you said the extra processing cost and also more importantly any flickering effect that might emerge (I should probably at least try, though). I'll probably give this a go soon.

The Vista Desktop Window Manager API, again, sounds potentially helpful but like I said before I was hoping for compatibility with more than just Vista.

Anyway, thanks again for your comments, I might put my effort in the meantime towards some other areas of my program (I'm actually pretty new to both Python and Tkinter, so problems are something I'm going to be having a lot of :p)

mrpoate

This is an interesting subject, no need to thank me.

But, although I only have a fairly basic grasp of the inner workings of Windows (if that), I was trying to picture it all in my head - when a window is moved, and ordered to repaint on another area of the screen, whatever was "behind" it comes into view. So doesn't that suggest that somehow, somewhere in the depths of the computer, the pixel data corresponding to the revealed section existed and could be fetched when it was needed for display?

What actually happens is that Windows tells the window underneath to repaint itself. Again, only DWM stores pixel data of its windows (Ever notice that in earlier versions how sometimes "trails" are left when a window is moved from over another one?).

Windows gets windows to repaint themselves by sending them a WM_PAINT message.

As to the whiting out and speed, using blitting, the performance hit you might face wouldn't be severe. Just a tip if you go this route: only blit when and where you need to. For instance, if your display window is minimized, you don't need (or want) to blit. As to the flickering, I don't know how much of this you might experience.

Also, you can step up the intervals from 200ms to 500ms (yes, actually lower the frame rate) because unless you want to watch a video or movie, play a game, or something like over the remote connection (which is not the point of remote assistance technologies) then you don't actually need such a high frame rate. It would also be easier on your network (now just 2 images to send a second...still high, but...).

I have a suggestion not related to what you've asked. But do you know how to use pygame? It might be interesting to use a pygame window for rendering, and the gui toolkit for everything else, because pygame can be hardware accelerated. It might also be quite a bit faster.

EDIT: My bad, hardware accelerated windows in pygame can only be full screen.

What actually happens is that Windows tells the window underneath to repaint itself. Again, only DWM stores pixel data of its windows (Ever notice that in earlier versions how sometimes "trails" are left when a window is moved from over another one?).

Windows gets windows to repaint themselves by sending them a WM_PAINT message.

Oh ok, yeah. So I guess it would be difficult or impossible to get the pixel data without access to DWM's stores.

As to the whiting out and speed, using blitting, the performance hit you might face wouldn't be severe. Just a tip if you go this route: only blit when and where you need to. For instance, if your display window is minimized, you don't need (or want) to blit. As to the flickering, I don't know how much of this you might experience.

Good advice, I'll keep that in mind - blitting while minimised or while hidden behind other windows would be unnecessary. If the flickering doesn't turn out to be a problem, I think this is the option I'll go with for now.

Also, you can step up the intervals from 200ms to 500ms (yes, actually lower the frame rate) because unless you want to watch a video or movie, play a game, or something like over the remote connection (which is not the point of remote assistance technologies) then you don't actually need such a high frame rate. It would also be easier on your network (now just 2 images to send a second...still high, but...).

Yeah I might give that a try. The only thing is, although this may not be the best or most viable idea, originally I sorta wanted the program to be more than just remote assistance - kinda like a show-off arena as well (for instance, you want to check out a new operating system - say Ubuntu - and you can just browse the channels for a connected Ubuntu (but obviously my program would have to be cross-platform for this), or you want to see a new game - you could make a channel with your friend who has the game, and he could give you a demo and maybe walk you through the basics). Assistance or instruction would still be the primary use, though, so I'll check out the costs in speed and resources and base the interval on that.

I have a suggestion not related to what you've asked. But do you know how to use pygame? It might be interesting to use a pygame window for rendering, and the gui toolkit for everything else, because pygame can be hardware accelerated. It might also be quite a bit faster.

EDIT: My bad, hardware accelerated windows in pygame can only be full screen.

I checked out pygame on its official website and wikipedia, and it looks interesting (the idea of abstracting the lower-level details of C++ so that only the game/GUI logic need be programmed is appealing for sure). But if it offers no hardware acceleration for non-fullscreen applications, then I'm not sure there's any real point in using it. The idea of hardware acceleration to quicken rendering is definitely one worth examining (I'll look more into this later probably after getting the basic program working, when I'll be looking more closely at ways to optimise the program).

In the meantime, I've actually hit a different probably far more trivial problem also to do with Tkinter - I might start a different thread on this if its ok. Its to do with packing an image opened from file into a frame multiple times - between each packed instance, there seems to be automatically added padding of about 10 pixels. I'm sure its a fairly basic problem, but after a few hours of searching couldn't see what I was doing wrong. Don't mean to flit from problem to problem, or change the thread's subject, so I'll post this in a separate thread.

Anyway cheers for the comments

mrpoate

Just a though, but couldn't you make it a pyw file? That way there would be no python.exe window and the program wouldnt have to constantly minimise itself over and over. I dont know if i understand the problem correctly, if so then never mind me :P

Just a though, but couldn't you make it a pyw file? That way there would be no python.exe window and the program wouldnt have to constantly minimise itself over and over. I dont know if i understand the problem correctly, if so then never mind me :P

Ah I don't think the presence of a python.exe window is really the issue (and the program isn't actually minimising itself over and over - sorry if my code confused you a little there with the self.master.withdraw() and deiconify() (albeit commented out)). Not that I don't appreciate you taking the time to comment something :p

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.