Hello group!

I need to scrape data from the screen of an open application. It will be used as text, so it will need to be converted into a string that I can parse. I've looked around and found scrapers that use the "picture" as a bitmap or jpeg. The few examples I've seen that do convert the scrape to text cover HTML info from websites. The few I've tried haven't worked in any application for me. So I need help. Can you tell me where to start? I've seen tools like "CopyFromScreen" and "Regex", but I'm not sure what will grab the text from the screen. Your direction will be appreciated.

If it helps to know, currently I do this screen scraping manually each day. Of course it's a simple scrape and copy, then paste the data into a text file (using Notepad). I feel sure I can automate some of this. I want to learn how to do this!

In advance, thanks for your comments, directions and assistance. It will be most appreciated.

Don

Recommended Answers

All 9 Replies

The answer may depend on where you are trying to get the data from. Web browser? Java applet? Word? etc...

cgeier, I'm not sure what to call the application that I'm using. The program in reference most likely is an old "green screen" product and may be UNIX based. I'm connecting to it via the web in some kind of third party party product. I'll be happy to get back with you to give you more info, I'll need to know what to look for to provide better info.

You're being somewhat cryptic. The name of the program would be useful. Also a screen shot of the window may be useful. Additionally, how much time do you spend each day copying data from the screen?

cgeier, I don't meant to be cryptic. My apologies for appearing that way.

The desire to automate this screen scraping is due to the fact that we are having to do it manually and are doing it about 4 hours before it needs to be done. Lastly, as a group we are spending about 10 hours 5 days a week doing it. Plus, we miss doing it on the weekends (no one is at the office).

My goal is to have a computer to do this automatically every night at midnight. This way I get the full picture of what transpired before the data is lost (the current system still prints the daily details I need and doesn't store it in a history field). You'd think the computer would store this data until the next business day. Unfortunately it only stores the "important" parts of the data, but not the day and time it is created (which is part of what I need). One of the primary needs of capturing the info has to do with ensuring it's rate accuracy.

When I get to the office tomorrow, I'll confirm the name of the software we use to connect to the server.

It sounds like those "texts" that you scrape from the screen are log messages of some type.
Wouldn't it be easier to create a program that connects to the server and retrieve those log files, instead of scraping the screen or in this case the content of another program.

If you absolutely need to go the scrape way, there are only two options.
1) Capture a screenshot and try to ocr it.
2) Create a hook to that software you're using and "read" the content of the objects being returned. (http://www.codeproject.com/Articles/33459/Spying-Window-Messages-from-the-Inside)

cgeier, the name of the program we use is called 'SecureCRT' (http://www.vandyke.com/products/securecrt/). I'm not sure if this is Java (although I'd bet $2 it is). You might be able to read something via their website that might answer the question. It does seem there may be some information there on using Visual Basic Script to retrieve screen data. I'm going to read further on that later today to see if I can learn something there.

Oxiegen, I'd prefer to link directly to the server. Unfortunately it's remote and the company won't allow me to do it (although I think I will ask again!). I'll read up on 'hook' you've referred to and hopefully that will lead me in the right direction. I'll also try doing the screen shot and OCR.

Thanks again to both of you for responding. I hope to learn how to do this!

SecureCRT has built-in support for scripting.

Here are some resources:
Scripting Essentials: A Guide to Using VBScript in SecureCRT

Script Examples

Example Scripts for SecureCRT® for Windows

Table of Protocol-Specific Command-Line Options

Example: Read Data From Separate Hosts/Commands File And Log To Individual

How To Capture Command Output With ReadString() Using SecureCRT®

I had trouble getting "ReadString()" to work. I modified one of the scripts (.vbs) listed in the resources. I am attaching it. See the "ReadMe" in the .zip file for more info.

Here is an updated copy of the above script. The previous version logged the error and then exited if it couldn't connect to a computer in the list. This version, logs the error and continues with other computers.

Another update. Added code to check that a session exists in Session Manager. It doesn't really check if the file exists in Session Manager, but rather checks for the .ini file that Session Manager creates. If a session exists in Session Manager, a ".ini" file should exist in %AppData%\VanDyke\Config\Sessions" with the name of the session (ex: Computer123.ini). Without this code, the script could hang due to a messagebox being open that says that the session couldn't be found.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.