954,580 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

How to extract data from web databases?

Hi all, anyone know how to extract data from a web database. For example, how can I extract movie information from http://www.imdb.com/ into a relational database? Is there any existing tools to do this? Or need I write some codes?

Thanks a lot!

bei
Newbie Poster
1 post since Aug 2004
Reputation Points: 10
Solved Threads: 0
 

Their database has to be open to the public which no serious company would ever do due to security reasons. Instead some create a web service where you can access their data through XML. Amazon.com does this. I'm not sure about imdb.com. In any case, you can still extract html data from pages on the imdb server using your favorite server language. You'd then have to parse the data. It's a dirty way to do it, but it's doable.

samaru
a.k.a inscissor
Team Colleague
1,256 posts since Feb 2002
Reputation Points: 262
Solved Threads: 18
 

Hi
Iam a professional web data extractor.
We can do in Visual BAsic for web data extraction by automatically submitting some input and extract results in to MS Access
If required, I can quote for the site you specify
Contact at [email]raathisoft@hotmail.com[/email]
Venkat

raathisoft
Newbie Poster
1 post since Nov 2004
Reputation Points: 10
Solved Threads: 0
 

Remember that doing what you intend could well be viewed as data theft and/or a violation of the terms of service of the site.

You are effectively bypassing whatever mechanism the site has (usually sponsorship through advertising) in ripping the information.

Before you bring any such service live be sure to get permission from the site owners.

jwenting
duckman
Team Colleague
8,392 posts since Nov 2004
Reputation Points: 1,662
Solved Threads: 337
 

Hi all, anyone know how to extract data from a web database. For example, how can I extract movie information from http://www.imdb.com/ into a relational database? Is there any existing tools to do this? Or need I write some codes?I m in need of movie information since I m doing a project named as recommender system for movie.

kalpana
Newbie Poster
7 posts since Oct 2005
Reputation Points: 10
Solved Threads: 0
 

what is a semantic web?

kalpana
Newbie Poster
7 posts since Oct 2005
Reputation Points: 10
Solved Threads: 0
 

how to extract rdf documents into a relational database

kalpana
Newbie Poster
7 posts since Oct 2005
Reputation Points: 10
Solved Threads: 0
 

Hi all, anyone know how to extract data from a web database. For example, how can I extract RDF documents into relational database

kalpana
Newbie Poster
7 posts since Oct 2005
Reputation Points: 10
Solved Threads: 0
 

Hi all, anyone know how to extract data from a web database. For example, how can I extract RDF documents into relational database since I am doing a project in semantic web

kalpana
Newbie Poster
7 posts since Oct 2005
Reputation Points: 10
Solved Threads: 0
 

Hi all, anyone know how to extract data from a web database. For example, how can I extract movie information from http://www.amazon.com/ into a relational database? Is there any existing tools to do this? Or need I write some codes?I am in need of movie information since I am doing a project in recommender system for movie.Please reply me for this

kalpana
Newbie Poster
7 posts since Oct 2005
Reputation Points: 10
Solved Threads: 0
 

[QUOTE=kalpana]Hi all, anyone know how to extract data from a web database. For example, how can I extract RDF documents into relational database.I am in neede of RDF language since I am doing a project in semantic web

kalpana
Newbie Poster
7 posts since Oct 2005
Reputation Points: 10
Solved Threads: 0
 

Newbie, http://www.newbielabs.com , is a great web automation tool for extracting web data and saving it to either Excel or Access. If you are not adept in scripting, they can customize scripts for you for a small fee.

GW



Hi all, anyone know how to extract data from a web database. For example, how can I extract movie information from http://www.amazon.com/ into a relational database? Is there any existing tools to do this? Or need I write some codes?I am in need of movie information since I am doing a project in recommender system for movie.Please reply me for this

GeorgeL
Newbie Poster
2 posts since Jul 2006
Reputation Points: 10
Solved Threads: 0
 

IMHO the by far best tool for this kind of work is iMacros ( http://www.iopus.com/imacros ). Here is an example VBS script that does exactly what you need:


Dim MyArray
Dim objFileSystem, objOutputFile
Dim strOutputFile
Dim pos

' access database
set db = CreateObject("ADODB.Connection")
db.Open("DRIVER={Microsoft Access Driver (*.mdb)}; DBQ=" _
& mypath & "IIM-TEST-EXTRACT.MDB")

set iim1= CreateObject ("InternetMacros.iim")
iret = iim1.iimInit("")

For num = 1 To 3
str = cstr(num) 'Convert integer to string
iret = iim1.iimDisplay("Listing No: " + str)
pos = num '+ 4'start at 5: Offset for POS= statement
str = cstr(pos) 'Convert integer to string
iret = iim1.iimSet("-var1", str) 'Select a new link for each run
iplay = iim1.iimPlay("wsh-extract-jobs")
data = iim1.iimGetLastExtract()
If iplay = 1 and len (data) > 0 Then
MyArray = Split(data, "[EXTRACT]")
' use SQL to insert new data
sql = "insert into tableJobListings (JobTitle, Salary, PositionType, RefCode) values ('" _
& MyArray(0) & "', '" & MyArray(1) & "' , '" & MyArray(2) & "' , '" & MyArray(3) & "')"

' execute sql statement
set rs = db.Execute(sql)

End If

---

PS: If you have any question about this, please post in their forum at http://forum.iopus.com , their support is VERY responsive ;)

michael_at_work
Newbie Poster
2 posts since Jul 2006
Reputation Points: 10
Solved Threads: 0
 

Sure you can spend $499.00 for IMacro Scripting Edition or just $49.95 for Newbie Automation Suite, which also comes with a sophisticated Task Manager program. I think iOpus only uses the Windows Scheduled Tasks program and only supports IE. Newbie runs on both IE and Mozilla Firefox.

The following Newbie Script extracts Amazon search results and save it to any ODBC compliant DB (Oracle, MSQL, Access, etc):

program Extract_Amazon;
var
sData : string;
sSearchStr : string;

procedure OnDocumentComplete(URL : string);
begin
if IsPartOf('http://www.amazon.com/exec/obidos/ats-query-page', URL) then
begin
Fill('field-title', sSearchStr);
SendKeyPress('Enter');
end;
if IsPartOf('http://www.amazon.com/exec/obidos/search-handle-form/ref', URL) or
IsPartOf('http://www.amazon.com/exec/obidos/search-handle-url/ref', URL) then
begin
sData := GetTableCell(23,1,2);
AppendDBTable;
DBSetField('SearchFld', sData);
PostDBTable;
NewbieScriptEnd;
end;
end;

{ This is the main program body. }
begin
DeleteFile(ExcelFile);
if not(OpenDBTable('AmazonDB', 'ResultTable')) then
NewbieScriptEnd;
sSearchStr := Readln('Enter search string:');
Navigate('http://www.amazon.com/exec/obidos/ats-query-page/ref=b_tn_bh_bo/002-3070511-5218419');
end.





IMHO the by far best tool for this kind of work is iMacros ( http://www.iopus.com/imacros ). Here is an example VBS script that does exactly what you need:


Dim MyArray
Dim objFileSystem, objOutputFile
Dim strOutputFile
Dim pos

' access database
set db = CreateObject("ADODB.Connection")
db.Open("DRIVER={Microsoft Access Driver (*.mdb)}; DBQ=" _
& mypath & "IIM-TEST-EXTRACT.MDB")

set iim1= CreateObject ("InternetMacros.iim")
iret = iim1.iimInit("")

For num = 1 To 3
str = cstr(num) 'Convert integer to string
iret = iim1.iimDisplay("Listing No: " + str)
pos = num '+ 4'start at 5: Offset for POS= statement
str = cstr(pos) 'Convert integer to string
iret = iim1.iimSet("-var1", str) 'Select a new link for each run
iplay = iim1.iimPlay("wsh-extract-jobs")
data = iim1.iimGetLastExtract()
If iplay = 1 and len (data) > 0 Then
MyArray = Split(data, "[EXTRACT]")
' use SQL to insert new data
sql = "insert into tableJobListings (JobTitle, Salary, PositionType, RefCode) values ('" _
& MyArray(0) & "', '" & MyArray(1) & "' , '" & MyArray(2) & "' , '" & MyArray(3) & "')"

' execute sql statement
set rs = db.Execute(sql)

End If

---

PS: If you have any question about this, please post in their forum at http://forum.iopus.com , their support is VERY responsive ;)

GeorgeL
Newbie Poster
2 posts since Jul 2006
Reputation Points: 10
Solved Threads: 0
 
Sure you can spend $499.00 for IMacro Scripting Edition or just $49.95 for Newbie Automation Suite, which also comes with a sophisticated Task Manager program.



Newbie looks like a nice program but I never used it, so I can't say much about it. Actually, price was not really the issue when I purchased the software. iMacros was recommended to me by several colleagues and friends. It worksvery reliably with all the websites I need to automate. Also, I used their tech support twice and always received a solution (not just a reply!) in less a day. So I like it, but YMMV.

PS: As task scheduler/manager, I use the Windows task scheduler . Free with every Windows PC ;)

michael_at_work
Newbie Poster
2 posts since Jul 2006
Reputation Points: 10
Solved Threads: 0
 

I've heard very very good things about Iopus software [IMG]http://images.myspacetweaker.net/pics/thumbsup.gif[/IMG]
it's rather pricy, but you really gotta check it out it can let you automate a lot of your web processes..

findinforums
Newbie Poster
16 posts since Jun 2006
Reputation Points: 10
Solved Threads: 0
 

If you are using MS Office, here it is:
Open a database, or switch to the Database window for the open database.
To import HTML tables or lists, on the File menu, point to Get External Data, and then click Import. To link HTML tables or lists, on the File menu, point to Get External Data, and then click Link Tables.
In the Import or Link dialog box, in the Files of type box, click HTML Documents (*.html; *.htm).
Click the arrow to the right of the Look in box.
Select the drive and folder where the HTML file you want to import or link is located, and then double-click the file name.
Follow the instructions for the Import HTML Wizard or the Link HTML Wizard. Click the Advanced button if you want to edit an import/export specification (import/export specification: A specification that stores the information that Access needs to import or export a fixed-width or delimited text file.) or specify different file and field formats.
If your HTML file contains more than one table or list, repeat steps 1 through 6 for each table or list you want to import or link.
NotesBy default, Access converts a hyperlink address to a Hyperlink data type column, but only if all values in a table column or list contain a hyperlink address defined by an HTML tag. You can change the data type when using the Import HTML Wizard or the Link HTML Wizard.
A table embedded within a table cell in an HTML file is treated as a separate table when you import or link. A list embedded in a table cell is treated as the contents of a cell, and each item in the list is delimited with the carriage return/line feed characters.
For data that spans rows or columns in an HTML table, Microsoft Access 2000 (and later versions) duplicates the data in each cell, whereas Microsoft Excel 2000 (and later versions) stores the data in the first or upper-left cell, and then leaves other cells blank. :)

GIRob69
Newbie Poster
1 post since Oct 2006
Reputation Points: 10
Solved Threads: 0
 

You can also try SWExplorerAutomation SWEA ( http://webunittesting.com ).

Alex_f
Newbie Poster
1 post since Nov 2006
Reputation Points: 10
Solved Threads: 0
 

Is it possible to use this command "create type ADDRESS as object ( Attribute list )" in Mysql?

Asiima
Newbie Poster
5 posts since Jan 2006
Reputation Points: 10
Solved Threads: 0
 

There is a company http://www.knowlesys.com provide the custom extractor.
You can tell them your target website and requirements they will provide you the custom extractor.

You only need click the Start button to get the data in your local database.

Price is cheap and service is wonderful!:eek:

Tom01234
Newbie Poster
1 post since Feb 2007
Reputation Points: 10
Solved Threads: 0
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You