Hi all, anyone know how to extract data from a web database. For example, how can I extract movie information from http://www.imdb.com/ into a relational database? Is there any existing tools to do this? Or need I write some codes?

Thanks a lot!

Recommended Answers

All 24 Replies

Their database has to be open to the public which no serious company would ever do due to security reasons. Instead some create a web service where you can access their data through XML. Amazon.com does this. I'm not sure about imdb.com. In any case, you can still extract html data from pages on the imdb server using your favorite server language. You'd then have to parse the data. It's a dirty way to do it, but it's doable.

Hi
Iam a professional web data extractor.
We can do in Visual BAsic for web data extraction by automatically submitting some input and extract results in to MS Access
If required, I can quote for the site you specify
Contact at raathisoft@hotmail.com
Venkat

Remember that doing what you intend could well be viewed as data theft and/or a violation of the terms of service of the site.

You are effectively bypassing whatever mechanism the site has (usually sponsorship through advertising) in ripping the information.

Before you bring any such service live be sure to get permission from the site owners.

Hi all, anyone know how to extract data from a web database. For example, how can I extract movie information from http://www.imdb.com/ into a relational database? Is there any existing tools to do this? Or need I write some codes?I m in need of movie information since I m doing a project named as recommender system for movie.

what is a semantic web?

how to extract rdf documents into a relational database

Hi all, anyone know how to extract data from a web database. For example, how can I extract RDF documents into relational database

Hi all, anyone know how to extract data from a web database. For example, how can I extract RDF documents into relational database since I am doing a project in semantic web

Hi all, anyone know how to extract data from a web database. For example, how can I extract movie information from http://www.amazon.com/ into a relational database? Is there any existing tools to do this? Or need I write some codes?I am in need of movie information since I am doing a project in recommender system for movie.Please reply me for this

Hi all, anyone know how to extract data from a web database. For example, how can I extract RDF documents into relational database.I am in neede of RDF language since I am doing a project in semantic web

Newbie, http://www.newbielabs.com, is a great web automation tool for extracting web data and saving it to either Excel or Access. If you are not adept in scripting, they can customize scripts for you for a small fee.

GW


Hi all, anyone know how to extract data from a web database. For example, how can I extract movie information from http://www.amazon.com/ into a relational database? Is there any existing tools to do this? Or need I write some codes?I am in need of movie information since I am doing a project in recommender system for movie.Please reply me for this

IMHO the by far best tool for this kind of work is iMacros ( http://www.iopus.com/imacros ). Here is an example VBS script that does exactly what you need:


Dim MyArray
Dim objFileSystem, objOutputFile
Dim strOutputFile
Dim pos

' access database
set db = CreateObject("ADODB.Connection")
db.Open("DRIVER={Microsoft Access Driver (*.mdb)}; DBQ=" _
& mypath & "IIM-TEST-EXTRACT.MDB")

set iim1= CreateObject ("InternetMacros.iim")
iret = iim1.iimInit("")

For num = 1 To 3
str = cstr(num) 'Convert integer to string
iret = iim1.iimDisplay("Listing No: " + str)
pos = num '+ 4'start at 5: Offset for POS= statement
str = cstr(pos) 'Convert integer to string
iret = iim1.iimSet("-var1", str) 'Select a new link for each run
iplay = iim1.iimPlay("wsh-extract-jobs")
data = iim1.iimGetLastExtract()
If iplay = 1 and len (data) > 0 Then
MyArray = Split(data, "[EXTRACT]")
' use SQL to insert new data
sql = "insert into tableJobListings (JobTitle, Salary, PositionType, RefCode) values ('" _
& MyArray(0) & "', '" & MyArray(1) & "' , '" & MyArray(2) & "' , '" & MyArray(3) & "')"

' execute sql statement
set rs = db.Execute(sql)

End If

---

PS: If you have any question about this, please post in their forum at http://forum.iopus.com , their support is VERY responsive ;)

Sure you can spend $499.00 for IMacro Scripting Edition or just $49.95 for Newbie Automation Suite, which also comes with a sophisticated Task Manager program. I think iOpus only uses the Windows Scheduled Tasks program and only supports IE. Newbie runs on both IE and Mozilla Firefox.

The following Newbie Script extracts Amazon search results and save it to any ODBC compliant DB (Oracle, MSQL, Access, etc):

program Extract_Amazon;
var
sData : string;
sSearchStr : string;

procedure OnDocumentComplete(URL : string);
begin
if IsPartOf('http://www.amazon.com/exec/obidos/ats-query-page', URL) then
begin
Fill('field-title', sSearchStr);
SendKeyPress('Enter');
end;
if IsPartOf('http://www.amazon.com/exec/obidos/search-handle-form/ref', URL) or
IsPartOf('http://www.amazon.com/exec/obidos/search-handle-url/ref', URL) then
begin
sData := GetTableCell(23,1,2);
AppendDBTable;
DBSetField('SearchFld', sData);
PostDBTable;
NewbieScriptEnd;
end;
end;

{ This is the main program body. }
begin
DeleteFile(ExcelFile);
if not(OpenDBTable('AmazonDB', 'ResultTable')) then
NewbieScriptEnd;
sSearchStr := Readln('Enter search string:');
Navigate('http://www.amazon.com/exec/obidos/ats-query-page/ref=b_tn_bh_bo/002-3070511-5218419');
end.


IMHO the by far best tool for this kind of work is iMacros ( http://www.iopus.com/imacros ). Here is an example VBS script that does exactly what you need:


Dim MyArray
Dim objFileSystem, objOutputFile
Dim strOutputFile
Dim pos

' access database
set db = CreateObject("ADODB.Connection")
db.Open("DRIVER={Microsoft Access Driver (*.mdb)}; DBQ=" _
& mypath & "IIM-TEST-EXTRACT.MDB")

set iim1= CreateObject ("InternetMacros.iim")
iret = iim1.iimInit("")

For num = 1 To 3
str = cstr(num) 'Convert integer to string
iret = iim1.iimDisplay("Listing No: " + str)
pos = num '+ 4'start at 5: Offset for POS= statement
str = cstr(pos) 'Convert integer to string
iret = iim1.iimSet("-var1", str) 'Select a new link for each run
iplay = iim1.iimPlay("wsh-extract-jobs")
data = iim1.iimGetLastExtract()
If iplay = 1 and len (data) > 0 Then
MyArray = Split(data, "[EXTRACT]")
' use SQL to insert new data
sql = "insert into tableJobListings (JobTitle, Salary, PositionType, RefCode) values ('" _
& MyArray(0) & "', '" & MyArray(1) & "' , '" & MyArray(2) & "' , '" & MyArray(3) & "')"

' execute sql statement
set rs = db.Execute(sql)

End If

---

PS: If you have any question about this, please post in their forum at http://forum.iopus.com , their support is VERY responsive ;)

Sure you can spend $499.00 for IMacro Scripting Edition or just $49.95 for Newbie Automation Suite, which also comes with a sophisticated Task Manager program.

Newbie looks like a nice program but I never used it, so I can't say much about it. Actually, price was not really the issue when I purchased the software. iMacros was recommended to me by several colleagues and friends. It works very reliably with all the websites I need to automate. Also, I used their tech support twice and always received a solution (not just a reply!) in less a day. So I like it, but YMMV.

PS: As task scheduler/manager, I use the Windows task scheduler. Free with every Windows PC ;)

If you are using MS Office, here it is:

  1. Open a database, or switch to the Database window for the open database.
  2. To import HTML tables or lists, on the File menu, point to Get External Data, and then click Import. To link HTML tables or lists, on the File menu, point to Get External Data, and then click Link Tables.
  3. In the Import or Link dialog box, in the Files of type box, click HTML Documents (*.html; *.htm).
  4. Click the arrow to the right of the Look in box.
  5. Select the drive and folder where the HTML file you want to import or link is located, and then double-click the file name.
  6. Follow the instructions for the Import HTML Wizard or the Link HTML Wizard. Click the Advanced button if you want to edit an import/export specification (import/export specification: A specification that stores the information that Access needs to import or export a fixed-width or delimited text file.) or specify different file and field formats.
  7. If your HTML file contains more than one table or list, repeat steps 1 through 6 for each table or list you want to import or link.

Notes

  • By default, Access converts a hyperlink address to a Hyperlink data type column, but only if all values in a table column or list contain a hyperlink address defined by an <A HREF> HTML tag. You can change the data type when using the Import HTML Wizard or the Link HTML Wizard.
  • A table embedded within a table cell in an HTML file is treated as a separate table when you import or link. A list embedded in a table cell is treated as the contents of a cell, and each item in the list is delimited with the carriage return/line feed characters.
  • For data that spans rows or columns in an HTML table, Microsoft Access 2000 (and later versions) duplicates the data in each cell, whereas Microsoft Excel 2000 (and later versions) stores the data in the first or upper-left cell, and then leaves other cells blank. :)

Is it possible to use this command "create type ADDRESS as object ( Attribute list )" in Mysql?

There is a company http://www.knowlesys.com provide the custom extractor.
You can tell them your target website and requirements they will provide you the custom extractor.

You only need click the Start button to get the data in your local database.

Price is cheap and service is wonderful!:eek:

Hi all, anyone know how to extract data from a web database. For example, how can I extract movie information from http://www.imdb.com/ into a relational database? Is there any existing tools to do this? Or need I write some codes?

Thanks a lot!

That's typically something I would do with OutWit Hub. It's a free firefox extension in which you can easily create scrapers and have the program extract the data to excel files. If you feel like doing some code, I think you can add your own functions in javascript. I can't do that myself but it's already pretty powerful as is for us non tech guys.

Sam

You could also use Automation Anywhere's web data extraction capability. It is very easy to use and I have been using it for quite some time now, without any problems.

Definitely check out OutWit Hub for web data extraction. Basically, any website is made up of a series of components (docs, pics; links; text; rss feeds) and the hub dissects the page into the media you want it in.

Hello. I have a new business and I am trying to launch a new program to real estate professionals, which will help their declined applicatants and others with credit coaching to improve their scores. I am tring to find a web data extractor that will help me in gathering realtor info like name and email addresses from the various realtor directory associations, in lieu of getting them one at a time. Any help would be appreciated since this will in turn help many.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.