I'm new with java programming so I'm sorry if my question is a lazy one. I'm working on a project that consists on collecting job offers from the web. So as a first step, I want to extract data (job offer data) from a specific webpage. So I want to know if there is an API or an existing code that can help me.
Thank you for your time and effort.

2 Years
Discussion Span
Last Post by peter_budo

YOU can use Java Sockets to connect to a web site, send an HTLM request, and read the response. The hard part is usually parsing the web page to extract the right data.
The first half of this tutorial shows you how to get the web page's HTML. The parsing of that HTML is probably going to be a messy tangle of String searching and substringing, maybe with some XML parsing.

Like stultuske said, this isn't really a suitable project for a complete Java beginner.


To add up on previous replies, you may want to check if they have resources for developers or API that is publicly accessible.
example this forum has API https://www.daniweb.com/api/documentation#fetch-articles where you can get JSON of all forums https://www.daniweb.com/api/forums
If not check also loading of resources from browser consoleoften if you lucky they may be using REST service that returns structured content often in JSON format finding. Example DICE Battlelog, exposes player statistics here http://battlelog.battlefield.com/bf4/warsawoverviewpopulate/177958806/2/ and contains many other data that are publicly accessible.

HTML parsing with JSOUP should be last think on your mind as it is messy, time consuming and prone to webpage changes

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.