Hi , I am currently doing a university project 'creation of a web mining tool to gather information about individuals and classify them semantically'.

I am currently lost and don't know how to start. Does any one have tutorials links that explain how to use java to connect to website and extract information ?

Thanks in advance.

Recommended Answers

All 9 Replies

Thanks for your quick response.It's sure a start. However, when you want to use JavaSwing to make an interface for codes, and uses classes, how do you integrate classes and methods inside the interface?

I learnt java with Eclipse, and just wrote programs and run them without interfaces as such.

Can anyone suggest give a sample code , that integrates classes, methods in javaSwing ?

Just out of curiosity, did anyone mention the word "ethics" or "responsibility" when they gave you this assignment? I only ask because it seems an odd sort of assignment to give to someone who obviously doesn't actually know the language yet, and I'm wondering what other areas have been skipped over in your training. Data mining, pulling personal details off of web sites, goes right up against the thin line between technically interesting and morally undefensible, and climbs up on that line, and begins to do vigorous backflips.

When I asked my lecturer about the law infringement that might occur while getting user information, he said that as long as this is a research project, it's ok.

FYI, I know java, I got a B in that module last year. Therefore I wonder why and where " who obviously doesn't actually know the language .." comes from.

I suppose my question must be too simple for you to give an answer, please don't bother.

Data mining is not as negative as you portray. The web is changing , everything is getting automated, why not web search? Semantically classified information can help to create an advanced search.

P.S:- If you can't help, stop criticizing and do your "vigorous backflips" elsewhere.

Actually, Jon is very helpful around here. He assumption that you don't know the language well stems directly from the fact that you posted that you didn't know how to use classes and method within interfaces.

Perhaps you can start here to learn interfaces: http://download.oracle.com/javase/tutorial/uiswing/

Thanks for your answer. I know how to create classes and their methods. It must really be easy to integrate them in interfaces judging from the answers am getting. Thanks for your link.

Roodra - I'm sorry if my remarks on ethics seemed to be directed at you. I was actually more concerned about the ethical imbecile who's lecturing you. (although your evident embrace of the naive cynicism of your instructor is disturbing in its own right) Yes, data mining is in itself an innocent set of techniques and an interesting area of research. A good friend of mine is doing a PhD in this area just now and it's fascinating to learn some things about it, reading over his shoulder as it were. But it should be pretty obvious that the techniques involved are easily turned to the ever-burgeoning field of screwing people over, and that a little concern for the potential ethical (not merely the legal) consequences of invading people's privacy wholesale might be in order. Perhaps not to you, you are evidently young enough to believe what you're told. That's fine, you'll grow out of that. But your instructor really ought to spend a few minutes thinking about what he's up to before he offers as an assignment the task of scraping web sites to assemble personal data on individuals. He's certainly out of his league when it comes to his advice on the legality - if the collection and collation of the data were offensive to the law, pleading that it was done for research would not get you anywhere. And I wonder what the human subjects committee at your institution would say about this. It might easily be considered as an edge case, shading towards human experimentation, and reputable institutions take a very dim view on that. For example, I've seen psychology studies on cheating nixed by human subjects committees because the questions would have inadvertantly elicited combinations of data that would uniquely identify individual students. Well, that's exactly what your assignment aims to do, only it's by filtering already available data. What do you think a human subjects committee would do if you told them you were planning on aggregating unspecified personal data on potentially identifiable individuals, and that you had given no thought to securing that data, obscuring identities, or any other means of protecting the innocent targets of your research? Subjects who, needless to say, have neither been informed of your intent nor been given an option to opt out of the study, and stand to make no personal gain from your discoveries, but only stand to risk revelation of information they rightly regard as private. They might end up approving such work, honestly, but it seems to me that they'd at least think about it and consider the consequences.
That this wasn't a consideration for your professor suggests to me that he's not thinking seriously about what he's doing. You might want to deviate from his example in this regard.

We might have some productive discussion on what sorts of data mining are ethically permissible, under what circumstances it's not a violation of someone's privacy to aggregate data freely made public in isolation, that sort of thing. But to pretend that these are not issues worth considering seems to me a willful blindness, unworthy of a serious person.

I hope that clarifies my previous post to some extent. I look forward to reading your thoughts on this - and of course, anyone else's thoughts who cares to join in.

I apologize if I was rude before. Data mining personal information will certainly require permission from the owners of the data, that is the person himself. Any sites having personal data of individuals have secured it to prevent mining of these information, from what I have found from numerous social network sites.

I am going to propose to my lecturer that I create a site by myself, put in false data about fictional individuals and then using my java mining tool to get those information.

From what you have said, it seem to me that I have embarked on something I know scarce, and please be at rest, that I'll not to the best of my ability do something that is unethical (using real individual information without their consent ).

Thanks for your patience, and grudgingly , to your interesting insight as well as your invaluable time taken to explain.

commented: A polite and gracious response, thanks for that! +1

A very fine and fair response, and I'm interested to hear what your instructor makes of your proposal. I expect that he'll have some issues with whether you would be able to generate usable data, and whether that would distract you from the project. He might well be right in that - generating lifelike data would be a major undertaking in its own right.
I don't want to steer you away from this assignment, by any means. If you're going to talk sensibly about this business, and either pursue it or decide not to pursue it, you should certainly learn all you can about it, but learn about it in a way that satisfies any concerns you have (including any that I've managed to raise in you). If there is a school of philosophy in your university, perhaps you can discuss the matter with someone in that school. They might help you develop a set of practical steps to fulfill the assignment in an ethically responsible way. Or, if there is a psychology department they will possibly have some people who concern themselves with experimentation on human subjects. That committee may well not have policies to cover this situation, but they're equipped to consider the problems, and to give you examples of how this sort of thing is addressed in other sorts of research.
If there's no luck there, perhaps you can develop at least the first step to a policy. For example, all data gathered and aggregated might be held as sensitive information and the aggregations destroyed after the research is completed. Any aggregations to be presented in findings should undergo some sort of anonymization (depending on what your study is, this might be hard or easy to do). It might be plausible to post a notice on the site(s) being studied alerting the participants of your intent, outlining your privacy policy, and informing them of how they can opt out - again, this assumes some things about your study which may not be the case. here may be other things I haven't thought of, but there's a start.

You should certainly, however, try to find a morally defensible way to learn more about data mining from the practical side. Either you will do more of this sort of work in the future, or you will not, and that decision should be conditioned on real knowledge of what's involved, assuming that you can get that knowledge without sacrificing your own principles.

And who knows, you might find form all of this that the study of ethics is one that appeals to you - it's a different sort of logic, but still a logic. And with any luck, you'll also give your professor something to think about in the future, and that's always good.

Best of luck, and thanks for your response.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.