I'm building a small information retrieval (IR) application using VB6. I'm stuck at calculating the tf.idf part. I've google for the solutions/codes, but didn't find anything in vb6, except other languages such as phyton etc (which I'm not familiar with). I'm also a beginner programmer in vb6. I've looked at VB6 Stuff and Tricks Thread (http://www.daniweb.com/forums/thread214396.html), but so far couldn't find the answer here and elsewhere.

If someone could show me the sample code or have knowledge on such links that provided the solution, please let me know. Need help here. Thanks in advance.

6 Years
Discussion Span
Last Post by noraz

If you can tell us a bit more about tf/idf, what it does, we might be able to supply some code. I had a quick look at some explanations and it did not make much sense to me.

What exactly would you like the app to do?


TF referring to Term Frequency and IDF is Inverse Document Frequency. I'm using these later to rank documents.

I have a collection of text documents. I have indexed all the terms in that documents (by applying tokenizing and stemmer). In the stemming output, it will has list of these terms together with their document id. The output of the stemming is a text file with comma delimited.

What I have to do is to calculate for each term in the list, how many does it appears in a document.

While IDF is done by first dividing the total number of documents by the number of documents that contains the actual keyword in question. Then taking the log of the result. Let say, in 10 docs that I have, only 3 docs contained the word "computer". The calculation should be log (10/3).

I'm really not good in converting mathematical formula into source code.


It makes a bit more sense now. You will probably have to start using file system objects/functions. Just search Daniweb, there are plenty of sample codes.

The following link is a full on tutorial with sample code. I'm sure this will give you the solution you need to read from your files, get the required data and save it to another file called say MyLogs.txt etc.


This article has been dead for over six months. Start a new discussion instead.
Be sure to adhere to our posting rules.