Hello everyone,

I have a PDF file which I converted to a text file, and now I want to read the whole thing, search for the keyword "SITUATIONAL" and insert it all into the database and where "SITUATIONAL" will be the key entry for printing it out on the web.

I have no idea on how to continue with this, I sort of got somewhere with HTMLAgilityPack and C#, but my boss wants me to use VB unfortunately :(

Any pointers in the right direction would be greatly appreciated.

  • Robert.
    Note: If there is a way to import/read PDF files directly, that would be even greater so I won't have to convert them to a .txt file.
    Note 2: Here is a link for a sample of the text file.

Recommended Answers

All 6 Replies

I want to read the whole thing, search for the keyword "SITUATIONAL" and insert it all into the database and where "SITUATIONAL" will be the key entry for printing it out on the web.

Let's take an example. Using your linked text file, what, specifically do you want to insert into the database? Please define "all". Also, what do you mean by "will be the key entry for printing it out on the web"? Reading the file and extracting text should not be a problem once you can provide more detail.

The system.IO namespace includes several different ways this can be accomplished. Basically you want to read the file one line at a time and look for "SITUATIONAL" in the line then act when you find it. There is the option to read the whole file into memory, but if the file gets very large you could run out of memory.

There is not much info here to offer advice on, but the basic pattern could be something like this:

      Dim strm As IO.Stream = IO.File.OpenRead("pdfText.txt")
      Dim sr As New IO.StreamReader(strm)
      Dim line As String
      Dim trimchars() As Char = {" "c}
      Do While sr.Peek <> -1
         line = sr.ReadLine()
         If line.TrimStart(trimchars).StartsWith("SITUATIONAL") Then
            ' found pattern 
         End If
      Loop
      sr.Close()

By the looks of the text file, you can use a System.IO.ReadLine method to read the lines of the file. Then, when you find a line that begins with 'SITUATIONAL', you split the line into records.

continuing TnTinMN example at line 8 you can place:

Dim records = line.Split(vbTab) ' assuming the tab seperates the fields.

record(0) would be 'SITUATIONAL'
record(1) would be 'SBR02'
record(3) would be '1069'

etc...

From there, you can insert the data into the database.

We can't assume the file is tab delimited. We also can't assume that the only line of interest is the line identified by SITUATIONAL. The spec (and I use the term very loosely) stated "search for the keyword "SITUATIONAL" and insert it all into the database". "All" could very well mean the entire block defined ffrom SITUATIONAL to the next SITUATIONAL. My first rule of programming is "if you don't know what the code is supposed to do then you shouldn't start writing it".

Yes, clearer definition and sample code would help...

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.