Member Avatar for Geek-Master

What is the best approach to scan a file if you are looking for more than one pattern. Lets say you have a text file "example.txt", and you want to search for an array of strings {"error", "warning", "failed"} within the file. What is an efficient way of doing this?

Hi,

Dim aStreamReader As System.IO.StreamReader = New System.IO.StreamReader("file.txt")
For Each s As String In aStreamReader.ReadToEnd().Split(" ")
'Process your space delimited tokens here
'Split() accepts more than one deimiter as an array if you need
Next

Or you can use aStreamReader.ReadToEnd().Split(" ").IndexOf(.....) iteratively if you don't want to process each token.

Member Avatar for Geek-Master

that makes since, but what about if you need to compare each word to a list of "search words."

Let say we break it down into pseudo code

1. Open File Stream
2. Read in a line from file
3. Split the line into words and place into an array
4. For each word in array loop through it
5. Place current word into variable
6. For each "search word" in array loop through
7. Compare current word to current search word
8. If a match flag document as found or count errors
9. If no errors found, message user that everything is good

So will this cause to much processing time (depending on the document you scan, and the list of words your looking for). Or is there a faster and more efficient way of going about it?

This is more ASP.NET solution, but might be what you need here, not sure...

Dim strList As String
        Dim strLine As String
        Dim arrList As Array
        Dim lenList As Integer
        Dim i As Integer
        Dim chkErr As Integer
        Dim boolErr As Boolean

        boolErr = False
        strList = "error,warning,failed,blah,blah2,etc"

        ' strList contains all the words you want to find delimitted by commas

        arrList = Split(strList, ",") ' splits the string based on commas and places them in an array

        lenList = UBound(arrList) ' gets the length of the array

        ' import text line by line in whatever fasion you see fit
        ' say for example that each line is placed into a string called strLine


        For i = 0 To lenList
            chkErr = InStr(strLine, arrList(i))
            If chkErr > 0 Then
                boolErr = True
            End If
        Next i
        If boolErr = True Then
            MsgBox("Error has been detected")
        Else
            MsgBox("No errors detected")
        End If

         '  loop here

I have used this before to detect HTML in posts on Guestbooks and Forums

Hi,

Just create a string array of your searched words and for each token parsed call searched_words.indexOf(this_token). Something like this

Dim s As string = (New System.IO.StreamReader("file.txt")).ReadToEnd()
For Each searched_token As String In searched_words 'OR : In (new String() {"tkn1","tkn2"})
if s.indexOf(searched_token) > 0 then
' this token is found
' you can keep looking for the same token further by using more indexofs
' with the second parameter
end if
Next

Loren Soth

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.