I have been trying to process some text about 10000 lines to upwards of 26000 lines of text. I wrote a program to search for text strings and copy the lines around them if they are found.

It searches Line by Line.
I need to improve this code speed so it doesn't take so long.
What I think is happening is when I ask to copy line 1 it copies line 1 then when I want to copy line 2 it reads line 1 then copies line 2. for line 3 copy it reads 1,2 then copies 3. This is killing my buffer and significantly slowing down the process.

Now I could be wrong but I think this is what is going on and I have no Idea how to fix it. I know its possible to read the entire file to memory, but I don't think I can pull search strings line by line against the text when in memory.

I would like to utilize a Find Feature if possible to search the entire document and store the required line numbers in a ListBox. Then use the line numbers from that listbox to copy the lines found plus a few around them (it would need to store lines found + extras in the list box first)

Other possible helpfuls... add memory allocation, or keeping the buffer from Re Reading Line by Line every Single Time it searches a single line of text.

Any Help would be Greatly Appreciated ... Code below

Private Sub Strip_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Strip.Click
        Dim theStringBuilder As New System.Text.StringBuilder()
        RichTextBox1.Text = String.Empty
        MsgBox("THIS MAY TAKE SOME TIME - MAY EVEN SHOW NOT RESPONDING - JUST WAIT - YOU CAN CONTINUE TO WORK WHILE THIS IS RUNNING", vbOKCancel)
        RichTextBox1.Text &= "(THIS DOCUMENT HAS BEEN MODIFIED WITH STRIP-N-SAVE)" & vbNewLine
        'Correcting Blank Values on Form
        If TextBox5.Text = "" Then
            TextBox5.Text = "75"
        End If
        If TextBox4.Text = "" Then
            TextBox4.Text = "20"
        End If
        If TextBox3.Text = "" Then
            TextBox3.Text = "20"
        End If
        If TextBox1.Text = "" Then
            TextBox1.Text = "M6"
        End If
        If TextBox2.Text = "" Then
            TextBox2.Text = TextBox1.Text
        End If

        'Variables
        Dim endLine = CInt(TextBox5.Text)
        Dim LineCount As Integer = Document.Lines.Length
        Dim randarray(0 To LineCount) As Long
        ProgressBar1.Maximum = UBound(randarray)
        Dim StartLine = CInt(TextBox3.Text)
        Dim Ending = CInt(TextBox4.Text)
        Dim J, L, kk As Integer
        'Correcting Input if Document too Short
        If LineCount < endLine - 1 Then
            endLine = LineCount
        End If
        'Copying First Lines At all times
        For x As Integer = 0 To endLine - 1
            RichTextBox1.Text &= Document.Lines(x) & vbNewLine
            J = x
        Next
        'Search For Text and Copy if Exists
        For x As Integer = endLine To LineCount - 1
            If Document.Lines(x).Contains(TextBox1.Text) Or Document.Lines(x).Contains(TextBox2.Text) Then
                If LineCount <= x + Ending Then
                    Ending = 10
                End If
                If LineCount <= x + Ending Then
                    Ending = 5
                End If
                If LineCount <= x + Ending Then
                    Ending = 1
                End If
                If LineCount <= x + Ending Then
                    Ending = 0
                End If
                If LineCount <= x + Ending Then
                    Exit For
                End If
                L = x - StartLine
                kk = L
                If kk <= J + StartLine Then
                    L = J + 1
                End If
                For k As Integer = L To x + Ending
                    RichTextBox1.Text &= Document.Lines(k) & vbNewLine
                    If LineCount <= k Then
                        Exit For
                    End If
                Next
                x = x + Ending
                J = x
                ProgressBar1.Value = x
                'Stop if Document is Too Short
                If x >= LineCount - 1 Then
                    Exit For
                End If
            Else
                If x >= LineCount - 1 Then
                    Exit For
                End If
            End If
            'Stop if Document is Too Short
            If x >= LineCount - 1 Then
                Exit For
            End If
        Next
        ProgressBar1.Value = LineCount
        Try
            My.Computer.Audio.Play("C:\WINDOWS\Media\notify.wav")
        Catch ex As Exception
        End Try
        MsgBox("Document Complete - Press Save", vbOKCancel)
    End Sub

--- NOTE --- Document.Text is a RichTextBox

Recommended Answers

All 26 Replies

I have an idea (was gonna write the code myself, but your project is a little full-fledged already with the textboxes and all). As you read in a new line from the file, basically make a variable that contains the "old" variable, and then replace the current variable with the newline of data. Then have a boolean flag variable... that determines if the "next" line should be saved or not....

I'm a little confused? Exactly how is this going to stop the 10000 ReReads? Won't it still ReRead?

No, No. Imagine something like...

dim curLine as string
dim oldLine as string
dim bFlag as boolean
Dim SavedLines As New Stack
dim fs = New FileStream("file.txt",FileMode.Open,FileAccess.Read)
Dim d as new StreamReader(fs)

d.BaseStream.Seek(0,SeekOrigin.Begin)
while d.peek()>-1
     if bflag = true then
           bflag = false
          SavedLines.push(d.readLine())
     else
          oldLine = curLine
          curLine = d.readline()
          if curLine = "whatever looking for" then
               bFlag = true
               SavedLines.push(oldLine)
               SavedLines.push(curLine)
          end if
     end if
End while
d.close()

er... something like that. You only actually read from the file once, but you keep the last line read in a buffer, and flag if the next line should be saved or not.

You may also use String.Join which is very fast compared to loop + append. For example, lines

For x As Integer = 0 To endLine - 1
   RichTextBox1.Text &= Document.Lines(x) & vbNewLine
   J = x
Next

can be replaced with

RichTextBox1.Text = String.Join(vbNewLine, Document.Lines)
' or RichTextBox1.Text = String.Join(vbNewLine, Document.Lines, 0, endLine)
' if endLine <> UBound(Document.Lines)
J = endLine - 1

Check if you have other similar places in your code.

No, No. Imagine something like...

dim curLine as string
dim oldLine as string
dim bFlag as boolean
Dim SavedLines As New Stack
dim fs = New FileStream("file.txt",FileMode.Open,FileAccess.Read)
Dim d as new StreamReader(fs)

d.BaseStream.Seek(0,SeekOrigin.Begin)
while d.peek()>-1
     if bflag = true then
           bflag = false
          SavedLines.push(d.readLine())
     else
          oldLine = curLine
          curLine = d.readline()
          if curLine = "whatever looking for" then
               bFlag = true
               SavedLines.push(oldLine)
               SavedLines.push(curLine)
          end if
     end if
End while
d.close()

er... something like that. You only actually read from the file once, but you keep the last line read in a buffer, and flag if the next line should be saved or not.

Dim d as new StreamReader(fs)
Causes "Overload Resolution Failed , New cannot be called - Object to String"

--- So I cannot test this feature and I'm not sure what this Error Means.

You may also use String.Join which is very fast compared to loop + append. For example, lines

For x As Integer = 0 To endLine - 1
   RichTextBox1.Text &= Document.Lines(x) & vbNewLine
   J = x
Next

can be replaced with

RichTextBox1.Text = String.Join(vbNewLine, Document.Lines)
' or RichTextBox1.Text = String.Join(vbNewLine, Document.Lines, 0, endLine)
' if endLine <> UBound(Document.Lines)
J = endLine - 1

Check if you have other similar places in your code.

String.Join works very fast, but after it reads its first set of lines it doesn't re-Loop. So I can get the first 75 characters.. then when Textbox1.Text Appears it copies 20 lines before that to 20 lines after. It than doesn't do anything, Why, I don't know?

I got it ro Re-Loop, but it overwrites Line 1 " This Document..." and String.Join doesn't let me add "TEXT".

The Write Isn't really the Slowdown though. It is the ReadLines. But this will help some, as Long as I can figure out how to make the first line say "(This Document..."

Sorry about that, up at the very top you need to add imports system.io (just above "public class").

Sorry about that, up at the very top you need to add imports system.io (just above "public class").

I already have this... but I still have an error at " Dim ds As New StreamReader(fs) "

What would cause this to be a problem?

try this: Dim d = New StreamReader(fs)

Now " New StreamReader(fs) "is the OverLoad Resolution

This is about as simple of code as I could muster together that will show the problem.

How do we fix this from Taking 10 minutes to find strings of Text that I know that a standard CTRL+F can find in Less than one Second?

Private Sub Strip_Click
        Dim endLine = CInt(TextBox5.Text)
        Dim LineCount As Integer = Document.Lines.Length
        Dim randarray(0 To LineCount) As Long
        ProgressBar1.Maximum = UBound(randarray)
        Dim StartLine = CInt(TextBox3.Text)
        Dim Ending = CInt(TextBox4.Text)
        Dim J, L, kk, xx, y As Integer
        'Correcting Input if Document too Short
        If LineCount < endLine - 1 Then
            endLine = LineCount
        End If
        For x As Integer = 0 To endLine - 1
            ListBox1.Items.Add(x)
        Next
        For x As Integer = endLine To LineCount - 1
            If Document.Lines(x).Contains(TextBox1.Text) Or Document.Lines(x).Contains(TextBox2.Text) Then
                xx = x - StartLine
                For k As Integer = xx To x + Ending
                    ListBox1.Items.Add(k)
                Next
            End If
        Next
        GoTo last
---Some Junk---Other Working Code---
Last:
End Sub

Maybe its because this is how I wrote it?

Dim curLine As String
        Dim oldLine As String
        Dim bFlag As Boolean
        Dim SavedLines As New Stack
        Dim fs = New FileStream(Document.Text, FileMode.Open, FileAccess.Read)
        Dim ds = New StreamReader(fs)
        ds.BaseStream.Seek(0, SeekOrigin.Begin)
        While ds.Peek() > -1
            For x As Integer = endLine To LineCount - 1
                If bFlag = True Then
                    bFlag = False
                    SavedLines.Push(ds.ReadLine())
                Else
                    oldLine = curLine
                    curLine = ds.ReadLine()
                    If curLine.Contains(TextBox1.Text) Or curLine.Contains(TextBox2.Text) Then
                        bFlag = True
                        SavedLines.Push(oldLine)
                        SavedLines.Push(curLine)

                    End If
                End If
            Next
        End While
        ds.Close()

but I don't know as I cannot find another better way

Dim FileLines() As String = File.ReadAllLines("c:\output.txt")
        Dim SavedLines As New ArrayList

        For I = 0 To FileLines.Count - 1
            If InStr(FileLines(I), "48") <> 0 Then
                SavedLines.Add(FileLines(I))
            End If
        Next I


        For I = 0 To SavedLines.Count - 1
            ' // Put Whatever you want to do with the "found" items here
            '// instead of the richtextbox1.
            RichTextBox1.AppendText(SavedLines(I).ToString)
        Next I

I changed to this

Dim FileLines() As String = File.ReadAllLines(Document.Text)
        Dim SavedLines As New ArrayList
        For I As Integer = 0 To LineCount - 1
            If InStr(FileLines(I), "48") <> 0 Then
                SavedLines.Add(FileLines(I))
            End If
        Next I

In debug I get "Illegal Characters in Path" from the First Line Dim FileLines().

Can I not load from an already existing RichTextBox?

What about using the Find Feature? I now that Document.Find(TextBox1.Text) works (for what character)... but I don't know how to apply it to find what Line, or how to get it to start at the line it left off from.

What is the Syntax for Finding the First Line a string is present. then Finding the Second line? This needs to be done without using Contains and Scraping Text Line by Line.

Do you mean, how to use [array].Find ?

Here's an example

Private Function Match(ByVal str As String) As Boolean
  '
  Return str.IndexOf("line") >= 0

End Function

Private Sub Button1_Click(ByVal sender As Object, ByVal e As System.EventArgs) Handles Button1.Click
  '
  Dim ThisIX As Integer
  Dim Document(3) As String
  Document(0) = "line1"
  Document(1) = "another line"
  Document(2) = "not this one"
  Document(3) = "last line"
  ThisIX = Array.FindIndex(Document, 0, AddressOf Match)
  Do Until ThisIX < 0
    Debug.Print(Document(ThisIX))
    ThisIX = Array.FindIndex(Document, ThisIX + 1, AddressOf Match)
  Loop

End Sub

with a hard-coded string to search for.

Addressof Match " does not fit, Match is an Error. What should this Match be?

All I want is to speed up the Text Find and Save to String Process. I have tried 4 different Methods. All of which are slow. I have yet to actually USE any FIND methods, only Line by Line Compare Methods. I have tried both StringBuilder and StreamBuilder.

At this point I'm loosing interest in getting any better at Visual Basic Code as I have no Idea why a TEXT FIND program cannot be used. it exists in NotePad...

Yes and No. Basicly Highlighting Text is not the answer, but Finding the Text is the Answer. I don't need to highlight. I just need to know the Line Position of the Text String and for each found put information into a ListBox. Doing this in a user control is a possible Option, but there shouldn't be any reason to do it this way.

FYI...
There is a significant amount of problems with the code they posted because VB 2005 doesn't like the code. I followed the installation instruction exactly to test the code, most Errors are that SelectedRTF is not Part of RichTextBoxHS. So, what does that mean? I didn't make this I do not know.

Now... This Runs Extreamly Fast, But it picks Either the Incorrect lines or something else (Not sure)

RichTextBox2.Clear()
        RichTextBox2.Text = String.Empty
        ListBox1.Items.Clear()
        MsgBox("THIS MAY TAKE SOME TIME - MAY EVEN SHOW NOT RESPONDING - JUST WAIT - YOU CAN CONTINUE TO WORK WHILE THIS IS RUNNING", vbOKCancel)
        RichTextBox2.Text &= "(THIS DOCUMENT HAS BEEN MODIFIED WITH STRIP-N-SAVE)" & vbNewLine
        Dim LineCount As Integer = RichTextBox1.Lines.Length
        Dim List As Integer
        For x As Integer = 0 To 75 - 1
            ListBox1.Items.Add(x)
        Next
        '---------------------------
        'Finding the Characters M6 and there postion
        Dim iCount As Integer
        Dim Pos As Integer = 1
        Dim iPos
        Dim sString As String = RichTextBox1.Text ' String Given by User
        Dim Text1 As String = "M6"
        iCount = 0
        Dim xy As Integer = Len(sString)
        Dim JJ As Integer
        For iPos = 1 To xy - 1
            If Mid(sString, iPos, Len(Text1)) = Text1 Then
                JJ = RichTextBox1.GetLineFromCharIndex(iPos) '--WHAT---something here does not work correctly.
                ListBox1.Items.Add(JJ)
                Refresh()
            End If
        Next
        ' Display the result
        Refresh()

        'Delete Duplicates in the ListBox
        For i As Int16 = 0 To ListBox1.Items.Count - 2 ' Why -2 here and            
            For m As Int16 = ListBox1.Items.Count - 1 To i + 1 Step -1
                If ListBox1.Items(i).ToString = ListBox1.Items(m).ToString Then
                    ListBox1.Items.RemoveAt(m)
                End If
            Next
        Next

        '----------------------------
        Dim listlines As Integer = ListBox1.Items.Count
        For y = 0 To listlines - 1
            RichTextBox2.Text &= RichTextBox1.Lines(ListBox1.Items(y)) & vbNewLine '--ERROR-- Says I'm out of bounds.
        Next

I hate to tell you, but that too loops through all the lines. I guess I'm confused on what you want to do.... (at least until now). I've been under the impression you wanted to open a file, find the line with a given word or so, and keep it (doesn't matter where, in an array or a listbox is not relevant). Now it seems like you want to have the text put into a textbox and search the textbox for the given word.....

I hate to tell you, but that too loops through all the lines. I guess I'm confused on what you want to do.... (at least until now). I've been under the impression you wanted to open a file, find the line with a given word or so, and keep it (doesn't matter where, in an array or a listbox is not relevant). Now it seems like you want to have the text put into a textbox and search the textbox for the given word.....

Array in a Textbox? I have always had the Text In a RichTextbox. Then I Search it. Ultimately I'm doing the same thing. I'm just making it possible to see what Lines I have been copying by saving the Lines to a ListBox. For some reason this seems to be Exceptionally Faster. I was able to search an entire document of 15000 lines in about 2 seconds, although I did something wrong the entire process only took about 10-15 seconds until it surpased the end of the document and said I was out of bound.

I would like to skip this way later, but for building purpose it seems to help keep out the bugs. and Since i have a way of deleting duplicate Lines of text so I don't copy them twice it may be more useful.

I have word wrap as False and I get different results from GetLinefromCharIndex and Lines.Length.

What else can I do?

Optimize lines 21-27:

iPos = sString.IndexOf(Text1)
Do Until iPos < 0
   JJ = RichTextBox1.GetLineFromCharIndex(iPos)
   ListBox1.Items.Add(JJ)
   Refresh()
   iPos = sString.IndexOf(Text1, iPos + 1)
Loop

Now your number of loops is equal to number occurrences of the search string instead of the length of the whole text.

I was able to solve the problem using Standard TextBox control Vs. RichTextBox Control which for some reason makes ERRORS / Problems with GetLineFromCharIndex and Counts the Incorrect Line

Here is the Code I used for VB 2005

For iPos = 1 To xy
                If Mid(sString, iPos, Len(Text1)) = Text1 Or Mid(sString, iPos, Len(Text2)) = Text2 Then
                    JJ = TextBox1.GetLineFromCharIndex(iPos)
                    For k As Integer = JJ - LinesBefore To JJ + LinesAfter
                        If k >= TextBox1.Lines.Length Then
                            Exit For
                        End If
                        ListBox1.Items.Add(k)
                    Next
                    Refresh()
                End If
            Next
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.