Hi Everyone,

Most recently, I've been working on a small project to sort through multiple text files which have all the same strings. Basically, I am looking to sort through and create a text delimited file for future use. The below script is what I have come up with so far, and have tried several "For Loops" but found that I was lot worse off than working with the "Do";

Dim dlgOpen As New OpenFileDialog()
        dlgOpen.CheckFileExists = True
        dlgOpen.DefaultExt = "txt"
        dlgOpen.InitialDirectory = "C:\DocBase"
        dlgOpen.Multiselect = False

        Dim strDocumentName As String = ""
        Dim strDocumentNumber As String = ""
        Dim strDocumentWorkOrder As String = ""

        If dlgOpen.ShowDialog <> Windows.Forms.DialogResult.OK Then Exit Sub

        Using readDocFile As New IO.StreamReader(dlgOpen.FileName)
            Dim strCurrentLineOne As String = ""
            Dim strCurrentLineTwo As String = ""
            Dim strCurrentlineThree As String = ""

            Do
                strCurrentLineOne = readDocFile.ReadLine

                If strCurrentLineOne.Contains("Document Name:") Then
                    strDocumentName = strCurrentLineOne.Substring(14)
                    Exit Do
                End If
            Loop Until strCurrentLineOne Is Nothing

            Do
                strCurrentLineTwo = readDocFile.ReadLine
                If strCurrentLineTwo.Contains("Document Number:") Then
                    strDocumentNumber = strCurrentLineTwo.Substring(16)
                    Exit Do
                End If
            Loop Until strCurrentLineTwo Is Nothing

            Do
                strCurrentlineThree = readDocFile.ReadLine
                If strCurrentlineThree.Contains("Document Work Order:") Then
                    strDocumentWorkOrder = strCurrentlineThree.Substring(20)
                    Exit Do
                End If
            Loop Until strCurrentlineThree Is Nothing
        End Using 'readDocFile closes and disposes

        TextBox1.Text = strDocumentName & ", " & strDocumentNumber & ", " & strDocumentWorkOrder

Basically, what this little script does, or should do, is loop through the text file (shown below) once opened through the "OpenFileDialog", and looks for "Document Name:", "Document Number:" and "Document Work Order:"; (There are thousands of these text files, some large and some small, but all with the same String Scheme).

Document Name:Document 1
Document Number:1234567
Document Work Order:38VZ-001
Document Name:Document 2
Document Number:3456789-123
Document Work Order:38VZ-002
Document Name:Document 3
Document Number:456912-123-12V1
Document Work Order:38VZ-003
Document Name:Document 4
Document Number:891234
Document Work Order:38VZ-004

In return, after the substrings remove the requested chars, I am expecting that the script would output these lines and sort them to TextBox1 just like so?:

Document 1, 1234567, 38VZ-001
Document 2, 3456789-123, 38VZ-002
Document 3, 456912-123-12V1, 38VZ-003
Document 4, 891234, 38VZ-004

The script does work, but the problem is that it only returns one line to TextBox1 (Multiline TextBox)as shown here;

Document 1, 1234567, 38VZ-001

I am more than certain that something I've done (or not) is, well, not right. And more certain than not, the issue remains between the chair and the keyboard on this one...

Would anyone have any thoughts, scripts, and/or samples that can point me in the right direction?

Happy coding everyone!

Thanks!

VNexus

change

TextBox1.Text = strDocumentName & ", " & strDocumentNumber & ", " & strDocumentWorkOrder

to

TextBox1.AppendText(strDocumentName & ", " & strDocumentNumber & ", " & strDocumentWorkOrder & vbnewline)

I would do it a bit different if I were you.
Since you've got field identifiers (Document Name, Number or whatever) I would read the next line, check the type of the field and assign it's value to a var.
When all 3 vars get a value, then append the text to the textbox and clear the vars.
If you don't find one of the expected fields then stop the process and report an error.
If you find yourself assigning value to a var that already contains a value then you've got a corrupt file - or a file with a different structure. Stop the process and report an error or display the data to a second textbox.

This way you'll have control over what you are doing, with just 1 loop and with validation on your data.

Hey guys, thanks for the tips! Adam, those are some good thoughts, and that is some part of the plan, but unfortunatley I'm still stuck on trying to get all of the strings to appended as they should in the text box, unless I've missunderstood your points in your suggestion. My issue is that it only wants to read and fetch one line, and I think that this may have to do with how the "exiting" is happening after it finds the first line for all 3 identifiers.

GeekByChoice, that was a good thought and it works, but it only re-fetches the first line and keeps appending that line each time when I open the file.

I may have to re-think this script as I feel that it needs to count the lines that it sees. I've already tried commenting out the "Exit Do" but that logically throws object reference errors.

Again, thanks guys for taking a look!

VNexus

maybe this helps you:

Dim dlgOpen As New OpenFileDialog()
        dlgOpen.CheckFileExists = True
        dlgOpen.DefaultExt = "txt"
        dlgOpen.InitialDirectory = "C:\DocBase"
        dlgOpen.Multiselect = False

       If dlgOpen.ShowDialog <> Windows.Forms.DialogResult.OK Then Exit Sub
		Dim regEx As New Regex("\bDocument Name:(.*)" & Environment.NewLine & "Document Number:(.*)" & Environment.NewLine & "Document Work Order:([0-9A-Z]{4}-[0-9]{3})", RegexOptions.Multiline)
		For Each _match As System.Text.RegularExpressions.Match In regEx.Matches(IO.File.ReadAllText(dlgOpen.FileName))
			Debug.WriteLine(_match.Value)
			If _match.Groups.Count = 4 Then	'first group is the whole match
				TextBox1.AppendText(String.Format("{0}, {1}, {2}{3}", _match.Groups(1).Value, _match.Groups(2).Value, _match.Groups(3).Value, Environment.NewLine))
			End If
		Next

You can get rid of your whole using statement
Also you need to import System.Text.RegularExpressions

above code prints out:
Document 1, 1234567, 38VZ-001
Document 2, 3456789-123, 38VZ-002
Document 3, 456912-123-12V1, 38VZ-003
Document 4, 891234, 38VZ-004

Edited 5 Years Ago by GeekByChoiCe: n/a

GeekByChoice! Wowwwwwwwww!!! I was not expecting this! This is funny, I just picked up a book this morning regarding the usage of regex. It appeared that more and more of what I needed to do, regex would most likely handle the multiline function - but Wowwwww! I now have a far better understanding - even far better than what this book here is talking about! You are an excellent teacher!

I'm going to play with this a little bit and repost the code in it's entirety, then mark this thread as resolved.

Again, absolultely fantastic thinking behind this GeekByChoice! - and many thanks!

VNexus

GeekByChoice, here is the code in its entirety, with a few other goodies added. Also, you will probably note that where I've changed;

If _match.Groups.Count = 4 Then

to;

If _match.Groups.Count >= 1 Then

which helps to capture all lines. Some of the files are uneven which may only count for 1 perhaps 2 or 3, but this allows to capture them all.

Anyway, I just really want to extend my many thanks to you, and left the code here in the event that anyone else should run into this issue.

Again, thank you very much! (I've marked this thread as solved)

VNexus

Here's the code:

Imports System
Imports System.Collections.Generic
Imports System.Text
Imports System.Text.RegularExpressions
Imports System.IO


Public Class Form1

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

        Dim i As Integer
        Dim dlgOpen As New OpenFileDialog()
        dlgOpen.CheckFileExists = True
        dlgOpen.DefaultExt = "txt"
        dlgOpen.InitialDirectory = "C:\DocBase"
        dlgOpen.Multiselect = False

        If dlgOpen.ShowDialog <> Windows.Forms.DialogResult.OK Then Exit Sub
        Dim regEx As New Regex("\bDocument Name:(.*)" & Environment.NewLine & "Document Number:(.*)" & Environment.NewLine & "Document Work Order:([0-9A-Z]{4}-[0-9]{3})", RegexOptions.Multiline)

        For Each _match As System.Text.RegularExpressions.Match In regEx.Matches(IO.File.ReadAllText(dlgOpen.FileName))

            Debug.WriteLine(_match.Value)
            If _match.Groups.Count >= 1 Then
                TextBox1.AppendText(String.Format("{0}, {1}, {2}{3}", _match.Groups(1).Value, _match.Groups(2).Value, _match.Groups(3).Value, Environment.NewLine))
            Else
                TextBox1.Text = ""
            End If

        Next

        If TextBox1.Text = "" Then
            For i = 0 To 22
                ProgressBar1.Value = i
                Application.DoEvents()
                System.Threading.Thread.Sleep(1)
            Next
            MsgBox("This file format does not match. Conversion process cancelled.", MsgBoxStyle.Exclamation, "DocBase Conversion Process Cancelled")

        Else
            ProgressBar1.Minimum = 0
            ProgressBar1.Maximum = 30
            For i = 0 To 30
                ProgressBar1.Value = i
                Application.DoEvents()
                System.Threading.Thread.Sleep(1)
            Next
            MsgBox("DocBase file format conversion successful!", MsgBoxStyle.Information, "File Conversion Success")
        End If


    End Sub
    'Save New File
    Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        Dim FileWriter As StreamWriter
        Dim results As DialogResult
        Dim dlgSave As New SaveFileDialog()

        dlgSave.Title = "Doc Base File Converter - Save New DocBase File"
        dlgSave.FileName = "New DocBase.txt"
        dlgSave.InitialDirectory = "\New DocBase Files"
        dlgSave.Filter = "Doc Base Files (*.txt)|*.txt"

        results = dlgSave.ShowDialog

        If results = DialogResult.OK Then

            Dim i As Integer
            ProgressBar1.Minimum = 0
            ProgressBar1.Maximum = 30

            For i = 0 To 30
                ProgressBar1.Value = i
                Application.DoEvents()
                System.Threading.Thread.Sleep(1)
            Next

            FileWriter = New StreamWriter(dlgSave.FileName, False)
            FileWriter.Write(TextBox1.Text)
            FileWriter.Close()

            MsgBox("The new formatted DocBase file has been saved successfully!", MsgBoxStyle.Information, "File Saved Successfully")

        Else
            For i = 0 To 22
                ProgressBar1.Value = i
                Application.DoEvents()
                System.Threading.Thread.Sleep(1)
            Next
            MsgBox("Save to new DocBase format was either cancelled or aborted.", MsgBoxStyle.Exclamation, "File Save Cancelled")

        End If

    End Sub
End Class
This question has already been answered. Start a new discussion instead.