How to read a text file that doesn't have constant formatting?

Question

weasel7711 0 Junior Poster in Training

13 Years Ago

I wrote a program to analyze a log file for a machine that my company repairs.
The program that runs the machine spits output into a text file (.log) and my program will analyze it and return the results of different calculations to the user.

The log file idealy looks like this most of the time

This is ideal formatting and USUALLY is the case. Now out of each of those 4 line "paragraphs" only the first and the last lines (eg >22 and 9000) are analyzed. The middle two lines are ignored. So the program that I wrote reads in 5 line "chunks" until the end of the file. It reads the first line, removes the ">" and stores the value into an int array. It then reads the next 3 lines and stores the last line read into another integer array. After reading is done, the arrays are analyzed and output is displayed to the user.

However, occasionally the program that interacts with the machine spits out random newlines and the formatting is different, like this:

So I am wondering if anyone could give input to me on how to write an algorithm that will read this data correctly regardless of the newlines. Is there a way to read a line and ignore blank lines?
I originally wrote this program in C++ but I made a C# GUI version so that it would be easier for the users to use.

Any ideas?
Thanks
-Weasel

3 Contributors
5 Replies
446 Views
6 Hours Discussion Span
Latest Post 13 Years Ago Latest Post by weasel7711

Mitja Bonca 557 Nearly a Posting Maven

13 Years Ago

Tell me, which values you want to get from this upper example? In each part are these 2 numbers:
1. the number on the right side of th ">" mark, and
2. the number which is just above the next ">" mark?

So in this example:
>25
233966

300156
89980

you would like to get 25 and 89980. Am I right?

Edited 13 Years Ago by Mitja Bonca because: n/a

Momerath 1,327 Nearly a Senior Poster

13 Years Ago

using System;
using System.IO;

namespace TestBed {
    class TestBed {
        static void Main() {
            StreamReader sr = new StreamReader("Test.txt");
            String currentLine = null;
            String startLine = null;
            String lastGoodLine = null;
            Boolean lookingForStart = true;

            while (sr.EndOfStream == false) {
                if (lookingForStart) {
                    currentLine = sr.ReadLine();
                    if (currentLine.StartsWith(">")) {
                        startLine = currentLine;
                        lookingForStart = false;
                    }
                } else {
                    if (sr.Peek() == '>') {
                        Console.WriteLine("Start Line -> {0}{1}End Line ->{2}", startLine, Environment.NewLine, lastGoodLine);
                        lookingForStart = true;
                    } else {
                        currentLine = sr.ReadLine();
                        if (String.IsNullOrEmpty(currentLine.Trim()) == false) {
                            lastGoodLine = currentLine;
                        }
                    }
                }
            }

            if (lookingForStart == false) {
                Console.WriteLine("Start Line -> {0}{1}End Line ->{2}", startLine, Environment.NewLine, lastGoodLine);
            }

            Console.ReadLine();

        }
    }
}

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

weasel7711 0 Junior Poster in Training · Answer 1 · 2011-04-12T20:22:08+00:00

Tell me, which values you want to get from this upper example? In each part are these 2 numbers:
1. the number on the right side of th ">" mark, and
2. the number which is just above the next ">" mark?
So in this example:
>25
233966
300156
89980
you would like to get 25 and 89980. Am I right?

Yes that is correct.

Mitja Bonca 557 Nearly a Posting Maven · Answer 2 · 2011-04-12T20:32:58+00:00

This is something you would like to have I guess:

//creating a generci list for storing the wanted values
            List<int> list = new List<int>();

            using (StreamReader sr = new StreamReader(@"C:\1\test25.txt"))
            {
                string line;
                int value;
                int counter = 0;
                while ((line = sr.ReadLine()) != null)
                {
                    if (line.Contains(">"))
                    {
                        value = Convert.ToInt32(line.Remove(0, 1));
                        list.Add(value);
                    }
                    if (line != " ")
                        counter++;
                    if (counter == 4)
                    {
                        list.Add(Convert.ToInt32(line));
                        counter = 0;
                    }
                }
            }

What the code does, it to check if the line contains the ">" char. If it does, it addes the number beside the char to the array (in my case I used a generic list, whihc is way more appropriate to use then an array). Then if goes row by row forward.
Every this part consist of 4 NOT EMPTY rows. So there is a counter which counts all not empty rows (if row is empty there is no counting done). When counter reachers 4 (4th not empty row in the part) it add the number to the list again, and resets the couner. And story goes one form beginning.

I hope its understanadable enough.

weasel7711 0 Junior Poster in Training · Answer 3 · 2011-04-12T20:41:56+00:00

Thank you both, they both look like great algorithms. I will play around with my code and use your suggestions and let you know when I have gotten it to work. Thank you.