Hi,
I have a text file in which there are 10000 lines. There are 225 numerical values on each line and each numerical value is followed by a index number and a colon e.g., 1:0.021354 2:0.125432 3:451321 ...... 225:0.001254.
Now I want to remove this indexing. I know how to add indexing but could not perform to remove that. Please help me in solving this problem. Thanks
Here is the code for adding indexing.

static void Main(string[] args)
        {
            string[] files = Directory.GetFiles(@"F:\New folder", "*.txt", SearchOption.AllDirectories);

            StringBuilder strFile = new StringBuilder();

            foreach (string file in files)
            {
                using (StreamReader sr = new StreamReader(file))
                {
                    string s = Path.ChangeExtension(file, null);
                    strFile.AppendFormat(s.Substring(s.LastIndexOf(@"\") + 1) + " ");
              //      string s = Path.GetFileName("-1");
              //      strFile.AppendFormat(s + " ");
                    char[] charsToTrim = { ' ', '\r', '\n' };
                    string kkj = (sr.ReadToEnd().TrimEnd(charsToTrim));
                    string[] words = kkj.Split(' ');
                    string temp = "";

                    int iCounter = 1;

                    foreach (string word in words)
                    {
                        if (word.Contains("-0.000000"))
                        {
                            temp = word.Replace("-0.000000", "");

                            strFile.Append(temp);

                            iCounter++;
                            continue;
                        }

                        else if (word.Contains("0.000000"))
                        {
                            temp = word.Replace("0.000000", "");


                            strFile.Append(temp);

                            iCounter++;
                            continue;
                        }

                        strFile.Append(iCounter++ + ":" + word + " ");

                    }
                    strFile.AppendLine();
                }
            }
            using (StreamWriter outfile = new StreamWriter(@"F:\New folder (3)\-1.train"))
            {
                outfile.Write(strFile.ToString());
            }
        }

First, I wouldn't recommend using the ReadToEnd() method or storing everything in a StringBuilder before writing it to a file. In your example, it is not so bad, since you are essentially doing one line at a time, but when you go to read that file with 10000 lines, it will start consuming a significant amount of memory. Also, you shouldn't be opening and closing the same file over and over again. You should start with something like this:

using(StreamReader reader = new StreamReader(...))
{
    using(StreamWriter writer = new StreamWriter(...))
    {
        ...
    }
}

You could then use a loop to read each line like this:

for(string line = reader.ReadLine(); line != null; line = reader.ReadLine())
{
    ...
}

You could then use the Split() method to break up the line by spaces, use something like IndexOf(':');, and use the Substring() method to retrieve the data and index. You could a bit of error checking at this point (i.e. increment a counter to detect missing or out of order indices, checking to ensure the data is valid, etc.) if you wish. You could use a StringBuilder to construct each line, and write each line out to the StreamWriter.

Let me know if I need to clarify anything. Best of luck.

Thanks for replying and providing me a hint. I have tried according to your suggestion. Here is my try. Please help me to improve it.
I am new in c#, i know there are some mistakes, plz don't mind and help me to solve.

static void Main(string[] args)
        {
            StringBuilder strFile = new StringBuilder();
            using (StreamReader reader = new StreamReader(@"F:\xyz.txt"))
            {

                for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
                {

                    string[] words = line.Split(' ');
                    for (int i = 0; i < 225; i++)
                    {
                        int x = words[i].IndexOf(':');

                        string d = words[i].Substring(x);

                    }
                    strFile.AppendLine();
                }
                using (StreamWriter outfile = new StreamWriter(@"F:\xyz1.txt"))
                {
                    outfile.Write(strFile.ToString());
                }
            }
        }

Now i have successfully removed the indexing by using indexof and substring method.but colon is still there with every numeric value. Here is the code. How to remove the colon now?

static void Main(string[] args)
        {
           StringBuilder strFile = new StringBuilder();

           using (StreamReader reader = new StreamReader(@"F:\New folder (4)\xxx.txt"))
           {
               for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
               {
                   char[] charsToTrim = { ' ', '\r', '\n' };
                   string kkj = (line.TrimEnd(charsToTrim));
                   string[] words = kkj.Split(' ');

                   foreach (string word in words)
                   {
                       int x = word.IndexOf(':');

                       string d = word.Substring(x);

                       strFile.Append(d + " ");
                   }
                   strFile.AppendLine();
               }
           }
            using (StreamWriter outfile = new StreamWriter(@"F:\New folder (4)\xyz1.txt"))
            {
                outfile.Write(strFile.ToString());
            }
        }

You want the substring to start at the character following the colon. So you would add one to x:

string d = word.Substring(x + 1);

There was another point I was trying to make. Constantly adding to the StringBuilder is going to consume a lot of memory. You can nest your using statements, so you can write out each line of data as you iterate through the file:

static void Main(string[] args)
{
    StringBuilder strFile = new StringBuilder();
    using (StreamReader reader = new StreamReader(@"F:\New folder (4)\xxx.txt"))
    {
        using (StreamWriter outfile = new StreamWriter(@"F:\New folder (4)\xyz1.txt"))
            for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
            {
                char[] charsToTrim = { ' ', '\r', '\n' };
                string kkj = (line.TrimEnd(charsToTrim));
                string[] words = kkj.Split(' ');
                foreach (string word in words)
                {
                    int x = word.IndexOf(':');
                    string d = word.Substring(x);
                    strFile.Append(d + " ");
                }
                outfile.WriteLine(strFile.ToString());
                strFile.Clear();
            }
        }
    }
}

I just changed it so that it would write out the StringBuilder data to the file, and clear the StringBuilder. You could also directly write to the file, and remove the StringBuilder completely, but at least this way, we wouldn't have to consume so much memory.

Just as an aside, the StringBuilder is a wrapper for a character array (or possibley multiple character arrays). This has a limited capacity, and once you pass it, it either has to allocate another chunk of memory, or reallocate its current array (i.e. allocate and copy everything to the new array). Given the amount of data, it would do this several times in your program, which can be very expensive in terms of CPU cycles, in addition to memory consumption. When you use the Clear() method, it does not release the memory it already has allocated (as in my example above), so there would be far less allocation requests after the first couple iterations.

Thank you so much for your kind help.

Edited 4 Years Ago by jumboora

Please tell me one more thing, how to process multiple text files in a directory and storing result in only one text file.
I tried to change the above code, but it has some problem.

 static void Main(string[] args)
        {
            string[] files = Directory.GetFiles(@"F:\New folder (4)\", "*.txt", SearchOption.AllDirectories);

            StringBuilder strFile = new StringBuilder();


            foreach (string fri in files)
            {

                //    using (StreamReader reader = new StreamReader(@"F:\Experiment 1\CMTRAINNEW\*.txt"))
                using (StreamReader reader = new StreamReader(fri))
                {
                    using (StreamWriter outfile = new StreamWriter(@"F:\Experiment 1\SVMTrainFormatFilesLF\+1.txt"))

                        for (string line = reader.ReadLine(); line != null; line = reader.ReadLine())
                        {

                            string[] words = kkj.Split(' ');
                            string temp = "";
                            int iCounter = 1;

                            foreach (string word in words)
                            {
                              strFile.Append(iCounter++ + ":" + word + " ");
                                }
                            strFile.AppendLine();                            
                            outfile.Write(strFile.ToString());
                         //   strFile.Clear();
                        }

                }


            }

        }

What problem are you having?

Couple things I see right away:

  • You'll want to keep clearing the StringBuilder, otherwise you're going to keep appending everything you previously appended.
  • The using block for StreamWriter should be nested outside the foreach block. You don't want to keep opening and closing the file for every iteration.

Maybe I should quickly explain the using block. It is used to dispose of unmanaged resources. In other words, resources not managed by the .NET framework, like things the Operating System manages (such as file handles, network resources, etc.). The class handles this by implementing the IDispoable interface, which has a method called Dispose(). So, something like this:

using(StreamWriter writer = new StreamWriter(fileName))
{
    ...
}

is essentially equivalent to:

StreamWriter writer = new StreamWriter(fileName);
try
{
    ...
}
finally
{
    writer.Dispose();
}

Note that it still attempts to call Dispose(), even when an Exception is thrown. The point I'm trying to make is, when Dispose() is called on a StreamWriter, the file is flushed and closed. There are quite a few things that happen here, that can make this an expensive call.

So, bringing it back around, you should almost always keep a file open for as long as you need it. There are cases where you can't (or shouldn't) keep files open for very long, but I don't think it applies in this situation.

Please tell me how to skip (not trim or remove) first word of each line in a text file?
Actually i want to put indexing on each word except first word of each line.(I want to write that first word on each line but without indexing).
Thanks

Edited 4 Years Ago by jumboora

Yeah, sure... It would be easier to use a for-loop instead of a foreach-loop. You would handle the first item before the loop, like this:

if (words.Length > 0)
    strFile.Append(word[0] + " ");

Then, you would change the foreach-loop to a for-loop:

for (int i = 1; i < words.Length; i++)
    strFile.Append(i + ":" + words[i] + " ");

You could do the first part inside the for-loop (or the foreach-loop), but it would add a condition that would have to be checked on every iteration, which would only evaluate differently on the first iteration.

This question has already been answered. Start a new discussion instead.