I have this string, it looks like this

1.2.3.4 sometext
9.8.7.6 othertext
5.6.7.8 moretext

sometext has a space before new line and moretext has a tab at the end (and maybe a new line)

I need to remove white space only at end of each line but excluding new line char.

here is what I have

output = "1.2.3.4 sometext \r\n9.8.7.6 othertext\r\n5.6.7.8 moretext\t";
output = Regex.Replace(output, @"^\s+|\s+$", "", RegexOptions.Multiline);
this.textBox1.Text = output;

This is almost there but it removes new line too, and leaves me with

1.2.3.4 sometext9.8.7.6 othertext5.6.7.8 moretext

I'm hoping there is a regex practitioner who might help me out.

Thank you for taking the time to read.

(edit) cannot post code or format text correctly for some reason.
My original string does not contain empty lines.

Recommended Answers

All 33 Replies

Perhaps:

output = Regex.Replace(output, @"(.*?)[ \t]+$", "$1", RegexOptions.Multiline);

It searches for spaces and tabs before the newline and replaces it with everything before it.

Thanks pritaeas, your code leaves me with the space still after "sometext " but removes tab.

(edit) I think I understand what your pattern is doing, so I don't understand why its not removing space.

But in reality I'd need to remove any whitespace, which I suppose could be somethings other than tabs and spaces.

It's a hosts file I'm parsing.

I ran this:

string output = "1.2.3.4 sometext \r\n9.8.7.6 othertext\r\n5.6.7.8 moretext\t";
output = Regex.Replace(output, @"(.*?)[ \t]+$", "$1", RegexOptions.Multiline);

with the debugger the tab is clearly gone. VS2015 community console app, .NET 4.5.2

Yes, tab is removed but space is not VS 2010 targeting .net 3.5

I'll try in VS 2015 community.

Same in VS2015 .net 4.5.2 and 3.5
I think this works, but not sure how safe it is, was just a guess as I'm very new and poor with regex.

output = Regex.Replace(output, @"^\s+|\s+$", "\r\n", RegexOptions.Multiline);

(edit) well actually it works in vs2010 but in vs2015 I get extra empty lines between my actual lines when written to a .txt file.

Now I'm just getting confused. My original pattern (scrounged from the internet) appears to work just fine in VS2015

string output = "1.2.3.4 sometext \r\n9.8.7.6 othertext\r\n5.6.7.8 moretext\t";
output = Regex.Replace(output, @"^\s+|\s+$", "", RegexOptions.Multiline);

Are you sure, because that last one also removes the \r. If that's what you want then it's fine.

No, I'm not so sure now.
My PC has been on for about 6 days, I think I'll restart it and see how I go from there.

It did not, had original problem, this however appears to work fine.

string path_to_original = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + @"regexoriginal.txt";
string path_to_parsed = Environment.GetFolderPath(Environment.SpecialFolder.Desktop) + @"regexparsed.txt";

using (StreamReader streamReader = new StreamReader(path_to_original))
                {
                    string text = streamReader.ReadToEnd();
                    string output = Regex.Replace(text, @"s+$", "r", RegexOptions.Multiline);

                    using (StreamWriter streamWriter = new StreamWriter(path_to_parsed))
                    {
                        streamWriter.Write(output);
                    }
                }   

(edit) for some reason will not allow me to post the contents of text file
http://textuploader.com/atdf8

I believe this is solved now, thank you kindly for your time and input pritaeas.

Ahh man I missed the Regex fun (LOL I am addicted to using Regex, no joke a client gives me grieff for it)

Don't let a solved thread stop you.
For me parsing hosts file was no picnic, you might have fun though.

Remember, it can have comments, and multiple aliases for each IP / Domain.

If you could do it with one pattern one pass you really would be the emperor of Regex. It took me 6 different patterns and passes :(

I take that as a challange. I'll dig into it when I get home from work tonight, and see if I can write up a Regex Pattern

commented: Best of luck +8

Give this one a try

output = Regex.Replace(output, @"^([\d]+\.[\d]+\.[\d]+\.[\d]+[\s]+[^$\s]+)([\s]*)($)", x => (x.Groups [1].Value + x.Groups [3].Value), RegexOptions.Multiline);

It may not be the most prettiest, but some quick test show it should work. For your sake this actually checks all 3 lines of yours, making sure to remove any trailing white space after stuff like "moretext", or any other values, but preserves your new line. I was able to add an additional line to my test and it worked (also, I don't know if this is C#'s doing, but it didn't want to keep the last lines \n, unless I added more data on afterwords).

So yeah this should work past the 3 lines

Nice pattern. It works for the example text above, but unfortunately not on a real hosts file.

Ohh this is for a host file? Remind me what those are for? I wasn't sure this was for something specific or not.

That being said fill me in, I'm not done if there's room to improve

Okay I remember what a hosts file is. So let me ask you this. What is the purpose of the regex, why do you want to trim the white space but not the return lines?

Do you need an element from it? Or? (come on finding more excuses to write regex hahaha)

A hosts file should only have one redirect IP on each line, but I've seen some people put two like so.....

......Sorry, forum will not allow me to post some things so the rest of my post is here http://textuploader.com/aowvp

One way I could see this being easier is for a pattern to capture into an array, everything that begins with an IP address, up until another IP address or new line.

I'd be totally lost on that though.

Ohh that? Shoot that's what I originally did when I wrote this regex. Let me clean it up a little now that I know what I am working with and give you a working solution.

Alright here's a quick piece of code I whipped up. It's two classes to make it easier to return the data. I tested it with the hosts file from that link, downloaded and saved it as a .txt file and read it in.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace HostsFileParser1
{
    /// <summary>
    /// Holds a single host record
    /// </summary>
    public class Host
    {
        /// <summary>
        /// Initializes a new Host
        /// </summary>
        /// <param name="IPAddress">IP Address</param>
        /// <param name="Hostname">Hostname</param>
        public Host (string IPAddress, string Hostname)
        {
            this.IPAddress = IPAddress;
            this.Hostname = Hostname;
        }

        /// <summary>
        /// Get or Set the IP Address
        /// </summary>
        public string IPAddress
        {
            get;
            set;
        }

        /// <summary>
        /// Get or Set the Hostname
        /// </summary>
        public string Hostname
        {
            get;
            set;
        }
    }
}



using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;

namespace HostsFileParser1
{
    public class Program
    {
        public static void Main (string [] args)
        {
            List<Host> Results = ParseHostFile("hosts.txt");
        }

        /// <summary>
        /// Parses out a hosts file, returning the results in a collection of Hosts
        /// </summary>
        /// <param name="Filename">Filename of the hosts file</param>
        /// <returns>Returns a collection Host records</returns>
        public static List<Host> ParseHostFile (string Filename)
        {
            List<Host> Hosts = new List<Host>();

            if (File.Exists(Filename))
            {
                string HostsFile = File.ReadAllText(Filename);

                MatchCollection matches = Regex.Matches(HostsFile, @"^[\s]*(?<IP>[\d\.]+)[\s]+(?<Hostname>[A-Za-z0-9\-\.]+)", RegexOptions.Multiline);
                foreach (Match match in matches)
                {
                    Hosts.Add(new Host(match.Groups ["IP"].Value, match.Groups ["Hostname"].Value));
                }
            }

            return Hosts;
        }
    }
}

And there you go. Should parse out the data for you. I found an article that say hostnames can only be alphanumerical, a dash ('-'), or a dot ('.'), so I used that and then IPs of course are only dots and numbers.

Hopefully that works for you

Just to acknowledge your posts, I just have not had time for coding for day or two..

Thanks.

Haven't coded for a day or two? Blasphemy! HAHAHAH. Anyway, let me know if that works for you

Hi, finally got some spare time :)

Your code works okay, but does not pick up aliases or the second entry on same line.

public static void Main(string[] args)
        {
            List<Host> Results = ParseHostFile("hosts.txt");
            foreach (Host host in Results)
            {
                Debug.Write(host.IPAddress + " - " + host.Hostname + "\n");
            }
        }

Here is the output...

127.0.0.1 - localhost
127.0.0.1 - www.affiliatewindow.com
127.0.0.1 - activate.adobe.com
127.0.0.1 - static.eplayer.performgroup.co.uk
127.0.0.1 - eplayer.performgroup.com
127.0.0.1 - www.performgroup.com

Here is what I'd want to have...

127.0.0.1 - localhost me
127.0.0.1 - www.affiliatewindow.com
127.0.0.1 - activate.adobe.com
127.0.0.1 - static.eplayer.performgroup.co.uk alias
127.0.0.1 - eplayer.performgroup.com
127.0.0.1 - bactivate.adobe.com
127.0.0.1 - www.performgroup.com

I thought host files you could only have one item on a line (and wasn't sure about the alias if you wanted to pull those back, that and couldn't find the exact syntax an alias had to follow)

Do alias only have to be followed by a space? (I'll take a look at it tonight when I land, should work you up a fix real quick).

I believe an alias pretty much follows the domain rules, but it can also be another domain or just an alias name without the top level .dom. There can also be as many as you like on the line.

127.0.0.1 domain1.com domain2.net domain3.org domain26.edu some_alias

is perfectly valid.

You are right that only one IP redirect should be on each line, and that is something about it that if present I want to fix.

I appreciate your interest, and time.

Okay making a note for myself here (I am actaully on a business trip and left the code at home).

There can be multiple alias, all seem to be seperated by a space. However, once a '#' appears, it counts as a comment (for the rest of the line)

Also, I love regex, and am good at it, so if I can help others with it, I do.

I've noticed you are good at it :)

Where you see a space, it can be any amount of whitespace, including tabs.

Alright, this should hopefully work correctly. I ran a few tests and it seemed good (felt like an idiot, totally over complicated it originally, then rewrote the pattern)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace HostsFileParser1
{
    /// <summary>
    /// Holds a single host record
    /// </summary>
    public class Host
    {
        /// <summary>
        /// Initializes a new Host
        /// </summary>
        /// <param name="IPAddress">IP Address</param>
        /// <param name="Hostname">Hostname</param>
        /// <param name="Aliases">Collection of Alias names (set to null to exclude)</param>
        public Host (string IPAddress, string Hostname, IEnumerable<string> Aliases)
        {
            this.IPAddress = IPAddress;
            this.Hostname = Hostname;
            this.Aliases = new List<string>();

            if (Aliases != null)
            {
                this.Aliases.AddRange(Aliases);
            }
        }

        /// <summary>
        /// Get or Set the IP Address
        /// </summary>
        public string IPAddress
        {
            get;
            set;
        }

        /// <summary>
        /// Get or Set the Hostname
        /// </summary>
        public string Hostname
        {
            get;
            set;
        }

        /// <summary>
        /// Get or Set the collection of Aliases 
        /// </summary>
        public List<string> Aliases
        {
            get;
            set;
        }
    }
}

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.IO;

namespace HostsFileParser1
{
    public class Program
    {
        public static void Main (string [] args)
        {
            List<Host> Results = ParseHostFile("hosts.txt");
        }

        /// <summary>
        /// Parses out a hosts file, returning the results in a collection of Hosts
        /// </summary>
        /// <param name="Filename">Filename of the hosts file</param>
        /// <returns>Returns a collection Host records</returns>
        public static List<Host> ParseHostFile (string Filename)
        {
            List<Host> Hosts = new List<Host>();

            if (File.Exists(Filename))
            {
                string HostsFile = File.ReadAllText(Filename);

                MatchCollection matches = Regex.Matches(HostsFile, @"^[\s]*(?<IP>[\d\.]+)[\s]+(?<Hostname>[A-Za-z0-9\-\.]+)(?<Aliases>([ \t]+[A-Za-z0-9\-\.]+)+)?", RegexOptions.Multiline);
                foreach (Match match in matches)
                {
                    Hosts.Add(new Host(match.Groups ["IP"].Value, match.Groups ["Hostname"].Value, match.Groups ["Aliases"].Value.Split(new char [] { ' ', '\t' }, StringSplitOptions.RemoveEmptyEntries)));
                }
            }

            return Hosts;
        }
    }
}
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.