Splitting a string, delimiters included

ddanbe 0 Tallied Votes 1K Views Share

This piece of code is far from perfect, but it works!
It produces a list of substrings (consisting of digits and letters) and their delimiters.
If it is practicaly a sin to manipulate the index of a for loop, then I'm a sinner.
If two delimiters follow each other, an empty string is produced between the two.
If anyone can improve this, please do.
Perhaps solutions with Regex or LINQ are possible.
Or perhaps the great Ketsuekiame can again come up with a one liner? ;)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace StringSplit
{
    class Program
    {
        static void Main(string[] args)
        {
            string str = "abc,123.xyz;praise;end,file,clear";
            List<string> myText = SplitWithDelimiters(str);
            foreach (var item in myText)
            {
                Console.WriteLine(" item: {0}", item);
            }
            Console.ReadKey();
        }

        static List<string> SplitWithDelimiters(string str)
        {
            List<string> SplittedStringList = new List<string>();
            for (int i = 0; i < str.Length; i++)
            {
                string temp = string.Empty;
                int index = i;
                bool last = false;
                while (char.IsLetterOrDigit(str[index]))
                {
                    temp = temp + str[index];
                    if (index < str.Length - 1)
                    {
                        index++;
                    }
                    else
                    {
                        last = true;
                        i = str.Length;
                        break;
                    }
                }
                SplittedStringList.Add(temp);
                if (!last)
                {
                    i = index;
                    SplittedStringList.Add(str[index].ToString());
                }
            }
            return SplittedStringList;
        }
    }
}
Ketsuekiame 860 Master Poster Featured Poster

Was that a challenge? ;)

EDIT:

Done :)

using Common;
using System;
using System.Collections.Generic;
using System.Linq;

namespace StringSplitTest
{
    public class MainClass : IExecutionClass
    {
        const string str = "abc,123.xyz;praise;end,file,clear";
        public bool Execute(params object[] programData)
        {
            SplitStringByNonAlphanumerics(str).ForEach((s) => Console.WriteLine(s));

            return true;
        }

        public List<string> SplitStringByNonAlphanumerics(string stringToSplit)
        {
            return stringToSplit.Split(stringToSplit.Where(c => !char.IsLetterOrDigit(c), StringSplitOptions.RemoveEmptyEntries).ToArray()).ToList();
        }
    }
}

Note: The Common library contains interfaces for my test framework and the usage here is so I could inherit the IExecutionClass

Everything else is included in the standard framework.

commented: Nice! +14
deceptikon 1,790 Code Sniper Team Colleague Featured Poster

Done :)

The result set must include delimiters, not remove them. ;) Try this little one-liner:

var parts = Regex.Split("abc,123.xyz;praise;end,file,clear", @"([\.,;])");
ddanbe commented: Nice! +14
ddanbe 2,724 Professional Procrastinator Featured Poster

Great work guys!
But I guess we now can no longer speak of the great Ketsuekiame(look here)but also of the great decepticon!Who clearly wins here.
Wish I knew more about LINQ and Regex. Buzy learning, so your posts are a great help. Thanks!

kplcjl commented: Unless you are on drugs, that's "busy" learning +3
Ketsuekiame 860 Master Poster Featured Poster

Damn, I missed that part about keeping the delimiters in. That's actually an easy change to make, but not worth it.

Regex is far superior at this task, but I know little Regex at all :)

@Deceptikon: Out of interest, what's the performance difference between Regex and the LINQ statements I create?

deceptikon 1,790 Code Sniper Team Colleague Featured Poster

@Deceptikon: Out of interest, what's the performance difference between Regex and the LINQ statements I create?

That's something I'd profile if it worried me, which it doesn't in this case. My guess is that that LINQ would win out most of the time, but with such a simple pattern and when the string is longer, the expression building logic of LINQ could overwhelm the setup of a regex parse.

The methods of solving the problem are different enough to make a comparison difficult though. ;)

kplcjl 17 Junior Poster

I didn't want to read the comments because they would probably come up with good answers and I didn't want to spoil my thoughts before trying something.
There are several problems with this code. First off, strings are immutable. Every time you alter a string you add a new record into memory, when you reassign a variable to the new value the old value is put into garbage collection. Read up on interning, it might help you understand it better. Also read up on StringBuilder, it is designed to efficently do character math.
I messed up by not understanding what you were doing so it was good that I kept your original code as a control. Added one ; after a ;. Whoops had to fix that too. Anyway here is my alternate solution.

        static List<string> SplitWithDelimitersN(string str)
        {
            List<string> SplittedStringList = new List<string>();
            int len = 0;
            string temp = string.Empty;
            for (int i = 0; i < str.Length; i++)
            {
                if (char.IsLetterOrDigit(str[i]))
                    len++;
                else
                {
                    SplittedStringList.Add(str.Substring(i-len,len));
                    SplittedStringList.Add(str[i].ToString());
                    len=0;
                }
            }
            return SplittedStringList;
        }
kplcjl 17 Junior Poster

Whoops declare temp and never use it!

kplcjl 17 Junior Poster

ddanbe: thanks for the endorsement. Caused me to relook at my code. Wait... That shouldn't work! Yep, didn't notice the last field isn't included in the List and it didn't work correctly. After the loop finishes, add:

            if (len > 0)
                SplittedStringList.Add(str.Substring(str.Length - ++len, --len));
xrjf 213 Posting Whiz Premium Member

If you're looking for concision, you may code something like:

        static void Main(string[] args)
        {
            string str = "abc,123.xyz;praise;end,file,clear";
            Console.WriteLine(String.Join("\r\n", 
                Regex.Split(str, @"(\,|\.|\;)+")));
            Console.ReadLine();
        }
ddanbe commented: Hey, nice tip! +15
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.