Hello all.

If there is one aspect of any language I cannot understand it is regex.

I have some strings I need to parse, of varying length and content, but there are patterns.

There 2 base types

the first always begins with %zn% and can be short or long and have multiple blocks on one line, but separated by what seems to be a SPACE char

for instance

%zn%pram-1%shop=mayfair%15.99$7% %zn%buggy-1%shop=humbug%7.0$9% etc.. and sometimes just one block per string.

I get to the %zn% part by using .IndexOf member of string class, but what I need is everything after %zn% right upto the space or end of string.

I'm totally hopeless with regex and hoping for a start and some schooling on why/how it works.

I.d really appreciate any help anyone can offer.

Suze

Recommended Answers

All 6 Replies

What seems to be a SPACE char is not always a SPACE char. Did you perform a hexdump on your strings to find that out?

Nope, I wouldnt know how :(

I First do this with the data

string text = asciiEncoding.GetString(Data);
                if (text.IndexOf("%zn%") > -1)
                {
                    string text2 = text.Substring(text.IndexOf("%zn%"));
                    Console.WriteLine(text2);
                }

So I'm assuming that the (space) is some form of whitespace chr.

Hi, Use \s for whitespace:

using System;
using System.Text.RegularExpressions;

namespace RegularExpression
{
    class Program
    {
        public static bool ValidateText(string regex, string text)
        {
            // Create a new Regex based on the specified regular expression.
            Regex r = new Regex(regex);
            // Test if the specified text matches the regular expression.
            return r.IsMatch(text);
        }

        public static void Main(string[] args)
        {
            Console.WriteLine("Test start whitespace: {0}", ValidateText(@"^[\w-]+\s+$", "this "));
            Console.WriteLine("Test end whitespace: {0}", ValidateText(@"^[\s+\w-]+$", " this"));
            Console.WriteLine("Test middle whitespace: {0}", ValidateText(@"^[\w-\s+\w-]+$", "this is"));

            Console.ReadLine();
        }
    }
}

Hope it helps

Thank you for the reply, I'm not sure how that can help me yet, but I'm working on it, cheers.

I think I've found that the char (or whatever it is), is not a space.

char[] delimit = new char[] { ' ' };

    foreach (string substr in text2.Split(delimit))
        {
             System.Console.WriteLine(substr);
        }

This splits part of the second part of my data where I know there are spaces, but not where, I'm expecting it here.

I think I viewed the hex (or binary) I dont know, but where I am trying to.. well split it essentially, it shows up as '00' so some sort of hex null terminator or something maybe?

Thanks again.

The second base part of my problem is to try and extract everything between these two tags

<img i='ind'> and </img> (looks like some sort of xml)

between those tags there are other tags (not of the same name), chars, special chars, digits, and whitespace.

I hate the feeling, but I feel way out of my depth when it comes to string manipulation.

'00' is character nul, the char with bte code 0 and used as a string terminator in C.
A space would show as '20'
You could represent it as (char)0

Thanks, I was thinking that but never new how to deal with it.

cheers all.

solved.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.