Hello,
i have made a code that strips off html tags using regex but i have issue.
Now in file i have multiple tabs.

example:

nesa<tab><tab>pera<tab><tab><tab><tab>nn<tab><tab><tab><tab><tab><tab>kkn

how can i remove extra tabs and keep only one tab ?
example:

nesa<tab>pera<tab>n<tab>kkn

Thanks in advance

Recommended Answers

All 9 Replies

Specifically for <tab>

Match: (<tab>){2,20}
Replace: <tab>

This will match occurances of <tab> between 2 and 20 in a row, can adjust as needed. Works on your provided example though.

Input: nesa<tab><tab>pera<tab><tab><tab><tab>nn<tab><tab><tab><tab><tab><tab>kkn
Output: nesa<tab>pera<tab>nn<tab>kkn

Can you please post full code of this?
Thanks in advance

        string inputString = "nesa<tab><tab>pera<tab><tab><tab><tab>nn<tab><tab><tab><tab><tab><tab>kkn";
        string regexMatch = "(<tab>){2,20}";
        string regexReplace = "<tab>";
        string outputString;

        outputString = Regex.Replace(inputString, regexMatch, regexReplace);

Why limit yourself to 20? You could use either of these:

  • (<tab>){2,}
  • (<tab>(?:<tab>)+)

See the Quick Reference from the MSDN for more info. Both are simple ways to capture two or more tabs. Also, are you looking for the string "<tab>" or tab characters? You would use the character escape \t for tabs, and also note there's a \s to match all white-space characters should you be so inclined.

Unisng method by MikeyIsMe gives me

Output: nesa<tab>pera<tab>nn<tab>kkn

But when i try to replace <tab> (its real tab so \t ) with , as delimiter i get
nesa,,pera,,,, nn,kkn,,,,
since there are many \t in input string
Even if i put "(<tab>){2,20000}"; it will give me ,,,, and not ,
Will try the + method today

@nmaillet, Didn't know you could leave the second part of the count specifier blank for any amount :)

@nesa24casa - the fix for my solution with your updated requirements using nmaillet's capturing. Also made it so it captures both <tab> and \t for the sake of covering all options :) simply remove |<tab> from the regex if you dont need it covered

string inputString = "nesa\t\tpera<tab><tab><tab><tab>nn\t\t\t\t\t\tkkn";
            string regexMatch = "(\t|<tab>){2,}";
            string regexReplace = ",";

Thanks Mikey will try later today and post

Thanks
Solved

Your welcome, mark the post as solved please :)

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.