0

Hello,
i have made a code that strips off html tags using regex but i have issue.
Now in file i have multiple tabs.

example:

nesa<tab><tab>pera<tab><tab><tab><tab>nn<tab><tab><tab><tab><tab><tab>kkn

how can i remove extra tabs and keep only one tab ?
example:

nesa<tab>pera<tab>n<tab>kkn

Thanks in advance

3
Contributors
9
Replies
11
Views
4 Years
Discussion Span
Last Post by Mike Askew
Featured Replies
  • Specifically for `<tab>` Match: `(<tab>){2,20}` Replace: `<tab>` This will match occurances of `<tab>` between 2 and 20 in a row, can adjust as needed. Works on your provided example though. Input: `nesa<tab><tab>pera<tab><tab><tab><tab>nn<tab><tab><tab><tab><tab><tab>kkn` Output: `nesa<tab>pera<tab>nn<tab>kkn` Read More

  • string inputString = "nesa<tab><tab>pera<tab><tab><tab><tab>nn<tab><tab><tab><tab><tab><tab>kkn"; string regexMatch = "(<tab>){2,20}"; string regexReplace = "<tab>"; string outputString; outputString = Regex.Replace(inputString, regexMatch, regexReplace); Read More

  • Why limit yourself to 20? You could use either of these: * `(<tab>){2,}` * `(<tab>(?:<tab>)+)` See the [Quick Reference](http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.100).aspx) from the MSDN for more info. Both are simple ways to capture two or more tabs. Also, are you looking for the string "<tab>" or tab characters? You would use the … Read More

  • @nmaillet, Didn't know you could leave the second part of the count specifier blank for any amount :) @nesa24casa - the fix for my solution with your updated requirements using nmaillet's capturing. Also made it so it captures both `<tab>` and `\t` for the sake of covering all options :) … Read More

  • Your welcome, mark the post as solved please :) Read More

1

Specifically for <tab>

Match: (<tab>){2,20}
Replace: <tab>

This will match occurances of <tab> between 2 and 20 in a row, can adjust as needed. Works on your provided example though.

Input: nesa<tab><tab>pera<tab><tab><tab><tab>nn<tab><tab><tab><tab><tab><tab>kkn
Output: nesa<tab>pera<tab>nn<tab>kkn

Edited by Mike Askew

1
        string inputString = "nesa<tab><tab>pera<tab><tab><tab><tab>nn<tab><tab><tab><tab><tab><tab>kkn";
        string regexMatch = "(<tab>){2,20}";
        string regexReplace = "<tab>";
        string outputString;

        outputString = Regex.Replace(inputString, regexMatch, regexReplace);
1

Why limit yourself to 20? You could use either of these:

  • (<tab>){2,}
  • (<tab>(?:<tab>)+)

See the Quick Reference from the MSDN for more info. Both are simple ways to capture two or more tabs. Also, are you looking for the string "<tab>" or tab characters? You would use the character escape \t for tabs, and also note there's a \s to match all white-space characters should you be so inclined.

0

Unisng method by MikeyIsMe gives me

Output: nesa<tab>pera<tab>nn<tab>kkn

But when i try to replace <tab> (its real tab so \t ) with , as delimiter i get
nesa,,pera,,,, nn,kkn,,,,
since there are many \t in input string
Even if i put "(<tab>){2,20000}"; it will give me ,,,, and not ,
Will try the + method today

1

@nmaillet, Didn't know you could leave the second part of the count specifier blank for any amount :)

@nesa24casa - the fix for my solution with your updated requirements using nmaillet's capturing. Also made it so it captures both <tab> and \t for the sake of covering all options :) simply remove |<tab> from the regex if you dont need it covered

string inputString = "nesa\t\tpera<tab><tab><tab><tab>nn\t\t\t\t\t\tkkn";
            string regexMatch = "(\t|<tab>){2,}";
            string regexReplace = ",";

Edited by Mike Askew

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.