I am new to C#
I want to replace Characters such as("' & < > ¡ ¢ £ ¤ ¥ ¦ § ¨ © ª « ¬ ² ³ ´ µ ¶ · ¸ and so on...) with entity numbers like (&#34; &#39; &#38; &#60; &#62; &#160; &#161; &#162; &#163; &#164; &#165; .........) in an html file In C# coding

for example

    <html>
    <head>
    <title>Sample</title>
    </head>
    <body>
    <div>
    <h2 class="chapternumber">1</h2>
    <p class="epigraph">Regarding the Great Depression. Milton Friedman, November 8, 2002</p>
    <p class="indent">'Cutler' plunged to his death Ànddy</p>
    <p class="indent">"Mr. Cutler" was reported  it.</p>
    </div>
    </body>
    </html> 

please help me

Edited 3 Years Ago by bullet_1

Managed to produce something like this:

&#60;html&#62;
&#60;head&#62;
&#60;title&#62;Sample&#60;&#47;title&#62;
&#60;&#47;head&#62;
&#60;body&#62;
&#60;div&#62;
&#60;h2 class&#61;&#34;chapternumber&#34;&#62;1&#60;&#47;h2&#62;
&#60;p class&#61;&#34;epigraph&#34;&#62;Regarding the Great Depression&#46; Milton Friedman&#44; November 8&#44; 2002&#60;&#47;p&#62;
&#60;p class&#61;&#34;indent&#34;&#62;&#39;Cutler&#39; plunged to his death &#65533;nddy&#60;&#47;p&#62;
&#60;p class&#61;&#34;indent&#34;&#62;&#34;Mr&#46; Cutler&#34; was reported it&#46;&#60;&#47;p&#62;
&#60;&#47;div&#62;
&#60;&#47;body&#62;
&#60;&#47;html&#62; 

using the following code:

string initContent = File.ReadAllText("test.txt");

int contentLength = initContent.Length;
Match m;

while ((m = Regex.Match(initContent, "[^a-zA-Z0-9\\s(&#\\d+;)-]")).Value != String.Empty)
    initContent = initContent.Remove(m.Index, 1).Insert(m.Index, string.Format("&#{0};",(int)m.Value[0])); 


File.WriteAllText("output.txt", initContent);

It's just an idea - it probably can be optimized and the RegEx pattern might still need some adjustments

Edited 3 Years Ago by TheApex

This article has been dead for over six months. Start a new discussion instead.