Hi,

I am developing a Winform app in 2010 Express. I want to be able to srip out superscript and subscript characters out of the strings that I am cutting and pasting from web pages.

For example:

1 I, Nephi, having been aborn of bgoodly cparents, therefore I was dtaught somewhat in all the learning of my father; and having seen many eafflictions  in the course of my days, nevertheless, having been highly favored of the Lord in all my days; yea, having had a great knowledge of the goodness and the mysteries of God, therefore I make a frecord of my proceedings in my days.

The 'a' in aborn and the 'b' in bgoodly and the 'c' in cparents are superscript characters.

I am using regexp to get rid of any excess white space:

public void FormatText()
        {
            string rtbTemp; // RichTextBox contents placed in this variable.
            Regex r = new Regex(@"\s+");
            IDataObject iData = Clipboard.GetDataObject();
            //try catch here ... error trapping for Windows Clipboard errors
            try
            {
                if (iData.GetDataPresent(DataFormats.Text))
                {
                    rtbTemp = (String)iData.GetData(DataFormats.Text);
                    rtbVerse.Text = r.Replace((rtbTemp.Trim()), @" ");
                    ClipboardOk = true;
                }
                else
                {
                    ClipboardOk = false;
                }
            }
            catch (Exception ex)
            {
                MessageBox.Show(ex.Message);
            }
        }

but I'm not that familiar with regexp to know if it can handle detecting superscript/subscript. Has anyone had any experience with stripping superscript before?

How can you tell that the a in aborn is not part of the word, other than your knowledge of the English language?

How can you tell that the a in aborn is not part of the word, other than your knowledge of the English language?

Here is a link of the html that I am cutting and pasting from:

http://scriptures.lds.org/en/1_ne/1

I have an app developed that eliminates the white spaces and puts the verse into a MySQL database. I have been trying to get an understanding of regex to be able to add to my FormatText() function, but so far I haven't been able to get a clear grasp on it.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.