I am cleaning up the code in an e-pub file. It was converted from a PDF and a number line break occur where not desired. I am using the following syntax to find the errand breaks </p>\n\n <p class="calibre2">[a-z]

For replace I wish to insert a space and whatever the character is found by the [a-z].

Is this feasible with only a regular expression replace, no script?

Recommended Answers

All 9 Replies

Member Avatar for LastMitch

Is this feasible with only a regular expression replace, no script?

What does it has to do with HTML or CSS?

This question you post is more related to code.

What a language are you using: PHP, Javascript or ASP.net or Java?

Is the code </p>\n\n <p class="calibre2"> not HTML? Last time I looked it is. I think it is relevant for others might want to learn about code clean-up using regular expressions.

As for language, editing e-pubs is a form of HTML and CSS in a very structured container file. So, I think this post is entirely relevant fot this area of the site.

Would you care to contribute how to answer my question?

Member Avatar for LastMitch

Last time I looked it is. I think it is relevant for others might want to learn about code clean-up using regular expressions.

When you mention regular expression it means this:

http://webcheatsheet.com/php/regular_expressions.php

As for language, editing e-pubs is a form of HTML and CSS in a very structured container file.

Tell me what is the issue? Are you getting backslash in your database?

Would you care to contribute how to answer my question?

Explain to me what are you doing with this

</p>\n\n <p class="calibre2"> 

You're over-complicating it. I am not talking a Regex in association with PHP or putting it in a database.

Is it possible in a good text editor that supports regular expressions to perform the replace substituting what is found (using the criteria in my original post) with a space and whatever letter is found?

</p>\n\n <p class="calibre2">[a-z] 

Regular expression search finds

</p>\n\n <p class="calibre2">e

I want to replace automatically with

 e (think of a space in front of the e-does not show up with code syntax)

Can this be done in the editor?

Member Avatar for LastMitch

Is it possible in a good text editor that supports regular expressions to perform the replace substituting what is found (using the criteria in my original post) with a space and whatever letter is found?

The answer is Yes.

Try this:

http://code.google.com/p/sigil/

It's a EPUB text editor. This editor is for editing EPUB files.

When you mention regular expression it suggested to me you need a code to remove the space so that's why I ask what language you are using.

Usually a simple code will remove that space base on which language you are using.

But for EPUB you need a text editor to do that.

EPUB is a file is similiar to mobi or pdf files

You can also take a look at this (to see how CSS works with EPUB editor):

http://epubbliss.com/word-to-epub-3-clean-up-filtered-html/

You need to play around with the editor to find the right CSS code to match & replace (substituting) letter / space.

Then what is the syntax to have the replace take the letter it finds and strip out the HTML around it?

I know about Sigil, and Calibre for that matter. It is conversion from a PDF to e-Pub using Calibre that created less than perfect results I am trying to clean up using Sigil and regular expressions.

So, my question is still relevant for this forum! At the heart of it I am trying to clean up HTML syntax using regular expressions.

What is the syntax to have the replace take the letter it finds and strip out the HTML around it?

What is the syntax to have the replace take the letter it finds and strip out the HTML around it?

A backreference, something like this:

</p>\n\n (<p class="calibre2">)([a-z]) 

The replace would be:

$1 $2

The actual syntax depends on the regex engine used by your editor.

I am using Sigil as my editor and it does not accept the search string as what you cited. I get a not found result.

Thanks for the try.

Member Avatar for LastMitch

What is the syntax to have the replace take the letter it finds and strip out the HTML around it?

Did you read what I wrote on my last sentence:

You need to play around with the editor to find the right CSS code to match & replace (substituting) letter / space.

I am being very directed. My impression is that you don't know how to do this. You can't keep saying things and going in circles around this issue which you are doing. You just want the answer that's all you want. You don't want to take time debugging this issue.

You know even pritaeas post a useful code snippet and guess what you said it doesn't work. To me it look right.

Bottom line is that you don't want to do the work.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.