Replace using POSIX regular expressions

TkTkorrovi 0 Tallied Votes 1K Views Share

Replaces every occasion of the pattern, or only the first occasion if there is a subexpression, between \( and \), anywhere in the regular expression, as repeated replace is not what one would expect in that case. The string size is restricted in POSIX regular expressions to the size of the int, approximately 32 kbytes, but otherwise such replace should be enough for anything necessary in real life.

/* replace using posix regular expressions */
#include <stdio.h>
#include <string.h>
#include <regex.h>

int rreplace (char *buf, int size, regex_t *re, char *rp)
{
    char *pos;
    int sub, so, n;
    regmatch_t pmatch [10]; /* regoff_t is int so size is int */

    if (regexec (re, buf, 10, pmatch, 0)) return 0;
    for (pos = rp; *pos; pos++)
        if (*pos == '\\' && *(pos + 1) > '0' && *(pos + 1) <= '9') {
            so = pmatch [*(pos + 1) - 48].rm_so;
            n = pmatch [*(pos + 1) - 48].rm_eo - so;
            if (so < 0 || strlen (rp) + n - 1 > size) return 1;
            memmove (pos + n, pos + 2, strlen (pos) - 1);
            memmove (pos, buf + so, n);
            pos = pos + n - 2;
        }
    sub = pmatch [1].rm_so; /* no repeated replace when sub >= 0 */
    for (pos = buf; !regexec (re, pos, 1, pmatch, 0); ) {
        n = pmatch [0].rm_eo - pmatch [0].rm_so;
        pos += pmatch [0].rm_so;
        if (strlen (buf) - n + strlen (rp) + 1 > size) return 1;
        memmove (pos + strlen (rp), pos + n, strlen (pos) - n + 1);
        memmove (pos, rp, strlen (rp));
        pos += strlen (rp);
        if (sub >= 0) break;
    }
    return 0;
}

int main (int argc, char **argv)
{
    char buf [FILENAME_MAX], rp [FILENAME_MAX];
    regex_t re;

    if (argc < 2) return 1;
    if (regcomp (&re, argv [1], REG_ICASE)) goto err;
    for (; fgets (buf, FILENAME_MAX, stdin); printf ("%s", buf))
        if (rreplace (buf, FILENAME_MAX, &re, strcpy (rp, argv [2])))
            goto err;
    regfree (&re);
    return 0;
err:    regfree (&re);
    return 1;
}
TkTkorrovi 69 Junior Poster

If you want to test it, just run it in some bash-like shell, giving the regular expressions and the replace pattern as arguments, preferably in the form $'expression' if you want to use escape sequences such as \t and \n. Then enter the text, and if your expression is correct, the the replaced line will appear, end with ctrl-d. In the basic regular expression you can use . for any character, range or set like [a-z0-9;-] or inverted like [^ ], and * or \{n,m\} for repeated character, range or subexpression. Subexpression is between \( and \), and you can refer to it with \n both in expression and replace pattern, where n is the number of subexpression, like in \1. Remember about *, that regular expression is evaluated from left to right, so the construct like a* must follow some other characters, and cannot be in the beginning of the expression, because * means none or more, and construct like a* alone doesn't determine any particular place.

TkTkorrovi 69 Junior Poster

I tested what a* does, in the beginning of the expression, in repeated replace. And guess what it does? It fills the string with replace pattern until it is full, the code finds it and then exits. But it's ok to use it in the beginning of the expression, if the search is not repeated. I'm really sorry for writing so many comments.

TkTkorrovi 69 Junior Poster

Oh i'm sorry, in accordance with my limits.h the biggest size of the string is 2147483647, some 2 GB so i guess large enough for most things...

xaviv 0 Newbie Poster

There is an error in line 26. It should be:

...
      if (strlen (buf) - n + strlen (rp) > size) return 1;
...
Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.