Hi,
I need to concatanate lines based on regex. Lines to be concatanated are scattered. All lines begin with number$number$number$number$sentences. There is nothing to mark the end of sentence, only the beginning. Here is an example. I want to rewrite this one

2 $5$233$ check big cat if it have not eaten all the meat
3 $5$233$ check big cat if it have not eaten all the meat 
        <a href=""http://example.com"">An Example duh! </a>
        <a href=""http://example.com"">An Example duh! </a> 
2 $5$233$ check big cat if it have not eaten all the meat 
check big cat if it have not eaten all the meat
<a href=""http://example.com"">An Example duh! </a>
2 $5$233$ check big cat if it have not eaten all the meat

into this one

2 $5$233$ check big cat if it have not eaten all the meat
3 $5$233$ check big cat if it have not eaten all the meat <a href=""http://example.com"">An Example duh! </a> <a href=""http://example.com"">An Example duh! </a> 
2 $5$233$ check big cat if it have not eaten all the meat  check big cat if it have not eaten all the meat <a href=""http://example.com"">An Example duh! </a>
2 $5$233$ check big cat if it have not eaten all the meat

Since I have been while out of python, I'm running out of ideas and I need your help
Thanks

Edited 5 Years Ago by Stefano Mtangoo: n/a

"copyright", "credits" or "license()" for more information.
>>> data = """2 $5$233$ check big cat if it have not eaten all the meat
3 $5$233$ check big cat if it have not eaten all the meat 
        <a href=""http://example.com"">An Example duh! </a>
        <a href=""http://example.com"">An Example duh! </a> 
2 $5$233$ check big cat if it have not eaten all the meat 
check big cat if it have not eaten all the meat
<a href=""http://example.com"">An Example duh! </a>
2 $5$233$ check big cat if it have not eaten all the meat"""
>>> re.findall(r'(\d* \$\d*\d*.*) ', data)
['2 $5$233$ check big cat if it have not eaten all the', '3 $5$233$ check big cat if it have not eaten all the meat', '2 $5$233$ check big cat if it have not eaten all the meat', '2 $5$233$ check big cat if it have not eaten all the']
Comments
Great PyTony
"copyright", "credits" or "license()" for more information.
>>> data = """2 $5$233$ check big cat if it have not eaten all the meat
3 $5$233$ check big cat if it have not eaten all the meat 
        <a href=""http://example.com"">An Example duh! </a>
        <a href=""http://example.com"">An Example duh! </a> 
2 $5$233$ check big cat if it have not eaten all the meat 
check big cat if it have not eaten all the meat
<a href=""http://example.com"">An Example duh! </a>
2 $5$233$ check big cat if it have not eaten all the meat"""
>>> re.findall(r'(\d* \$\d*\d*.*) ', data)
['2 $5$233$ check big cat if it have not eaten all the', '3 $5$233$ check big cat if it have not eaten all the meat', '2 $5$233$ check big cat if it have not eaten all the meat', '2 $5$233$ check big cat if it have not eaten all the']

Let me check, but your solution seems geniously simple!

This regex does not catch next lines. Maybe Multiline mode (I do not remember exactly) Catch lines and do .replace('\n', '')

Edited 5 Years Ago by pyTony: n/a

Oh! and it just returns 1 $3$5$ when I do

res = re.findall(r'(\d{1,}\s*\$\s*\d{1,}\s*\$\s*\d{1,}\s*\$*.*)', data, re.MULTILINE)
for line in res:
    print line
    exit(0)

it is supposed to return a whole thing as single line.
text file is big and I didn't write myself!

Looks like we would also change dot to match newline also by doing

re.MULTILINE | re.DOTALL

Looks like we would also change dot to match newline also by doing

re.MULTILINE | re.DOTALL

That is fine up until first line and all next line matches due to dot!

Is there a way to tell it to match all except given pattern (that shows next line)?
for example using above with DOTALL/MULTILINE with below it get returned as single line when it should be two.

1 $3$5$
<a href=""http://daniweb.com"">For </a>
<a href=""http://daniweb.com"">For </a><a href=""http://daniweb.com"">For </a>
<a href=""http://daniweb.com"">For </a><a href=""http://daniweb.com"">For </a>
2 $4$6$
<a href=""http://daniweb.com"">For </a>
<a href=""http://daniweb.com"">For </a><a href=""http://daniweb.com"">For </a>
<a href=""http://daniweb.com"">For </a><a href=""http://daniweb.com"">For </a>

Actually, why you can not use normal Python to group the lines (itertools.groupby or generator), but must use re?

data = """2 $5$233$ check big cat if it have not eaten all the meat
3 $5$233$ check big cat if it have not eaten all the meat 
        <a href=""http://example.com"">An Example duh! </a>
        <a href=""http://example.com"">An Example duh! </a> 
2 $5$233$ check big cat if it have not eaten all the meat 
check big cat if it have not eaten all the meat
<a href=""http://example.com"">An Example duh! </a>
2 $5$233$ check big cat if it have not eaten all the meat"""

def get_blocks(source):
    block = []
    for line in source:
        # simplified lazy check not very likely to mistake
        if line.count('$') == 3:
            if block:
                yield ''.join(block)
            block = [line]
        else:
            block.append(line)


print('\n'.join(get_blocks(data.splitlines())))

Edited 5 Years Ago by pyTony: n/a

Actually, why you can not use normal Python to group the lines (itertools.groupby or generator), but must use re?

PT,
I don't care what tool will get my job done. What I want to achieve. I want everything from number$number$number$text before next number$number$number$ to be on same line. But the text is random
So if itertools can do the job, that is fine with me :)

sample text is attached

Attachments
1 $1$26$  And <a href=""strongs://430"">God </a><a href=""strongs://559"">said</a>, Let us <a href=""strongs://6213"">make </a><a href=""strongs://120"">man </a>in our <a href=""strongs://6754"">image</a>, after our <a href=""strongs://1823"">likeness</a>: and let them have <a href=""strongs://7287"">dominion </a>over the <a href=""strongs://1710"">fish </a>of the <a href=""strongs://3220"">sea</a>, and over the <a href=""strongs://5775"">fowl </a>of the <a href=""strongs://8064"">air</a>, and over the <a href=""strongs://929"">cattle</a>, and over <a href=""strongs://3605"">all </a>the <a href=""strongs://776"">earth</a>, and over <a href=""strongs://3605"">every </a>creeping <a href=""strongs://7431"">thing </a>that <a href=""strongs://7430"">creepeth </a><a href=""strongs://5921"">upon </a>the <a href=""strongs://776"">earth</a>. 
1 $1$27$  So <a href=""strongs://430"">God </a><a href=""strongs://1254"">created </a><a href=""strongs://853""></a><a href=""strongs://120"">man </a>in his <span style=""color:#808080;font-style:italic;""> own </span><a href=""strongs://6754"">image</a>, in the <a href=""strongs://6754"">image </a>of <a href=""strongs://430"">God </a><a href=""strongs://1254"">created </a>he him; <a href=""strongs://2145"">male </a>and <a href=""strongs://5347"">female </a><a href=""strongs://1254"">created </a>he them. 
1 $1$28$  And <a href=""strongs://430"">God </a><a href=""strongs://1288"">blessed </a>them, and <a href=""strongs://430"">God </a><a href=""strongs://559"">said </a>unto them, Be <a href=""strongs://6509"">fruitful</a>, and <a href=""strongs://7235"">multiply</a>, and <a href=""strongs://4390"">replenish </a><a href=""strongs://853""></a>the <a href=""strongs://776"">earth</a>, and <a href=""strongs://3533"">subdue </a>it: and have <a href=""strongs://7287"">dominion </a>over the <a href=""strongs://1710"">fish </a>of the <a href=""strongs://3220"">sea</a>, and over the <a href=""strongs://5775"">fowl </a>of the <a href=""strongs://8064"">air</a>, and over <a href=""strongs://3605"">every </a>living <a href=""strongs://2416"">thing </a>that <a href=""strongs://7430"">moveth </a><a href=""strongs://5921"">upon </a>the <a href=""strongs://776"">earth</a>. 
1 $1$29$  And <a href=""strongs://430"">God </a><a href=""strongs://559"">said</a>, <a href=""strongs://2009"">Behold</a>, I have <a href=""strongs://5414"">given </a>you <a href=""strongs://853""></a><a href=""strongs://3605"">every </a><a href=""strongs://6212"">herb </a><a href=""strongs://2232"">bearing </a><a href=""strongs://2233"">seed</a>, <a href=""strongs://834"">which </a><span style=""color:#808080;font-style:italic;""> is </span><a href=""strongs://5921"">upon </a>the <a href=""strongs://6440"">face </a>of <a href=""strongs://3605"">all </a>the <a href=""strongs://776"">earth</a>, and <a href=""strongs://3605"">every </a><a href=""strongs://6086"">tree</a>, in the <a href=""strongs://834"">which </a><span style=""color:#808080;font-style:italic;""> is </span> the <a href=""strongs://6529"">fruit </a>of a <a href=""strongs://6086"">tree </a><a href=""strongs://2232"">yielding </a><a href=""strongs://2233"">seed</a>; to you it shall <a href=""strongs://1961"">be </a>for <a href=""strongs://402"">meat</a>. 
1 $1$30$  And to <a href=""strongs://3605"">every </a><a href=""strongs://2416"">beast </a>of the <a href=""strongs://776"">earth</a>, and to <a href=""strongs://3605"">every </a><a href=""strongs://5775"">fowl </a>of the <a href=""strongs://8064"">air</a>, and to every <a href=""strongs://3605"">thing </a>that <a href=""strongs://7430"">creepeth </a><a href=""strongs://5921"">upon </a>the <a href=""strongs://776"">earth</a>, <a href=""strongs://834"">wherein </a><span style=""color:#808080;font-style:italic;""> there </span><span style=""color:#808080;font-style:italic;""> is </span><a href=""strongs://5315|2416"">life</a>, <span style=""color:#808080;font-style:italic;""> I </span><span style=""color:#808080;font-style:italic;""> have </span><span style=""color:#808080;font-style:italic;""> given </span><a href=""strongs://853""></a><a href=""strongs://3605"">every </a><a href=""strongs://3418"">green </a><a href=""strongs://6212"">herb </a>for <a href=""strongs://402"">meat</a>: and it <a href=""strongs://1961"">was </a><a href=""strongs://3651"">so</a>. 
1 $1$31$  And <a href=""strongs://430"">God </a><a href=""strongs://7200"">saw </a><a href=""strongs://853""></a>every <a href=""strongs://3605"">thing </a><a href=""strongs://834"">that </a>he had <a href=""strongs://6213"">made</a>, and, <a href=""strongs://2009"">behold</a>, <span style=""color:#808080;font-style:italic;""> it </span><span style=""color:#808080;font-style:italic;""> was </span><a href=""strongs://3966"">very </a><a href=""strongs://2896"">good</a>. And the <a href=""strongs://6153"">evening </a>and the <a href=""strongs://1242"">morning </a><a href=""strongs://1961"">were </a>the <a href=""strongs://8345"">sixth </a><a href=""strongs://3117"">day</a>. 
1 $2$1$  Thus the <a href=""strongs://8064"">heavens </a>and the <a href=""strongs://776"">earth </a>were <a href=""strongs://3615"">finished</a>, and <a href=""strongs://3605"">all </a>the <a href=""strongs://6635"">host </a>of them. 
1 $2$2$  And on the <a href=""strongs://7637"">seventh </a><a href=""strongs://3117"">day </a><a href=""strongs://430"">God </a><a href=""strongs://3615"">ended </a>his <a href=""strongs://4399"">work </a><a href=""strongs://834"">which </a>he had <a href=""strongs://6213"">made</a>; and he <a href=""strongs://7673"">rested </a>on the <a href=""strongs://7637"">seventh </a><a href=""strongs://3117"">day </a>from <a href=""strongs://4480|3605"">all </a>his <a href=""strongs://4399"">work </a><a href=""strongs://834"">which </a>he had <a href=""strongs://6213"">made</a>. 
1 $2$3$  And <a href=""strongs://430"">God </a><a href=""strongs://1288"">blessed </a><a href=""strongs://853""></a>the <a href=""strongs://7637"">seventh </a><a href=""strongs://3117"">day</a>, and <a href=""strongs://6942"">sanctified </a>it: <a href=""strongs://3588"">because </a>that in it he had <a href=""strongs://7673"">rested </a>from <a href=""strongs://4480|3605"">all </a>his <a href=""strongs://4399"">work </a><a href=""strongs://834"">which </a><a href=""strongs://430"">God </a><a href=""strongs://1254"">created </a>and <a href=""strongs://6213"">made</a>. 
1 $2$4$ 
        <a href=""strongs://428"">These </a>
        <span style=""color:#808080;font-style:italic;""> are </span> the <a href=""strongs://8435"">generations </a>of the <a href=""strongs://8064"">heavens </a>and of the <a href=""strongs://776"">earth </a>when they were <a href=""strongs://1254"">created</a>, in the <a href=""strongs://3117"">day </a>that the <a href=""strongs://3068"">LORD </a><a href=""strongs://430"">God </a><a href=""strongs://6213"">made </a>the <a href=""strongs://776"">earth </a>and the <a href=""strongs://8064"">heavens</a>, 
1 $2$5$  And <a href=""strongs://3605"">every </a><a href=""strongs://7880"">plant </a>of the <a href=""strongs://7704"">field </a><a href=""strongs://2962"">before </a>it <a href=""strongs://1961"">was </a>in the <a href=""strongs://776"">earth</a>, and <a href=""strongs://3605"">every </a><a href=""strongs://6212"">herb </a>of the <a href=""strongs://7704"">field </a><a href=""strongs://2962"">before </a>it <a href=""strongs://6779"">grew</a>: <a href=""strongs://3588"">for </a>the <a href=""strongs://3068"">LORD </a><a href=""strongs://430"">God </a>had <a href=""strongs://3808"">not </a>caused it to <a href=""strongs://4305"">rain </a><a href=""strongs://5921"">upon </a>the <a href=""strongs://776"">earth</a>, and <span style=""color:#808080;font-style:italic;""> there </span><span style=""color:#808080;font-style:italic;""> was </span><a href=""strongs://369"">not </a>a <a href=""strongs://120"">man </a>to <a href=""strongs://5647"">till </a><a href=""strongs://853""></a>the <a href=""strongs://127"">ground</a>. 
1 $2$6$  But there went <a href=""strongs://5927"">up </a>a <a href=""strongs://108"">mist </a><a href=""strongs://4480"">from </a>the <a href=""strongs://776"">earth</a>, and <a href=""strongs://8248"">watered </a><a href=""strongs://853""></a>the <a href=""strongs://3605"">whole </a><a href=""strongs://6440"">face </a>of the <a href=""strongs://127"">ground</a>. 
1 $2$7$  And the <a href=""strongs://3068"">LORD </a><a href=""strongs://430"">God </a><a href=""strongs://3335"">formed </a><a href=""strongs://853""></a><a href=""strongs://120"">man </a><span style=""color:#808080;font-style:italic;""> of </span> the <a href=""strongs://6083"">dust </a><a href=""strongs://4480"">of </a>the <a href=""strongs://127"">ground</a>, and <a href=""strongs://5301"">breathed </a>into his <a href=""strongs://639"">nostrils </a>the <a href=""strongs://5397"">breath </a>of <a href=""strongs://2416"">life</a>; and <a href=""strongs://120"">man </a><a href=""strongs://1961"">became </a>a <a href=""strongs://2416"">living </a><a href=""strongs://5315"">soul</a>. 
1 $2$8$  And the <a href=""strongs://3068"">LORD </a><a href=""strongs://430"">God </a><a href=""strongs://5193"">planted </a>a <a href=""strongs://1588"">garden </a><a href=""strongs://4480|6924"">eastward </a>in <a href=""strongs://5731"">Eden</a>; and <a href=""strongs://8033"">there </a>he <a href=""strongs://7760"">put </a><a href=""strongs://853""></a>the <a href=""strongs://120"">man </a><a href=""strongs://834"">whom </a>he had <a href=""strongs://3335"">formed</a>. 
1 $2$9$  And out <a href=""strongs://4480"">of </a>the <a href=""strongs://127"">ground </a>made the <a href=""strongs://3068"">LORD </a><a href=""strongs://430"">God </a>to <a href=""strongs://6779"">grow </a><a href=""strongs://3605"">every </a><a href=""strongs://6086"">tree </a>that is <a href=""strongs://2530"">pleasant </a>to the <a href=""strongs://4758"">sight</a>, and <a href=""strongs://2896"">good </a>for <a href=""strongs://3978"">food</a>; the <a href=""strongs://6086"">tree </a>of <a href=""strongs://2416"">life </a>also in the <a href=""strongs://8432"">midst </a>of the <a href=""strongs://1588"">garden</a>, and the <a href=""strongs://6086"">tree </a>of <a href=""strongs:

Actually, why you can not use normal Python to group the lines (itertools.groupby or generator), but must use re?

data = """2 $5$233$ check big cat if it have not eaten all the meat
3 $5$233$ check big cat if it have not eaten all the meat 
        <a href=""http://example.com"">An Example duh! </a>
        <a href=""http://example.com"">An Example duh! </a> 
2 $5$233$ check big cat if it have not eaten all the meat 
check big cat if it have not eaten all the meat
<a href=""http://example.com"">An Example duh! </a>
2 $5$233$ check big cat if it have not eaten all the meat"""

def get_blocks(source):
    block = []
    for line in source:
        # simplified lazy check not very likely to mistake
        if line.count('$') == 3:
            if block:
                yield ''.join(block)
            block = [line]
        else:
            block.append(line)


print('\n'.join(get_blocks(data.splitlines())))

Cannot understand the code. can you explain a bit. I have been away python for so long ;)

Seems to work also for your sample.txt

def get_blocks(source):
    block = []
    for line in source:
        # simplified lazy check not very likely to mistake
        if len(line) > 12 and line.lstrip()[0].isdigit() and line[:12].count('$') == 3:
            if block:
                yield ''.join(block)
            block = [line]
        else:
            block.append(line)


with open('sample.txt') as data:
    print('------------------\n'.join(get_blocks(data)))

EDIT: slightly stronger check for correct start of block

Edited 5 Years Ago by pyTony: ----- print

Seems to work also for your sample.txt

def get_blocks(source):
    block = []
    for line in source:
        # simplified lazy check not very likely to mistake
        if len(line) > 12 and line.lstrip()[0].isdigit() and line[:12].count('$') == 3:
            if block:
                yield ''.join(block)
            block = [line]
        else:
            block.append(line)


with open('sample.txt') as data:
    print('------------------\n'.join(get_blocks(data)))

EDIT: slightly stronger check for correct start of block

Seem to work ok (I have to do further check for data integrity) But could you explain the code?

I think my code fails to give out the last block (sorry but that was sub 5 minutes of coding, only one run of your sample) You should add

yield ''.join(block)

to end of the function.

can you tell what you understand, what not?

Edited 5 Years Ago by pyTony: n/a

I think my code fails to give out the last block (sorry but that was sub 5 minutes of coding, only one run of your sample) You should add

yield ''.join(block)

to end of the function.

can you tell what you understand, what not?

General concept of what you are doing.
It does not however work for these lines in sample.txt

1 $2$3$  And <a href=""strongs://430"">God </a><a href=""strongs://1288"">blessed </a><a href=""strongs://853""></a>the <a href=""strongs://7637"">seventh </a><a href=""strongs://3117"">day</a>, and <a href=""strongs://6942"">sanctified </a>it: <a href=""strongs://3588"">because </a>that in it he had <a href=""strongs://7673"">rested </a>from <a href=""strongs://4480|3605"">all </a>his <a href=""strongs://4399"">work </a><a href=""strongs://834"">which </a><a href=""strongs://430"">God </a><a href=""strongs://1254"">created </a>and <a href=""strongs://6213"">made</a>. 
1 $2$4$ 
        <a href=""strongs://428"">These </a>
        <span style=""color:#808080;font-style:italic;""> are </span> the <a href=""strongs://8435"">generations </a>of the <a href=""strongs://8064"">heavens </a>and of the <a href=""strongs://776"">earth </a>when they were <a href=""strongs://1254"">created</a>, in the <a href=""strongs://3117"">day </a>that the <a href=""strongs://3068"">LORD </a><a href=""strongs://430"">God </a><a href=""strongs://6213"">made </a>the <a href=""strongs://776"">earth </a>and the <a href=""strongs://8064"">heavens</a>,

I am collecting block and yielding it when new block starts (and so the last block must have s yield at end)To accept such short start line reduce the length limit to 8 or 6 from 12.

Here is anyway my cleaned up code:

from pprint import pprint

def block_start(line, limit=8):
    return len(line) > limit and line.lstrip()[0].isdigit() and line[:12].count('$') == 3


def get_blocks(source, block_start=block_start):
    block = []
    for line in source:
        # simplified lazy check not very likely to mistake
        if block_start(line):
            if block:
                yield ''.join(block)
            block = [line]
        else:
            block.append(line)
    if block:
        yield ''.join(block)

with open('sample.txt') as data:
    print('------------------\n'.join(get_blocks(data)))
This question has already been answered. Start a new discussion instead.