I have a list of document pages with indicators where each document begins "ND":

Page1,ND
Page2,
Page3,
Page4,ND
Page5,

I am trying to format it so that I have a two column csv with the first and last page:

Page1, Page3
Page4, Page5
etc

I cannot get beyone a simple loop to identify the first document:

for i in mylist:
    if 'ND' in i:
         print i
    else:
         pass

I would like to know what methods are out there to print the line with the ND, then print the line before the next line with the ND and then move on.

I could not figure out an elegant way, but here is something I made that solves my problem:

beg = []
end = []
for i in range(len(unsortedlist)-1):
    if 'ND' in unsortedlist[i]:
        beg.append(unsortedlist[i].strip(",ND"))
        end.append(unsortedlist[i - 1].strip(",ND"))
    else:
        pass
for i in range(len(beg)):
    try:
        print '%s,%s' % (beg[i],end[i + 1])
    except:
        pass

The only thing it leaves incomplete is the last doc since the last page of the last document is not followed by a the first page of the next document. If anyone has any suggestions, I would prefer to learn a better way. Thanks.

I would do it in a similiar way. Provided all of the pages can be stored in memory, the following creates a junk_list that stores pages until the next "ND" is found. It then prints the first and last pages in junk_list and blanks junk_list. You could also store the first page and previous record in variables and print them when "ND" is found. Using lists is the easier way but does take more memory.

test_data = [
"Page1,ND\n",
"Page2,\n",
"Page3,\n",
"Page4,ND\n",
"Page5,\n" ]

junk_list=[]
for rec in test_data:
   substrs = rec.strip().split(",")

   if len(substrs) > 1:
      if (substrs[1].strip()=="ND") and (len(junk_list)):
         print "pages =", junk_list[0], junk_list[-1]
         junk_list=[]
   junk_list.append(substrs[0])

##--- final group of pages
print "pages =", junk_list[0], junk_list[-1], "final group"

Without using lists:

#!/usr/bin/python

fh = open('nd.txt', 'r')
(page, nd) = fh.readline().split(',')
(last, lastnd) = (page, page)

print page,
for i in fh:
        (page, nd) = i.strip().split(',')
        if nd == 'ND':
                print last if lastnd != last else ""
                lastnd = page
                print page,
        last = page
print last if lastnd != last else ""
fh.close()

Sample input:

Page1,ND
Page2,
Page3,
Page4,ND
Page5,
Page6,ND
Page7,ND
Page8,ND
Page9,ND
Page10,
Page11,
Page12,ND

Output:

Page1 Page3
Page4 Page5
Page6
Page7
Page8
Page9 Page11
Page12

Only assumption is that the first page is always "ND" i.e. a new document.

HTH

This article has been dead for over six months. Start a new discussion instead.