Stripping EOL character from a string

Question

mattgwilson 0 Newbie Poster

12 Years Ago

Hi, I'm opening up text files created by another program which leaves an EOL character (which is a square box) to separate lines.

I'm reading through these files and extracting data from them, I know where my data starts because I know the name of the parameter eg. "FOOD TYPE", I do not however know how long the string is going to be before the ⌂ appears. So what I've done below is extend the string to beyond any possible answer, then search for the EOL character...

some of the following code has been changed for ease of understanding

if knownparameter in line:
    lineb = line    #line will be used elsewhere
    found = lineb.find(knownparameter)    #the index number of the occurance of parameter
    extractb = lineb[found:(found+30):1]    #extractb is given a string to work with
    if ⌂ in extractb    #if the EOL character is in the string
        end = extractb.find(⌂)    #end is given the index of the EOL character
    extractb = lineb[found:end:1]    #final string to save to extractb
    print extractb

The problem is that this will not run due to the EOL character being in my code...
what can I do to get around this?

Thanks,

Matt

python

3 Contributors
13 Replies
340 Views
9 Hours Discussion Span
Latest Post 12 Years Ago Latest Post by TrustyTony

All 13 Replies

TrustyTony 888 pyMod

12 Years Ago

Character must be in quotes to make string (in Python characters are just short strings)

if '⌂' in extractb    #if the EOL character is in the string

Better even to use the hex value of the ascii with \x prefix.

TrustyTony 888 pyMod

12 Years Ago

I do not know what character you are after (not in ascii chart: http://bluesock.org/~willg/dev/ascii.html) and we have no sample of your input file (because you did not attach one), what you want I do not also know so well. Are you sure that simple .split() does not do the job?

TrustyTony 888 pyMod

12 Years Ago

I do not know what is your issue post exact copy pasted data later.

Python 2.7.2 (default, Jun 12 2011, 14:24:46) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import string
>>> data = '\x7F'.join(string.ascii_letters)
>>> data
'a\x7fb\x7fc\x7fd\x7fe\x7ff\x7fg\x7fh\x7fi\x7fj\x7fk\x7fl\x7fm\x7fn\x7fo\x7fp\x7fq\x7fr\x7fs\x7ft\x7fu\x7fv\x7fw\x7fx\x7fy\x7fz\x7fA\x7fB\x7fC\x7fD\x7fE\x7fF\x7fG\x7fH\x7fI\x7fJ\x7fK\x7fL\x7fM\x7fN\x7fO\x7fP\x7fQ\x7fR\x7fS\x7fT\x7fU\x7fV\x7fW\x7fX\x7fY\x7fZ'
>>> print data.split('\x7f')
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z']
>>>

woooee 814 Nearly a Posting Maven

12 Years Ago

To append the two recs together, you want to say something like,

data=open(fname, "r").readlines()
for ctr in len(data):
    if data[ctr].strip().startswith("DATE:"):
        print data[ctr].strip(), data[ctr+1].strip()

EOL though is actually decimal 10 or 13 or a combination of the two. Decimal 127 is Delete. To eliminate it, try something like this:

eliminate=chr(127)
rec=rec.replace(eliminate, " ")

If this does not work, iterate one of the lines character by character and print the character and ord(character), one per line, and substitute the offending character's ord for decimal 127 in the above code.

You can also use:

data=open(fname, "r").read()    ## read a one long string
## "DATE" will no longer appear but you know that each individual rec started with it
data_split = data.split("DATE:")
for rec in data_split:
    print rec

Edited 12 Years Ago by woooee because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

mattgwilson 0 Newbie Poster · Answer 1 · 2011-12-06T20:49:44+00:00

Character must be in quotes to make string (in Python characters are just short strings)
if '⌂' in extractb    #if the EOL character is in the string
Better even to use the hex value of the ascii with \x prefix.

Sorry Tony, I'm just typing from laptop over to this machine as the laptop is isolated from the network. I missed out the '', but that doesn't work, using the HEX code I haven't tried so I'll give it a shot.

Many thanks,

Matt.

EDIT....

Even using the Hex doesn't work, it actually stops the code from bringing up my menu when I run. It actually treats the character as an end of line within MY code itself...

I think I've opened a can of worms here...

Any help suggestions are greatly appreciated though, so thanks Tony.

mattgwilson 0 Newbie Poster · Answer 2 · 2011-12-06T21:19:27+00:00

I do not know what character you are after (not in ascii chart: http://bluesock.org/~willg/dev/ascii.html) and we have no sample of your input file (because you did not attach one), what you want I do not also know so well. Are you sure that simple .split() does not do the job?

Hi Tony, I've retried what I was trying to do with /x as Hex, it seems that's the problem, not the actual character itself. So how do I search for a character using the hex code?

The character I'm looking for is ⌂ in the program (is a square box) alt-127 on keyboard
Hex value 7F.

Hope this helps.

I can't attach the whole code due to restrictions with data transfer between laptop and workstation. I can only extract what I type.

mattgwilson 0 Newbie Poster · Answer 3 · 2011-12-06T21:34:07+00:00

Here's the code exactly as it appears, I want to strip the ⌂ character and everything after it. So I just have the date.

x = "1"
       while x == "1":
           path = r"C:\\TEST"
           entries = [join(path, entry) for entry in listdir(path)]
           files = filter(isfile, entries)
           a = open(join("C:\\TEST", "SearchResult.txt"), "w")
           
           b = "DATE" #date
           c = "EUT SER" #eut serial
           d = "ELAPSED PO" #elapsed powerup
           e = "ELAPSED KE" #elapsed keyup
           f = "EUT AC" #eut acceptable
           g = "EUT DID" #eut not acceptable
           ret = '\x7F' #new line
           
           for file in files:
                   o = open(file, 'r')
                   for line in o.readlines():
                       
                       if b in line:
                          print
                          lineb = line
                          found = lineb.find(b)
                          
                       #   print found
                          extractb = lineb[found:(found+31):1]
                          if ret in extractb:
                                 end = extractb.find(ret)
                                 extractb = lineb[found:end:1]
                          print extractb

Here is the current output (NOTE the EOL character is not displayed here, as it is being treated as an actual function and is putting TEST START TIME: on a new line, when in python I'm getting "DATE: 6/30/2010⌂TEST START TIME"

DATE: 6/30/2010
TEST START TIME

DATE: 7/8/2010
TEST START TIME:

DATE: 2/28/2011
TEST START TIME

DATE: 4/2/2011
TEST START TIME:

Thanks

mattgwilson 0 Newbie Poster · Answer 4 · 2011-12-06T21:51:52+00:00

To append the two recs together, you want to say something like,

data=open(fname, "r").readlines()
for ctr in len(data):
    if data[ctr].strip().startswith("DATE:"):
        print data[ctr].strip(), data[ctr+1].strip()

Thanks, but that's not the issue.

The line is thousands of characters long, in this line are several of the ⌂ characters which in some programs signifies a EOL.

I know where to start reading from, BUT I do not know where the string I want will end, due to variable character length (date for example). I know if I'd made the program that creates the file, I would have universified the dates as 00/00/0000. But this is not the case. The string end is currently marked with the EOL character "⌂" which is hex 7F.

I need the index for this character, so that I can do what my program "should" do.

mattgwilson 0 Newbie Poster · Answer 5 · 2011-12-06T21:58:18+00:00

EOL though is actually decimal 10 or 13 or a combination of the two. Decimal 127 is Delete. To eliminate it, try something like this:

eliminate=chr(127)
rec=rec.replace(eliminate, " ")

If this does not work, iterate one of the lines character by character and print the character and ord(character), one per line, and substitute the offending character's ord for decimal 127 in the above code.

I think I'll just read backwards from the following line and use that as "end" then end=end-2 to give me the correct index. This will not work on some of the lines, but it's a start. Sigh.

woooee 814 Nearly a Posting Maven · Answer 6 · 2011-12-06T21:59:25+00:00

Note the split on "DATE:" that I added to the above post, which may work better.

And Hex 7f = decimal 127 so you should be able to replace it.

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 7 · 2011-12-06T22:04:52+00:00

TrustyTony 888 pyMod

12 Years Ago

You could also try memoryview or bytes instead of string.

mattgwilson 0 Newbie Poster · Answer 8 · 2011-12-06T22:10:15+00:00

I think I'll just read backwards from the following line and use that as "end" then end=end-2 to give me the correct index. This will not work on some of the lines, but it's a start. Sigh.

   while x == "1":
       path = r"C:\\TEST"
       entries = [join(path, entry) for entry in listdir(path)]
       files = filter(isfile, entries)
       a = open(join("C:\\TEST", "SearchResult.txt"), "w")

       b = "DATE" #date
       c = "EUT SER" #eut serial
       d = "ELAPSED PO" #elapsed powerup
       e = "ELAPSED KE" #elapsed keyup
       f = "EUT AC" #eut acceptable
       g = "EUT DID" #eut not acceptable
       ret = 'T' #new line

       for file in files:
               o = open(file, 'r')
               for line in o.readlines():

                   if b in line:
                      print
                      lineb = line
                      found = lineb.find(b)

                   #   print found
                      extractb = lineb[found:(found+18):1]
                      if ret in extractb:
                             end = extractb.rfind(ret)
                             end=end-1
                             extractb = lineb[found:end:1]
                      print extractb

This works, it's more like a workaround than a solution, but if anybody comes up with a correct method of reading this character, please let me know.

Thanks,

Matt

TrustyTony 888 pyMod Team Colleague Featured Poster · Answer 9 · 2011-12-06T23:47:33+00:00

I do not like the coding style, but to keep things short, why would you not use simple rsplit(ret) and print first element of the result in matching line? You are also leaving your files open, use obfuscating variables a,b,c... And even more useless lineb synonym for line...

Stripping EOL character from a string

Recommended Answers Collapse Answers

All 13 Replies

Recommended Answers