943,636 Members | Top Members by Rank

Ad:
  • Python Discussion Thread
  • Unsolved
  • Views: 1853
  • Python RSS
You are currently viewing page 1 of this multi-page discussion thread
Sep 11th, 2008
0

How do I recognise strings and numbers

Expand Post »
Let's say I have a text file separated by tabs containing stuff like:

Python Syntax (Toggle Plain Text)
  1.  
  2. 1.02 \t hello \t 01/02/2008

How do I get python to recognise that 1.02 is a number, hello is a string and 01/02/2008 is a date.
The file has no pattern, i.e it won't always be number,string,date, it could be anything.

I'm thinking perhaps using regular expressions or something.

Waiting your earliest reply. Sample code would be great.
Last edited by iamthwee; Sep 11th, 2008 at 2:59 pm.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Sep 11th, 2008
1

Re: How do I recognise strings and numbers

Decide what types of objects that need to be converted. If needed, create a function to make each conversion type. An exception should occur on any failed conversion. If all conversions fail, return a string. Example:
Python Syntax (Toggle Plain Text)
  1. import time
  2.  
  3. def fdim(s):
  4. # return a float given a fraction (EX. '1/2')
  5. ss = s.split('/')
  6. return float(ss[0])/float(ss[1])
  7.  
  8. def evalList(s):
  9. # return a list given a string representation
  10. if s.strip().startswith('[' ) and s.strip().endswith(']'):
  11. try: return eval(s)
  12. except: raise ValueError
  13. raise ValueError
  14.  
  15. def evalDateStr(s):
  16. # return a struct_time object given a string representation
  17. return time.strptime(s, '%m/%d/%Y')
  18.  
  19. def convertType(s):
  20. for func in (int, float, evalDateStr, fdim, evalList):
  21. try:
  22. n = func(s)
  23. return n
  24. except:
  25. pass
  26. return s
  27.  
  28. s = "6 \t 1.02 \t hello \t 01/02/2008 \t [1,2,3] \t 15/16"
  29.  
  30. for obj in [convertType(item) for item in [w.strip() for w in s.split('\t')]]:
  31. print '%s - object type: %s' % (obj, type(obj))
Output:
Python Syntax (Toggle Plain Text)
  1. >>> 6 - object type: <type 'int'>
  2. 1.02 - object type: <type 'float'>
  3. hello - object type: <type 'str'>
  4. (2008, 1, 2, 0, 0, 0, 2, 2, -1) - object type: <type 'time.struct_time'>
  5. [1, 2, 3] - object type: <type 'list'>
  6. 0.9375 - object type: <type 'float'>
  7. >>>
Reputation Points: 86
Solved Threads: 40
Junior Poster
solsteel is offline Offline
141 posts
since Mar 2007
Sep 12th, 2008
0

Re: How do I recognise strings and numbers

Wow that's awesome, one more question though how do I detect type bools, such as TRUE and FALSE
I changed your code a bit to test it and got rid of the bits I don't need.

omg.txt (tab delimited)
Python Syntax (Toggle Plain Text)
  1. 0.0003 2.3 8 hello 000292 0.0.43
  2. 11/01/2008 13/13/1991 0 "343,4343"
  3. 0116 2346 6421-01-01 31/02/2008 "2.3"
  4. 01/02/08 1.223232223


conv.py
python Syntax (Toggle Plain Text)
  1. import time
  2.  
  3. def evalDateStr(s):
  4. # return a struct_time object given a string representation
  5. # changed it to day month year
  6. return time.strptime(s, '%d/%m/%Y')
  7.  
  8. def convertType(s):
  9. for func in (int, float, evalDateStr ):
  10. try:
  11. n = func(s)
  12. return n
  13. except:
  14. pass
  15. return s
  16.  
  17.  
  18. #changed to read in file line by line
  19. print "\nLooping through the file, line by line."
  20. f = open("omg.txt", "r")
  21. for line in f:
  22. s = line
  23. for obj in [convertType(item) for item in [w.strip() for w in s.split('\t')]]:
  24. print '%s - object type: %s' % (obj, type(obj))
  25.  
  26.  
  27. f.close()

output
Python Syntax (Toggle Plain Text)
  1. Looping through the file, line by line.
  2. 0.0003 - object type: <type 'float'>
  3. 2.3 - object type: <type 'float'>
  4. 8 - object type: <type 'int'>
  5. hello - object type: <type 'str'>
  6. 292 - object type: <type 'int'>
  7. 0.0.43 - object type: <type 'str'>
  8. (2008, 1, 11, 0, 0, 0, 4, 11, -1) - object type: <type 'time.struct_time'>
  9. 13/13/1991 - object type: <type 'str'>
  10. 0 - object type: <type 'int'>
  11. "343,4343" - object type: <type 'str'>
  12. 0116 2346 - object type: <type 'str'>
  13. 6421-01-01 - object type: <type 'str'>
  14. 31/02/2008 - object type: <type 'str'>
  15. "2.3" - object type: <type 'str'>
  16. 01/02/08 - object type: <type 'str'>
  17. 1.223232223 - object type: <type 'float'>

As you can see the results are all correct, I checked it to see if it would be fooled by february having 31 days and it passed. But does it auto detect leap years?

I have one more question as well I will post later. Thanks very much for saving me time.
Last edited by iamthwee; Sep 12th, 2008 at 5:38 am.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Sep 12th, 2008
0

Re: How do I recognise strings and numbers

Part two.

Let's assume I have a tab delimited file such as

test.txt
Python Syntax (Toggle Plain Text)
  1. hello 2.3 01/02/2008
  2. 2.0 there TRUE

I want to have my python script read it in.



python Syntax (Toggle Plain Text)
  1.  
  2. import pyXLWriter as xl
  3. import datetime
  4.  
  5. # Create a new workbook called simple.xls and add a worksheet
  6. workbook = xl.Writer("simple.xls")
  7. worksheet = workbook.add_worksheet()
  8.  
  9. worksheet.write([0, 0], "hello")
  10. worksheet.write([0, 1], 2.3)
  11. worksheet.write([0, 2], datetime.date(2008,02,01))
  12. worksheet.write([1, 0], 2.0)
  13. worksheet.write([1, 1], "there")
  14. worksheet.write([1, 2], 1
  15.  
  16. workbook.close()

As you can see in my example the [x,y] represents the row, column in the tab delimited text file.

I want my code to be generic so it can handle any tab delimited file of unknowN entries. I think I need to use functions but I have no idea how to.

Sample code would be great! Awaiting your earliest reply.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Sep 12th, 2008
0

Re: How do I recognise strings and numbers

Create another function to test for for boole value strings. Placing values in columns and rows is very easy in Python using enumerate().
Python Syntax (Toggle Plain Text)
  1. import time
  2.  
  3. def evalBoole(s):
  4. if s in ('True', 'False'):
  5. return eval(s)
  6. else:
  7. raise ValueError
  8.  
  9. def evalDateStr(s):
  10. # return a struct_time object given a string representation
  11. # changed it to day month year
  12. return time.strptime(s, '%d/%m/%Y')
  13.  
  14. def convertType(s):
  15. for func in (int, float, evalBoole, evalDateStr):
  16. try:
  17. n = func(s)
  18. return n
  19. except:
  20. pass
  21. return s
  22.  
  23. #changed to read in file line by line
  24. print "\nLooping through the file, line by line."
  25. f = open("omg.txt", "r")
  26. for i, line in enumerate(f):
  27. for j, obj in enumerate([convertType(item) for item in
  28. [w.strip() for w in line.strip().split('\t')]
  29. ]):
  30. print 'Row %d, Column %d, Value: %s' % (i,j,obj)
  31.  
  32. f.close()
Reputation Points: 86
Solved Threads: 40
Junior Poster
solsteel is offline Offline
141 posts
since Mar 2007
Sep 12th, 2008
0

Re: How do I recognise strings and numbers

Thanks that is what I was thinking about booleans!

However, the next bit that I am stuck on is in my post #4.

I need to take the tab file and convert it to an excel file.

Python Syntax (Toggle Plain Text)
  1. import pyXLWriter as xl
  2. import datetime
  3.  
  4. # Create a new workbook called simple.xls and add a worksheet
  5. workbook = xl.Writer("simple.xls")
  6. worksheet = workbook.add_worksheet()
  7.  
  8. worksheet.write([0, 0], "hello")
  9. worksheet.write([0, 1], 2.3)
  10. worksheet.write([0, 2], datetime.date(2008,02,01))
  11. worksheet.write([1, 0], 2.0)
  12. worksheet.write([1, 1], "there")
  13. worksheet.write([1, 2], 1
  14.  
  15. workbook.close()

Here I have the code to write to excel using the module pyXLWriter but it is hard coded. What I want is it to read the tab file then choose what the variable is, what x,y position is it and write it to excel. so can you help me do this? I don't know how to do this? Thank you for your help so far.
Last edited by iamthwee; Sep 12th, 2008 at 1:44 pm.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Sep 12th, 2008
1

Re: How do I recognise strings and numbers

I thought my use of enumerate() would give you enough information. Here's the code again with code to write to the Excel file (untested):
Python Syntax (Toggle Plain Text)
  1. print "\nLooping through the file, line by line."
  2. workbook = xl.Writer("simple.xls")
  3. worksheet = workbook.add_worksheet()
  4. f = open("omg.txt", "r")
  5. for i, line in enumerate(f):
  6. for j, obj in enumerate([convertType(item) for item in
  7. [w.strip() for w in line.strip().split('\t')]
  8. ]):
  9. worksheet.write([i, j], obj)
  10.  
  11. workbook.close()
  12. f.close()
Reputation Points: 86
Solved Threads: 40
Junior Poster
solsteel is offline Offline
141 posts
since Mar 2007
Sep 13th, 2008
0

Re: How do I recognise strings and numbers

No I can't seem to get it to work! I tried two tests.

import pyXLWriter as xl
import time
import datetime


workbook  = xl.Writer("simple555.xls")
worksheet = workbook.add_worksheet()

def evalBoole(s):
    if s in ('True', 'False'):
        return eval(s)
    else:
        raise ValueError

def evalDateStr(s):
    # return a struct_time object given a string representation
    # changed it to day month year
    return time.strptime(s, '%d/%m/%Y')

def convertType(s):
    for func in (int, float, evalBoole, evalDateStr):
        try:
            n = func(s)
            return n
        except:
            pass
    return s

#changed to read in file line by line
print "\nLooping through the file, line by line."
f = open("omg.txt", "r")
for i, line in enumerate(f):
    for j, obj in enumerate([convertType(item) for item in 
                             [w.strip() for w in line.strip().split('\t')]
                             ]):
		#print 'Row %d, Column %d, Value: %s' % (i,j,obj)
            worksheet.write([i, j], datetime.date(2008,02,01))

f.close()
workbook.close()

That works when you explicitly give it a value, either date, string or intger/float

But it doesn't work when I do

import pyXLWriter as xl
import time
import datetime


workbook  = xl.Writer("simple555.xls")
worksheet = workbook.add_worksheet()

def evalDateStr(s):
    # return a struct_time object given a string representation
    # changed it to day month year
    return time.strptime(s, '%d/%m/%Y')

def convertType(s):
    for func in (int, float, evalDateStr):
        try:
            n = func(s)
            return n
        except:
            pass
    return s

#changed to read in file line by line
print "\nLooping through the file, line by line."
f = open("omg.txt", "r")
for i, line in enumerate(f):
    for j, obj in enumerate([convertType(item) for item in 
                             [w.strip() for w in line.strip().split('\t')]
                             ]):
		#print 'Row %d, Column %d, Value: %s' % (i,j,obj)
            worksheet.write([i, j], obj)

f.close()
workbook.close()
it fails, it doesn't even write to excel file!!!!


Do I need to create a function such as:
Python Syntax (Toggle Plain Text)
  1. if obj = str then
  2. s = str(obj)
  3. worksheet.write ([i,j], s)
  4.  
  5. else if obj = integer then
  6. i = int(obj)
  7. worksheet.write([i,j], i)
  8.  
  9. else if obj = date then
  10. a = split date by commas =day
  11. b = split date by commas = month
  12. c = split date by commas = year
  13.  
  14. worksheet.write([i,j], datetime.date(c,b,a))
  15. end

If so how do I do this. I don't know anything about python!!!! THanks I'm almost finished.
Last edited by iamthwee; Sep 13th, 2008 at 11:37 am.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005
Sep 13th, 2008
0

Re: How do I recognise strings and numbers

You should run some tests on different types of data to see what is failing. Did you get any error messages? I would guess that it is failing on the date. Try this:
Python Syntax (Toggle Plain Text)
  1. for i, line in enumerate(f):
  2. for j, obj in enumerate([convertType(item) for item in
  3. [w.strip() for w in line.strip().split('\t')]
  4. ]):
  5. if isinstance(obj, time.struct_time):
  6. worksheet.write([i, j], datetime.date(*obj[:3]))
  7. else:
  8. worksheet.write([i, j], obj)
Reputation Points: 86
Solved Threads: 40
Junior Poster
solsteel is offline Offline
141 posts
since Mar 2007
Sep 14th, 2008
0

Re: How do I recognise strings and numbers

python Syntax (Toggle Plain Text)
  1. import pyXLWriter as xl
  2. import time
  3. import datetime
  4.  
  5.  
  6. workbook = xl.Writer("simple555.xls")
  7. worksheet = workbook.add_worksheet()
  8.  
  9. def evalBoole(s):
  10. if s in ('TRUE', 'FALSE'):
  11. return eval(s)
  12. else:
  13. raise ValueError
  14.  
  15. def evalDateStr(s):
  16. # return a struct_time object given a string representation
  17. # changed it to day month year
  18. return time.strptime(s, '%d/%m/%Y')
  19.  
  20. def convertType(s):
  21. for func in (int, float, evalBoole, evalDateStr):
  22. try:
  23. n = func(s)
  24. return n
  25. except:
  26. pass
  27. return s
  28.  
  29. #changed to read in file line by line
  30. print "\nLooping through the file, line by line."
  31. f = open("omg.txt", "r")
  32. for i, line in enumerate(f):
  33. for j, obj in enumerate([convertType(item) for item in
  34. [w.strip() for w in line.strip().split('\t')]
  35. ]):
  36. if isinstance(obj, time.struct_time):
  37. worksheet.write([i, j], datetime.date(*obj[:3]))
  38. else:
  39. worksheet.write([i, j], obj)
  40. f.close()
  41. workbook.close()

Cool! Thank you so much it works now, it writes to the excel file (2000 & 2003 format) as I expect. However, it still comes up with some error messages and I am not quite sure what they mean, or what it might affect:

command prompt dump
Python Syntax (Toggle Plain Text)
  1. Looping through the file, line by line.
  2. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:135: DeprecationWarning: s
  3. truct integer overflow masking is deprecated
  4. unknown3 = pack("<H", -2)
  5. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:135: DeprecationWarning: '
  6. H' format requires 0 <= number <= 65535
  7. unknown3 = pack("<H", -2)
  8. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:141: DeprecationWarning: s
  9. truct integer overflow masking is deprecated
  10. sbd_startblock = pack("<L", -2)
  11. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:142: DeprecationWarning: s
  12. truct integer overflow masking is deprecated
  13. unknown7 = pack("<LLL", 0x00, -2 ,0x00)
  14. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:143: DeprecationWarning: s
  15. truct integer overflow masking is deprecated
  16. unused = pack("<L", -1)
  17. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:205: DeprecationWarning: s
  18. truct integer overflow masking is deprecated
  19. pps_prev = pack("<L", -1) #0x44
  20. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:206: DeprecationWarning: s
  21. truct integer overflow masking is deprecated
  22. pps_next = pack("<L", -1) #0x48
  23. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:213: DeprecationWarning: s
  24. truct integer overflow masking is deprecated
  25. pps_sb = pack("<L", sb) #0x74
  26. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:207: DeprecationWarning: s
  27. truct integer overflow masking is deprecated
  28. pps_dir = pack("<L", dir) #0x4c
  29. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:168: DeprecationWarning: s
  30. truct integer overflow masking is deprecated
  31. marker = pack("<L", -3)
  32. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:169: DeprecationWarning: s
  33. truct integer overflow masking is deprecated
  34. end_of_chain = pack("<L", -2)
  35. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:170: DeprecationWarning: s
  36. truct integer overflow masking is deprecated
  37. unused = pack("<L", -1)

Also I wanted to ask what does *obj[:3]) mean?
Last edited by iamthwee; Sep 14th, 2008 at 9:22 am.
Featured Poster
Reputation Points: 1536
Solved Threads: 431
Posting Expert
iamthwee is offline Offline
5,865 posts
since Aug 2005

This thread is more than three months old

No one has posted to this discussion for at least three months. Please let old threads die and do not reply to them unless you feel you have something new and valuable to contribute that absolutely must be added to make the discussion complete. Otherwise, please start a new thread in this forum instead.
Message:
Previous Thread in Python Forum Timeline: VolumeRendering.py
Next Thread in Python Forum Timeline: Getting size of all files in folder, and finding out some offsets..





About Us | Contact Us | Advertise | Acceptable Use Policy
Forum Index | Build Custom RSS Feed


Follow us on Twitter


© 2011 DaniWeb® LLC