How do I recognise strings and numbers

Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
Reply

Join Date: Aug 2005
Posts: 5,266
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 377
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

How do I recognise strings and numbers

 
0
  #1
Sep 11th, 2008
Let's say I have a text file separated by tabs containing stuff like:

  1.  
  2. 1.02 \t hello \t 01/02/2008

How do I get python to recognise that 1.02 is a number, hello is a string and 01/02/2008 is a date.
The file has no pattern, i.e it won't always be number,string,date, it could be anything.

I'm thinking perhaps using regular expressions or something.

Waiting your earliest reply. Sample code would be great.
Last edited by iamthwee; Sep 11th, 2008 at 2:59 pm.
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Join Date: Mar 2007
Posts: 110
Reputation: solsteel is on a distinguished road 
Solved Threads: 31
solsteel solsteel is offline Offline
Junior Poster

Re: How do I recognise strings and numbers

 
1
  #2
Sep 11th, 2008
Decide what types of objects that need to be converted. If needed, create a function to make each conversion type. An exception should occur on any failed conversion. If all conversions fail, return a string. Example:
  1. import time
  2.  
  3. def fdim(s):
  4. # return a float given a fraction (EX. '1/2')
  5. ss = s.split('/')
  6. return float(ss[0])/float(ss[1])
  7.  
  8. def evalList(s):
  9. # return a list given a string representation
  10. if s.strip().startswith('[' ) and s.strip().endswith(']'):
  11. try: return eval(s)
  12. except: raise ValueError
  13. raise ValueError
  14.  
  15. def evalDateStr(s):
  16. # return a struct_time object given a string representation
  17. return time.strptime(s, '%m/%d/%Y')
  18.  
  19. def convertType(s):
  20. for func in (int, float, evalDateStr, fdim, evalList):
  21. try:
  22. n = func(s)
  23. return n
  24. except:
  25. pass
  26. return s
  27.  
  28. s = "6 \t 1.02 \t hello \t 01/02/2008 \t [1,2,3] \t 15/16"
  29.  
  30. for obj in [convertType(item) for item in [w.strip() for w in s.split('\t')]]:
  31. print '%s - object type: %s' % (obj, type(obj))
Output:
  1. >>> 6 - object type: <type 'int'>
  2. 1.02 - object type: <type 'float'>
  3. hello - object type: <type 'str'>
  4. (2008, 1, 2, 0, 0, 0, 2, 2, -1) - object type: <type 'time.struct_time'>
  5. [1, 2, 3] - object type: <type 'list'>
  6. 0.9375 - object type: <type 'float'>
  7. >>>
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,266
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 377
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: How do I recognise strings and numbers

 
0
  #3
Sep 12th, 2008
Wow that's awesome, one more question though how do I detect type bools, such as TRUE and FALSE
I changed your code a bit to test it and got rid of the bits I don't need.

omg.txt (tab delimited)
  1. 0.0003 2.3 8 hello 000292 0.0.43
  2. 11/01/2008 13/13/1991 0 "343,4343"
  3. 0116 2346 6421-01-01 31/02/2008 "2.3"
  4. 01/02/08 1.223232223


conv.py
  1. import time
  2.  
  3. def evalDateStr(s):
  4. # return a struct_time object given a string representation
  5. # changed it to day month year
  6. return time.strptime(s, '%d/%m/%Y')
  7.  
  8. def convertType(s):
  9. for func in (int, float, evalDateStr ):
  10. try:
  11. n = func(s)
  12. return n
  13. except:
  14. pass
  15. return s
  16.  
  17.  
  18. #changed to read in file line by line
  19. print "\nLooping through the file, line by line."
  20. f = open("omg.txt", "r")
  21. for line in f:
  22. s = line
  23. for obj in [convertType(item) for item in [w.strip() for w in s.split('\t')]]:
  24. print '%s - object type: %s' % (obj, type(obj))
  25.  
  26.  
  27. f.close()

output
  1. Looping through the file, line by line.
  2. 0.0003 - object type: <type 'float'>
  3. 2.3 - object type: <type 'float'>
  4. 8 - object type: <type 'int'>
  5. hello - object type: <type 'str'>
  6. 292 - object type: <type 'int'>
  7. 0.0.43 - object type: <type 'str'>
  8. (2008, 1, 11, 0, 0, 0, 4, 11, -1) - object type: <type 'time.struct_time'>
  9. 13/13/1991 - object type: <type 'str'>
  10. 0 - object type: <type 'int'>
  11. "343,4343" - object type: <type 'str'>
  12. 0116 2346 - object type: <type 'str'>
  13. 6421-01-01 - object type: <type 'str'>
  14. 31/02/2008 - object type: <type 'str'>
  15. "2.3" - object type: <type 'str'>
  16. 01/02/08 - object type: <type 'str'>
  17. 1.223232223 - object type: <type 'float'>

As you can see the results are all correct, I checked it to see if it would be fooled by february having 31 days and it passed. But does it auto detect leap years?

I have one more question as well I will post later. Thanks very much for saving me time.
Last edited by iamthwee; Sep 12th, 2008 at 5:38 am.
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,266
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 377
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: How do I recognise strings and numbers

 
0
  #4
Sep 12th, 2008
Part two.

Let's assume I have a tab delimited file such as

test.txt
  1. hello 2.3 01/02/2008
  2. 2.0 there TRUE

I want to have my python script read it in.



  1.  
  2. import pyXLWriter as xl
  3. import datetime
  4.  
  5. # Create a new workbook called simple.xls and add a worksheet
  6. workbook = xl.Writer("simple.xls")
  7. worksheet = workbook.add_worksheet()
  8.  
  9. worksheet.write([0, 0], "hello")
  10. worksheet.write([0, 1], 2.3)
  11. worksheet.write([0, 2], datetime.date(2008,02,01))
  12. worksheet.write([1, 0], 2.0)
  13. worksheet.write([1, 1], "there")
  14. worksheet.write([1, 2], 1
  15.  
  16. workbook.close()

As you can see in my example the [x,y] represents the row, column in the tab delimited text file.

I want my code to be generic so it can handle any tab delimited file of unknowN entries. I think I need to use functions but I have no idea how to.

Sample code would be great! Awaiting your earliest reply.
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Join Date: Mar 2007
Posts: 110
Reputation: solsteel is on a distinguished road 
Solved Threads: 31
solsteel solsteel is offline Offline
Junior Poster

Re: How do I recognise strings and numbers

 
0
  #5
Sep 12th, 2008
Create another function to test for for boole value strings. Placing values in columns and rows is very easy in Python using enumerate().
  1. import time
  2.  
  3. def evalBoole(s):
  4. if s in ('True', 'False'):
  5. return eval(s)
  6. else:
  7. raise ValueError
  8.  
  9. def evalDateStr(s):
  10. # return a struct_time object given a string representation
  11. # changed it to day month year
  12. return time.strptime(s, '%d/%m/%Y')
  13.  
  14. def convertType(s):
  15. for func in (int, float, evalBoole, evalDateStr):
  16. try:
  17. n = func(s)
  18. return n
  19. except:
  20. pass
  21. return s
  22.  
  23. #changed to read in file line by line
  24. print "\nLooping through the file, line by line."
  25. f = open("omg.txt", "r")
  26. for i, line in enumerate(f):
  27. for j, obj in enumerate([convertType(item) for item in
  28. [w.strip() for w in line.strip().split('\t')]
  29. ]):
  30. print 'Row %d, Column %d, Value: %s' % (i,j,obj)
  31.  
  32. f.close()
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,266
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 377
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: How do I recognise strings and numbers

 
0
  #6
Sep 12th, 2008
Thanks that is what I was thinking about booleans!

However, the next bit that I am stuck on is in my post #4.

I need to take the tab file and convert it to an excel file.

  1. import pyXLWriter as xl
  2. import datetime
  3.  
  4. # Create a new workbook called simple.xls and add a worksheet
  5. workbook = xl.Writer("simple.xls")
  6. worksheet = workbook.add_worksheet()
  7.  
  8. worksheet.write([0, 0], "hello")
  9. worksheet.write([0, 1], 2.3)
  10. worksheet.write([0, 2], datetime.date(2008,02,01))
  11. worksheet.write([1, 0], 2.0)
  12. worksheet.write([1, 1], "there")
  13. worksheet.write([1, 2], 1
  14.  
  15. workbook.close()

Here I have the code to write to excel using the module pyXLWriter but it is hard coded. What I want is it to read the tab file then choose what the variable is, what x,y position is it and write it to excel. so can you help me do this? I don't know how to do this? Thank you for your help so far.
Last edited by iamthwee; Sep 12th, 2008 at 1:44 pm.
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Join Date: Mar 2007
Posts: 110
Reputation: solsteel is on a distinguished road 
Solved Threads: 31
solsteel solsteel is offline Offline
Junior Poster

Re: How do I recognise strings and numbers

 
1
  #7
Sep 12th, 2008
I thought my use of enumerate() would give you enough information. Here's the code again with code to write to the Excel file (untested):
  1. print "\nLooping through the file, line by line."
  2. workbook = xl.Writer("simple.xls")
  3. worksheet = workbook.add_worksheet()
  4. f = open("omg.txt", "r")
  5. for i, line in enumerate(f):
  6. for j, obj in enumerate([convertType(item) for item in
  7. [w.strip() for w in line.strip().split('\t')]
  8. ]):
  9. worksheet.write([i, j], obj)
  10.  
  11. workbook.close()
  12. f.close()
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,266
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 377
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: How do I recognise strings and numbers

 
0
  #8
Sep 13th, 2008
No I can't seem to get it to work! I tried two tests.

import pyXLWriter as xl
import time
import datetime


workbook  = xl.Writer("simple555.xls")
worksheet = workbook.add_worksheet()

def evalBoole(s):
    if s in ('True', 'False'):
        return eval(s)
    else:
        raise ValueError

def evalDateStr(s):
    # return a struct_time object given a string representation
    # changed it to day month year
    return time.strptime(s, '%d/%m/%Y')

def convertType(s):
    for func in (int, float, evalBoole, evalDateStr):
        try:
            n = func(s)
            return n
        except:
            pass
    return s

#changed to read in file line by line
print "\nLooping through the file, line by line."
f = open("omg.txt", "r")
for i, line in enumerate(f):
    for j, obj in enumerate([convertType(item) for item in 
                             [w.strip() for w in line.strip().split('\t')]
                             ]):
		#print 'Row %d, Column %d, Value: %s' % (i,j,obj)
            worksheet.write([i, j], datetime.date(2008,02,01))

f.close()
workbook.close()

That works when you explicitly give it a value, either date, string or intger/float

But it doesn't work when I do

import pyXLWriter as xl
import time
import datetime


workbook  = xl.Writer("simple555.xls")
worksheet = workbook.add_worksheet()

def evalDateStr(s):
    # return a struct_time object given a string representation
    # changed it to day month year
    return time.strptime(s, '%d/%m/%Y')

def convertType(s):
    for func in (int, float, evalDateStr):
        try:
            n = func(s)
            return n
        except:
            pass
    return s

#changed to read in file line by line
print "\nLooping through the file, line by line."
f = open("omg.txt", "r")
for i, line in enumerate(f):
    for j, obj in enumerate([convertType(item) for item in 
                             [w.strip() for w in line.strip().split('\t')]
                             ]):
		#print 'Row %d, Column %d, Value: %s' % (i,j,obj)
            worksheet.write([i, j], obj)

f.close()
workbook.close()
it fails, it doesn't even write to excel file!!!!


Do I need to create a function such as:
  1. if obj = str then
  2. s = str(obj)
  3. worksheet.write ([i,j], s)
  4.  
  5. else if obj = integer then
  6. i = int(obj)
  7. worksheet.write([i,j], i)
  8.  
  9. else if obj = date then
  10. a = split date by commas =day
  11. b = split date by commas = month
  12. c = split date by commas = year
  13.  
  14. worksheet.write([i,j], datetime.date(c,b,a))
  15. end

If so how do I do this. I don't know anything about python!!!! THanks I'm almost finished.
Last edited by iamthwee; Sep 13th, 2008 at 11:37 am.
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Join Date: Mar 2007
Posts: 110
Reputation: solsteel is on a distinguished road 
Solved Threads: 31
solsteel solsteel is offline Offline
Junior Poster

Re: How do I recognise strings and numbers

 
0
  #9
Sep 13th, 2008
You should run some tests on different types of data to see what is failing. Did you get any error messages? I would guess that it is failing on the date. Try this:
  1. for i, line in enumerate(f):
  2. for j, obj in enumerate([convertType(item) for item in
  3. [w.strip() for w in line.strip().split('\t')]
  4. ]):
  5. if isinstance(obj, time.struct_time):
  6. worksheet.write([i, j], datetime.date(*obj[:3]))
  7. else:
  8. worksheet.write([i, j], obj)
Reply With Quote Quick reply to this message  
Join Date: Aug 2005
Posts: 5,266
Reputation: iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold iamthwee is a splendid one to behold 
Solved Threads: 377
Featured Poster
iamthwee's Avatar
iamthwee iamthwee is offline Offline
Posting Expert

Re: How do I recognise strings and numbers

 
0
  #10
Sep 14th, 2008
  1. import pyXLWriter as xl
  2. import time
  3. import datetime
  4.  
  5.  
  6. workbook = xl.Writer("simple555.xls")
  7. worksheet = workbook.add_worksheet()
  8.  
  9. def evalBoole(s):
  10. if s in ('TRUE', 'FALSE'):
  11. return eval(s)
  12. else:
  13. raise ValueError
  14.  
  15. def evalDateStr(s):
  16. # return a struct_time object given a string representation
  17. # changed it to day month year
  18. return time.strptime(s, '%d/%m/%Y')
  19.  
  20. def convertType(s):
  21. for func in (int, float, evalBoole, evalDateStr):
  22. try:
  23. n = func(s)
  24. return n
  25. except:
  26. pass
  27. return s
  28.  
  29. #changed to read in file line by line
  30. print "\nLooping through the file, line by line."
  31. f = open("omg.txt", "r")
  32. for i, line in enumerate(f):
  33. for j, obj in enumerate([convertType(item) for item in
  34. [w.strip() for w in line.strip().split('\t')]
  35. ]):
  36. if isinstance(obj, time.struct_time):
  37. worksheet.write([i, j], datetime.date(*obj[:3]))
  38. else:
  39. worksheet.write([i, j], obj)
  40. f.close()
  41. workbook.close()

Cool! Thank you so much it works now, it writes to the excel file (2000 & 2003 format) as I expect. However, it still comes up with some error messages and I am not quite sure what they mean, or what it might affect:

command prompt dump
  1. Looping through the file, line by line.
  2. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:135: DeprecationWarning: s
  3. truct integer overflow masking is deprecated
  4. unknown3 = pack("<H", -2)
  5. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:135: DeprecationWarning: '
  6. H' format requires 0 <= number <= 65535
  7. unknown3 = pack("<H", -2)
  8. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:141: DeprecationWarning: s
  9. truct integer overflow masking is deprecated
  10. sbd_startblock = pack("<L", -2)
  11. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:142: DeprecationWarning: s
  12. truct integer overflow masking is deprecated
  13. unknown7 = pack("<LLL", 0x00, -2 ,0x00)
  14. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:143: DeprecationWarning: s
  15. truct integer overflow masking is deprecated
  16. unused = pack("<L", -1)
  17. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:205: DeprecationWarning: s
  18. truct integer overflow masking is deprecated
  19. pps_prev = pack("<L", -1) #0x44
  20. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:206: DeprecationWarning: s
  21. truct integer overflow masking is deprecated
  22. pps_next = pack("<L", -1) #0x48
  23. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:213: DeprecationWarning: s
  24. truct integer overflow masking is deprecated
  25. pps_sb = pack("<L", sb) #0x74
  26. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:207: DeprecationWarning: s
  27. truct integer overflow masking is deprecated
  28. pps_dir = pack("<L", dir) #0x4c
  29. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:168: DeprecationWarning: s
  30. truct integer overflow masking is deprecated
  31. marker = pack("<L", -3)
  32. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:169: DeprecationWarning: s
  33. truct integer overflow masking is deprecated
  34. end_of_chain = pack("<L", -2)
  35. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:170: DeprecationWarning: s
  36. truct integer overflow masking is deprecated
  37. unused = pack("<L", -1)

Also I wanted to ask what does *obj[:3]) mean?
Last edited by iamthwee; Sep 14th, 2008 at 9:22 am.
*Voted best profile in the world*
Reply With Quote Quick reply to this message  
Reply

This thread is more than three months old.
Perhaps start a new thread instead?
Message:


Thread Tools Search this Thread



About Us | Contact Us | Advertise | DaniWeb | Acceptable Use Policy | RSS Feed

©2003 - 2009 DaniWeb® LLC