| | |
How do I recognise strings and numbers
Please support our Python advertiser: Programming Forums - DaniWeb Sister Site
![]() |
Let's say I have a text file separated by tabs containing stuff like:
How do I get python to recognise that 1.02 is a number, hello is a string and 01/02/2008 is a date.
The file has no pattern, i.e it won't always be number,string,date, it could be anything.
I'm thinking perhaps using regular expressions or something.
Waiting your earliest reply. Sample code would be great.
Python Syntax (Toggle Plain Text)
1.02 \t hello \t 01/02/2008
How do I get python to recognise that 1.02 is a number, hello is a string and 01/02/2008 is a date.
The file has no pattern, i.e it won't always be number,string,date, it could be anything.
I'm thinking perhaps using regular expressions or something.
Waiting your earliest reply. Sample code would be great.
Last edited by iamthwee; Sep 11th, 2008 at 2:59 pm.
*Voted best profile in the world*
•
•
Join Date: Mar 2007
Posts: 110
Reputation:
Solved Threads: 31
Decide what types of objects that need to be converted. If needed, create a function to make each conversion type. An exception should occur on any failed conversion. If all conversions fail, return a string. Example:
Output:
Python Syntax (Toggle Plain Text)
import time def fdim(s): # return a float given a fraction (EX. '1/2') ss = s.split('/') return float(ss[0])/float(ss[1]) def evalList(s): # return a list given a string representation if s.strip().startswith('[' ) and s.strip().endswith(']'): try: return eval(s) except: raise ValueError raise ValueError def evalDateStr(s): # return a struct_time object given a string representation return time.strptime(s, '%m/%d/%Y') def convertType(s): for func in (int, float, evalDateStr, fdim, evalList): try: n = func(s) return n except: pass return s s = "6 \t 1.02 \t hello \t 01/02/2008 \t [1,2,3] \t 15/16" for obj in [convertType(item) for item in [w.strip() for w in s.split('\t')]]: print '%s - object type: %s' % (obj, type(obj))
Python Syntax (Toggle Plain Text)
>>> 6 - object type: <type 'int'> 1.02 - object type: <type 'float'> hello - object type: <type 'str'> (2008, 1, 2, 0, 0, 0, 2, 2, -1) - object type: <type 'time.struct_time'> [1, 2, 3] - object type: <type 'list'> 0.9375 - object type: <type 'float'> >>>
Wow that's awesome, one more question though how do I detect type bools, such as
I changed your code a bit to test it and got rid of the bits I don't need.
omg.txt (tab delimited)
conv.py
output
As you can see the results are all correct, I checked it to see if it would be fooled by february having 31 days and it passed. But does it auto detect leap years?
I have one more question as well I will post later. Thanks very much for saving me time.
TRUE and FALSE I changed your code a bit to test it and got rid of the bits I don't need.
omg.txt (tab delimited)
Python Syntax (Toggle Plain Text)
0.0003 2.3 8 hello 000292 0.0.43 11/01/2008 13/13/1991 0 "343,4343" 0116 2346 6421-01-01 31/02/2008 "2.3" 01/02/08 1.223232223
conv.py
python Syntax (Toggle Plain Text)
import time def evalDateStr(s): # return a struct_time object given a string representation # changed it to day month year return time.strptime(s, '%d/%m/%Y') def convertType(s): for func in (int, float, evalDateStr ): try: n = func(s) return n except: pass return s #changed to read in file line by line print "\nLooping through the file, line by line." f = open("omg.txt", "r") for line in f: s = line for obj in [convertType(item) for item in [w.strip() for w in s.split('\t')]]: print '%s - object type: %s' % (obj, type(obj)) f.close()
output
Python Syntax (Toggle Plain Text)
Looping through the file, line by line. 0.0003 - object type: <type 'float'> 2.3 - object type: <type 'float'> 8 - object type: <type 'int'> hello - object type: <type 'str'> 292 - object type: <type 'int'> 0.0.43 - object type: <type 'str'> (2008, 1, 11, 0, 0, 0, 4, 11, -1) - object type: <type 'time.struct_time'> 13/13/1991 - object type: <type 'str'> 0 - object type: <type 'int'> "343,4343" - object type: <type 'str'> 0116 2346 - object type: <type 'str'> 6421-01-01 - object type: <type 'str'> 31/02/2008 - object type: <type 'str'> "2.3" - object type: <type 'str'> 01/02/08 - object type: <type 'str'> 1.223232223 - object type: <type 'float'>
As you can see the results are all correct, I checked it to see if it would be fooled by february having 31 days and it passed. But does it auto detect leap years?
I have one more question as well I will post later. Thanks very much for saving me time.
Last edited by iamthwee; Sep 12th, 2008 at 5:38 am.
*Voted best profile in the world*
Part two.
Let's assume I have a tab delimited file such as
test.txt
I want to have my python script read it in.
As you can see in my example the
I want my code to be generic so it can handle any tab delimited file of unknowN entries. I think I need to use functions but I have no idea how to.
Sample code would be great! Awaiting your earliest reply.
Let's assume I have a tab delimited file such as
test.txt
Python Syntax (Toggle Plain Text)
hello 2.3 01/02/2008 2.0 there TRUE
I want to have my python script read it in.
python Syntax (Toggle Plain Text)
import pyXLWriter as xl import datetime # Create a new workbook called simple.xls and add a worksheet workbook = xl.Writer("simple.xls") worksheet = workbook.add_worksheet() worksheet.write([0, 0], "hello") worksheet.write([0, 1], 2.3) worksheet.write([0, 2], datetime.date(2008,02,01)) worksheet.write([1, 0], 2.0) worksheet.write([1, 1], "there") worksheet.write([1, 2], 1 workbook.close()
As you can see in my example the
[x,y] represents the row, column in the tab delimited text file.I want my code to be generic so it can handle any tab delimited file of unknowN entries. I think I need to use functions but I have no idea how to.
Sample code would be great! Awaiting your earliest reply.
*Voted best profile in the world*
•
•
Join Date: Mar 2007
Posts: 110
Reputation:
Solved Threads: 31
Create another function to test for for boole value strings. Placing values in columns and rows is very easy in Python using enumerate().
Python Syntax (Toggle Plain Text)
import time def evalBoole(s): if s in ('True', 'False'): return eval(s) else: raise ValueError def evalDateStr(s): # return a struct_time object given a string representation # changed it to day month year return time.strptime(s, '%d/%m/%Y') def convertType(s): for func in (int, float, evalBoole, evalDateStr): try: n = func(s) return n except: pass return s #changed to read in file line by line print "\nLooping through the file, line by line." f = open("omg.txt", "r") for i, line in enumerate(f): for j, obj in enumerate([convertType(item) for item in [w.strip() for w in line.strip().split('\t')] ]): print 'Row %d, Column %d, Value: %s' % (i,j,obj) f.close()
Thanks that is what I was thinking about booleans!
However, the next bit that I am stuck on is in my post #4.
I need to take the tab file and convert it to an excel file.
Here I have the code to write to excel using the module pyXLWriter but it is hard coded. What I want is it to read the tab file then choose what the variable is, what x,y position is it and write it to excel. so can you help me do this? I don't know how to do this? Thank you for your help so far.
However, the next bit that I am stuck on is in my post #4.
I need to take the tab file and convert it to an excel file.
Python Syntax (Toggle Plain Text)
import pyXLWriter as xl import datetime # Create a new workbook called simple.xls and add a worksheet workbook = xl.Writer("simple.xls") worksheet = workbook.add_worksheet() worksheet.write([0, 0], "hello") worksheet.write([0, 1], 2.3) worksheet.write([0, 2], datetime.date(2008,02,01)) worksheet.write([1, 0], 2.0) worksheet.write([1, 1], "there") worksheet.write([1, 2], 1 workbook.close()
Here I have the code to write to excel using the module pyXLWriter but it is hard coded. What I want is it to read the tab file then choose what the variable is, what x,y position is it and write it to excel. so can you help me do this? I don't know how to do this? Thank you for your help so far.
Last edited by iamthwee; Sep 12th, 2008 at 1:44 pm.
*Voted best profile in the world*
•
•
Join Date: Mar 2007
Posts: 110
Reputation:
Solved Threads: 31
I thought my use of enumerate() would give you enough information. Here's the code again with code to write to the Excel file (untested):
Python Syntax (Toggle Plain Text)
print "\nLooping through the file, line by line." workbook = xl.Writer("simple.xls") worksheet = workbook.add_worksheet() f = open("omg.txt", "r") for i, line in enumerate(f): for j, obj in enumerate([convertType(item) for item in [w.strip() for w in line.strip().split('\t')] ]): worksheet.write([i, j], obj) workbook.close() f.close()
No I can't seem to get it to work! I tried two tests.
That works when you explicitly give it a value, either date, string or intger/float
But it doesn't work when I do
it fails, it doesn't even write to excel file!!!!
Do I need to create a function such as:
If so how do I do this. I don't know anything about python!!!! THanks I'm almost finished.
import pyXLWriter as xl
import time
import datetime
workbook = xl.Writer("simple555.xls")
worksheet = workbook.add_worksheet()
def evalBoole(s):
if s in ('True', 'False'):
return eval(s)
else:
raise ValueError
def evalDateStr(s):
# return a struct_time object given a string representation
# changed it to day month year
return time.strptime(s, '%d/%m/%Y')
def convertType(s):
for func in (int, float, evalBoole, evalDateStr):
try:
n = func(s)
return n
except:
pass
return s
#changed to read in file line by line
print "\nLooping through the file, line by line."
f = open("omg.txt", "r")
for i, line in enumerate(f):
for j, obj in enumerate([convertType(item) for item in
[w.strip() for w in line.strip().split('\t')]
]):
#print 'Row %d, Column %d, Value: %s' % (i,j,obj)
worksheet.write([i, j], datetime.date(2008,02,01))
f.close()
workbook.close()That works when you explicitly give it a value, either date, string or intger/float
But it doesn't work when I do
import pyXLWriter as xl
import time
import datetime
workbook = xl.Writer("simple555.xls")
worksheet = workbook.add_worksheet()
def evalDateStr(s):
# return a struct_time object given a string representation
# changed it to day month year
return time.strptime(s, '%d/%m/%Y')
def convertType(s):
for func in (int, float, evalDateStr):
try:
n = func(s)
return n
except:
pass
return s
#changed to read in file line by line
print "\nLooping through the file, line by line."
f = open("omg.txt", "r")
for i, line in enumerate(f):
for j, obj in enumerate([convertType(item) for item in
[w.strip() for w in line.strip().split('\t')]
]):
#print 'Row %d, Column %d, Value: %s' % (i,j,obj)
worksheet.write([i, j], obj)
f.close()
workbook.close()Do I need to create a function such as:
Python Syntax (Toggle Plain Text)
if obj = str then s = str(obj) worksheet.write ([i,j], s) else if obj = integer then i = int(obj) worksheet.write([i,j], i) else if obj = date then a = split date by commas =day b = split date by commas = month c = split date by commas = year worksheet.write([i,j], datetime.date(c,b,a)) end
If so how do I do this. I don't know anything about python!!!! THanks I'm almost finished.
Last edited by iamthwee; Sep 13th, 2008 at 11:37 am.
*Voted best profile in the world*
•
•
Join Date: Mar 2007
Posts: 110
Reputation:
Solved Threads: 31
You should run some tests on different types of data to see what is failing. Did you get any error messages? I would guess that it is failing on the date. Try this:
Python Syntax (Toggle Plain Text)
for i, line in enumerate(f): for j, obj in enumerate([convertType(item) for item in [w.strip() for w in line.strip().split('\t')] ]): if isinstance(obj, time.struct_time): worksheet.write([i, j], datetime.date(*obj[:3])) else: worksheet.write([i, j], obj)
python Syntax (Toggle Plain Text)
import pyXLWriter as xl import time import datetime workbook = xl.Writer("simple555.xls") worksheet = workbook.add_worksheet() def evalBoole(s): if s in ('TRUE', 'FALSE'): return eval(s) else: raise ValueError def evalDateStr(s): # return a struct_time object given a string representation # changed it to day month year return time.strptime(s, '%d/%m/%Y') def convertType(s): for func in (int, float, evalBoole, evalDateStr): try: n = func(s) return n except: pass return s #changed to read in file line by line print "\nLooping through the file, line by line." f = open("omg.txt", "r") for i, line in enumerate(f): for j, obj in enumerate([convertType(item) for item in [w.strip() for w in line.strip().split('\t')] ]): if isinstance(obj, time.struct_time): worksheet.write([i, j], datetime.date(*obj[:3])) else: worksheet.write([i, j], obj) f.close() workbook.close()
Cool! Thank you so much it works now, it writes to the excel file (2000 & 2003 format) as I expect. However, it still comes up with some error messages and I am not quite sure what they mean, or what it might affect:
command prompt dump
Python Syntax (Toggle Plain Text)
Looping through the file, line by line. C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:135: DeprecationWarning: s truct integer overflow masking is deprecated unknown3 = pack("<H", -2) C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:135: DeprecationWarning: ' H' format requires 0 <= number <= 65535 unknown3 = pack("<H", -2) C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:141: DeprecationWarning: s truct integer overflow masking is deprecated sbd_startblock = pack("<L", -2) C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:142: DeprecationWarning: s truct integer overflow masking is deprecated unknown7 = pack("<LLL", 0x00, -2 ,0x00) C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:143: DeprecationWarning: s truct integer overflow masking is deprecated unused = pack("<L", -1) C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:205: DeprecationWarning: s truct integer overflow masking is deprecated pps_prev = pack("<L", -1) #0x44 C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:206: DeprecationWarning: s truct integer overflow masking is deprecated pps_next = pack("<L", -1) #0x48 C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:213: DeprecationWarning: s truct integer overflow masking is deprecated pps_sb = pack("<L", sb) #0x74 C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:207: DeprecationWarning: s truct integer overflow masking is deprecated pps_dir = pack("<L", dir) #0x4c C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:168: DeprecationWarning: s truct integer overflow masking is deprecated marker = pack("<L", -3) C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:169: DeprecationWarning: s truct integer overflow masking is deprecated end_of_chain = pack("<L", -2) C:\Python25\lib\site-packages\pyXLWriter\OLEWriter.py:170: DeprecationWarning: s truct integer overflow masking is deprecated unused = pack("<L", -1)
Also I wanted to ask what does
*obj[:3]) mean? Last edited by iamthwee; Sep 14th, 2008 at 9:22 am.
*Voted best profile in the world*
![]() |
Other Threads in the Python Forum
- Previous Thread: VolumeRendering.py
- Next Thread: Getting size of all files in folder, and finding out some offsets..
| Thread Tools | Search this Thread |
alarm app beginner cipher cmd cx-freeze data decimals development dictionary directory dynamic error examples feet file float format ftp function generator getvalue gui halp homework http images import input ip itunes java keycontrol leftmouse line linux list lists logging loop maintain maze millimeter module mouse mysqldb number numbers output parsing path port prime programming projects push py2exe pygame pyglet pyqt python queue random recursion schedule screensaverloopinactive script scrolledtext slicenotation sqlite ssh string strings sudokusolver table terminal text thread threading time tkinter tlapse tuple tutorial ubuntu unicode url urllib urllib2 variable variables ventrilo verify vigenere web webservice wikipedia windows wxpython xlwt






