We get files from a unix system that are delimited with linefeed only, this is not a problem. The problem is that within some of the fields themselves, there are carraige controls ("\r\n"). Reading the file using OS will see a row like this and stop reading at the crlf.
I am trying to figure out how to A) Read to the end of the record regardless of crlf's, and B) if I do encounter a field that has one, remove it, and C) write that row back out in a proper format windows (and SQL) can use by replacing the newlines with CRLF's.
This is the code I'm using:
import csv class MyDialect(csv.excel): lineterminator = "\n" csv.register_dialect("myDialect", MyDialect) cr = csv.reader(open("data.csv","rb"), dialect = "myDialect") cw = csv.writer(open("clean_data.csv", "wb")) crlf = '\r\n' for row in cr: for col in row: if crlf in col: #col.replace("\r\n", "") <-- didn't work col = col.rstrip() cw.writerow(row) print "Finished"
I tried (delimiter = '\n') without any luck either. Is there any way to get Python to ignore CRLF's all together?