Hi guys,

I have a program where I need to load csv files. I have the problem where some loaded csv files have a comma as delimiter and other csv files have semicolon delimiters. For the functionality of the bigger program I want to use the load function and when a csv file has a comma as delimiter I want the function to replace the comma's by semicolons.

Basically I want to load a file, check if there are comma's and replace them for semicolon's and let the load function continue so that no matter what delimiter is used when making the csv file, my program allways 'see's' the csv file with semicolon delimiters.

Any idea's on how I can do that?

Here's my code:

import csv
import string

class Grid(object):
	def __init__(self, width, height):
		self.grid = []
		self.array = []
		self.width = width
		self.height = height
		self.length = width * height
		for x in range(self.width):
			col = []
			for y in range(self.height):
				cell = Cell(x, y, self)
				col.append(cell)
				self.array.append(cell)
			self.grid.append(col)
	
	def __getitem__(self, key):
		if hasattr(key, '__len__'):
			return self.grid[key[0]][key[1]]
		else:
			return self.array[key]
	
	def __len__(self):
		return len(self.array)
	
	def load(cls, filename):
		loadGrid = []
		file = open(filename)
		for line in file:
			countChar = string.count(line, ',')
			print countChar
			if countChar > 0:
				newline = line.replace(',', ';')
				print newline
		reader = csv.reader(open(filename), delimiter=';')
		for line in reader:
			loadGrid.append(line)
		width = len(loadGrid[0])
		height = len(loadGrid)
		grid = Grid(width, height)
		for x in range(width):
			for y in range(height):
				grid[x, y].value = loadGrid[y][x]
		return grid
	load = classmethod(load)
	
	def printGrid(self):
		for y in range(len(self.grid[0])):
			for x in range(len(self.grid)):
				print self.grid[x][y].value,
			print
	
class Cell(object):
	def __init__(self, x, y, grid):
		self.x = x
		self.y = y
		self.grid = grid
		self.value = 0
		self.clusterId = -1
	
	def setClusterId(self, clusterId):
		self.clusterId = clusterId
	
	def getClusterId(self):
		return self.clusterId

target = Grid.load('csv_test_comma.csv')

You can specificy a different delimiter. The code below is from the official docs. But for an either/or situation, if you don't want to write your own routine, the easiest is probably reading each file, replacing a semicolon with a comma if found, and writing to a new file. Then use csv to process the new file.

import csv
spamReader = csv.reader(open('eggs.csv'), delimiter=' ', quotechar='|')
for row in spamReader:
     print ', '.join(row)
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam

This code assumes that:
a) you don't have data that contains a lot of commas and semicolons. In other words, when you look at these delimited files commas or semicolons are generally only seen because they are the delimiters and not in text strings.
b) that 256 in the .read() is an adequate sample size to capture a couple rows worth of data. If your records\rows are longer than I would up that value.

import csv
def returnDelimiter(fileObj):
    text = fileObj.read(256) # you can change this to whatever is appropriate for your data
    fileObj.seek(0) # get back to beginning of file for csv reader
    if text.count(',') > text.count(';'):
        return ','
    else:
        return ';'

fileObj = open('text.csv')
rows = csv.reader(fileObj, delimiter=returnDelimiter(fileObj))
for columns in rows:
    print columns

BTW, avoid using tabs for your indentations. Using 4 spaces makes your code more readable and avoids problems with other folks' editors

Thanks for the help all...

I added the returnDelimiter functionality to my load method. Now it works fine. Somehow the csv files with comma's werent working for another function I have in the bigger program. I could only usefully use that function when csv files were separated by semicolons.

With the new load function I can either use a comma or semicolon separated file and in both ways it works now. Thanks again :D

This article has been dead for over six months. Start a new discussion instead.