954,525 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?
Have something to say? Contribute New Article Reply to this Article

Editing CSV files

I am new to programming and have been reading a few books on Pyhton but can't seem to grasp what I believe is probably a simple process. I need someone to show me briefly a sample script that would allow me to open a CSV file and edit it line by line. This is an example of part of the CSV file I have.

Links:

0. blog ==> http://blog.example.com/
1. site map ==> http://www.example.com/site-map
2. customer log in ==> http://www.example.com/customer-login
3. ==> http://www.example.com/
4. ==> javascript:openChat('http://server.iad.liveperson.net/hc/53450382/?cmd=file&file=visitorWantsToChat&site=53450382&byhref=1')
5. Products ==> http://www.example.com/products
6. Services ==> http://www.example.com/services
7. Become a Customer ==> http://www.example.com/become-a-customer
8. Contact Us ==> http://www.example.com/contact
9. About Us ==> http://www.example.com/about
10. Learning Center ==> http://www.example.com/LearningCenter


I want to remove certain things from each line, for example the numbers at the beginning of each line, entire lines that have certain words, etc. This is a very short part of the entire file but from this all I would want left is a new file containing this:

http://www.example.com/products

I can't figure out how to create a script that would weed through a CSV file and remove portions of lines or entire lines depending on the content.

Any help in the right direction would be much appreciated. Thanks.

lstensland
Newbie Poster
9 posts since May 2010
Reputation Points: 10
Solved Threads: 0
 

Your datas aren't in a csv format...
For csv, here is a simple example to read and write datas

import csv

# This class is to define another csv format if you need to
class excel_french(csv.Dialect):
    delimiter=';'
    quotechar='"'
    doublequote=True
    skipinitialspace=False
    lineterminator='\n'
    quoting=csv.QUOTE_MINIMAL

fic='yourfile.csv'
outcsvfic='out.csv'

csv.register_dialect('excel_french', excel_french)

cr=csv.reader(open(fic,"r"), 'excel_french') # 'excel_french' is optionnal.
            # Only if you want to use another csv format than the default one
cw=csv.writer(open(outcsvfic,'w'), 'excel_french')

for row in cr:
    print row
    cw.writerow(row[:2])


More informations in the doc http://docs.python.org/library/csv.html

jice
Posting Whiz in Training
225 posts since Oct 2007
Reputation Points: 64
Solved Threads: 57
 

To take only links which contain products from your (not CSV) data:

for x in open('linklist.txt').readlines():
    _,_,link=x.partition(' ==> ')
    if 'products' in link: print link
pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

Your datas aren't in a csv format... For csv, here is a simple example to read and write datas

import csv

# This class is to define another csv format if you need to
class excel_french(csv.Dialect):
    delimiter=';'
    quotechar='"'
    doublequote=True
    skipinitialspace=False
    lineterminator='\n'
    quoting=csv.QUOTE_MINIMAL

fic='yourfile.csv'
outcsvfic='out.csv'

csv.register_dialect('excel_french', excel_french)

cr=csv.reader(open(fic,"r"), 'excel_french') # 'excel_french' is optionnal.
            # Only if you want to use another csv format than the default one
cw=csv.writer(open(outcsvfic,'w'), 'excel_french')

for row in cr:
    print row
    cw.writerow(row[:2])

More informations in the doc http://docs.python.org/library/csv.html

Thanks for your reply. I know this is basic stuff but seeing an example is the best way for me to learn how this all ties together. I will continue reading up on Python but this information has been a great help. Thanks.

lstensland
Newbie Poster
9 posts since May 2010
Reputation Points: 10
Solved Threads: 0
 

To take only links which contain products from your (not CSV) data:

for x in open('linklist.txt').readlines():
    _,_,link=x.partition(' ==> ')
    if 'products' in link: print link


Thanks for your reply. This is another piece of very helpful information. I appreciate you taking your time to help me with this.

lstensland
Newbie Poster
9 posts since May 2010
Reputation Points: 10
Solved Threads: 0
 

To take only links which contain products from your (not CSV) data:

for x in open('linklist.txt').readlines():
    _,_,link=x.partition(' ==> ')
    if 'products' in link: print link


Why do you "readlines()" ?
As far as i know, this loads the whole file in memory which is of no use...

for x in open('linklist.txt'):
    _,_,link=x.partition(' ==> ')
    if 'products' in link: print link

would do exactly the same

jice
Posting Whiz in Training
225 posts since Oct 2007
Reputation Points: 64
Solved Threads: 57
 

Just catched it from some newbie posters I answered recently, I do normally use in open(..)

pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

I have found that it may be more helpful for my situation to be able to delete several items from my links. From a sample link this:

http://www.testsite.com/site-map
http://www.testsite.com/customer-login
http://www.testsite.com/
http://www.testsite.com/products
http://www.testsite.com/services
http://www.testsite.com/become-a-customer
http://www.testsite.com/contact
http://www.testsite.com/about
http://www.testsite.com/LearningCenter
http://www.testsite.com/knivesandtools
http://www.testsite.com/knivesandtools
http://www.testsite.com/knives
http://www.testsite.com/knivesaccessories

I would like to be able to delete all links that have: site-map, customer-login, services, become-a-customer, contact, about, and LearningCenter. Any ideas?

lstensland
Newbie Poster
9 posts since May 2010
Reputation Points: 10
Solved Threads: 0
 

Like this:

from __future__ import print_function

not_these=['site-map',
           'customer-login',
           'services',
           'become-a-customer',
           'contact',
           'about',
           'LearningCenter']

for i in open('links.txt'):
    for nono in not_these:
        if nono in i:
##            print (i, 'has',nono)
            break
    else: print(i, end='')
pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

Like this:

from __future__ import print_function

not_these=['site-map',
           'customer-login',
           'services',
           'become-a-customer',
           'contact',
           'about',
           'LearningCenter']

for i in open('links.txt'):
    for nono in not_these:
        if nono in i:
##            print (i, 'has',nono)
            break
    else: print(i, end='')

Yes that's it. I just need this to be written to a csv file instead of printed but I will try to do that myself. Like I said, I am quite new. I have been through a few python books and am finding it difficult to grasp. I understand what I read and can create the examples in the book, but when it comes to creating a script on my own I have a hard time getting my ideas to actually work.

Do you have any suggestions you could offer as far as books or tutorials that may better explain Python or give more complete examples that I can learn from?

Thanks.

lstensland
Newbie Poster
9 posts since May 2010
Reputation Points: 10
Solved Threads: 0
 

Perhaps you could read the sticky thread Starting Python http://www.daniweb.com/forums/thread20774.html !

Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

2 things...First, and this will make life easier for what you want to do, you are NOT dealing with CSV files. These are plain text files which are easier to work with.
Second, I have been learning Python as well. I have spent (and in some cases wasted) money on several books. The 2 books that I have used most have been "Beginning Python, Using Python 2.6 and Python 3.1" (by James Payne, Wrox) and "Python: Essential Reference" (by David M Beazley, Addison Wesley). I start with the beginning book, read the first 3-4 chapters, then started writing and used the indexes of both to find things I need...I then ask here or other places if Google does not help.
I do know BASH scripting and have been trying to convert my knowledge to Python...

ennoil
Newbie Poster
12 posts since May 2009
Reputation Points: 10
Solved Threads: 0
 

The files I am using are CSV files, I open them with Microsoft Excel. When I post information on them I just copy the text from Excel and paste it here to show what information I am dealing with.

Thanks for the advice on the books, I will look into them right away.

lstensland
Newbie Poster
9 posts since May 2010
Reputation Points: 10
Solved Threads: 0
 

When I think of a CSV file, I look for something like:

"Column 1","Column 2","Column 3"

You gave us:

0. blog ==> http://blog.example.com/
1. site map ==> http://www.example.com/site-map

Unless this is one column from your CSV file then it can be treated like a normal text file. It would be easier to work with then. I am also not sure if the csv module would even recognize this as a CSV file (if it does, can you use "==>" as the delimiter???). Because of the text between the number and the ==>, you would have a really hard time using spaces as delimiters...

ennoil
Newbie Poster
12 posts since May 2009
Reputation Points: 10
Solved Threads: 0
 

When I think of a CSV file, I look for something like:

"Column 1","Column 2","Column 3"

You gave us:

0. blog ==> http://blog.example.com/ 1. site map ==> http://www.example.com/site-map

Unless this is one column from your CSV file then it can be treated like a normal text file. It would be easier to work with then. I am also not sure if the csv module would even recognize this as a CSV file (if it does, can you use "==>" as the delimiter???). Because of the text between the number and the ==>, you would have a really hard time using spaces as delimiters...


I think you might be right. I have a script that pulls links and I have told it to save as "example.csv". I get a list from top to bottom of links. These are not separated by commas but the file is ".CSV" and is opened with Excel.

You are right about using "==>" as the delimiter. Someone replied earlier with a script that uses this and the script works. The reason I am saving as .csv is because the final output will be a CSV file like the example I gave earlier and will be a database that I will upload to a website. I am going to have certain info in certain cells in an Excel sheet that will need to be modified before the upload. I think it will be more organized and easier to use if I can say for example, "Append column C to column E" or "Multiply column d by 1.25".

While it is not actually separated by commas, if I were to manually fill in an Excel sheet with the information and select "Save as CSV", the output will be the same.

lstensland
Newbie Poster
9 posts since May 2010
Reputation Points: 10
Solved Threads: 0
 

I think you might be right. I have a script that pulls links and I have told it to save as "example.csv". I get a list from top to bottom of links. These are not separated by commas but the file is ".CSV" and is opened with Excel.

You are right about using "==>" as the delimiter. Someone replied earlier with a script that uses this and the script works. The reason I am saving as .csv is because the final output will be a CSV file like the example I gave earlier and will be a database that I will upload to a website. I am going to have certain info in certain cells in an Excel sheet that will need to be modified before the upload. I think it will be more organized and easier to use if I can say for example, "Append column C to column E" or "Multiply column d by 1.25".

While it is not actually separated by commas, if I were to manually fill in an Excel sheet with the information and select "Save as CSV", the output will be the same.

lstensland
Newbie Poster
9 posts since May 2010
Reputation Points: 10
Solved Threads: 0
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: