I am new to programming and have been reading a few books on Pyhton but can't seem to grasp what I believe is probably a simple process. I need someone to show me briefly a sample script that would allow me to open a CSV file and edit it line by line. This is an example of part of the CSV file I have.

Links:

0. blog ==> http://blog.example.com/
1. site map ==> http://www.example.com/site-map
2. customer log in ==> http://www.example.com/customer-login
3. ==> http://www.example.com/
4. ==> javascript:openChat('http://server.iad.liveperson.net/hc/53450382/?cmd=file&file=visitorWantsToChat&site=53450382&byhref=1')
5. Products ==> http://www.example.com/products
6. Services ==> http://www.example.com/services
7. Become a Customer ==> http://www.example.com/become-a-customer
8. Contact Us ==> http://www.example.com/contact
9. About Us ==> http://www.example.com/about
10. Learning Center ==> http://www.example.com/LearningCenter


I want to remove certain things from each line, for example the numbers at the beginning of each line, entire lines that have certain words, etc. This is a very short part of the entire file but from this all I would want left is a new file containing this:

http://www.example.com/products

I can't figure out how to create a script that would weed through a CSV file and remove portions of lines or entire lines depending on the content.

Any help in the right direction would be much appreciated. Thanks.

Recommended Answers

All 15 Replies

Your datas aren't in a csv format...
For csv, here is a simple example to read and write datas

import csv

# This class is to define another csv format if you need to
class excel_french(csv.Dialect):
    delimiter=';'
    quotechar='"'
    doublequote=True
    skipinitialspace=False
    lineterminator='\n'
    quoting=csv.QUOTE_MINIMAL

fic='yourfile.csv'
outcsvfic='out.csv'

csv.register_dialect('excel_french', excel_french)

cr=csv.reader(open(fic,"r"), 'excel_french') # 'excel_french' is optionnal.
            # Only if you want to use another csv format than the default one
cw=csv.writer(open(outcsvfic,'w'), 'excel_french')

for row in cr:
    print row
    cw.writerow(row[:2])

More informations in the doc
http://docs.python.org/library/csv.html

To take only links which contain products from your (not CSV) data:

for x in open('linklist.txt').readlines():
    _,_,link=x.partition(' ==> ')
    if 'products' in link: print link

Your datas aren't in a csv format...
For csv, here is a simple example to read and write datas

import csv

# This class is to define another csv format if you need to
class excel_french(csv.Dialect):
    delimiter=';'
    quotechar='"'
    doublequote=True
    skipinitialspace=False
    lineterminator='\n'
    quoting=csv.QUOTE_MINIMAL

fic='yourfile.csv'
outcsvfic='out.csv'

csv.register_dialect('excel_french', excel_french)

cr=csv.reader(open(fic,"r"), 'excel_french') # 'excel_french' is optionnal.
            # Only if you want to use another csv format than the default one
cw=csv.writer(open(outcsvfic,'w'), 'excel_french')

for row in cr:
    print row
    cw.writerow(row[:2])

More informations in the doc
http://docs.python.org/library/csv.html

Thanks for your reply. I know this is basic stuff but seeing an example is the best way for me to learn how this all ties together. I will continue reading up on Python but this information has been a great help. Thanks.

To take only links which contain products from your (not CSV) data:

for x in open('linklist.txt').readlines():
    _,_,link=x.partition(' ==> ')
    if 'products' in link: print link

Thanks for your reply. This is another piece of very helpful information. I appreciate you taking your time to help me with this.

To take only links which contain products from your (not CSV) data:

for x in open('linklist.txt').readlines():
    _,_,link=x.partition(' ==> ')
    if 'products' in link: print link

Why do you "readlines()" ?
As far as i know, this loads the whole file in memory which is of no use...

for x in open('linklist.txt'):
    _,_,link=x.partition(' ==> ')
    if 'products' in link: print link

would do exactly the same

Just catched it from some newbie posters I answered recently, I do normally use in open(..)

Like this:

from __future__ import print_function

not_these=['site-map',
           'customer-login',
           'services',
           'become-a-customer',
           'contact',
           'about',
           'LearningCenter']

for i in open('links.txt'):
    for nono in not_these:
        if nono in i:
##            print (i, 'has',nono)
            break
    else: print(i, end='')

Like this:

from __future__ import print_function

not_these=['site-map',
           'customer-login',
           'services',
           'become-a-customer',
           'contact',
           'about',
           'LearningCenter']

for i in open('links.txt'):
    for nono in not_these:
        if nono in i:
##            print (i, 'has',nono)
            break
    else: print(i, end='')

Yes that's it. I just need this to be written to a csv file instead of printed but I will try to do that myself. Like I said, I am quite new. I have been through a few python books and am finding it difficult to grasp. I understand what I read and can create the examples in the book, but when it comes to creating a script on my own I have a hard time getting my ideas to actually work.

Do you have any suggestions you could offer as far as books or tutorials that may better explain Python or give more complete examples that I can learn from?

Thanks.

2 things...First, and this will make life easier for what you want to do, you are NOT dealing with CSV files. These are plain text files which are easier to work with.
Second, I have been learning Python as well. I have spent (and in some cases wasted) money on several books. The 2 books that I have used most have been "Beginning Python, Using Python 2.6 and Python 3.1" (by James Payne, Wrox) and "Python: Essential Reference" (by David M Beazley, Addison Wesley). I start with the beginning book, read the first 3-4 chapters, then started writing and used the indexes of both to find things I need...I then ask here or other places if Google does not help.
I do know BASH scripting and have been trying to convert my knowledge to Python...

The files I am using are CSV files, I open them with Microsoft Excel. When I post information on them I just copy the text from Excel and paste it here to show what information I am dealing with.

Thanks for the advice on the books, I will look into them right away.

When I think of a CSV file, I look for something like:

"Column 1","Column 2","Column 3"

You gave us:

0. blog ==> http://blog.example.com/
1. site map ==> http://www.example.com/site-map

Unless this is one column from your CSV file then it can be treated like a normal text file. It would be easier to work with then. I am also not sure if the csv module would even recognize this as a CSV file (if it does, can you use "==>" as the delimiter???). Because of the text between the number and the ==>, you would have a really hard time using spaces as delimiters...

When I think of a CSV file, I look for something like:

"Column 1","Column 2","Column 3"

You gave us:

0. blog ==> http://blog.example.com/
1. site map ==> http://www.example.com/site-map

Unless this is one column from your CSV file then it can be treated like a normal text file. It would be easier to work with then. I am also not sure if the csv module would even recognize this as a CSV file (if it does, can you use "==>" as the delimiter???). Because of the text between the number and the ==>, you would have a really hard time using spaces as delimiters...

I think you might be right. I have a script that pulls links and I have told it to save as "example.csv". I get a list from top to bottom of links. These are not separated by commas but the file is ".CSV" and is opened with Excel.

You are right about using "==>" as the delimiter. Someone replied earlier with a script that uses this and the script works. The reason I am saving as .csv is because the final output will be a CSV file like the example I gave earlier and will be a database that I will upload to a website. I am going to have certain info in certain cells in an Excel sheet that will need to be modified before the upload. I think it will be more organized and easier to use if I can say for example, "Append column C to column E" or "Multiply column d by 1.25".

While it is not actually separated by commas, if I were to manually fill in an Excel sheet with the information and select "Save as CSV", the output will be the same.

I think you might be right. I have a script that pulls links and I have told it to save as "example.csv". I get a list from top to bottom of links. These are not separated by commas but the file is ".CSV" and is opened with Excel.

You are right about using "==>" as the delimiter. Someone replied earlier with a script that uses this and the script works. The reason I am saving as .csv is because the final output will be a CSV file like the example I gave earlier and will be a database that I will upload to a website. I am going to have certain info in certain cells in an Excel sheet that will need to be modified before the upload. I think it will be more organized and easier to use if I can say for example, "Append column C to column E" or "Multiply column d by 1.25".

While it is not actually separated by commas, if I were to manually fill in an Excel sheet with the information and select "Save as CSV", the output will be the same.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.