Hi, People...

I have a question about reading xls files in python then parsing it to sqlDatabase.

The thing is, that the xls file(pricelist) we as a company use are ordered and it's pretty easy to sort that out. The problem comes with xls files from other companies.
(We are a laptop repair/sales company.)

Example:

Our Pricelist have names for each col_index and each row_index contain the specs(info) for the laptop

-Make--------Model------Price------Bla
-HP ---------DV6000-----R10 -------bla
-Acer--------TM8100-----R20 -------bla

The other companies... don't

Theirs would have 3 cols, or more and rows would differ aswell asin
it looks like a freaking news article.... So it comes down to this:

Taking all the pricelists and adding them to our main pricelist, but that would require someone todo that every so day or often... and I need to cancel that from happening.

So the question is, how would I start to parse the pricelists... retrieving all the correct information regarding each laptop, and sorting them into the main database(this part is simple).

Speed is not that important for us, regarding the parsing.
It's just getting the information from the unordered xls files and parsing them to be sorted into one db.

Anyone have any ideas?

Recommended Answers

All 9 Replies

Looks like csv maybe an good option, fine tuning the .split() may work just fine with some conditions...

How would should i store multi-lines from csv so that the "end" product still remain the same:

Like.

This is the currect print-out:

['', '']
['@R850-14K@$@Tecra@$$R 10']
['$@15.6\xe2\x80\x9d Toshiba HD TruBrite LED backlight Display@$$@3YR ON-SITE@']
['$@Intel Core i7 2620M 2.70 / 3.40Turbo 4MB 1333Mhz@$$']
['$@4GB DDR3']
['$@Multi Card Reader']
['$@AMD Radeon 1GB / 3G@$$']
['$@Windows 7Pro 64 bit@$$']
['$@Battery life: up to 6h00min@$$']
['$@Weight: 2.4kg@$$']
['', '']

this is the code to get to it, now I know it's messy and some stuff maybe wrong and so but it's changing almost every run-time.. so meh..

import csv

#book = raw_input("Enter file name (*add .csv or .xls): ")

reader= csv.reader(open('Toshiba.csv', 'rb'), delimiter=',', quotechar='"')#, quoting=csv.QUOTE_MINIMAL)
list = []

for row in reader:
    list.append(', '.join(row).split('"'))
    print row[0].split('$$$')


for i in list:
    #first = i[0].split("$$")
    #second = first[0].split(',')
    #print second

    if 'GB' in i[0]:
        x= i[0].split('$')
        #x.sort()
        #print x[0].split("'")
        #z = tuple(x)
        c = x[0].split('@')
        c1 = x[1].split('@')
        #c.sort()
        #print c[2].split(',')
        #print c[2]
        #if 'i7' in c[2]:
        #print x
        v = c
        if 'i7' in v:
            pass 
        elif 'i5' in v:
            pass 
        elif 'i3' in v:
            pass
        else:
            #print c
            cc = c1[1].split("'")
            cc1 = cc[0].split(', ')
            #print cc1[0]



        #for o in c:

            #print c[1].split(',')

anyway I want that bit to be able to be stored as one "thing" asin the laptop with its specs and not specs from another..

if i change the last print in the else: print cc1[0] (un # it):
i can have seperate outputs like:

4GB DDR3
2GB DDR3
2GB DDR3
4GB DDR3
4GB DDR3
AMD dedicated 1GB
4GB DDR3
AMD dedicated 1GB
4GB DDR3
AMD dedicated 1GB
6GB (4+2) DDR3
AMD 2GB
4GB DDR3

But thats unsorted... this ^ way works fine for the bits from the xls->csv file with all the specs in one line... its just the multiline specs thats shitty

Got it figured out and working perfectly ... So i guess this thread is "Solved" although it was a major waste of time to even post here anyways....

Nice to hear about your triumpf!
Maybe it was good lesson for you to formulate clearer question, and to make it possible for others to run your code, next time. If you want to be nice to others proper Daniquette is to post your solution, if it is free to share. You might get some improvement ideas. Without suitable data file it is, however, difficult to be helpful.

It is good to hear that you are planning to mark the thread solved. Much appreciated by the moderation team!

Thanks... Here is the Working script.

BTW pyTony: My question was pretty obvious... anyway

The script:

#!/usr/bin/python
# Coded by MJ-meo-dmt(Acidz)

import sys, os


from xlrd import open_workbook,cellname
book = open_workbook('Toshiba.xls')
sheet = book.sheet_by_index(0)
print sheet.name, "name"
print sheet.nrows, "rows"
print sheet.ncols, "cols"



products = []
# Get the product keys and save them
for row_index in range(sheet.nrows):

    firstcol = sheet.cell(row_index, 0).value

    if firstcol:
        if firstcol != ' ':

            if firstcol != '':

                if '-' in firstcol:
                    if 'Terms and Conditions Apply' in firstcol:
                        pass

                    else:
                        products.append(firstcol)



# Construct the products
Mainlist = []
sublist = []
var = ''
price = str(0.0)
for row in range(sheet.nrows):

    if sheet.cell(row, 0).value in products:
        #print var
        if var == '':
            pass
        else:
            var += ' ' + price
            sublist.append(var)
            Mainlist.append(sublist)
            var = ''

        for p in range(1, 3):

            price = str(sheet.cell(row, 3).value)
            var += sheet.cell(row, 0).value

            if sheet.cell(row, p).value == '':
                pass 

            else:

                var += ', ' + sheet.cell(row, p).value

    else:
        for r in range(1, 3):

            if sheet.cell(row, r).value == '':
                pass 
            else:

                var += ', ' + sheet.cell(row, r).value



for i in range(len(Mainlist[0])):
    print Mainlist[0][i]

Of course without the data file we are completely in dark, but here is how I would clean up your code, hope my trasformations are correct.

#!/usr/bin/python
# Coded by MJ-meo-dmt(Acidz)
from xlrd import open_workbook,cellname

book = open_workbook('Toshiba.xls')
sheet = book.sheet_by_index(0)
print sheet.name, "name"
print sheet.nrows, "rows"
print sheet.ncols, "cols"
products = []
# Get the product keys and save them
for row_index in range(sheet.nrows):
    firstcol = sheet.cell(row_index, 0).value
    if (firstcol and  '-' in firstcol and
        'Terms and Conditions Apply' not in firstcol):
                products.append(firstcol)

# Construct the products
main_list = []
sublist = []
var = ''
price = str(0.0)
for row in range(sheet.nrows):
    if sheet.cell(row, 0).value in products:
        #print var
        if var:
            var += ' ' + price
            sublist.append(var)
            main_list.append(sublist)
            var = '' 
        for p in range(1, 3):
            price = str(sheet.cell(row, 3).value)
            var += sheet.cell(row, 0).value
            if sheet.cell(row, p).value:
                var += ', ' + sheet.cell(row, p).value
    else:
        for r in range(1, 3):
            if sheet.cell(row, r).value:
                var += ', ' + sheet.cell(row, r).value


print '\n'.join(main_list[0])

Oh I see nice... yea I never really got into the or/and... I always made some little screw-up and it seemed not to work (for me :P) anyway yea here is the data file..

:http://dl.dropbox.com/u/45416451/Ecliptic/toshiba.tar.gz - Sorry there was a limit on DaniWeb so i used dropbox, and its a xls file in a tar

I changed last print to do

print >>open('out1.csv', 'w'),'\n'.join(main_list[0]).encode('utf8')

printing to file (different filenames, of course) and windiff said the files to be identical. So for me I did OK changing your code.

I see, it's Fine thanks for the ideas... Yea I'll have to make the script add the data to an Sql_database later on. But all is good now. Thanks for the help/ideas

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.