Hello, I could use a spot of help today.

I am very new to python (n00b) and I have a rather specific Issue I am trying to resolve.

This problem can be solved using ArcGIS (for those who know it) I simply desire to know more about python so I want to solve it using python.

Problem: I have a data set that has some QA/QC issues that I am trying to locate. The data has an X and Y location value an ID value and an Elevation value. I have organized the data into a .csv file. I have successfully got the file to read in using the csv module. I essentially need to compare each Item in a list to every other item in the list. If there is a discrepancy I want that Item wrote to a new file. I ideally wish this program to run on a data set approx 60,000 entries.

I cannot get the program to Iterate through the list in the manner I desire so I could use some pointers on how logically that can be done. I also cannot get the output file to have more then 1 line of output.

Here is my code so far.

#This program will asses elevation in the following way.  The user will determine the
#file path, then the user will also determine the buffer area and a limit to the
#differnece in elevation.   Then the program will read, and assess
#the elevations for that buffer distance around each point.  The end result is
#a file that contains the suspect entries.

#Import Libraries

import string

import math

import fileinput

import os

import anydbm

import sys

import csv


print "Hello USER!  Please enter the filepath of the database you wish to check!"
filename = raw_input(">>>")
print "enter the size of the buffer you want (We are Using the Bottom of the well as the coordinate)"
pointbuff = raw_input(">>>")    

pointbuff= float(pointbuff)

print "enter the ammount of elevation tolerance (in feet)"

elevation = raw_input(">>>")

print '......Working'

elevation = float(elevation)
temp = list()
test = list()

x1 = float()
x2 = float()
y1 = float()
y2 = float()
elev1 = float()
elev2= float()    
count = int()
count = 0

#Test and export data function

def tester(x=float(),y = float(),elev = int(), g=list()):
    read1 = csv.reader(open(filename, 'rb'))
    for row in read1:
        test = row
        x2 = test[1]
        if x2 == "":
            x2 = 0
            x2 = float(x2)
        y2 = test[2]
        if y2 == "":
            y2 = 0
            y2= float(y2)

        elev2 = test[3]
        if elev2 == "":
            elev2 = 0
            elev2 = float(elev2)

        #test if other point is in buffer
        if x1 <= (x2 + pointbuff) or x1 >= (x2 - pointbuff) and y1 <= (y2 + pointbuff) or y1 >= (y2 - pointbuff):


        if ((abs(elev1)) - (abs(elev2))) >= elevation:
            f = open("G:\TESTOUTPUT.txt", 'w')

            A = g
            A = str(A)
            test = str(test)
            print A

def readit():
    read = csv.reader(open(filename, 'rb'))
    for row in read:
        temp = row
        x1 = temp[1]

        if x1 == "":
            x1 = 0
            x1 = float(x1)
        y1 = temp[2]
        if y1 == "":
            y1 = 0
            y1 = float(y1)

        elev1 = temp[3]
        if elev1 == "":
            elev1 = 0
            elev1 = float(elev1)



print 'DONE!'

here is some sample data I am using
30952 594635.24 200621.13 0
22500 598647.51 170513.26 -2191
31634 603679.17 171486.24 -2255
22734 606320.46 177526.04 -2459
22842 605912.11 177952.72 -2481
44528 688937.27 202291.1 -2780
44528 688937.27 202291.1 -2943
28509 613899.93 162957.46 -2041
44528 688937.27 202291.1 -2921
59268 594188.52 200371.53 -2971
59268 594188.52 200371.53 -2969
58687 593467.84 199357.6 -2937
58687 593467.84 199357.6 -2927
40672 592204.28 200947.08 -2961
40672 592204.28 200947.08 -2960
1111111 592204.28 202291.1 0

I appologize for the quality of the code but as I said before Im very new to this

Many thanks,

6 Years
Discussion Span
Last Post by Kraln00b

I essentially need to compare each Item in a list to every other item in the list.

Generally speaking, to compare you start with the first item in the list and compare to item #2, #3, etc. Then the same for the second item through n-1, as there is nothing to compare the last item to. A simple example to illustrate:

test_list = ["abc", "def", "abf", "abc", "abf", "xyz" ]

stop_y = len(test_list)
stop_x = stop_y -1
for x in range(0, stop_x):   ## stop at end-1
    ##  start at the next element and go through the end
    for y in range(x+1, stop_y):
        if test_list[x] == test_list[y]:
            print "2 are equal", test_list[x], x

Note that this brute force method can take some time with a large file (60,000 recs = 59,999 passes through the [smaller each time] list). A faster way is to use a set or dictionary indexed on the key element(s).

You should use a function instead of all of this code doing the same thing.

x2 = test[1]
        if x2 == "":
            x2 = 0
            x2 = float(x2)
        y2 = test[2]
        if y2 == "":
            y2 = 0
            y2= float(y2)
        elev2 = test[3]
        if elev2 == "":
            elev2 = 0
            elev2 = float(elev2)
##---------- use instead
def convert_arg(arg_in):
    """ return a float if arg_in will convert, otherwise return zero
        ret_arg = float(arg_in)
        return ret_arg
    except:  ## will catch anything, like "" that will not convert
        return 0

x2 = convert_arg(test[1])
y2 = convert_arg(test[2])
elev2 = convert_arg(test[3])

Edited by woooee: n/a


Thank you for the help! but I need the comparison to be different. for instance,

the list is


I need to compare A with ABCDE
then B with ABCDE
then C with ABCDE

I dont mind if the current selection compares to itself.

is there a way to do that simply?


I think your comparison of B with A has already been done when you compared A with B. The cost of doing this work is O(n^2), which is not small for your data set. If you can reduce it by a factor of 2, it is probably worth doing. Thus, if you need both the A and the B in the output file, you should write them both when you see the issue, and not do the work again later. Also, you might as well skip self-comparison: It can never fail (if your test is right), so it is wasted effort. In other words: I agree with woooee about the general layout of the loop.

Another hint: In your main function,

outputFile = None
  outputFile = open(outputFileName,'w')
  # do the work
  if outputFile:

You can also do this using a with statement if you are using a recent Python (2.5 and above): http://effbot.org/zone/python-with-statement.htm

Edited by griswolf: n/a


Here's an outline of "do the work" in prior posting

stopInner = len(data)
stopOuter = stopInner -1
for outerIndex in range(0, stopOuter):   ## stop at end-1
    ##  start at the next element and go through the end
    for innerIndex in range(outerIndex+1, stopInner):
      a = data[outerIndex]
      b = data[innerIndex]
      if badPair(a,b): 
        markBad(outputFile, a, b)

where badPair is a function that returns True (or some non-zero value) if the pair should be marked as bad in the output file and markBad does that marking. You could quite reasonably instead have a single function that does both things, maybe def compareAndMarkIfBad(anOpenFile,leftItem,rightItem): ...

Edited by griswolf: better names


OH! I'm sorry, my brain was stuck on stupid there for a minute. I get it, and thank you for the pointers on efficiency! I'll give this a whirl later today.

Thank you very much for your time,



It is simple to use EasyGUI to get the data instead of keying in the entire file path, elevation, and pointbuff. Also, making one pass through the input file list and converting to floats will save doing that every time you access each record. An example follows with some sample code to print elevation differences that exceed the elevation input.

def convert_arg(arg_in):
    """ return a float if arg_in will convert, otherwise return zero
        ret_arg = float(arg_in)
        return ret_arg
    except:  ## will catch anything, like "" that will not convert
        return 0

def get_file_name(dir_in = ""):
    """ example of choosing a file and entering data via EasyGUI
    import easygui
    file_name_and_path = easygui.fileopenbox( \
                     "Double Click", "Choose your file", "/home/", "*")

    title = "pointbuff and elevation"
    msg = "Enter Now"
    field_names = ["Point Buff","Elevation"]
    field_values = []   ## start with no default values
    field_values = easygui.multenterbox(msg, title, field_names, field_values )

    # make sure that none of the fields was left blank
    while 1:
         if type(field_values) == None:
            field_values = [ -1, -1 ]
         errmsg = ""
         for x in range(len(field_names)):
            if field_values[x].strip() == "":
               errmsg = errmsg + ('"%s" is a required field.\n\n' % field_names[x])
         if errmsg == "": 
            break # no problems found
         field_values = easygui.multenterbox(errmsg, title, field_names, field_values)
    return file_name_and_path, field_values[0], field_values[1]

def read_file(filename):
    """ read the file, convert {1]. [2]. and [3] to floats and return
        the new list
    converted_list = []
##    read1 = csv.reader(open(filename, 'rb'))
    read1 = simulate_csv_read()  ## use test data instead of csv file
    for row in read1:
        junk_list = [row[0]]   ## junk_list holds the data from one record
        for j in range(1, 4):  ## convert to float and append to list
            junk_list.append( convert_arg(row[j]) )
        for j in range(4, len(row)):  ## rest of record

    return converted_list

def simulate_csv_read():
    """ function to use provided data to simulate a csv read
    provided_list = ['30952 594635.24 200621.13 0',
'22500 598647.51 170513.26 -2191',
'31634 603679.17 171486.24 -2255',
'22734 606320.46 177526.04 -2459',
'22842 605912.11 177952.72 -2481',
'44528 688937.27 202291.1 -2780',
'44528 688937.27 202291.1 -2943',
'28509 613899.93 162957.46 -2041',
'44528 688937.27 202291.1 -2921',
'59268 594188.52 200371.53 -2971',
'59268 594188.52 200371.53 -2969',
'58687 593467.84 199357.6 -2937',
'58687 593467.84 199357.6 -2927',
'40672 592204.28 200947.08 -2961',
'40672 592204.28 200947.08 -2960',
'1111111 592204.28 202291.1 0' ]

    return_list = []
    for rec in provided_list:

    return return_list

if __name__ == "__main__":
    fname, pointbuff, elevation = get_file_name("/home")
    print fname, pointbuff, elevation
    elevation_diffs = abs(convert_arg(elevation))

    converted_list = read_file(fname)
    stop_y = len(converted_list)
    stop_x = stop_y -1
    for x in range(0, stop_x):
        ## saves finding converted_list[x][3] for every value of "y"
        this_x_elev = converted_list[x][3]
        for y in range(x+1, stop_y):
            if elevation_diffs < abs(this_x_elev - converted_list[y][3]):
                print converted_list[x]
                print converted_list[y]

Edited by woooee: n/a


Thank you for your help! I got this running smoothly today. I may redo this program in the future using a class for the database entries.

This question has already been answered. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.