I've just joined up here hoping somebody might be able to help me with a project I've got on at work at the moment.
I've been learning python using the method, let's just do it and see what happens and I appear to be coming up to conflicts consistently and am now 100% stuck on where else to head.
Basically, I've got a CSV with 4 columns in it:
Domain = string
Page = string
Linking = string
Size = integer
I need to complete various functions on these that seemed only basic to me at first but soon got really complicated.
I'm converting my CSV to a graphml file (xml based) that will run in yED.
I need to be able to get a list of all the 'nodes':
Every unique item within the 'Domain' column is a node
Every unique item within the 'Linking' column is a node
Every item within the 'Page' column is a node - however, this is where it gets complicated really and I'm struggling to put it in plain text, every Unique version of the 'Domain' and 'Page' column needs to be listed, i.e. if "Page1" was listed twice but the 'Domain' column was different for these occurences "Page1" would need to be listed twice (I decided to do this with MD5 Hash Tags)
That is the first stage of this project anyway, there is another bit after (connecting all the nodes up) but I can't get onto that until I solve this :(
This is the code that I currently use:
#Import needed packages import csv, array, md5, decimal from useful_funcs import collections #Import CSV (or Database in future) inputFile = open("C:\\Users\\RobH\\Desktop\\xml.csv", "r") reader = csv.reader(inputFile) #Declare memoryTable memoryTable =  #Store CSV (or DB) in memoryTable for row in reader: memoryTable.append(row) #Declare hashTable hashTable =  #Declare hash2Table hash2Table =  #Hash columns 0 and 1 n=0 for r in memoryTable: i = 0 string2Hash = '' while i < len(r)-2: string2Hash += r[i] i+=1 #Get MD5 of hash string2Hashmd5 = md5.new(string2Hash) string2Hashmd51 = string2Hashmd5.hexdigest() #Append hash to memoryTable memoryTable[n].append(string2Hashmd51) n+=1 #Hash2 columns 0, 1 and 2 n=0 for r in memoryTable: i = 0 string4Hash = '' while i < len(r)-2: string4Hash += r[i] i+=1 #Get MD5 of hash2 string4Hashmd5 = md5.new(string4Hash) string4Hashmd51 = string4Hashmd5.hexdigest() #Append hash2 to memoryTable memoryTable[n].append(string4Hashmd51) n+=1 #Sort memoryTable from operator import itemgetter memoryTable.sort(key=itemgetter(4)) #Copy memoryTable to hashTable and hash2Table for row in memoryTable: hashTable.append(row) hash2Table.append(row) #Remove all hash duplicates from hashTable hashTable2 = collections.removeduplicates(hashTable,4) #collections.printy(hashTable2) #Search memoryTable for hash duplicates and add up all values for first edges #Append added up hash values to hashTable #Remove all hash2 duplicates from hash2Table (nodes) hash2Table2 = collections.removeduplicates(hash2Table,5) #collections.printy(hash2Table2) #Search memoryTable for hash2 duplicates and add up all values for second edges HashSize =  roww = 0 for r in hash2Table2: Col3 = [hash2Table2[roww]] HashSize.append(Col3) roww+= 1 #collections.printy(HashSize) #collections.printy(memoryTable) #Append added up hash2 values to hash2Table hash2Table2_2 = list(hash2Table2) i = 0 while i < len(HashSize)+1: x = 0 templist =  for r in HashSize: if r == memoryTable[x]: templist.append(memoryTable[x]) x+= 1 y = 0 templist1 =  while y < len(templist): numberr = decimal.Decimal(templist[y]) * 100 templist1.append(numberr / 100) y+=1 templist2 = sum(templist1) #print templist2 hash2Table2_2.append(templist2) i+= 1 #collections.printy(hash2Table2_2) #something isn't working right... not sure what #value = int(templist) #print value #listy = sum(r for r in templist) #print listy #collections.printy(hash2Table2_2)
and the collections package is:
def printy(hashTable): ret = '' for r in hashTable: print r def removeduplicates(hashTable,column): ret = '' listOfHashTable = list(hashTable) col = column prev = 0 i = 0 z = 1 while i != z: z = len(listOfHashTable) for r in listOfHashTable: if r[col] == prev: rownumb = listOfHashTable.index(r) listOfHashTable.pop(rownumb) prev = r[col] i = len(listOfHashTable) return tuple(listOfHashTable)
If nobody wants to help me that's ok - I'm sure I'll solve it at some point but at the moment it's really REALLY annoying me :(
Thanks a lot,