hi all,

i have a file that contains the following data (just a sample) :

501 	 0 	 0.932 0.933 0.931 0.931 0.929 0.933 0.93 0.928 
501 	 1 	 0.974 0.98 0.978 0.976 0.974 0.974 
501 	 2 	 0.953 0.949 0.944 0.951 0.942 0.942 0.942 0.948 
501 	 3 	 0.933 0.934 0.934 0.935 0.931 0.933 0.932 0.934 
501 	 4 	 0.939 0.934 0.934 0.934 0.937 0.932 0.938 
501 	 5 	 0.944 0.942 0.942 0.943 0.939 0.95 0.942 0.948 
501 	 6 	 0.974 0.976 0.974 0.971 0.97 0.967 0.971 0.974 
501 	 7 	 0.986 0.984 0.984 0.986 0.986 0.984 
502 	 0 	 0.927 0.933 0.931 0.931 0.929 0.933 0.93 0.928 
502 	 1 	 0.974 0.98 0.978 0.976 0.973 0.971 0.974 
502 	 2 	 0.953 0.949 0.951 0.942 0.942 0.942 0.948 
502 	 3 	 0.933 0.934 0.934 0.935 0.931 0.933 0.932 0.931 
502 	 4 	 0.939 0.934 0.934 0.932 0.937 0.932 0.938 
502 	 5 	 0.944 0.942 0.942 0.943 0.939 0.95 0.95 0.948 
502 	 6 	 0.974 0.974 0.974 0.971 0.97 0.967 0.971 0.974 
502 	 7 	 0.986 0.984 0.984 0.986 0.984 0.986

and i parse it using the following commands :

file_hol = open("<file_name>.dat", "r")
data_hol = []
for line_hol in file_hol :
	line_hol = [float(x) for x in line_hol.split()]
	data_hol.append(tuple(line_hol))
file_hol.close()

so, i get a list of lists with the name data_hol (everything is ok till now)

the 1st column is a timestamp
the 2nd column is a port number
what i would like to find is the port with the largest sum of the numbers of the rest of the columns
and, eventually, to have a list of ports where a port corresponds to a list of the port number and and the numbers (rest of the columns) with the largest sum,
so

pisa_hol = [0]
for i in range(0, len(data_hol)) :
	if sum(data_hol[i][2:len(data_hol[i])]) > sum(pisa_hol[int(data_hol[i][1])][1:len(pisa_hol[int(data_hol[i][1])])]) :		pisa_hol[data_hol[i][1]] = data_hol[i][2, len(data_hol[i])]

what i get is "TypeError: len() of unsized object"

any ideas ?

thanx

Recommended Answers

All 6 Replies

It's hard to help you without the full traceback of the error but here's my two cents:

You're parsing the file contents incorrectly. What you should be doing is splitting each line at the tab character to get time stamp, port number, then the remaining string of floats. At this point you can sum the floats together and get a single float for your data list.

Here's how I would parse the data you provided:

>>> data_stor = []
>>> for line in file_hol:
...     data = line.split('\t')
...     if len(data) == 3:
...         tstamp = data[0].strip()
...         port = data[1].strip()
...         valu = sum([float(ea) for ea in data[2].strip().split()])
...         data_stor.append([tstamp, port, valu])
...     
>>> max(data_stor, key=lambda x:x[2])
['501', '6', 7.7770000000000001]
>>>

This returns that port 6 at timestamp 501 has the highest sum of numbers.

Something like that might do. I avoided the use of float() until the end to keep time-stamp and port info from turning into floats:

raw_data = """\
501 	 0 	 0.932 0.933 0.931 0.931 0.929 0.933 0.93 0.928
501 	 1 	 0.974 0.98 0.978 0.976 0.974 0.974
501 	 2 	 0.953 0.949 0.944 0.951 0.942 0.942 0.942 0.948
501 	 3 	 0.933 0.934 0.934 0.935 0.931 0.933 0.932 0.934
501 	 4 	 0.939 0.934 0.934 0.934 0.937 0.932 0.938
501 	 5 	 0.944 0.942 0.942 0.943 0.939 0.95 0.942 0.948
501 	 6 	 0.974 0.976 0.974 0.971 0.97 0.967 0.971 0.974
501 	 7 	 0.986 0.984 0.984 0.986 0.986 0.984
502 	 0 	 0.927 0.933 0.931 0.931 0.929 0.933 0.93 0.928
502 	 1 	 0.974 0.98 0.978 0.976 0.973 0.971 0.974
502 	 2 	 0.953 0.949 0.951 0.942 0.942 0.942 0.948
502 	 3 	 0.933 0.934 0.934 0.935 0.931 0.933 0.932 0.931
502 	 4 	 0.939 0.934 0.934 0.932 0.937 0.932 0.938
502 	 5 	 0.944 0.942 0.942 0.943 0.939 0.95 0.95 0.948
502 	 6 	 0.974 0.974 0.974 0.971 0.97 0.967 0.971 0.974
502 	 7 	 0.986 0.984 0.984 0.986 0.984 0.986"""

fname = "my_data.dat"
# write the test file
fout = open(fname, "w")
fout.write(raw_data)
fout.close()

# read the test file
fin = open(fname, "r")
data_hol = []
for line_hol in fin:
    line_hol = [x for x in line_hol.split() ]
    data_hol.append(tuple(line_hol))

my_datalist = []
for line in data_hol:
    #print(line[1], line[2:])  # test
    port = line[1]
    # now you can use float()
    data_sum = sum([float(x) for x in line[2:]])
    my_datalist.append((port, data_sum))

# test the list of (port, data_sum) tuples
for tup in my_datalist:
    print(tup)

"""my output -->
('0', 7.4470000000000001)
('1', 5.8559999999999999)
('2', 7.5709999999999997)
('3', 7.4660000000000002)
('4', 6.548)
('5', 7.5500000000000007)
('6', 7.7770000000000001)
('7', 5.9099999999999993)
('0', 7.4420000000000002)
('1', 6.8260000000000005)
('2', 6.6270000000000007)
('3', 7.4630000000000001)
('4', 6.5460000000000003)
('5', 7.5579999999999998)
('6', 7.7749999999999995)
('7', 5.9099999999999993)
"""

thank you all, what i really needed was the max. sum for every port,
your comments were very helpful

the final version is :

pisa_hol = {}
file_hol = open("/.../hol.dat", "r")
data_hol = []
for line_hol in file_hol :
	data_hol = line_hol.split('\t')
	if len(data_hol) == 3 :
		port = int(data_hol[1].strip())
		cells = [float(x) for x in data_hol[2].strip().split()]
		weight = sum(cells)
		if pisa_hol.has_key(port) :
			if sum(pisa_hol[port]) < sum(cells) :
				pisa_hol[port] = cells
		else :
			pisa_hol[port] = cells
	else :
		print "ERROR : Input data are NOT formated properly !"

hi again,

given that

501 	 0 	 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
501 	 1 	 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
501 	 2 	 0.0 0.0 0.0 0.0 0.0 0.965 0.0 0.0 
501 	 3 	 0.0 0.952 0.0 0.951 0.949 0.0 0.0 0.947 
501 	 4 	 0.965 0.0 0.963 0.0 0.0 0.0 0.962 0.0 
501 	 5 	 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
501 	 6 	 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
501 	 7 	 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
502 	 0 	 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
502 	 1 	 0.0 0.0 0.98 0.0 0.0 0.979 0.0 0.0 
502 	 2 	 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
502 	 3 	 0.947 0.0 0.0 0.945 0.0 0.0 0.0 0.0 
502 	 4 	 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 
502 	 5 	 0.0 0.965 0.0 0.0 0.0 0.0 0.964 0.0

and that

cells = [float(x) for x in data_hol[2].strip().split()]

where data_hol[2] is the 3rd column of the a file, that contains a batch of floats,
HOW can i insert into the list "cells" only the NON-zero items ?

cells = [float(x) for x in data_hol[2].strip().split()]

where data_hol[2] is the 3rd column of the a file, that contains a batch of floats,
HOW can i insert into the list "cells" only the NON-zero items ?

If you want to ignore the items that are 0.0:

cells = [float(x) for x in data_hol[2].strip().split() if x != 
'0.0']

Keep in mind however that the ports with all 0.0 will result in an empty list named cells , which potentially could break other code. Just ensure that your list handling is going to account for the possibility of empty lists.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.