Here is example how data can be summed to dictionary or you can use numpy.histogram to sum the data as weights of the categorized data.

data = '''5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.splitlines():
energy, pos = map(float, d.split())
freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
data = [d.split() for d in data.splitlines() if d != '\n']
weights = [float(a) for a,b in data]
pos  = [int(float(b)) for a,b in data]

# numpy organizes by itself the limits for bins
print('5 bins by numpy histogram')
print(numpy.histogram(pos,bins=5, weights=weights))

Specialties:
IT/Science/Contracts/Religious translation/interpreting FIN-ENG-FIN
Python programming

4
Contributors
5
Replies
15
Views
6 Years
Discussion Span
Last Post by Stackheuw

You can also define your own bins limits for the numpy histogram by passing a sequence, like in bins = range(1, 7) . Matplotlib's histograms also use numpy's histogram() method. Here is the same example with the plotted histogram

import matplotlib.pyplot as plt

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
weights, pos = zip(*[map(float,d.split()) for d in data.strip().splitlines()])

fig = plt.figure()
n, bins, patches = ax.hist(pos, bins = range(1,7), weights = weights, facecolor = "green")
print n, bins
plt.savefig("histo.png") # save figure (optional)
plt.show() # display figure on the screen

Edited by Gribouillis: n/a

usufull idiom .strip().splitlines()
Attachments

Code cleaned up from Grib's example:

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.strip().splitlines():
energy, pos = map(float, d.split())
freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
weights, pos = zip(*(map(float, d.split()) for d in data.strip().splitlines()))

print('numpy histogram')
print(numpy.histogram(pos, bins=list(range(6)), weights=weights))

It's also possible to bin items to a collection like a set or list.

This technique can bin items without key collisions.

import collections

data = ((5.639792, 1.36),
(4.844813, 1.89),
(4.809105, 2.33),
(3.954150, 2.69),
(2.924234, 3.42),
(1.532669, 4.50),
(0.000000, 5.63))

bucket = collections.defaultdict(list)
for each in data:
bucket[int((each[1]))].append(each[0])
print(bucket)

@ljh: The C++ post's requirement was 'no external libraries used' so I translated it to 'no external modules' for Python.

Is there a way I can use a data file and set columns as coordinates? I'm really interested in doing my script this way.

Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.