## pyTony 888

Here is example how data can be summed to dictionary or you can use numpy.histogram to sum the data as weights of the categorized data.

``````data = '''5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.splitlines():
energy, pos = map(float, d.split())
freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
data = [d.split() for d in data.splitlines() if d != '\n']
weights = [float(a) for a,b in data]
pos  = [int(float(b)) for a,b in data]

# numpy organizes by itself the limits for bins
print('5 bins by numpy histogram')
print(numpy.histogram(pos,bins=5, weights=weights))``````

Specialties:
IT/Science/Contracts/Religious translation/interpreting FIN-ENG-FIN
Python programming

## Gribouillis 1,391

You can also define your own bins limits for the numpy histogram by passing a sequence, like in `bins = range(1, 7)` . Matplotlib's histograms also use numpy's histogram() method. Here is the same example with the plotted histogram

``````import matplotlib.pyplot as plt

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
weights, pos = zip(*[map(float,d.split()) for d in data.strip().splitlines()])

fig = plt.figure()
n, bins, patches = ax.hist(pos, bins = range(1,7), weights = weights, facecolor = "green")
print n, bins
plt.savefig("histo.png") # save figure (optional)
plt.show() # display figure on the screen``````
commented: usufull idiom .strip().splitlines() +13

## pyTony 888

Code cleaned up from Grib's example:

``````data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.strip().splitlines():
energy, pos = map(float, d.split())
freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
weights, pos = zip(*(map(float, d.split()) for d in data.strip().splitlines()))

print('numpy histogram')
print(numpy.histogram(pos, bins=list(range(6)), weights=weights))``````

## lrh9 95

It's also possible to bin items to a collection like a set or list.

This technique can bin items without key collisions.

``````import collections

data = ((5.639792, 1.36),
(4.844813, 1.89),
(4.809105, 2.33),
(3.954150, 2.69),
(2.924234, 3.42),
(1.532669, 4.50),
(0.000000, 5.63))

bucket = collections.defaultdict(list)
for each in data:
bucket[int((each[1]))].append(each[0])
print(bucket)``````

## pyTony 888

@ljh: The C++ post's requirement was 'no external libraries used' so I translated it to 'no external modules' for Python.

## Stackheuw

Is there a way I can use a data file and set columns as coordinates? I'm really interested in doing my script this way.