Putting data to bins

TrustyTony 2 Tallied Votes 2K Views Share

Here is example how data can be summed to dictionary or you can use numpy.histogram to sum the data as weights of the categorized data.

Stackheuw commented: Very helpful +0
data = '''5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.splitlines():
    energy, pos = map(float, d.split())
    freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
data = [d.split() for d in data.splitlines() if d != '\n']
weights = [float(a) for a,b in data]
pos  = [int(float(b)) for a,b in data]

# numpy organizes by itself the limits for bins
print('5 bins by numpy histogram')
print(numpy.histogram(pos,bins=5, weights=weights))
Gribouillis 1,391 Programming Explorer Team Colleague

You can also define your own bins limits for the numpy histogram by passing a sequence, like in bins = range(1, 7) . Matplotlib's histograms also use numpy's histogram() method. Here is the same example with the plotted histogram

import matplotlib.pyplot as plt

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
weights, pos = zip(*[map(float,d.split()) for d in data.strip().splitlines()])

fig = plt.figure()
ax = fig.add_subplot(111)
n, bins, patches = ax.hist(pos, bins = range(1,7), weights = weights, facecolor = "green")
print n, bins
plt.savefig("histo.png") # save figure (optional)
plt.show() # display figure on the screen
commented: usufull idiom .strip().splitlines() +13
TrustyTony 888 pyMod Team Colleague Featured Poster

Code cleaned up from Grib's example:

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.strip().splitlines():
    energy, pos = map(float, d.split())
    freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
weights, pos = zip(*(map(float, d.split()) for d in data.strip().splitlines()))

print('numpy histogram')
print(numpy.histogram(pos, bins=list(range(6)), weights=weights))
lrh9 95 Posting Whiz in Training

It's also possible to bin items to a collection like a set or list.

This technique can bin items without key collisions.

import collections

data = ((5.639792, 1.36),
        (4.844813, 1.89),
        (4.809105, 2.33),
        (3.954150, 2.69),
        (2.924234, 3.42),
        (1.532669, 4.50),
        (0.000000, 5.63))

bucket = collections.defaultdict(list)
for each in data:
    bucket[int((each[1]))].append(each[0])
print(bucket)
TrustyTony 888 pyMod Team Colleague Featured Poster

@ljh: The C++ post's requirement was 'no external libraries used' so I translated it to 'no external modules' for Python.

Stackheuw 0 Newbie Poster

Is there a way I can use a data file and set columns as coordinates? I'm really interested in doing my script this way.

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.