Putting data to bins

13 Years Ago TrustyTony 2 2K Views

Here is example how data can be summed to dictionary or you can use numpy.histogram to sum the data as weights of the categorized data.

python

Stackheuw commented: Very helpful +0

data = '''5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.splitlines():
    energy, pos = map(float, d.split())
    freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
data = [d.split() for d in data.splitlines() if d != '\n']
weights = [float(a) for a,b in data]
pos  = [int(float(b)) for a,b in data]

# numpy organizes by itself the limits for bins
print('5 bins by numpy histogram')
print(numpy.histogram(pos,bins=5, weights=weights))

Gribouillis 1,391 Programming Explorer

13 Years Ago

You can also define your own bins limits for the numpy histogram by passing a sequence, like in bins = range(1, 7) . Matplotlib's histograms also use numpy's histogram() method. Here is the same example with the plotted histogram

import matplotlib.pyplot as plt

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
weights, pos = zip(*[map(float,d.split()) for d in data.strip().splitlines()])

fig = plt.figure()
ax = fig.add_subplot(111)
n, bins, patches = ax.hist(pos, bins = range(1,7), weights = weights, facecolor = "green")
print n, bins
plt.savefig("histo.png") # save figure (optional)
plt.show() # display figure on the screen

Edited 13 Years Ago by Gribouillis because: n/a

TrustyTony commented: usufull idiom .strip().splitlines() +13

TrustyTony 888 ex-Moderator

13 Years Ago

Code cleaned up from Grib's example:

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.strip().splitlines():
    energy, pos = map(float, d.split())
    freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
weights, pos = zip(*(map(float, d.split()) for d in data.strip().splitlines()))

print('numpy histogram')
print(numpy.histogram(pos, bins=list(range(6)), weights=weights))

lrh9 95 Posting Whiz in Training

13 Years Ago

It's also possible to bin items to a collection like a set or list.

This technique can bin items without key collisions.

import collections

data = ((5.639792, 1.36),
        (4.844813, 1.89),
        (4.809105, 2.33),
        (3.954150, 2.69),
        (2.924234, 3.42),
        (1.532669, 4.50),
        (0.000000, 5.63))

bucket = collections.defaultdict(list)
for each in data:
    bucket[int((each[1]))].append(each[0])
print(bucket)

TrustyTony 888 ex-Moderator

13 Years Ago

@ljh: The C++ post's requirement was 'no external libraries used' so I translated it to 'no external modules' for Python.

Stackheuw 0 Newbie Poster

13 Years Ago

Is there a way I can use a data file and set columns as coordinates? I'm really interested in doing my script this way.

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.