954,545 Members — Technology Publication meets Social Media
Username:
Password:
Lost login information?

Putting data to bins

By Tony Veijalainen on Jul 20th, 2011 11:37 am

Here is example how data can be summed to dictionary or you can use numpy.histogram to sum the data as weights of the categorized data.

data = '''5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.splitlines():
    energy, pos = map(float, d.split())
    freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
data = [d.split() for d in data.splitlines() if d != '\n']
weights = [float(a) for a,b in data]
pos  = [int(float(b)) for a,b in data]

# numpy organizes by itself the limits for bins
print('5 bins by numpy histogram')
print(numpy.histogram(pos,bins=5, weights=weights))

You can also define your own bins limits for the numpy histogram by passing a sequence, like in bins = range(1, 7) . Matplotlib's histograms also use numpy's histogram() method. Here is the same example with the plotted histogram

import matplotlib.pyplot as plt

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
weights, pos = zip(*[map(float,d.split()) for d in data.strip().splitlines()])

fig = plt.figure()
ax = fig.add_subplot(111)
n, bins, patches = ax.hist(pos, bins = range(1,7), weights = weights, facecolor = "green")
print n, bins
plt.savefig("histo.png") # save figure (optional)
plt.show() # display figure on the screen
Attachments histo.png 8.46KB
Gribouillis
Posting Maven
Moderator
2,786 posts since Jul 2008
Reputation Points: 1,044
Solved Threads: 691
 

Code cleaned up from Grib's example:

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''
# use the integer part of second value to categorize the first value and add it to that bin
freq = dict()
for d in data.strip().splitlines():
    energy, pos = map(float, d.split())
    freq[int(pos)] = freq.setdefault(int(pos),0) + float(energy)

print('Categorized by integer part')
print(sorted(freq.items()))

# using numpy.histogram
import numpy
weights, pos = zip(*(map(float, d.split()) for d in data.strip().splitlines()))

print('numpy histogram')
print(numpy.histogram(pos, bins=list(range(6)), weights=weights))
pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

It's also possible to bin items to a collection like a set or list.

This technique can bin items without key collisions.

import collections

data = ((5.639792, 1.36),
        (4.844813, 1.89),
        (4.809105, 2.33),
        (3.954150, 2.69),
        (2.924234, 3.42),
        (1.532669, 4.50),
        (0.000000, 5.63))

bucket = collections.defaultdict(list)
for each in data:
    bucket[int((each[1]))].append(each[0])
print(bucket)
lrh9
Posting Whiz in Training
243 posts since Oct 2009
Reputation Points: 119
Solved Threads: 36
 

@ljh: The C++ post's requirement was 'no external libraries used' so I translated it to 'no external modules' for Python.

pyTony
pyMod
Moderator
5,359 posts since Apr 2010
Reputation Points: 782
Solved Threads: 852
 

Is there a way I can use a data file and set columns as coordinates? I'm really interested in doing my script this way.

Stackheuw
Newbie Poster
6 posts since Sep 2011
Reputation Points: 10
Solved Threads: 0
 

This article has been dead for over three months

Post: Markdown Syntax: Formatting Help
You
View similar articles that have also been tagged: