Trying to bin one column and sum the other simultaneously. Please help.

Question

arick1234 0 Newbie Poster

12 Years Ago

I have an array [x,y] where the values of x is ascending and the y values are random. I would like to sum all of the y values together when the x values are within a certain range. I can't show you any code as i dont even know where to start with this but this is the desired effect:-

Example data:-

(1, 0.5)
(2, 5.0)
(4, 2.0)
(7, 0.5)
(8, 0.5)
(10, 2.5)
(11, 1.5)
(15, 3.0)
(18, 4.0)
(20, 0.5)
...... etc.

If the range of y is 10. So from y=1 to y=10 and from y=11 to y=20 the x values will be summed within these two bands, I will get a list:-

11.0, 9.0, ...... etc.

So far I have been advised to sum all the y values, this gives:-

import numpy, scipy, matplotlib.pylab as plt

mjd1, flux1, error1 = numpy.loadtxt("log_207_band1.dat", usecols=(1,2,3), unpack=True)

x=mjd1
y=flux1
z=zip(x,y)

print sum(i[1] for i in z)

I'm really at a loss of what to do next.

Thanks.

bin math python statistics

2 Contributors
7 Replies
2K Views
1 Day Discussion Span
Latest Post 12 Years Ago Latest Post by Gribouillis

All 7 Replies

Gribouillis 1,391 Programming Explorer

12 Years Ago

Take a look at pyTony's snippet http://www.daniweb.com/software-development/python/code/373120

Edited 12 Years Ago by Gribouillis because: n/a

Gribouillis 1,391 Programming Explorer

12 Years Ago

Here is a simple example

import numpy as np
x = np.array([1, 2, 4, 7, 8, 10, 11, 15, 18, 20])
y = np.array([0.5, 5.0, 2.0, 0.5, 0.5, 2.5, 1.5, 3.0, 4.0, 0.5])
n, bins = np.histogram(x, bins=np.array([1,11,21]), weights=y)
print n
"""my output -->
[ 11.   9.]
"""

Edited 12 Years Ago by Gribouillis because: n/a

Reply to this topic

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.

arick1234 0 Newbie Poster · Answer 1 · 2011-11-30T20:29:56+00:00

Take a look at pyTony's snippet http://www.daniweb.com/software-development/python/code/373120

I've run the code in this thread and it works beautifully. However I'm not proficient enough in python to understand how it works. I don't really understand how to get my data in the same format as in the example. eg:-

data = '''
5.639792 1.36
4.844813 1.89
4.809105 2.33
3.954150 2.69
2.924234 3.42
1.532669 4.50
0.000000 5.63
'''

Thanks.

arick1234 0 Newbie Poster · Answer 2 · 2011-12-01T03:43:26+00:00

Here is a simple example

import numpy as np
x = np.array([1, 2, 4, 7, 8, 10, 11, 15, 18, 20])
y = np.array([0.5, 5.0, 2.0, 0.5, 0.5, 2.5, 1.5, 3.0, 4.0, 0.5])
n, bins = np.histogram(x, bins=np.array([1,11,21]), weights=y)
print n
"""my output -->
[ 11.   9.]
"""

Thank you Gribs, that help a lot! I am now getting the results I but there is still I small issue. Here is the code I have written so far:-

import numpy as np, scipy, matplotlib.pylab as plt, itertools

mjd1, flux1, error1 = np.loadtxt("log_207_band1.dat", usecols=(1,2,3), unpack=True)

x= min (mjd1)
y= max (mjd1)
m=[mjd1]
f=[flux1]
a=[]

while (x <= y):
    a.append(x),      
    x += 10

v = np.array(m)
w = np.array(f)
n, bins = np.histogram(v, bins=np.array(a), weights=w,)

z=zip(a,n)

print z

At the moment 'z' is printing all values including the ones where the 'n' values are zero. What I would like to do is only return the 'a' and 'n' values in the zipped array for when 'n' is not zero. I don't need the zero values for when I'm plotting graphs.

Many Thanks.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 3 · 2011-12-01T14:15:31+00:00

If you want to remove the null values, you could use

z = [(u, v) for (u, v) in zip(a,n) if v != 0.0]
# or
z = [(u, v) for (u, v) in zip(a,n) if abs(v) < 1.0e-9]

however, I don't think you can plot your graph with matplotlib.pyplot.hist() then. You could use examples from the matplotlib's gallery instead, like this one http://matplotlib.sourceforge.net/examples/pylab_examples/custom_ticker1.html

arick1234 0 Newbie Poster · Answer 4 · 2011-12-01T16:59:17+00:00

If you want to remove the null values, you could use
z = [(u, v) for (u, v) in zip(a,n) if v != 0.0]
# or
z = [(u, v) for (u, v) in zip(a,n) if abs(v) < 1.0e-9]
however, I don't think you can plot your graph with matplotlib.pyplot.hist() then. You could use examples from the matplotlib's gallery instead, like this one http://matplotlib.sourceforge.net/examples/pylab_examples/custom_ticker1.html

I have tried both of the bits of code you suggested but i dont seem to be havign any luck with it. I've done a bit more on my code and now I have:-

import numpy as np, scipy, matplotlib.pylab as plt, itertools

mjd1, flux1, error1 = np.loadtxt("log_207_band1.dat", usecols=(1,2,3), unpack=True)

a = mjd1
b = flux1
c = error1
d = (1/c)**2
e = b/(c**2)
mim = min (a)
mam = max (a)
arm = [a]
are = [d]
bim = []

# Binned Mjd.

while mim <= mam:
    bim.append(mim),      
    mim += 10

#Binned Error.

w = np.array(arm)
x = np.array(are)
m, bins = np.histogram(w, bins=np.array(bim), weights=x,)

bie = m**(-0.5)

#Binned Flux.
    #Top.

arft = [e]
y = np.array(arft)
nt, bins = np.histogram(w, bins=np.array(bim), weights=y,)

    #Bottem.

arfb = [d]
z = np.array(arfb)
nb, bins = np.histogram(w, bins=np.array(bim), weights=z,)

bif = (nt/nb)

f = zip(bim,bif,bie)

print f

As you can see I have now included the error in my data. A snippet of the data printed for Z is:-

[(52629.634891900001, 5.8216447829999991, 2.345571756), 
(52639.634891900001, nan, inf), 
(52649.634891900001, nan, inf), 
(52659.634891900001, nan, inf), 
(52669.634891900001, nan, inf), 
(52679.634891900001, nan, inf), 
(52689.634891900001, nan, inf), 
(52699.634891900001, nan, inf), 
(52709.634891900001, nan, inf), 
(52719.634891900001, 5.1222770987666246, 0.55903046280511504), 
(52729.634891900001, nan, inf), 
(52739.634891900001, 4.2638768171735446, 0.42006474243340763), 
(52749.634891900001, nan, inf), 
(52759.634891900001, 3.4933218917231756, 0.098269515245267794), 
(52769.634891900001, 1.6646994479708315, 0.52235125349315226), 
(52779.634891900001, 2.0437369327933292, 0.47024580342005712), 
(52789.634891900001, 2.1838623317766106, 0.92503409390429436), 
(52799.634891900001, 3.5205974719355666, 0.19194470381687062), 
(52809.634891900001, nan, inf), 
(52819.634891900001, 7.553690679610451, 1.464913512979636), 
(52829.634891900001, 2.1452428411935212, 0.48029633284501766), 
(52839.634891900001, nan, inf), 
(52849.634891900001, nan, inf), 
(52859.634891900001, nan, inf)

As you can see the values with nan and inf are the ones I need to remove. Is there a way of removing them within the histogram function? This would also rectify the issue I'm having with the

bif = (nt/nb)

part of the code. The error message is:-

bif = (nt/nb)
RuntimeWarning: invalid value encountered in divide

Which I guess is due to the situation where it divides zero by zero.

Many thanks again Gribs you have been a massive help.

Gribouillis 1,391 Programming Explorer Team Colleague · Answer 5 · 2011-12-01T20:49:00+00:00

I don't think you can remove the undefined values within the histogram function. You will have to remove them before calling histogram(), either by shortening the arrays or by replacing inf and nan by some other value, for example 0.0.

I don't really understand why you want to remove the values. It seems to me that your infinite values come from the 1/c**2, where some entries of your error1 is 0.0. If you want to plot the squared inverse of your error, you must decide one way or the other what you want to see when the error is 0.0.

Trying to bin one column and sum the other simultaneously. Please help.

Recommended Answers Collapse Answers

All 7 Replies

Recommended Answers