Perhaps I should post this elsewhere but we speak the same language here. I may have asked a similar question some time ago but now I have a new perspective and want to investigate. I have a wastewater process that's being sampled periodically (uniform sampling for what it's worth). The sample rate is way too low to avoid aliasing but the samples are real enough and the data is continuously available and very likely not amenable to being sampled more often (economics). It's a bit like sampling a random series except that I "know" there is an underlying pattern that repeats each day with variable amplitude no doubt. That, plus transients, would be the highest frequency content and seasonal things are the lowest frequency content which I'm not too worried about. And, while I'd like to know when transients happen and how big they are, I'm afraid that's out of the question. In fact, what's of value here is to estimate how much plant capacity is being "used up". By my reckoning, 6 months of data during our peak months is a good averaging period - as it's the peak months that determine our capacity "use" for regulatory purposes. In the shorter term, the numbers are used for determining charges for overly high concentrations, shared use, etc. To make things a bit more complicated, the regulatory agency has us report the weekly data on a monthly basis (actually here there are 2 samples per week) and average it for the month. If there are 3 contiguous months with these averages exceeding our "capacity" or some large fraction of it, then we are put on notice that planning for future capacity must begin. So, this is one "measure" that's in concrete. But, I digress a bit ..... Here is my question: Instead of worrying about aliasing which is where I go to first of course, is there a statistical measure that might help me better understand the "quality" of our numbers or how much variation is "expected" given those numbers? For example, given 4 to 8 weeks of data (4 to 8 samples), what can be said the data set in a statistical sense? How might one best put the answer to use in a case like this? Where should I be looking? Fred

# Analyzing an "undersampled" sequence

Started by ●July 13, 2010

Reply by ●July 13, 20102010-07-13

I didn't see where you reveal what is being sampled. Is it how full a tank is? How much fluid is flowing in a pipe? Fred Marshall wrote:> Perhaps I should post this elsewhere but we speak the same language > here. I may have asked a similar question some time ago but now I have > a new perspective and want to investigate. > > I have a wastewater process that's being sampled periodically (uniform > sampling for what it's worth). > The sample rate is way too low to avoid aliasing but the samples are > real enough and the data is continuously available and very likely not > amenable to being sampled more often (economics). > > It's a bit like sampling a random series except that I "know" there is > an underlying pattern that repeats each day with variable amplitude no > doubt. That, plus transients, would be the highest frequency content > and seasonal things are the lowest frequency content which I'm not too > worried about. And, while I'd like to know when transients happen and > how big they are, I'm afraid that's out of the question. > > In fact, what's of value here is to estimate how much plant capacity is > being "used up". By my reckoning, 6 months of data during our peak > months is a good averaging period - as it's the peak months that > determine our capacity "use" for regulatory purposes. > In the shorter term, the numbers are used for determining charges for > overly high concentrations, shared use, etc. > > To make things a bit more complicated, the regulatory agency has us > report the weekly data on a monthly basis (actually here there are 2 > samples per week) and average it for the month.Depending on what is being sampled that could be a complete accounting or an incomplete accounting of usage. If each sample records how much was used since the last sample was taken, then when you add them together you have complete accounting of the usage for the month. If all that the sample is measuring is the instantaneous usage at the instant the sample is taken then you have a very incomplete accounting of usage and could make it mean just about anything you want it to. -jim> > If there are 3 contiguous months with these averages exceeding our > "capacity" or some large fraction of it, then we are put on notice that > planning for future capacity must begin. So, this is one "measure" > that's in concrete. But, I digress a bit ..... > > Here is my question: > > Instead of worrying about aliasing which is where I go to first of > course, is there a statistical measure that might help me better > understand the "quality" of our numbers or how much variation is > "expected" given those numbers? > For example, given 4 to 8 weeks of data (4 to 8 samples), what can be > said the data set in a statistical sense? How might one best put the > answer to use in a case like this? > > Where should I be looking? > > Fred

Reply by ●July 13, 20102010-07-13

Fred Marshall <fmarshallx@remove_the_xacm.org> wrote:>Instead of worrying about aliasing which is where I go to first of >course, is there a statistical measure that might help me better >understand the "quality" of our numbers or how much variation is >"expected" given those numbers? >For example, given 4 to 8 weeks of data (4 to 8 samples), what can be >said the data set in a statistical sense? How might one best put the >answer to use in a case like this? > >Where should I be looking?Something like a Student's T test can tell you if a sample or group of samples is out-of-line. (I think I may have said the same thing, the last time you asked a similar question.) Steve

Reply by ●July 13, 20102010-07-13

On 7/13/2010 12:31 PM, Fred Marshall wrote:> Perhaps I should post this elsewhere but we speak the same language > here. I may have asked a similar question some time ago but now I have > a new perspective and want to investigate. > > I have a wastewater process that's being sampled periodically (uniform > sampling for what it's worth). > The sample rate is way too low to avoid aliasing but the samples are > real enough and the data is continuously available and very likely not > amenable to being sampled more often (economics). > > It's a bit like sampling a random series except that I "know" there is > an underlying pattern that repeats each day with variable amplitude no > doubt. That, plus transients, would be the highest frequency content and > seasonal things are the lowest frequency content which I'm not too > worried about. And, while I'd like to know when transients happen and > how big they are, I'm afraid that's out of the question. > > In fact, what's of value here is to estimate how much plant capacity is > being "used up". By my reckoning, 6 months of data during our peak > months is a good averaging period - as it's the peak months that > determine our capacity "use" for regulatory purposes. > In the shorter term, the numbers are used for determining charges for > overly high concentrations, shared use, etc. > > To make things a bit more complicated, the regulatory agency has us > report the weekly data on a monthly basis (actually here there are 2 > samples per week) and average it for the month. > If there are 3 contiguous months with these averages exceeding our > "capacity" or some large fraction of it, then we are put on notice that > planning for future capacity must begin. So, this is one "measure" > that's in concrete. But, I digress a bit ..... > > Here is my question: > > Instead of worrying about aliasing which is where I go to first of > course, is there a statistical measure that might help me better > understand the "quality" of our numbers or how much variation is > "expected" given those numbers? > For example, given 4 to 8 weeks of data (4 to 8 samples), what can be > said the data set in a statistical sense? How might one best put the > answer to use in a case like this? > > Where should I be looking?Other things being equal, clustering should follow a Poisson distribution. If you measure flow -- a quantity that can be heavily influenced by rainfall -- only twice a week, how do you bill equitably? Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������

Reply by ●July 13, 20102010-07-13

Jerry Avins wrote:> > Other things being equal, clustering should follow a Poisson > distribution. If you measure flow -- a quantity that can be heavily > influenced by rainfall -- only twice a week, how do you bill equitably? > > JerryJerry, I don't imagine that we bill entirely "equitably" - more like "agreeably". We measure flow continuously to get the volume and concentration once or twice a week. The concentration is assumed to apply for the entire measured volume between concentration samples. So, one may say that we sample loading in that fashion. I think I answered my own question to the point where I can deal with it: We have the weekly or twice-weekly samples and have computer monthly averages - as the latter have some regulatory importance. You might consider these monthly averages to be lowpassed versions of the samples. Then, one can compute the distribution of outcomes and infer(?) the amount of loading. My "backwards" sort of reasoning goes like this: We take a set of samples. We determine the distribution of those sample values over a suitably long time such that daily and even annual variations are included in the distribution. The caution here is that trends get wiped out - so a suitable time frame or set of them needs to be selected that has some meaning where gross trends are concerned. If we assume that the distribution represents a reasonable estimate of ground truth, then we can infer in quantitative terms what's happening - such as over-loading (i.e. loading that's above some determined threshold). It's surely not "perfect" but it's better than nothing ... I think. Fred

Reply by ●July 13, 20102010-07-13

On 7/13/2010 8:28 PM, Fred Marshall wrote:> Jerry Avins wrote: > >> >> Other things being equal, clustering should follow a Poisson >> distribution. If you measure flow -- a quantity that can be heavily >> influenced by rainfall -- only twice a week, how do you bill equitably? >> >> Jerry > > Jerry, > > I don't imagine that we bill entirely "equitably" - more like "agreeably". > > We measure flow continuously to get the volume and concentration once or > twice a week. > > The concentration is assumed to apply for the entire measured volume > between concentration samples. So, one may say that we sample loading in > that fashion. > > I think I answered my own question to the point where I can deal with it: > > We have the weekly or twice-weekly samples and have computer monthly > averages - as the latter have some regulatory importance. > You might consider these monthly averages to be lowpassed versions of > the samples. > Then, one can compute the distribution of outcomes and infer(?) the > amount of loading. > > My "backwards" sort of reasoning goes like this: > We take a set of samples. > We determine the distribution of those sample values over a suitably > long time such that daily and even annual variations are included in the > distribution. > The caution here is that trends get wiped out - so a suitable time frame > or set of them needs to be selected that has some meaning where gross > trends are concerned. > If we assume that the distribution represents a reasonable estimate of > ground truth, then we can infer in quantitative terms what's happening - > such as over-loading (i.e. loading that's above some determined threshold). > It's surely not "perfect" but it's better than nothing ... I think.If your samples are taken at times of unusually high I&I, the dilution can make the measured concentrations uncharacteristically low. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������

Reply by ●July 14, 20102010-07-14

Jerry Avins wrote:> If your samples are taken at times of unusually high I&I, the dilution > can make the measured concentrations uncharacteristically low. > > JerryYes, I know but the sample times are set for a number of reasons. Actually, our concern right now is why the concentrations are so darned high! So, in these parts where there's nearly 100 inches of rain each year, we're used to seeing and fixing I&I. Right now it's not a big concern. Fred

Reply by ●July 14, 20102010-07-14

On Jul 13, 9:29�pm, Jerry Avins <j...@ieee.org> wrote:> On 7/13/2010 8:28 PM, Fred Marshall wrote: > > > > > > > Jerry Avins wrote: > > >> Other things being equal, clustering should follow a Poisson > >> distribution. If you measure flow -- a quantity that can be heavily > >> influenced by rainfall -- only twice a week, how do you bill equitably? > > >> Jerry > > > Jerry, > > > I don't imagine that we bill entirely "equitably" - more like "agreeably". > > > We measure flow continuously to get the volume and concentration once or > > twice a week. > > > The concentration is assumed to apply for the entire measured volume > > between concentration samples. So, one may say that we sample loading in > > that fashion. > > > I think I answered my own question to the point where I can deal with it: > > > We have the weekly or twice-weekly samples and have computer monthly > > averages - as the latter have some regulatory importance. > > You might consider these monthly averages to be lowpassed versions of > > the samples. > > Then, one can compute the distribution of outcomes and infer(?) the > > amount of loading. > > > My "backwards" sort of reasoning goes like this: > > We take a set of samples. > > We determine the distribution of those sample values over a suitably > > long time such that daily and even annual variations are included in the > > distribution. > > The caution here is that trends get wiped out - so a suitable time frame > > or set of them needs to be selected that has some meaning where gross > > trends are concerned. > > If we assume that the distribution represents a reasonable estimate of > > ground truth, then we can infer in quantitative terms what's happening - > > such as over-loading (i.e. loading that's above some determined threshold). > > It's surely not "perfect" but it's better than nothing ... I think. > > If your samples are taken at times of unusually high I&I, the dilution > can make the measured concentrations uncharacteristically low.Duh, what's, I & I? Greg

Reply by ●July 14, 20102010-07-14

On 7/14/2010 5:55 AM, Greg Heath wrote:> ... what's, I& I?Infiltration and inflow, which force sewage plants to process rainwater. Infiltration occurs when leaky mains are lower that the water table. Inflow is often illegal pump connections to the sanitary sewer. When streets become submerged, rainwater can pour in through manhole covers. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������

Reply by ●July 14, 20102010-07-14

On 7/14/2010 5:55 AM, Greg Heath wrote: > ... what's, I& I? Infiltration and inflow, which force sewage plants to process rainwater. Infiltration occurs when leaky mains are lower than the water table. Inflow is often illegal pump connections to the sanitary sewer. When streets become submerged, rainwater can pour in through manhole covers. Jerry -- Engineering is the art of making what you want from things you can get. �����������������������������������������������������������������������