Hello ladies and gents,

Chapter 3 exercise 3-1:

Suppose we wish to find the median of a collection of values. Assume that the we have read some values so far, and that we have no idea how many values remain to be read. Prove that we cannot afford to discard any of the values that we have read. Hint: One proof strategy is to assume that we can discard a value, and then find values for the unread--and therefore unknown--part of our collection that would cause the median to be the value that we discarded.

I don't understand it completly, do I have to enter values and then enter one value wich isn't incorporated into the vector, this then proves that by discarding a value will leave you with a wrong median?

Recommended Answers

All 52 Replies

Member Avatar for iamthwee

Hello ladies and gents,

Chapter 3 exercise 3-1:

I don't understand it completly, do I have to enter values and then enter one value wich isn't incorporated into the vector, this then proves that by discarding a value will leave you with a wrong median?

It seem like an odd exercise to teach u c++?

I guess the only way 2 prove is to do an example. see what happens, does the median change?

... see what happens, does the median change?

Well, if I enter a value but don't include it in the vector but add one to the amount of numbers that are entered, then the median won't be correct, thing is, I don't understand what the actual thing is I have to do in this exercise.

Its actually a theoritical question. You are not required to write any code to solve it.

However, it can also be proven with actual code & appropriate data.

You need to be able to explain how discarding any value read in (before input is complete) could actually be discarding the value that would have end up being the median.

Its actually a theoritical question. You are not required to write any code to solve it.

However, it can also be proven with actual code & appropriate data.

You need to be able to explain how discarding any value read in (before input is complete) could actually be discarding the value that would have end up being the median.

Ok, thanks for the additional info :!: I think I could do this by putting an if statement in it, for example when the while loop is at N°3 it should enter 0 into the vector, this way the amount of vectors is correct, but the total amount isn't.

Ok, thanks for the additional info :!: I think I could do this by putting an if statement in it, for example when the while loop is at N°3 it should enter 0 into the vector, this way the amount of vectors is correct, but the total amount isn't.

The point of the exercise is to be sure that the reader understands why it is necessary for an algorithm that computes the median of a collection of values to be able to have access to all of the values at the same time. This access is not necessary for an algorithm that computes the mean, as such an algorithm can store just a running total of the values it has read.

Oke,

What I mean with median + 1 and median -1 is the value before the median and the value after the median.

So in this number sequence : 1,1,1,5,5,5,8,8,8 all of them are 5.

Roelof

To compute the median of discrete values continually read in, all values must be stored. (contrary to simple average where only running sum, current value and number are necessary). Also the list from where the median is to be taken must be sorted. So these steps are necessary to get the median of a sample:

1. Current read-in value must be inserted in the list (collection) in such a way that the list is ordered (usually ascending order)

2. Depending on whether the number of values is odd or even, the median is:
2.1 if odd, the median is the value taken from the center of the ordered list, e.g. list: 4,2,1,3,5; ordered: 1,2,(3),4,5; median: 3
2.2 if even, the median is the average of the two values taken from the center position of the ordered list. e.g. list: 4,2,6,1,3,5; ordered: 1,2,(3,4),5,6; median: (3+4)/2 = 3.5

It's clear that this is a current median which will change every time a new value is read.

//Pseudocode (assuming first value is stored in list[1], div means integral division):
  read v
  increment(n)
  insert (n, v, list)
  if (odd(n)) then median = list[1 + n div 2] 
     else median = (list[n div 2]+list[1 + n div 2]) / 2

-- tesu

You can prove this by giving just 1 simple example. No need for general proof, because
if it can't work for even 1 case, then its incorrect.

Hello,

That the median changed if you discard a number is not true.
Let's take a look at this numbers : 1,1,1,5,5,5,8,8,8

If you discard one number the median will still be 5.

Roelof

Here's a hint for constructing the proof.

Suppose you have read some numbers, and there are other numbers you haven't read yet so you don't know what they are.

Pick one of the numbers you've read so far. Your job is to prove that whatever that number might be, it is possible that the numbers you are about to read will make it necessary for you to keep that number. To do this, you do not have to know what the unread numbers are -- you just have to imagine what they might be.

Another part of the point of this exercise, by the way, is to get you familiar with how people think about proofs, because often the best way of convincing yourself that a program works correctly is to sketch a simple proof. It doesn't even have to have every detail filled in.

Hello.

Oke,

lets take 0,1,2,3 and image that I will read 4,5,6,7,8,9.

The median of this will be 5.

Now I discard 3.

So we have read in 0 , 1 and 2 and will read 4,5,6,7,8,9.
Now the median will be 4.5 and that's another outcome then when we don't discard 3.

If we had 1,1,1,5,5,5,8,8,8.
And we discard a number it will be the same.

So my conclusion is that if the median and the numbers before and after the median are different you can't discard because then the outcome wil be different.
When they are the same and that you don't know it's makes no difference.

Roelof

Hello,

I think I have figured it out.

When your read some numbers and have a unknown numbers of values then you can't calculate the median. And if you don't know the median, you can't check if the median, the number before and after it are the same. So if you discarded a value you can't know if the outcome will be the same as the answer when you don't discard a value.

Roelof

When you say "If you discarded a value you can't know if the outcome will be the same as the answer when you don't discard a value." This statement is correct, but it does not solve the problem.

The reason is that perhaps there might be some other procedure you could use to figure out what the median would have been anyway. So you have to prove that no such procedure can exist.

I think that this article, although interesting, has nothing to do with the original exercise in the book. In order to answer the exercise, you have to understand something important about algorithms, namely what information an algorithm needs at what point in its execution in order to be able to deliver a result.

In effect, the exercise asks you to prove a particular property of a particular collection of algorithms (i.e. all algorithms that are capable of computing the median of a sequence of values). I think that the only way to be sure you understand this property is to construct a proof yourself.

Oke,

Allgorithms which calculate a median needs a fixed series of values.
As far as I know you can only calculate a median if you know how much values you have to my maths knowlegde. If this is not so, I can't solve this problem.

Roelof

I am sorry, but I do not understand the connection between what you just said and the problem that you are trying to solve.

Again, the problem is to prove that it is impossible to compute the median of a collection of numbers without storing the values of all of the numbers at the same time. This property is in contrast with the mean, which it is possible to compute by storing only the sum of the numbers seen so far and the count of how many numbers have been read.

Oke,

But I don't have a clue how to prove it.
As far as I know you can't calculate a median of a values you don't know.
I can prove that most of the cases that if you discard a value the median will change. The only case that the median is not changing is that if the number before and after the median and the median self are equal.

Roelof

You are assuming that if you discard a value, you are also discarding the knowledge of how many values there were. Nowhere does the problem say that.

Moreover, the problem is to show that there is *no* number you can discard. If I have 3, 1, 4, 1, 5, I can discard 1, 4, or 5, and so long as I remember that there were five numbers originally, I can still compute the median as being 3.

Oke,

If I understand you right.
If I read in 5 numbers in a vector and I discard one There are now 4 values. But the vector.size still says 5 so the outcome will be false.

If this is the problem I can write a programm which will print out the orginal median, size of the vector. After that I can discard one of the numbers with a if - then construction and print out the vector-size , real numbers of computed value and the median.

Roelof

I'm sorry, but you still do not understand the problem.

Oops, then we have a problem.

I thought that I have to prove that if we discard a value or that the numbers read in don't match the real numbers the median will change so it's proven that we can't discard anything ?

Roelof

I'm going to try once more.

You want to find the median of N numbers, which I will call A[0] through A[N-1]. You do not know the value of N.

You have already read M numbers, which are A[0] through A[M-1]. Because you have already read them, you know the value of M. You also know that 0 <= M <= N. If you have understood everything up to this point, you should be able to explain how you know that 0 <= M <= N.

What you have to prove is that you need to retain the values of all the numbers A{0] through A[M-1]. Note that if you knew the value of N, you might not have to do this. For example, if M=N, then that means you have read all of the numbers already, so all you have to do is compute the median and you can throw them all away.

For that matter, if M+1=N, then the median of A[0] through A[N-1] has a value that depends on only a few elements of A (Problem: Figure out which ones), so you can discard all but those few elements.

But you do not know how much larger N is than M, and it is this fact that means you have to remember all of the elements of A that you have read so far.

The problem, then, is to prove it.

Please note that statements such as "the numbers read in don't match the real numbers" are so vague that I don't think you can use them to prove anything.

I hope I have been able to explain the problem clearly this time, because if I haven't, I'm out of ideas.

Oke

I take a try.

If N is the numbers I have to read and M is the numbers I have read M can't be O because then I haven't read in any numbers which we have. M can't be equal to N because then we would read in al the numbers which we don't have because then we know the value of N.

So 0 < M < N .

This this point im correct ??

Roelof

No, you are completely incorrect. First, I said 0 <= M <= N. You said 0 < M < N. So when I asked you to check your understanding by trying to prove something, you tried to prove something else.

I don't like to sound unkind, but if you are going to be successful as a programmer, you really have to learn to read more carefully.

Oke,

I apologize.
But in my understanding M can't be 0. So must be greater dan 0.
The exercise says that we have read in some values. If M=0 then we have not read some values. That's why I said 0 < M.

Roelof


Roelof

In that case, your understanding is incorrect :-)

It is true that if M=0, there are no values to discard, so you don't have to worry about what happens if you discard one. Nevertheless, when you are trying to think formally about a problem, it is important to be very careful with your use of terms.

Oke,

Then a new try.

If we read in a unknown numbers N in a Array [A]
The numbers that we have read is M.

If we start to read the input M=0. When we have read all the numbers in then M=N.
So therefore : 0 <= M <=N. M can't be negative because when we haven't read in a numbers we can discard a number.
If M = N +1 then there are no values more to read in so the value of M is not important anymore because N is now known


Roelof

Hello Andrew,

Can you tell me if this part is correct.
If so, I can try to write the rest.

Roelof

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.