arkoenig 340 Practically a Master Poster

The reason that you are permitted to use the "this" pointer with a static data member, even though the pointer is ignored, is so that you can write a statement such as

p->v = 42;

without having to know whether v is static.

Otherwise, you would have to figure out the type of the object to which P points, and then write

P_type::v = 42;
arkoenig 340 Practically a Master Poster

Please use code tags.

arkoenig 340 Practically a Master Poster

Why not use the sort algorithm from the C++ standard library?

arkoenig 340 Practically a Master Poster

Are you sure that your user isn't picking k instead of K?

arkoenig 340 Practically a Master Poster

Where are you using && with integers?

Code tags are described here.

arkoenig 340 Practically a Master Poster

Also... Please use code tags.

arkoenig 340 Practically a Master Poster

I have simply redefined the offending constructor in my cpp file, this is working fine but is it safe?

No.

What should I do to make it safe? Bearing in mind that I cannot edit the offending code directly as it has already been distributed.

Don't try to pretend that you can change existing code without changing it.

In other words, if you want to change what it does, you have to change it, compile it, and distribute the new version--preferably under a different name so that it is not confused with the original.

The only exception to this rule is if the original author took into consideration that someone might want to change part of the code's behavior, and made provisions for doing so. One common method of making such provisions is to use inheritance; but there are others.

arkoenig 340 Practically a Master Poster

Maybe you should use something other than an empty string in your #define statements.

#define NUMERIC_CONSTANTS_H 1
arkoenig 340 Practically a Master Poster

If you want to continue calculating other values, then you should continue calculating other values. What is preventing you from doing so?

arkoenig 340 Practically a Master Poster

When you call check2(drives), the formal parameter is an Array. That means that drives is copied and what check2 sees is a copy of it. Is that what you intend? If not, the parameter should probably be Array&. It's hard to say more without seeing all the code.

arkoenig 340 Practically a Master Poster

@Akill10: line 8 should have semicolons, not commas. Tut, tut.

arkoenig 340 Practically a Master Poster

Very good to know, thank you.

One reason it's not portable is that IBM mainframes use a character set in which 'a' through 'i' are hex 81 through 89, but 'j' is hex 91. Much more information than you ever wanted to know about this character set can be found here.

arkoenig 340 Practically a Master Poster

I'm assuming you meant const char * here?

Yes. Fixed; thank you.

arkoenig 340 Practically a Master Poster

To get the value of a character, subtract 'a'.

This technique works on most computers in use today, but it is not actually guaranteed to work. That's why I coded my example the way I did.

arkoenig 340 Practically a Master Poster

Suppose n is the number you're looking for. Then:

const int chunksize = 370000, numletters = 26;
assert(n < numletters * numletters * chunksize);
int chunkno = n / chunksize;
int digit1 = chunkno / numletters;
int digit2 = chunkno % numletters;
const char* alphabet = "abcdefghijklmnopqrstuvwxyz";
string filename = string("x") + alphabet[digit1] + alphabet[digit2];
arkoenig 340 Practically a Master Poster

Just a question. Is tail-recursion optimization even relevant anymore?

Interesting question. Consider the following, which I've seen in many variations:

void traverse(Node* n) {
    if (n) {
        process(n);
        traverse(n->left);
        traverse(n->right);
    }
}

It is easy to remove the tail recursion by hand:

void traverse(Node* n) {
    while (n) {
        process(n);
        traverse(n->left);
        n = n->right;
    }
}

and you're suggesting that the compiler you used is clever enough to do this automatically.

Now, let's make the program a little more complicated:

void traverse(Node* n) {
    std::string s = "Test";
    if (n) {
        process(n, s);
        traverse(n->left);
        traverse(n->right);
    }
}

Now the second call to traverse is no longer a tail recursion, because it is followed by the implicit call to the destructor of s. So it is much harder for the compiler to optimize. Nevertheless, the hand optimization is legitimate, because we can realize that the value of s doesn't change.

So the question may come down to whether programs that are susceptible to optimization are more like my first example or more like my last.

arkoenig 340 Practically a Master Poster

Please read this.

arkoenig 340 Practically a Master Poster

Just for the record, I agree that the solution is ugly and unnecessary obfuscated, but at least it's fun.

I'm being serious when I say that it's not obfuscated--it illustrates a programming technique that is important in languages that use recursion as a matter of course.

Definition: A function call that is the very last thing a function does is called a tail call. So, for example, a return statement that returns the result of a function call is a tail call.

In C++, many calls that look like tail calls really aren't, because the compiler has to insert destructor calls for local variables before the function can actually return. However, when a function call is a true tail call, it is often possible for a compiler to replace the call snd subsequent return by a jump instruction.

A tail call that is recursive is called a tail recursion. It is often possible for a compiler to replace a tail recursion by a loop.

There are some programming languages, such as Scheme (which is used as the introductory programming language at MIT), that either discourage or completely prohibit iteration statements. Instead, programmers are expected to use recursion. In such language, it is an important optimization on the programmer's part to use tail recursion where possible, because the compiler will implement tail-recursive programs as iterations that do not consume any new stack space on each iteration.

So although the existence of destructors usually makes it difficult …

mike_2000_17 commented: Awesome! +2
Nick Evan commented: Cool. Not much of a schemer myself. +16
arkoenig 340 Practically a Master Poster

Is that what is known as "tough love"?

No, it's what is known as getting home late after three hours waiting around to play music for ten minutes at a sound-mixing workshop, and feeling like doing something a little weird.

arkoenig 340 Practically a Master Poster

The point was to give an example of recursion that solved the problem correctly, but that could not reasonably be turned in as homework. I am hoping that the original poster will look at the program, figure out how it works, and then rewrite it in a simpler style.

arkoenig 340 Practically a Master Poster

@arkoenig: I understand that the OP didn't show enough effort towards solving the problem himself that it shouldn't be dignified with an answer. You could just have said so. With all due respect, posting a bad solution and implying that he should make it even worse so that he can hand it in without suspicion is a bit of a nasty trick, IMO. It sort-of locks him out of finding the actual solution to the problem, which I won't post, obviously, but I just wanted to point it out.

But it's not a bad solution. It's just an unnecessarily clever one. In fact, it is the preferred solution in some programming languages, such as Scheme, that automatically translate proper tail recursion into iteration.

arkoenig 340 Practically a Master Poster

Here's a recursive solution that's that's tricky enough that if you turn it in, your teacher will probably realize that you did not write it yourself:

void cubes1(int i, int n)
{
    if (i <= n) {
        cout << i*i*i << ' ';
        cubes1(i+1, n);
    }
}

void cubes(int n)
{
    cubes1(1, n);
}

Now it's up to you to figure out what I did that's so tricky, and simplify it so that you can legitimately claim that it's your own work.

arkoenig 340 Practically a Master Poster

Here's the information you need.

NathanOliver commented: excelent link +2
arkoenig 340 Practically a Master Poster

Of course you don't have to read a vector to return a random element. You do, however, have to know how many elements the vector has.

If the "vector" is really a file, the only way to know how many lines it has is to read the entire file. You can guess how many lines it has by reading part of the file and assuming that the average line length for the entire file is the same as it is for the part that you read, but without reading the entire file, you have no way of knowing how accurate your guess is.

arkoenig 340 Practically a Master Poster

This algorithm looks like it requires enough memory to hold the entire sample--that is, enough to hold enough information to identify which lines are in the sample. That could be as little as one integer per line; those integers would just record the line numbers.

The question is whether this algorithm actually solves the problem. It is hard to answer that question without knowing what the problem is. If, for example, you want to choose a random line from the file a lot of times, and you do not know in advance how many times you are going to have to do it, this algorithm will not solve the problem.

If you know in advance how many lines are in the entire file, you can choose n random elements from the file without even storing any of the elements.

This is another example that shows why there is little point in discussing solutions to a problem until you know for sure what the problem is and what constitutes an acceptable solution.

arkoenig 340 Practically a Master Poster

The trouble is that you might not know the value of K at the time you start reading the file.

arkoenig 340 Practically a Master Poster

It's not easy to replace a line of text in a file, because if the replacement line is a different size than the original, you have to change the positions of the remaining characters in the file.

Unless you want to get into databases, the easiest most reliable way to change the contents of a file is to copy the entire file into a new file with a different name. Once that step is complete, you can delete the original file and rename the new file to have the same name as the original.

Obviously this technique does not scale well to very large files, but if you are going to deal in this way with very large files, you will have other problems to solve as well.

arkoenig 340 Practically a Master Poster

In your search function, counter should be a reference, not a pointer. So you should delete the * on line 17, and in line 98, you should change "int * counter" to "int & counter".

arkoenig 340 Practically a Master Poster

Without reading the whole file, you cannot select every line with equal probability. One reason is that you have no way of knowing whether the portions of the file you did not read consist of one long line each, or a whole bunch of little tiny lines each.

arkoenig 340 Practically a Master Poster

It is very easy to estimate the performance of a solution, it is called complexity class.

It is impossible to estimate the performance of a solution without knowing whether a particular program actually is a solution.

It is reasonable to claim that an algorithm that selects every line with the same probability is a solution. However, an algorithm that seeks to a random offset in the file, and then extracts a line near that offset, generally does not select every line with the same probability. So the question then becomes whether that selection is reasonable, and why.

So far, you have said that the selection is reasonable, but you haven't said why. That is, you haven't said what criteria you are using to distinguish reasonable results from unreasonable results. Until you do that, there's no point in talking about the performance of individual programs, because there's no way to know whether those programs even produce correct results.

arkoenig 340 Practically a Master Poster

I don't see any point in talking about algorithm details until you explain what probability distribution you're willing to accept.

arkoenig 340 Practically a Master Poster

The problem doesn't ask how many lines are in the files, it asks to get a random line.

Usually someone who says "get a random line" implies that each line should have the same probability of being selected as any other line. If you don't care about that requirement, then you'll have to explain what probability distributions you're willing to accept. In particular, the technique you suggested has the property that any line with 100 characters in it is 100 times as likely to be selected as any line with only a single character in it. Such behavior does not normally fall into what people mean when they say "random."

arkoenig 340 Practically a Master Poster

A problem I see by the suggested solutions so far that is, it is inefficient if one calls the function numerous times. So in the long run it will probably be better to read it into a vector and use that vector
to extract lines.

I don't see how you can estimate how likely such a solution is to be better without knowing something about (a) the circumstances under which the program will be run, or (b) the definition of "better."

As Narue has pointed out, there is no a priori reason to believe that you even have enough memory to store the whole file. Even if you do, there is a different tradeoff that seems at least as attractive at first glance: Read through the file and store a vector for which each element indicates the starting position of a line, stored as a seek offset. You still have to reread the lines that you wind up selecting, but at least you don't have to store the lines that you never select.

This strategy has the additional advantage that instead of storing a seek pointer to the beginning of every line, you can store a seek pointer to the beginning of every nth line, for a suitable value of n. Then you can still get to any line by seeking to the last known spot before the beginning of that line, and you still don't have to read the whole file. You can adjust the space/time …

arkoenig 340 Practically a Master Poster

rvalue-lvalue qualification is not the same as cv-qualification. You can have a const or non-const, lvalue or rvalue variable.

I think you're missing the point.

The C++ standard, clause 5.2.11, says that the result of const_cast<T>(e) is an lvalue if type T is a reference type; otherwise the result is an rvalue. So const_cast<myclass*>(anything) is an rvalue, because the type myclass* is not a reference type.

But, in theory, just reading or writing the value of the this pointer (not what it points to) is "valid" (if the compiler permits it), just like this code is valid, but useless:

void someFunction(int* some_ptr) {
  delete some_ptr;
  std::cout << "The pointer I just deleted had address: " << some_ptr << std::endl;
  some_ptr = NULL; // reading or writing some_ptr is valid and well-behaved after the delete. This also applies to the this pointer.
};

I still believe you are mistaken. One of the interesting aspects of pointers is that it is undefined behavior even to copy the value of a pointer to memory that has been freed. Clause 3.7.3.2 of the C++ standard says that once memory has been deallocated, any pointer to that memory becomes invalid, and any use of such a pointer, even to pass it to a function, results in undefined behavior.

This behavior is inherited from C, and comes from some processor architectures that use different registers for addresses and other data. In such processors, the mere act of putting an address into an …

mike_2000_17 commented: Very good point! Glad to know that! +2
arkoenig 340 Practically a Master Poster

This is a roundabout way of saying that the problem is not impossible to solve if you solve a different problem instead.

And, of course, your suggestion depends on running on an operating system that can determine the length of a file without reading. Not all systems can do it; and the ones that can are not always able to do so on all devices. In particular, you can't determine the length of a "file" that is typed in from a keyboard as your program is running, because the person doing the typing may not even have decided yet how long the file will be.

arkoenig 340 Practically a Master Poster

To give you an idea of how many worms are in this particular can, I'd like to isolate two lines from the OP's code:

delete this;
const_cast<myclass *>(this) = 0;

The program is already broken at this point, because once you have executed "delete this;" any subsequent use of this is invalid.

Come to think of it, the second statement is invalid anyway, even without "delete this", because this is not an lvalue. And even if it were, casting it to myclass* yields an rvalue, to which you can't assign.

If your compiler accepts this program, your compiler is broken.

arkoenig 340 Practically a Master Poster

I was looking for a method of picking a random line without reading the whole file.

That's impossible, unless you already know how many lines are in the file.
Because until you reach the end of the file, you don't know how many lines remain unread.

arkoenig 340 Practically a Master Poster

Both of the previous algorithms are seriously flawed. They never choose the first line, while the second line has a 50% probability of being picked up. For any subsequent line, the probability of being picked up is decreasing.

This comment is wrong, because in the last algorithm presented (repeated here for convenience, with the parameter type changed from ifstream to ifstream& because you can't copy an object of type ifstream):

string random_line(ifstream& file)
{
    string line;
    string selected_line;
    int nlines = 0;
    while (getlinefile, line))
        if ((rand() % ++nlines) == 0)
            selected_line = line;      
    return selected_line;
}

at the beginning of the first trip through the loop, nlines is zero. ++nlines increments it to 1, which means that we start by computing rand() % 1, which is always zero. This fact means that the first time through the loop, the statement

selected_line = line;

will always be executed. If the values of the random numbers are such that this statement is not executed again, then the algorithm will select the first line.

The second misstatement is more subtle. As Accelerated C++ points out, using rand() % n for a random number between 0 and n-1 gives results that are generally not random, even if rand() itself is perfect.

To see why, assume that rand() returns a 32-bit signed value, i.e. a value between 0 and 2147483647, inclusive, and that you are computing rand() % 190.

There are 2147483648 possible values that rand() can take on, …

arkoenig 340 Practically a Master Poster

Nice info, but 4 years too late.

And wrong, too, for at least two reasons that come to mind immediately.

arkoenig 340 Practically a Master Poster

Ah... Now you know what you need to learn.

arkoenig 340 Practically a Master Poster

What do you think end() returns for a container?

arkoenig 340 Practically a Master Poster

It sounds to me that perhaps you're thinking about the program's output rather than the properties that you want your iterators to have.

If you have an iterator that refers to the last element of a list, and you apply ++ to it, what does that iterator refer to? Whatever your answer is to that question, the value returned by end() has to have the same answer.

arkoenig 340 Practically a Master Poster

Are you saying that the code I commented out was actually correct?

It looked correct at first glance. I haven't analyzed the code thoroughly, as I've been doing about three other things at the same time today.

arkoenig 340 Practically a Master Poster

On line 62, you define and initialize a local variable named begin. You never do anything with that variable again, which means that it serves no purpose. On the same line, you have a return statement that is commented out. Why?

Independently of these problems, your uses of first_link + 1 in lines 68 and 75 are just wrong. The only time it makes sense to add an integer and a pointer is when that pointer points to an element of an array, and first_link does not do so in this case.

I am reluctant to tell you how to solve these problems, because I don't want to do your homework for you. If you make a serious effort to do so yourself, and explain what you did, I may change my mind.

arkoenig 340 Practically a Master Poster

You're welcome.

Tracking down what part of the program is causing a compilation error during template instantiation can be a pain.

arkoenig 340 Practically a Master Poster

Your begin, end, rbegin, and rend functions don't have any return statements in them, so they return garbage.

arkoenig 340 Practically a Master Poster
T& operator*(){return *elem;}
const T& operator*()const {return *elem;}
arkoenig 340 Practically a Master Poster

Never mind, I think I found your problem. Lines 23 and 24:

T operator*(){return *elem;}
const T operator*()const {return *elem;}

have to return references.

arkoenig 340 Practically a Master Poster

(yeah when I double click the error message ... it takes me to the memory header)

So we need to see the code in the memory header that it's complaining about.

arkoenig 340 Practically a Master Poster

And what are the types of data and avail?

Meanwhile... Are you saying that line 287 in the memory header is the call to uninitialized_fill?