So, I am sure everybody has been here, but just to describe it properly, a short story:

I am writing a (to me) somewhat complicated program. By changing the input parameters I suddenly start getting a run-time error.
After several hours of placing more or less meaningful printing-commands around the code, I locate the place that I can't get past.
However, nothing seems to take place there.
I print all the usual variables, but nothing seems to be wrong.
I start simplifying the code. Removing all member-functions that are not used, all variables, slowly replacing all variables with dummies, while at the same time checking if the error still occurs.
Suddenly, after having removed about 4/5 of the program, the bug disappears. However, when rolling back the most recent changes, it does not reappear.

This has happened before during other projects, and I have spoken to other people who also knew of it. My question is, what do you guys do to locate such an error, when it stops occurring? The original version also seems to run atm. I have not much experience with debugging programs, but I tried running it through gdb - it just reported that the program had excited fine. Do I just have to wait for it to reappear, and hope I have more luck next time I go hunting? It's incredibly frustrating, both having wasted so much time not solving it, and also knowing that it's still in there somewhere.

From my experience, this looks like a memory corruption problem. Maybe playing with valgrind can help you. Also, try outputting the pointer addresses you use. verify each dynamic memory allocation to make sure you never use an uninitialized pointer. Also check bounds on for-loops and stuff, make sure you are not writing memory out of bounds. Valgrind will help you with all that.

When you make changes until the bug disappears, and then roll-back and it doesn't reappear, it probably has to do with the fact that the memory pattern of your program is slightly altered so it no longer causes a visible problem, but it doesn't mean the problem really did disappear.

Is this a multi-threaded application? Pseudo-random-like errors often occur when the program is multi-threaded and the memory is not properly shielded with mutexes.

nope, not multi-threaded. didn't know about valgrind, will look into that. had the pointer-problem until recently, so I did check for that this time.

but thanks for sharing =)

valgrind is a wonderful program =) I could track down the error very reliably.

>> After several hours of placing more or less meaningful printing-commands around the code
you don't need to do that, because gdb have a function called print.
you just break on the line for example , from the code that I'm currently
debugging.

20
(gdb) continue
Continuing.

Breakpoint 2, WinMain (hInstance=0x400000, hPrevInstace=0x0,
    lpszCmdLine=0x241f0d "", iCmdShow=10) at scroll_bar.cc:58
58              wndclass.style=CS_HREDRAW|CS_VREDRAW;
(gdb) print hInstance
$1 = 0x400000
(gdb) list
53
54              WNDCLASS wndclass;
55              MSG msg;
56              HWND hWnd;
57
58              wndclass.style=CS_HREDRAW|CS_VREDRAW;
59              wndclass.cbClsExtra=0;
60              wndclass.cbWndExtra=0;
61              wndclass.hInstance=hInstance;
62              wndclass.hIcon=LoadIcon(NULL,IDI_APPLICATION);
(gdb)

But using assert at the places that you can think your program will never
reach is really really a good idea.

>>I start simplifying the code. Removing all member-functions that are not used, all variables, slowly replacing all variables with dummies, while at the same time checking if the error still occurs.

Errr ! Do not break the code. Very bad idea !

>> I have not much experience with debugging programs, but I tried running it through gdb - it just reported that the program had excited fine.

if you are new to gdb don't bug off gdb.The only answer read start to end.
If your program didn't crash ( something like by trying to access a invalid memory
location or something like that) then it never say anything.You better use the
break command to set the breakpoint where you want to inspect data and begin hit
run.For a example.

(gdb) break scroll_bar.cc:60
Breakpoint 3 at 0x401447: file scroll_bar.cc, line 60.
(gdb)

or break WinMain, break main etc.

Please note that if you want to use that symbols inside the debugger in the running
time then you need to use -g option when you linking and compiling.

so, if I understand correctly, if I put in gdb break-points around the code, I can make it run to a certain point, and I will then have access to all variables within the scope? would be very nice, I was raised on matlab, and you really get to miss the workspace.

so, if I understand correctly, if I put in gdb break-points around the code, I can make it run to a certain point, and I will then have access to all variables within the scope? would be very nice.

Oh yeah, that's the whole point of gdb. What other way were you using it? And there are a lot of IDEs which integrate gdb in the user interface (like KDevelop), while others have their own debugging environment (like VC++). Otherwise, you have to deal with gdb with the command-line, which is actually a bit hard to use.

This article has been dead for over six months. Start a new discussion instead.