## Clockowl 56

Hiya fellas,

I just made this Knight's Tour program, and wish to speed it up without making it non-brute force. So it shouldn't have any intelligence.

I did most I could, and was wondering if you guys know more ways to speed this program up. First thing I thought about is changing the recursive "solveit" to an iterative function, but will this really speed the program up, since it's maximum "call depth" is 64?

The code is attached as well as below. Any other comments are welcome as well.

Nick

PS:
Code is attached and below:

``````#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <windows.h>

#define NODEBUG

#define TIMEELAPSED {printf("\tTime elapsed: %lu\n", GetTickCount() - startTime);}

const int horizontal[8] = {2, 1, -1, -2, -2, -1, 1, 2};
const int vertical[8] = {-1, -2, -2, -1, 1, 2, 2, 1};

unsigned int boardsSolved = 0;
unsigned char board[8*8] = {1};
unsigned int startTime;

void printBoard(const unsigned char *board){
unsigned int i, j;

for(i = 0; i < 8; i++){
for(j = 0; j < 8; j++){
printf("%02u ", board[i*8+j]);
}
putchar('\n');
}
putchar('\n');

return;
}

int solveit(const unsigned int *knightPos, const unsigned int steps){
if(steps == 64){
printf("Board (%u) solved:\n", ++boardsSolved);
printBoard(board);
TIMEELAPSED
return 0;
}

/*Algo:
calc xpos
if xpos is valid:
calc ypos
if ypos is valid:
if board[x][y] is not stepped on
set board[x][y] to steps
call self with updated position and board
*/

unsigned int i;
unsigned int curPos[2];

for(i = 0; i < 8; i++){
curPos[0] = knightPos[0] + horizontal[i];
if(curPos[0] > 7){
continue;
}
else{
curPos[1] = knightPos[1] + vertical[i];
if(curPos[1] > 7){
continue;
}
else{
if(board[curPos[0] + curPos[1]*8] == 0){
board[curPos[0] + curPos[1]*8] = steps+1;

#ifdef DEBUG
printf("Board after %u steps: \n", steps);
printBoard(board);
#endif
solveit(curPos, steps+1);
board[curPos[0] + curPos[1]*8] = 0;
}
else{
continue;
}
}
}
}

#ifdef DEBUG
else{
printf("All possibilities tried, returning.\n");
}
#endif

return -1;
}

int main(){
printf("Knights Tour Finder v0.1\n");
startTime = GetTickCount();

unsigned int startPos[2] = {0};

printf("Starting with this board: \n");
printBoard(board);

solveit(startPos, 1);

printf("Done");
return 0;
}``````

Recalculating the time in every recursive call, get's quite expensive. Perhaps depth mod 8 == 0 would be enough?

Also, you re-draw the board for every move. You only need to redraw two squares after any move - the from square, and the to square. It also gets rid of any possible screen flicker.

## Clockowl 56

That's not entirely correct I'm afraid.

I only display the board when it is solved, that's also when I calculate the difference in time.

Notice the #ifdef DEBUG in front of the "print the board after every move" line. That line only gets compiled when DEBUG is defined, which it isn't. It was there for, naturally, debugging purposes.

That's not entirely correct I'm afraid.

I only display the board when it is solved, that's also when I calculate the difference in time.

Notice the #ifdef DEBUG in front of the "print the board after every move" line. That line only gets compiled when DEBUG is defined, which it isn't. It was there for, naturally, debugging purposes.

Then you need to start bringing the facts forward, and start including data: How long does your program take to run, and on what hardware?

Have you profiled it yet, and what were the results?

What compiler are you using, and with what flags?

I'm sure you could google around and find other versions of this program. How does your program compare to them, when run on the same hardware and compiler?

That will give you real insight into the performance of your program.

## Clockowl 56

It's doesn't matter how long it takes to run, I want it to run faster. Improving this algorithm.

And relax, I was only saying you made a mistake in your previous post and explained it.

I'm using GCC with the -o3 and -Wall flag. That's all.

## jephthah 1,888

before i go any further, i want to know what kind of speeds you're currently getting. time per solution.

also, do you require closed solutions, or are open solutions acceptable?

## Clockowl 56

Why does it matter what speeds I'm getting? I want it to be faster. You can't meassure this speed very well since it will "hang" on the same solution for quite a long time and then sometimes spawn 500 solutions in 2 seconds. It gets to 3498th solution in about 24 minutes on my system.

Open solutions are acceptable.

There's a good benchmark at solution nr. 823. It'll get there in about 70 seconds here and then hang for some time.

## jephthah 1,888

Why does it matter what speeds I'm getting?

why so hostile?

i want to know if its worth my time to post my solution. if you're getting speeds as fast as mine, i wont bother.

anyhow, the Warnsdorff algorithm, as ive coded it, solves each of the 64 positions on the board in 15 milliseconds or less. This is a 1.8 GHz machine.

mind you, this is just "solving" any given position. I could keep running it at that rate indefinitely, but if i had to keep track of unique solutions, it would add some overhead.

## Clockowl 56

Sorry if that sounded hostile, I was honestly wondering why it mattered.

Are you saying that for any given position you find *all* the solutions in 15ms? Warnsdorff only finds a set, no? If so, please post your implementation so I can study it. Thank you.

## jephthah 1,888

i dont have my code available to me here at work, its on my laptop at home.

all i did was write code to implement the Warnsdorff Algorithm. its a couple hundred years old, and it's no secret.

you could implement it yourself and get the same results i did.

the pros of this method is that it's accurate (~98%) and its FAST. when i said 15 milliseconds or less, that was the maximum total time to find one solution per square, for ALL of the 64 squares in succession. so, approximately 250 microseconds per solution, or less. (and actually that was on my laptop, an AMD Athelon X2 (2.something GHz per core)

the real flaw with it is that it won't find all possible solutions, which may be unacceptable depending on your requirements. it can find multiple solutions per square, some closed and some open, but it wont find them all.

.

## Clockowl 56

Okay.

I'm doing this for fun and training myself, I wanted to create an algorithm that finds every possible Knight's Tour from any given position in a reasonable amount of time. I'd like it to be in a minute, but I'm not sure if that's possible.

I'm working on an improvement, so hang on. I'm curious how much faster it'll run, but it will run faster.

## jephthah 1,888

I wanted to create an algorithm that finds every possible Knight's Tour from any given position in a reasonable amount of time. I'd like it to be in a minute, but I'm not sure if that's possible.

um, yeah, good luck with that.

there are 26,534,728,821,064 possible CLOSED solutions to the Knight's Tour on an 8x8 board, plus an as-yet uncounted number of undoubtedly many orders of magnitude greater than that for the OPEN solutions.

check these guys out for help: http://www.cray.com/products/xt5/index.html , and meanwhile, Ill keep an eye out for your results when you're published in the journals.

:P

.

## Clockowl 56

No way, THAT much? Well... erm..

Maybe a minute per position then? hahahaha.. Okay, it won't be a minute, but erm.. Well fast. Really fast.

Testing the improvement.

## Clockowl 56

I am stunned.

Here are two algorithms for you, they are cut down to their differences.

I'll give you that the above algorithm will loop less. A lot less. When the second one loops 8 times, the top one may loop 2, 3, 4, 6 or 8 times.

Algorithm 1:

``````for(moves = 0; moves < maxMoves; moves++){
curMove = board[curX + curY*8].availableMoves[moves];

curPos[0] = curX + horizontal[curMove];
curPos[1] = curY + vertical[curMove];
}``````

Algorithm 2:

``````for(i = 0; i < 8; i++){
curPos[0] = knightPos[0] + horizontal[i];
if(curPos[0] < 7){
curPos[1] = knightPos[1] + vertical[i];
if(curPos[1] < 7){
}
}
}``````

To my surprise, the second one executed a lot faster than the first. I'll try and undo the structuring of my code, that seems to be killing the first one, don't you think?

## Clockowl 56

And another surprise struck me! I had this code:

``````if(curPos[0] > 7){
continue;
}
else{
curPos[1] = knightPos[1] + vertical[i];
if(curPos[1] > 7){
continue;
}
else{``````

And figured I could change it to this:

``````curPos[0] = knightPos[0] + horizontal[i];
if(curPos[0] < 8){
curPos[1] = knightPos[1] + vertical[i];
if(curPos[1] < 8){``````

Lesson learned? The top one is about 30% faster when compiled with GCC. Is this a bad of C, or a bad of GCC? Bad branch prediction by the CPU? I'd really like to know.

## Clockowl 56

Hey,

I made a bit of a change to my program, it runs a bit faster now.

Can anyone else come up with more ways to speed up this specific algorithm? So no other ones?

Does anyone know a good way to measure how much time is spend ONLY calling? So not the function itself, purely the call?

As to why: I figured the iterative implementation of the function will only delete *that* overhead. Nothing more. So if little time is spend on the call itself, I wouldn't bother writing an iterative function.

Here's the code, enjoy:

``````#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <windows.h>

#define NODEBUG

#define TIMEELAPSED {printf("\tTime elapsed: %lu\n", GetTickCount() - startTime);}

static unsigned int startTime;

void printBoard(const unsigned char *board){
unsigned int i, j;

for(i = 0; i < 8; i++){
for(j = 0; j < 8; j++){
printf("%02u ", board[i*8+j]);
}
putchar('\n');
}
putchar('\n');

return;
}

__attribute ((fastcall)) static int solveit(const unsigned int knightPosx, const unsigned int knightPosy, const unsigned int steps){
__attribute((aligned(16))) static const int horizontal[8] = {2, 1, -1, -2, -2, -1, 1, 2};
__attribute((aligned(16))) static const int vertical[8] = {-1, -2, -2, -1, 1, 2, 2, 1};

static unsigned int boardsSolved = 0;
__attribute((aligned(16))) static unsigned char board[8*8] = {1};

if(steps == 64){
printf("Board (%u) solved:\n", ++boardsSolved);
printBoard(board);
TIMEELAPSED
return 0;
}

//i is used for selecting the positions.
register unsigned int i;
register unsigned int curPosx;
register unsigned int curPosy;

/*Algo:
calc xpos
if xpos is valid:
calc ypos
if ypos is valid:
if board[x][y] is not stepped on
create new board, copy from old board
set new board[x][y] to 1
call self with updated position and board
*/

for(i = 0; i < 8; i++){
curPosx = knightPosx + horizontal[i];
if(curPosx > 7){
continue;
}
else{
curPosy = knightPosy + vertical[i];
if(curPosy > 7){
continue;
}
else{
if(!board[curPosx + (curPosy<<3)]){
board[curPosx + (curPosy<<3)] = (steps+1);

#ifdef DEBUG
printf("Board after %u steps: \n", steps);
printBoard(board);
getchar();
#endif
solveit(curPosx, curPosy, (steps+1));
board[curPosx + (curPosy<<3)] = 0;
}
}
}
}

#ifdef DEBUG
printf("All possibilities tried, returning.\n");
#endif

return 0;
}

int main(){
printf("Knights Tour Finder v0.1\n");
startTime = GetTickCount();

unsigned int startPos[2] = {0};

solveit(startPos[0], startPos[1], 1);

printf("Done");
return 0;
}``````

## Jishnu 160

Does anyone know a good way to measure how much time is spend ONLY calling? So not the function itself, purely the call?

Time.h has a function clock() which returns a structure of the type clock_t, which is again defined in time.h. Use that.

Lesson learned? The top one is about 30% faster when compiled with GCC. Is this a bad of C, or a bad of GCC? Bad branch prediction by the CPU? I'd really like to know.

Me too.

Edited: After seeing your optimizations, I don't think you'll call the clock() function. You'll get another overhead. You're very enthusiastic about the problem at hand... Keep it up!!

## Clockowl 56

I can't see how I could place the calls to clock in such a way I can only measure the time spent calling. If you do, please post some example code so I can learn from it.

I think a profiler should be more convenient for this, but I don't have any experience with them. Can anyone recommend me a good profiler?

## Jishnu 160

profiler

I've not heard that word before. I'm interested too!

Place two calls to clock(). One before calling the function and another at the starting of the function. Take the difference between the two values returned and you'll get the required time.

## Clockowl 56

Heh, that's not what I meant I'm afraid.

You see, I want to make an iterative function. The only advantage that has over this recursive function, that it doesn't call "itself". So, it's in essence the same function WITHOUT "call solveit()". To see if writing an iterative function is worth the trouble, I first want to know how much that command (and that command only) takes. (Calling a function also takes time).

A profiler "examines" your code while it runs. Some hold counters of function calls, others also display time spend in each function, etc. You should check it out on wikipedia/google if you're interested, they can greatly help you find bottlenecks in a program.

## Jishnu 160

Okay. Did you use one with your code what did you observe? Did it help you in further optimization?