Sudoku Solver
I am writing a program to solve Sudoku Puzzles.
This thread will contatin my progress throughout the project and hold any questions and replies.

Part 2 - Checking for Validity
This part is confusing and I will require some help.

I know in Sudoku there is 9 Blocks, in a 3x3 Grid, each containing a 3x3 Grid. The rules are that the same number cannot be in the same row, column or block. You must fill in each space with a number from 1 - 9.

Checking if the numbers are between 1 - 9 is pretty easy, however checking if theres a number in the same row, column or block is the part I am stuck on.

Code So Far:

        //Check ROWS
        for(int i = 0; i < 9; i++)
        {
            for(int j = 0; j < 9; j++)
            {

            }
        }
        //Check COLS
        for(int l = 0; l < 9; l++)
        {
            for(int k = 0; k < 9; k++)
            {

            }
        }

This step is not done.

Just because Soduku is displayed in a 2D grid that doesn't mean you have to store it internally like that. And there are very good reasons why you shoudn't.

You have 81 cells. You want to look at groups of cells like {1,2,3,4,5,6,7,8,9} - a "horizontal row", or {1,10,19...} - a "vertical row" or {1,2,3,10,11,12,19,20,21) - a "3x3 block".
You're going to spend a lot of time iterating through the members of those groups, and doing it all with x,y coordinates is going to be very tedious.

Why not just have int[81] cells to hold the "9x9" grid of numbers, and arrays to hold the rows/cols/blocks. There are 27 rows/cols/blocks, amd each is defined by a list of 9 cell numbers, ie int[27][9] allRCBs containing cell numbers (like the examples above). Just look at how quick and easy processing those groups will be now....

(It's a classic example of why you should start by understanding the model before worryng about the user interface)

The next step is to start to define some useful methods to encapsulate access to that data structure - eg

getCell(row, col) - convenience method for user interface - makes it easy to print or build a GUI as a 9x9 grid

isValid(int[] rcb) - checks a row/col/block for validity (numbers 1-9 exactly once) rcb is a list of the 9 cells that make up the row/col/block, as in

for (int[] rcb : allRCBs) {
   if (isValid(rcb)) ...

Edited 2 Years Ago by JamesCherrill

Comments
Well put

I trust you're new solution, however I'm having trouble understanding it.

I don't understand why you set it up like 'int[27][9] allRCBs' or how that's a better way. I understand where you got 27 from and I think I understand where the 9 comes from (all the cell blocks) but why count them twice? Once in 27 and then another on its own.

Also, in you're example method getCell(), aren't you just specifying an x, y anyway?

How does it check for multiple numbers inside isValid()? How do you pass a column? row? block?
Could this check a block to see if there's no duplicates and for numbers only between 1-9?

Also I should've noted this in the OP but this is a console application as difficult as it is will be.

Edited 2 Years Ago by Doogledude123

allRCBs lists the cells that are part of each row/col/block. The members of the arrays are cell numbers.
The cells array holds the actual numbers 1-9 (or "empty").

To understand why, just compare the incomplete code in your part 2 with the code in my last post. All I need to do is write a small method to check if the cells identified in an array of 9 cell numbers have unique values. Then I can use allRCBs to call that method 27 times in a trivial loop that does all the rows, columns and blocks in one easy pass.

The getCell method is there to show that having a 1D array internally doesn't impact the ease of doing the user interface.

Finally, ideally you will write a "model" that implements all the workings of Sudoku, and provides a few simple get/set methods that a separate UI can call. Then you can write a console UI, GUI, or HTML user interface afterwards (which would also be a great learning exercise).

How you represent your board really depends on how you're planning to solve it. A good way to decide how you should save the board is to decide which operations you'll be performing on the board, and then deciding on how you want to store it.

A 9x9 array will work fine, but it may not be optimal, depending on how your planning on solving it. If your planning on using backtracking, this will get the job done.

James, I'm not sure if storring an array of groups will be that much more effective (correct me if Im wrong). You can iterate through board[row][y] for 0 < y < 10, and board [x][col] for 0 < x < 10 easily. 3x3 blocks are a little more tricky I'll admit. Making an array for each row, column and block will make iterating through the blocks easier, but it will also increase the length of time required for inserting and deleting elements by a factor of 3. Also, for large boards, cache performance will deteriorate quicker.

One thing that would make this faster though is saving it in boolean arrays. Ie, allRCBs[7][4] would be true if 4 was in the 7th column say. Then you could check if there are inconsitencies in O(n) time fairly easily. Also, you can check if a number "fits" into a group in O(1) time, giving it an improvement over a 9x9 array.

Other ways I can think of for representing the boards is to write the suduko puzzle as a cover problem, represent it as a 2D doubly-linked list, and use the dancing links algorithm to solve it. You can also write it as a sat problem, and solve that. Another method might even to be to represent it as in linear program and solve it using the simplex algorithm.

To check if a bunch of integers are unique, you can store them in a counting array, ie, for 0 < n < 10, count[board[n][row]]++, and check to see that all the values in count are 1. This works in O(n) time.

Hi Hiroshe

I think you may be over-elaborating this. Discussion of execution scaling would be appropriate to a project of "arbitrary sized games like sudoku", but I don't think that's what DoogleDude had in mind.

The game of Sudoku has a board of exactly 81 cells, which are considered in exactly 27 subsets of 9 cells each. I also don't get the point about inserting or deleting - the entire mapping is fixed by the rules of the game - what are you inserting or deleting?

At this point I should confess that for every single hour I have ever spent trying to solve efficiency problems, I have spent months trying to fix code that's incomprehensible and buggy. My obsession in software design is to make the code as obvious as possible. Personally I would prefer to implement this as a collection of 81 cells, plus a collection of 27 Subsets, where a subset is a class with a list of 9 cells as a member and methods to return whether the subset was complete and/or valid. That enables me to write solution logic that looks a lot like the English rules of Sudoku, without worrying about multiple loops to access rows/cols/3x3 blocks of 2D arrays.

Anyway, as always in architecture discussions, that's just my opinion, and here in DaniWeb everyone's welcome to contribute to the discussion.

:)
J

ps: Checking for uniqueness of ints 1-9 in some collection of ints:
yes, you can use a count, but a boolean array is more direct in my opinion, something like (not exact Java code)

boolean hasAllUniqueValues(listOfValues) {
   boolean[] found = new boolean[9];
   for (int value : listOfValues) {
      if (found[value])  return false; // duplicated value
      found[value] = true;
   }
   return true;
 }

Edited 2 Years Ago by JamesCherrill

Sorry, I didn't mean inserting and deleting new cells. I ment inserting and deleting values in cells. If he's using a backtacking method, he'll need to make guesses on the board, and check for inconsistencies as he's going along. If that's the case, then checking if an inividual value causes an inconsistency in O(1) time would be quite desirable. Reducing time on the overhead of insert and delete would be nice, but it's a trade off.

I wonder if I didn't explain my original proposal very well because it imposes zero overhead on changing cell values. At the risk of boring everybody (stop reading here if you like!) I'll go through it with greater care:
The game requires us to store 81 numbers, grouped in 27 overlappinng subsets (rows/cols/3x3 blocks) of 9 numbers each. We have to check that each subset contains no duplicated values. Proposed implementation (slightly pseudo-java):

int[81] cells;  // contains the numbers. 1D array because that halves the number of loops and indexes we will need
int[27][9] rcbs;  // contains indexes into the cells array to define which cells are in which row/col/block subgroup - this array is fixed by the rules

adding/deleting/changing a number:

cells[i] = new value; // can't be any easier than that!

now a simple method to see if an r/c/b subgroup (defined by 9 indexes into the cells array) has duplicates

 boolean hasAllUniqueValues(int[] cellIndexes) {
    boolean[] found = new boolean[9];
    for (int i = 0 to 8) {
       int value = cells[cellIndexes[i]];
       if (found[value]) return false; // duplicated value
          found[value] = true;
       }
       return true; // no duplicates
    }
 }

now all we need to check the entire game is

   boolean isComplete() {
        // is the current game a valid complete configuration?
        for (int i = 0 to 26) {  // check each rcb subgroup
            if (! hasAllUniqueValues(rcbs[i]))
                return false;
        }
        return true;
    }

So that's the entire code for checking a complete valid solution. Personally I doubt that there exists a significantly simpler (or faster) Java solution, but maybe someone out here is about to demonstrate that I'm wrong...?

Edited 2 Years Ago by JamesCherrill

Ah! I see what your doing. Yes, indeed that introduces no overhead, and it does make checking much easier, and will result in less code too.

Checking if adding a single element creates an inconsistency still takes O(n) time in this, but for 9x9, it doesn't matter too too much. Checking the entire board is O(n).

If speed was a concern, my suggestion would be to use boolean groups[27][9], which still uses O(n) time to check if the entire board is valid, but it takes O(1) time to check if a new cell is valid, ie:

boolean newCellValid(boolean[][] groups, int x, int y, int value)
{
    int r = x;                          // Row
    int c = y + 9;                      // Column
    int b = (x/3) + 3*((y/3)-1) + 18;   // Block. This is a little tricky.
    return !(groups[r][value] || groups[c][value] || groups[b][value]);
}

Yes. I assume you are thinking of variable-sized boards as having more blocks of the same size? If so, checking the rows/cols may be O(sqrt(n)), but checking the 3x3 is O(1). But anyway, I was only thinking of the standard game.

I did think that for re-checking after adding/changing a single number we could use a link from each cell to the the three RCBs it belongs to (eg int[81][3] array), so checking if that new value causes an invalidation looks like

  boolean hasCausedAnInvalidConfiguration(int cellThatWasChanged) {

    for (int i = 0 to 2) { // re-check each rcb this cell belongs to
       if (! hasAllUniqueValues(rcbs[containingRCBs[cellThatWasChanged][i]]))
          return true;
       }
    return false;
    }

which is 1/3 the effort of re-checking the whole board.

ps: Before someone notices, I just posted code for checking a completed board (all 81 numbers filled in). Obviously there's almost identical code for incomplete ones that just ignores empty (value==0) cells

just for fun, and for people who say Java is slow...
that code for checking completness of the whole grid (but with the R/c/b blocks as a class for clarity) timed over one million calls, on my very ordinary Core i3 PC:

complete valid grid: 0.64 seconds / million calls (worst case)
grid with first cell invalid 0.024 seconds / million calls (best case)

(the difference is because the algorithm terminates as soon as it finds any invalid subgroup - which will usually be the case when searching for solutions).

I tried to find a valid solution by starting with a grid populated by 9x1, 9x2 ... 9x9 and swapping random pairs until a solution is found. I don't recommend it. ten billion tries (1 minute 9 secs) and no solutions...

Hey guys, sorry for not replying right away. I'm currently busy with personal life.
But after reading all the replies and understanding almost none of it, I've thought about what I really don't understand about it.

I believe it's because of the int [27] [9] allRCBs. I understand the 27 as (number of columns + number of rows + number of blocks) and the 9 as (number of blocks). Now if this is correct, how do you know which column row or block your referring to out of the 27? How do you get a cell based on that?

If I cannot seem to understand this, can we just move on with a way that I already have it coded out? From the 2D array standpoint?

9 is number of cells in each row/col/block. Each entry in the array is a cell number. Eg row 1 consists of cells 0, 1, 2, 3...8

If you don't get it, don't worry. There are lots of ways to do this. It's your project, don't let us hijack it! Try your own way and at the very least you'll learn a lot.

Edited 2 Years Ago by JamesCherrill

Alright, I decided to keep it my way after endlessly trying to understand to no avail.

I think I have a working method(untested) for testing rows, column is as easy as changing some letters for the loop to check vertically instead of horizontally.

In the if statement, what should I do? I was thinking of having a counter and the counter increments if the row is valid, and at the end if the counter equals 9, all rows valid, then return true or something?
Obviously I cant just return true in the if statment.

Anyways, heres what I've come up with.

        int[] tempRow;

        for(int i = 0; i < 9; i++)
        {
            tempRow = new int[9];

            for(int j = 0; j < 9; j++)
            {
                tempRow[j] = board[i][j];
            }
            for(int k = 0; k < 9; k++)
            {
                for(int l = 0; l < 9; l++)
                {
                    if(tempRow[k] != 0 & (tempRow[k] != tempRow[l]))
                    {

                    }
                }
            }
        }

I think I've figured out a way to check Blocks aswell. Will post code to that later.

Edited 2 Years Ago by Doogledude123

Obviously I cant just return true in the if statment

Yes. Yet another good reason to put that in a method! (You could pass tempRow as a parameter so it could handle rows cols and blocks identically)

ps: That code will always find a duplicate entry whenever it processes a value of l equal to the value of k, ie tempRow[0] == tempRow[0], tempRow[1] == tempRow[1] etc

pps: using "l" as a variable name is perfectly valid, but depending on the font you are using can be easily confused with "1" - causing total bafflement for whoever is reading it!

Edited 2 Years Ago by JamesCherrill

Yeah, but that requires different code which could be broken down a bit, but it would just be more confusing I think.

Nice catch! I'll just have to skip when k is equal to l.

Understood, however in this particular program I do not think it matters much because I am using i, j, k, l in a forward sucess.

that requires different code which could be broken down a bit, but it would just be more confusing

Not really - you can just refactor your existing code to extract lines 11-20 as a method

boolean isValid(int[] temp) {
 for(int k = 0; k < 9; k++)
   {
   for(int l = 0; l < 9; l++)
     {
        if(temp[k] != 0 & (temp[k] != temp[l]))
          {
          ????? still to be done 
          }
     }
   }
}

Then what's left is

 for(int i = 0; i < 9; i++)
   {
      tempRow = new int[9];
      for(int j = 0; j < 9; j++)
         {
            tempRow[j] = board[i][j];
         }
   if (isValid(tempRow) ....
 }

... which you then have similar code for columns and blocks without having to repeat the code I just factored out.

This is a good example of using methods for clarity and eliminating duplication - you;ll need to do more and more like this as you progress.

ps; Sorry about the indentation. I hate that style and always use the Sun/Oracle standard (see the Java API source code). Others may disagree.

Edited 2 Years Ago by JamesCherrill

This article has been dead for over six months. Start a new discussion instead.