Generic Sudoku Solver
B. Pieters
The generic sudoku solver (gss) is a program that solves and generates sudokus. The program not only solves standard sudokus but can handle many different sudoku topologies, such as X-, jigsaw-, and NRC-sudokus. The gss input format allows you to input and create all sorts of sudoku topologies. This document describes gss and how to use it. You can view this document online at https://bartp5.github.io/gss/.
The gss program was developed to solve a wide variety of sudokus. Furthermore, I wanted to write a solver that applies logic rather than the usual basic elimination and backtracking (systematic guessing) that most solvers seem to apply. The gss solver uses various logic strategies to solve a sudoku (see Section 2.3). The implemented logic is not sufficient to solve all sudokus so gss also uses backtracking as a last resort.
Before we continue some terminology that will be used throughout this document. A sudoku consists of elements. Each element is a member of at the very least one but generally more than one blocks. The blocks in a sudoku all have the same size. The block size of a sudoku equals the number of symbols in the sudoku. In a solved sudoku each element is assigned one symbol and the elements within each block together contain the complete set of symbols in the sudoku, i.e. two elements within one block may not share the same symbol. A standard 9x9 sudoku has 9 symbols (1-9), 81 elements arranged in a 9x9 square, and the blocks are 9 rows, 9 columns, and 9 non-obverlapping squares of 3x3 elements.
The sudokus gss can handle may have block sizes up to 128 symbols (if your platform supports 128 bit integers and gss was compiled with 128bit integer support, see Section 2.2).
The input format of gss allows you to specify the size of the sudoku, the block size and each block and its member elements, and the element symbol (if known). Valid sudoku files may also be used as a template for creating new sudokus. Furthermore, the gss format allows you to specify the output (ASCII) format in a flexible way, such that you can have gss display your sudoku in ASCII-art or let gss produce an input file for a 3rd party program which can then produce a prettier version of your sudoku.
To build gss first get the latest source code from https://github.com/bartp5/gss.
To compile gss type “make” and “make install”. In the Makefile you may change some options. The size option may have the following values:
This size indicate the maximum block size (i.e. the number of symbols). For M32bit the maximum block size is 32 symbols and for M128bit it is 128. This means that if you want to solve a 100x100 sudoku you need gss to be compiled with the M128bit option. The gss program extensively uses the popcount function on integers. You may use gss own popcount implementations or use the gcc popcount implementations (which are generally faster). To use a gcc popcount function you need to specify GCC_POPCNT<bits>, where <bits> matches the integer size. For 128 bit integers you may both specify GCC_POPCNT64 or GCC_POPCNT128. This is as with 128 bit integers a 128 bit popcount is optionally constructed using two 64 bit popcount operations.
To verify the correct behavior of gss use “make test”. This will invoke gss with a database with about 50000 sudokus with known solutions and report back whether gss correctly solved the sudokus.
The following strategies are used for solving the sudokus.
The different strategies and the maximum level at which the various strategies are applied can be controlled. Per default all strategies are enabled at the maximum level (blocksize-1). For standard sudokus, basic elimination and backtracking is usually faster than using higher level logic, i.e. if your aim is to solve many standard sudokus very fast you should disable the higher level logic (or consider getting a specialized sudoku solver for the task). However, the higher level logic has an important advantage over backtracking. The performance of the logic strategies scales better with sudoku size as compared to backtracking. For some large sudokus backtracking performance is quite poor. Furthermore, the logic strategies give you some sort of metric for the difficulty of the sudoku (the maximum level used to solve the sudoku), thus you can also generate a sudoku with a specific difficulty rating.
The syntax for specifying a sudoku file works with several tags. We first discuss the tags for defining the size and block structure of the sudoku.
There are several defaults that gss assumes if nothing else is defined. If no size is given gss assumes the size to be 81. If no block size is given gss assumes it to be 9. if no blocks are defined gss assumes <standardblocks>.
The following tags are to define the sudoku array itself and the output format of gss.
Some examples are described in Section 3.
the gss program is called from the command line as follows:
gss [options] <sudoku input files> ...
the following options are supported
In this section we present several examples of gss input files for various types of sudokus. You can find example input files for gss in the examples directory.
A basic input file for a standard sudoku could look like shown in Figure 1. As no size and blocks are specified gss assumes the standard size and block structure. The sudoku array is formatted such in ASCII that the sudoku is somewhat readable. The input formatting will be copied by gss for the puzzle and solution output (i.e. gss just copies one to one whatever follows the <sudoku> tag, editing only those characters it recognizes as elements in the sudoku array). This copying of the formatting makes it relatively easy to define your input file such that the output format is what you want it to be.
To solve this sudoku you can run gss sudoku.gss
An example of how you can create a specific sudoku variant can be found in Figure 2. This file is for an “NRC” sudoku, a sudoku with 4 additional blocks on top of the normal sudoku structure. In this example the additional blocks are defined using the <matrix> tag. If you do any type of non-standard block definition you always have to define all blocks as gss will not make any assumptions about the structure. As the remaining blocks are blocks in a standard sudoku we can use the <standardblocks> tag to define all remaining blocks with one tag. The sudoku array itself is again formatted in ASCII.
Another very common variation is the size of the sudoku. The basic structure of a 9x9 standard sudoku can be maintained for a series of sizes starting with a 4x4, 9x9, 16x16, 25x25, etc. In Figure 3 an example of a 16x16 sudoku is shown. As this time the size is not standard we need to tell gss the total number of elements in the sudoku with the <size>-tag, and the block size with the <blocksize>-tag. If the standard structure fits to the indicated size and block size, the <standardblocks>-tag can again be used to specify the all blocks for the standard sudoku structure. Depending on how gss is compiled gss has a maximum block-size of 128, i.e. the largest sudoku with the standard structure that is supported by gss is 121x121.
There are many databases around with many sudokus. It is useful to be able
to use gss on such a database to benchmark and verify gss. With the
--compact-format option you can pass a sudoku database. With this option gss
will read multiple sudokus from one file. It will simply read the file until it read 81
number characters, and take the result as one sudoku. Unknowns may be specified
by either a zero or a period. If gss encounters a “#” it will skip the remainder of
the line so you can add comments. Some commonly used sudoku databases are a
list of minimal sudokus (with only 17 hints) which can be downloaded from
http://staffhome.ecm.uwa.edu.au/~00013890/sudokumin.php.
Several other lists of hard sudokus can be found at
http://forum.enjoysudoku.com/the-hardest-sudokus-new-thread-t6539.html.
When it comes to benchmarking you will find that gss is considerably slower than many other solvers. The primary reason for this is the use of higher level logic. Per default gss tries to use logic and uses backtracking only as a last resort. As it is, however, backtracking is not all that inefficient on the relatively small grid of a standard 9x9 sudoku. In fact we can easily boost gss performance by disabling the use of higher level logic (either pass -S brute or -l 1). Without higher level logic gss performance improves quite a bit on standard sudokus. Note however, that on larger sudokus backtracking may perform very bad as compared to higher level logic.
The previous two examples use ASCII formatting for the sudoku. In general gss only handles ASCII in and output. However, the ASCII format is designed such that gss may provide the input for other progranms to produce prettier output. You can, for example, use gss in conjunction with the Asymptote program to create high quality sudoku graphics. To this end there are two tags in gss that are important. First of all the “<pattern>” tag, this tag specifies a pattern for a sudoku element. This way gss can tell elements apart from other numbers that may be part format requirements from another program. The second is the “<emptychar>” tag. Per default gss replaces unknowns with periods. This is clear in ASCII formatting but may not be desirable in conjunction with other programs. In figures 4.a , and 4.b an example is shown which was created with gss in conjunction with asymptote. In Fig. 4.a we show a pretty version of the same sudoku as in Figure 2. The input file to create these graphics can be found in Appendix A.
To create new sudokus with gss you have to supply gss with a template sudoku, i.e. an input file describing a sudoku. The template sudoku is first solved by gss and subsequently randomized. After randomization gss eliminates elements again to create the new sudoku. For this procedure one needs a valid sudoku as a template. In some cases you may not have a valid sudoku (e.g. if you create a new sudoku layout you may not know a valid solution for the grid). In this case you can have gss try to fill your sudoku for you. There are two algorithms for filling sudokus. The first algorithm (invoked using the -f option) starts with an empty grid and using backtracking until a valid solution is found. This works well for standard size sudokus but for larger grids this method becomes unusable as it becomes too slow. In this case you can try fill the sudoku with an optimization algorithm (invoked with -F). Note, that not all sudoku structures you may come up with have a valid solution. The backtracking algorithm will tell you if no valid solution is found. However, the optimizer has no way of telling as it is not systematic in its search for a solution. If the optimizer fails you may have to try the backtracker to verify a solution exists. In the following two sections I discuss JigSawMRF, a random field generator to aid the creation of jigsaw sudokus and the MakeBook.sh script which demonstrates the creation of various types of sudokus. You can use the script as a tool to create puzzle books or study it to learn how to make your own sudokus with gss.
A common variety of sudokus are jigsaw sudokus where some blocks are irregularly shaped. Usually a jigsaw sudoku consists of some regular blocks, such as rows and columns, and irregular blocks. To obtain a somewhat playable sudoku the cells in an irregular block are generally all connected, which makes it possible to make clear graphical representation of the irregular blocks. Naturally, gss has no problem solving a jigsaw sudoku as we can simply define the irregular blocks with the <block>or <matrix>tags. However, gss cannot create a new jigsaw structure. To generate new jigsaw sudoku structures the utility JigSawMRF is provided. The JigSawMRF utility generates a random field, which separates a field into a number of equal sized and connected blocks.
The elements in the random fields JigSawMRF generate all have an ID number and a coordinate in an N dimensional space. The ID number can be used in gss to specify which cells belong to which block. The coordinates were introduced to aid the graphical representation of the field. The number of dimensions is at least one but can otherwise be chosen freely (so yes, you can create hyper dimensional jigsaw sudokus).
The JigSawMRF program starts with a specification of the field, i.e. which elements are in it, and for each element, which elements are its neighbors. The elements are than assigned random ”phases”, i.e. are assigned to a random block. Then JigSawMRF iteratively swaps element phases until all elements of a specific phase are directly connected or connected via neighbors of the same phase.
The JigSawMRF program is called from the command line as follows:
The following options with arguments are supported:
where, (<x>,<y>,...) specifies the coordinate of each element in n dimensional space. You can provide an arbitrary number of dimensions provided all elements coordinates are specified with the same number of dimensions. The element ID’s specify the neighbors of the element where the element ID’s are numbered in the order of specification the file, starting at 1.
The script MakeBook.sh can create pdf’s with puzzles and solutions for various types of sudokus. The script relies on pdflatex, asymptote, and naturally also on gss and JigSawMRF. The script is located in a directory with several gss input files and other shell scripts which are requires for the proper functioning of the script. Usage:
The following options with arguments are supported:
The MakeBook.sh script knows several sudoku types. Samples of the sudoku types can be found below. The sudoku types are: