Make it learn. First 2/5th's of the game should be easy, and that's a big
piece of the problem.
locally defined terms:
swath == x by y sequence of positions contained by at least two
empty positions, horizontally and vertically, on all sides,
and by at least one empty position diagonally. x>=3, y>=3,
x*y<=81. This is at Nikken Tobi, or Kogeima, or less
area == 4 by d+1 strip of positions, not limited by played stones in
any way. d is length of the side of the board. a 19x19 board
has 36 (?) areas - not too much to check
+ swaths can be operated upon largely independently. If it is found that
they cannot, they should be redefined to do so
+ swaths of different sizes can easily be considered, and should be,
particularly because the large swath histogram would take relatively
long to fill up. Look for largest swath first, than fall back on
successively smaller swaths
+ let empty be 0, black be 1, white be 2, edge piece be 3.
+ 2 bits per vertex is feasible, but packed formats are slow. 1 byte would
be faster.
+ 81 bytes for a single 9x9 swath isn't bad. However, 531K or 132K is highish
for an exhaustive mapping of all possibilities within a 9x9.
+ histograms should not be expected to become fully populated. Some moves
are illegal. Note that dumb moves would probably be represented, tho more
"slowly" than good moves.
+ in the sparse (one byte per) format, space required for a 100% populated
histogram is x^6 bytes, where "x" is the length of a swath's side. for
the sparse format, of course, that's (x^6)/4 bytes.
+ combining swaths may reduce to
+ random placements may be desirable for a while, at Least
+ some kind of basic "this is a legal move" checking should be added, if
nothing else just to get a good reality check into the histogram-driven
moves
+ consider digging up a ton of SGF files to do a histogram of "winning
swaths"
+ things that make a swath or area "good":
+ eventually, the game is won. This should deter overaggressive play.
+ a high number of the opponent's stones (defined by units that may
extend outside the swath) are captured from it
+ at the end of the game, a high number of empty spaces are bordered by
portions of this swath - partially or wholly. wholly might count for
more
+ swaths and areas may need different rules for what makes them "good"
+ this "good swath" things should probably be stored separately, so that
they can be weighted, and these weightings could be adjusted on the fly
+ stored swaths applied to swaths on-board count more than stored areas
applied to areas on-board
+ stored areas applied to unknown (or too large) on-board swaths count
more than stored swaths applied to same
+ functions for rotating swaths and areas are important. 0, 90, 180 and 270.
if feeling lazy, 90 could be used to construct 180 and 270, but that should
be encapsulated
+ consider storing swath's (or whatever) in a sparse unix file. All that's
stored is data describing how good the swath is. The offset of the swath,
is all the describes the swath itself
+ consider making the canonicalization function, invert the colors in the
swath half the time (based on, say, the color of the piece closest to the
upper left, scanning toward the right). This could halve the size of the
databases
+ consider making the canonicalization function, sometimes rotate. It's
obvious when to do this if one edge is included, or if a corner is
included. It's less obvious for swaths having no edge pieces at all,
but may be very beneficial there too. this could potentially reduce
the database size to 25% of what it would be otherwise
+ possible way of canonicalizing stuff with and without an edge: go ahead
and rotate a swath 0, 90, 180 and 270. Then treat each as a number, base
4. Pick the smallest number. This should encompass rotation for edges.
Ensure that empty is 0, as there should be plenty of them; this should
concentrate the swaths at the beginning of the file storing the database.
Also make edge 3, as there should never be more of them than the other
three combined. Then again, reverse that: it'll make the file look longer
to ls, but it should make it actually take less room on disk to have the
least significant bytes varying most frequently
+ possibility for making stored swaths more applicable to endgame: if a
suggested move is on the vary edge of the in-database swath, don't choose
it.
----------------------------------------------------------------------
SGF
+ it is apparently common (but deprecated) practice to add stones to a board
when commenting it. Hence, when parsing up SGF (or MGT) format files, it
may be important to apply a consistency check to ensure that the sequence
of steps is valid. If any are invalid, the whole thing should probably be
thrown out
+ sgf spec is at http://nobi.ethz.ch/martin/sgfspec.html
+ converting sgf to other formats:
+ http://www.cwi.nl/~jansteen/sgf2misc/sgf2misc.html
+ prog/sg2ishi05.sh.Z (problem with variations, and short format)
+ prog/gobase20.zip (does variations .exe and C source, portable?)
+ smart go (sgf) and ishi (sf) are most common. also liberty
----------------------------------------------------------------------
Finding an archive of nicely-formatted, hopefully skillfully played,
games.
+ collectors (without public info?)
+ Dave Dyer apparently collects games.
http://www.andromeda.com/people/ddyer/go-program.html. They weren't
available when I looked.
+ http://www.cwi.nl/~jansteen/gobase/gobase.html, joseki database, other
+ archives of available data
+ http://ltiwww.epfl.ch/~warkent/go/stuff/definitions.html http, sgf
ftp://igs.nuri.net/Go/games/
ftp://igs.nuri.net/Go/pro/
----------------------------------------------------------------------
lots of good info about computerizing go:
http://www.psy.uq.edu.au:8080/~jay/go/CS-TR-339.html