Make it learn. First 2/5th's of the game should be easy, and that's a big piece of the problem. locally defined terms: swath == x by y sequence of positions contained by at least two empty positions, horizontally and vertically, on all sides, and by at least one empty position diagonally. x>=3, y>=3, x*y<=81. This is at Nikken Tobi, or Kogeima, or less area == 4 by d+1 strip of positions, not limited by played stones in any way. d is length of the side of the board. a 19x19 board has 36 (?) areas - not too much to check + swaths can be operated upon largely independently. If it is found that they cannot, they should be redefined to do so + swaths of different sizes can easily be considered, and should be, particularly because the large swath histogram would take relatively long to fill up. Look for largest swath first, than fall back on successively smaller swaths + let empty be 0, black be 1, white be 2, edge piece be 3. + 2 bits per vertex is feasible, but packed formats are slow. 1 byte would be faster. + 81 bytes for a single 9x9 swath isn't bad. However, 531K or 132K is highish for an exhaustive mapping of all possibilities within a 9x9. + histograms should not be expected to become fully populated. Some moves are illegal. Note that dumb moves would probably be represented, tho more "slowly" than good moves. + in the sparse (one byte per) format, space required for a 100% populated histogram is x^6 bytes, where "x" is the length of a swath's side. for the sparse format, of course, that's (x^6)/4 bytes. + combining swaths may reduce to + random placements may be desirable for a while, at Least + some kind of basic "this is a legal move" checking should be added, if nothing else just to get a good reality check into the histogram-driven moves + consider digging up a ton of SGF files to do a histogram of "winning swaths" + things that make a swath or area "good": + eventually, the game is won. This should deter overaggressive play. + a high number of the opponent's stones (defined by units that may extend outside the swath) are captured from it + at the end of the game, a high number of empty spaces are bordered by portions of this swath - partially or wholly. wholly might count for more + swaths and areas may need different rules for what makes them "good" + this "good swath" things should probably be stored separately, so that they can be weighted, and these weightings could be adjusted on the fly + stored swaths applied to swaths on-board count more than stored areas applied to areas on-board + stored areas applied to unknown (or too large) on-board swaths count more than stored swaths applied to same + functions for rotating swaths and areas are important. 0, 90, 180 and 270. if feeling lazy, 90 could be used to construct 180 and 270, but that should be encapsulated + consider storing swath's (or whatever) in a sparse unix file. All that's stored is data describing how good the swath is. The offset of the swath, is all the describes the swath itself + consider making the canonicalization function, invert the colors in the swath half the time (based on, say, the color of the piece closest to the upper left, scanning toward the right). This could halve the size of the databases + consider making the canonicalization function, sometimes rotate. It's obvious when to do this if one edge is included, or if a corner is included. It's less obvious for swaths having no edge pieces at all, but may be very beneficial there too. this could potentially reduce the database size to 25% of what it would be otherwise + possible way of canonicalizing stuff with and without an edge: go ahead and rotate a swath 0, 90, 180 and 270. Then treat each as a number, base 4. Pick the smallest number. This should encompass rotation for edges. Ensure that empty is 0, as there should be plenty of them; this should concentrate the swaths at the beginning of the file storing the database. Also make edge 3, as there should never be more of them than the other three combined. Then again, reverse that: it'll make the file look longer to ls, but it should make it actually take less room on disk to have the least significant bytes varying most frequently + possibility for making stored swaths more applicable to endgame: if a suggested move is on the vary edge of the in-database swath, don't choose it. ---------------------------------------------------------------------- SGF + it is apparently common (but deprecated) practice to add stones to a board when commenting it. Hence, when parsing up SGF (or MGT) format files, it may be important to apply a consistency check to ensure that the sequence of steps is valid. If any are invalid, the whole thing should probably be thrown out + sgf spec is at http://nobi.ethz.ch/martin/sgfspec.html + converting sgf to other formats: + http://www.cwi.nl/~jansteen/sgf2misc/sgf2misc.html + prog/sg2ishi05.sh.Z (problem with variations, and short format) + prog/gobase20.zip (does variations .exe and C source, portable?) + smart go (sgf) and ishi (sf) are most common. also liberty ---------------------------------------------------------------------- Finding an archive of nicely-formatted, hopefully skillfully played, games. + collectors (without public info?) + Dave Dyer apparently collects games. http://www.andromeda.com/people/ddyer/go-program.html. They weren't available when I looked. + http://www.cwi.nl/~jansteen/gobase/gobase.html, joseki database, other + archives of available data + http://ltiwww.epfl.ch/~warkent/go/stuff/definitions.html http, sgf ftp://igs.nuri.net/Go/games/ ftp://igs.nuri.net/Go/pro/ ---------------------------------------------------------------------- lots of good info about computerizing go: http://www.psy.uq.edu.au:8080/~jay/go/CS-TR-339.html