Connect Four Hash Function: Map close elements to close hash keys - algorithm

I'm writing a Connect Four game engine. Currently I'm using Zobrist hashing to generate hash keys for different Connect Four board positions (In order not to do the same thing twice, evaluated board positions are stored in a hash table). The board positions evaluated (nodes in a minimax tree), are always close to each other. Unfortunately close board positions are mapped to uniformly distributed hash-keys leading to a lot of cpu cache misses.
Is it possible to build a hash function which maps close board positions to close hash keys?
A board position for one player is represented by a bitboard of following structure:
. . . . . . . TOP
5 12 19 26 33 40 47
4 11 18 25 32 39 46
3 10 17 24 31 38 45
2 9 16 23 30 37 44
1 8 15 22 29 36 43
0 7 14 21 28 35 42
I don't know if it is even possible.
Thanks for your help!

I don't think this is possible. A good hash key (like zobrist hashing is for board games) will most likely have pseudo random properties to achieve a uniform distribution of keys in the transposition table. Having the keys of "close" positions close to each other in table contradicts this.
Consider this: Even if you map your board positions one to one to a table with (2^7-1)^7 positions, you will not be able to map "close" board positions to close memory locations: If a piece at a low index changes, positions will be near, but the higher the piece index gets the position differences double each time, and the high ones will be many terabyte apart ;-)
As an author of a chess engine I know this problem. AFAIK nobody solved this problem yet, and everybody uses zobrist hashing, maybe with some minor modifications.
Anyway, good luck solving Connect-4... I know it has been done before, but it is more satisfactory to do it self ;-)

Here is how to modify your presumably nearly uniformly random hash function to bias it in a way that similiar board positions are somewhat likely to occur at nearby hashes.
Let hash(gamestate) be your existing function. We'll create a newhash(gamestate) that uses hash for the random behavior, but has a reasonably high probability of generating hashes that are near each other for closely related game states.
Let the 'color' of a board state be the next player to move. If want to find the hash key for the white player, use newhash(board) = hash(board). If you want to find the hash for a black position, find the black piece with maximal number according to your order, say, at position i. Remove piece i from the game state and call the modified state probableparent Then use newhash(board) = hash(probableparent) + i. If you order the positions by likely order of placement (higher things come later as a first order criteria, maybe the middle locations come earlier as a second criteria? I don't really know good strategy for connect4), then it's somewhat likely that on the white turn before the black turn was at probableparent, and hence nicely in your cache and hence i is near by. Also, the 8 possible black moves will likely share the same prev_board state and hence have near by hash locations.
You can extend this idea to roll back more than one ply at a time. Say if current turn % 3 == 2, removing the maximal two moves at board positions i and j , and then use newhash(board) = hash(board-two-removals-ago) + i*48 + j.

Related

How to get powerset from a set with 3600 elements using as little memory as possible

I have been looking for a language and code to help me calculate all possible subsets of a set of 3600 elements. At first my search started with python, then I went through JavaScript and then came to Perl. I know using Perl to calculate all subsets as shown in https://rosettacode.org/wiki/Power_set having 16GB of ram there is a significant memory consumption, but I'm not sure if anything better than perl or this script bellow:
MY MWE:
use ntheory "forcomb";
my #S = qw/1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30/;
forcomb { print "[#S[#_]] " } scalar(#S);
print "\n";
There is no calculator that can handle so much elements in memory.
The number of possible subset starting from a set of 3600 elements is 2^3600.
This number is very big. Consider that
2^10 is close to 1.000
2^20 is close to 1.000.000
2^30 is close to 1.000.000.000
Basically every 10 you add three zeros, so with 2^3600 you have a number with 1200 zeros of different combinations, which is an unimaginable big number.
You can't solve this problem also saving the data to disk and using all the existing computers on the earth.
With all the computers existing on the earth (a number close to 2.000.000.000, so 2^31 computers) and imagine a disk space of a terabyte for each of them (2^40 bytes) you can imagine storing information for a set of 71 elements (71 not 3600) using a single byte to store each number and without considering the extra space to store the set information... take your consideration based on that.
You can eventually imagine giving a sort order to all the possible subsets and coding an algorithm that gives you the nth subset based on that sort. This can be done because you don't need to calculate and store all possible subsets, but calculate just one using some rule. If you are interested we can try to evaluate such solution
For a set s (with size |s|), the size of its power set P(s) is |P(s)| = 2^|s|.
Never-mind the memory. You'd need 2^3600 iterations to calculate each value.
This is totally computationally intractable in this universe.
Take Java (or another compiled language like Pascal with some bit support).
It has BitSet, so 3600 elements are represented with approximately 3600/8 = 450 bytes. All possibilities would be 23600: to much to iterate. One could iterate with a BigInteger, every ith bit representing an element.
Simple iterating with a BigInteger upto 23600 - 1 should make it your (descendants') life's work. Be aware that this kind of problem is something for quantum computing.
But I assume you have a very smart algorithm pruning most possibilities.
It would be nice to have dependencies like in sudoku. Then maybe a logic language or some rule engine might do.
Should 3600 be the seconds in an hour which you have to combine, please consider spending that hour otherwise. 😉

Get all possible valid positions of ships in battleship game

I'm creating probability assistant for Battleship game - in essence, for given game state (field state and available ships), it would produce field where all free cells will have probability of hit.
My current approach is to do a monte-carlo like computation - get random free cell, get random ship, get random ship rotation, check if this placement is valid, if so continue with next ship from available set. If available set is empty, add how the ships were set to output stack. Redo this multiple times, use outputs to compute probability of each cell.
Is there sane algorithm to process all possible ship placements for given field state?
An exact solution is possible. But does not qualify as sane in my books.
Still, here is the idea.
There are many variants of the game, but let's say that we start with a worst case scenario of 1 ship of size 5, 2 of size 4, 3 of size 3 and 4 of size 2.
The "discovered state" of the board is all spots where shots have been taken, or ships have been discovered, plus the number of remaining ships. The discovered state naively requires 100 bits for the board (10x10, any can be shot) plus 1 bit for the count of remaining ships of size 5, 2 bits for the remaining ships of size 4, 2 bits for remaining ships of size 3 and 3 bits for remaining ships of size 2. This makes 108 bits, which fits in 14 bytes.
Now conceptually the idea is to figure out the map by shooting each square in turn in the first row, the second row, and so on, and recording the game state along with transitions. We can record the forward transitions and counts to find how many ways there are to get to any state.
Then find the end state of everything finished and all ships used and walk the transitions backwards to find how many ways there are to get from any state to the end state.
Now walk the data structure forward, knowing the probability of arriving at any state while on the way to the end, but this time we can figure out the probability of each way of finding a ship on each square as we go forward. Sum those and we have our probability heatmap.
Is this doable? In memory, no. In a distributed system it might be though.
Remember that I said that recording a state took 14 bytes? Adding a count to that takes another 8 bytes which takes us to 22 bytes. Adding the reverse count takes us to 30 bytes. My back of the envelope estimate is that at any point in our path there are on the order of a half-billion states we might be in with various ships left, killed ships sticking out and so on. That's 15 GB of data. Potentially for each of 100 squares. Which is 1.5 terabytes of data. Which we have to process in 3 passes.

How to generate/iterate all "Connect-Four" games?

I need to iterate all different "Connect-Four" games possible.
The grid has 42 cells, and there are 21 red and 21 yellow pieces.
Every game generated must use every pieces, and all pieces of the same color are indistinguishable (e.g if you swap two reds in a solution, it doesn't count as another solution)
From that I can draw the conclusion that there's
I'm thinking about generating binary strings containing 21 0 and 21 1 but beside generating every 42-char long binary string and testing them one by one, I don't have any idea how to do that. That would be 42! (1.4050061e+51) string to test, so that's not an option.
How would you go about generating all these possible games ?
It seems that you do not care that some of these games will have ended early. To simply generate all of the possible combinations, you should think of the board as a matrix, where a 1 represents a black, and a 0 represents a red.
Now if we vectorize the matrix of values for a full board, then we will get something like
[0,1,1,0,...]
where the exact order depends on the permutation. Now since we have 21 of each color, that means that you are essentially asking for all of the possible permutations of the vector
[ones(1,21),zeros(1,21)]
(in Matlab and Python notation). In Matlab, you would then generate the list of all permutations by using the function
perms([ones(1,21),zeros(1,21)])
I am not sure what you want here because obviously it is not feasible to enumerate all of these in practice. If you are just interested in how to do it, I would suggest that you look in the Matlab implementation. It looks like 10 lines of pretty simple code.

Generating Settlers of Catan Numbers?

I am trying to generate a Settlers of Catan game board and am stuck trying to create an efficient implementation of hex numbers.
The goal is to randomly generate a set of numbers from 2-12 (with only one instance of 2 and 12, and two instances of all numbers in between), ensuring that the values 6 and 8 they are not hexagonally (?) adjacent to one another. 6 & 8 are special because they are the numbers you are most likely to roll so the game does not want these next to one another as players get disproportionately higher resources of that kind. A 7 means you have to discard resources.
The expected result: http://imgur.com/Ng7Siy8
Right now I have a working brute force implementation that is very slow and I am hoping to optimize it, but I am not sure how. The implementation is in VBA, which has constrained the data structures I can use.
In pseudo code I am doing something like this:
For Each of the 19 hexes
Loop Until we have a valid number
Generate a random number between 1 and 12
Check
Have we already placed too many of that number?
Is the number equal to 6 or 8?
Is the number being placed on a hex next to another hex with 6 or 8 placed on it?
If valid
Place
If invalid
Regenerate random number
It's very manual and subject to the random generator function, which means it can be anywhere from being really short to being really really long (compounded over 19 hexes).
Note: How my numbers are being placed seems important. I start at the outside of the gameboard (see here http://imgur.com/Ng7Siy8) on the gray hex with number 6, and then move counter clockwise around the board inward. This means that my next hex is 2 light green, 4 light orange...continuing around to 9 dark green and then coming inwards to 4 light orange.
This pattern limits the number of comparisons I need to make.
There are several optimizations you can do - first of all you know exactly how many numbers are present prom each tile - you have 2,3,3,4,4,5,5,6,6,8,8,9,9,10,10,11,11,12. So start off with this set of numbers - you will eliminate the check if the number has been generated too many times. now you can do a random shuffle of this set of numbers and check if it is "valid". This will still result in too many negative checks I think but it should perform better than your current approach.
Place the 8 first, calculate which of the remaining tiles you'd be happy to place the 6 on (i.e. non-adjacent), then choose on at random for the 6. Then place the rest.

How to work out the complexity of the game 2048?

Edit: This question is not a duplicate of What is the optimal algorithm for the game 2048?
That question asks 'what is the best way to win the game?'
This question asks 'how can we work out the complexity of the game?'
They are completely different questions. I'm not interested in which steps are required to move towards a 'win' state - I'm interested in in finding out whether the total number of possible steps can be calculated.
I've been reading this question about the game 2048 which discusses strategies for creating an algorithm that will perform well playing the game.
The accepted answer mentions that:
the game is a discrete state space, perfect information, turn-based game like chess
which got me thinking about its complexity. For deterministic games like chess, its possible (in theory) to work out all the possible moves that lead to a win state and work backwards, selecting the best moves that keep leading towards that outcome. I know this leads to a large number of possible moves (something in the range of the number of atoms in the universe).. but is 2048 more or less complex?
Psudocode:
for the current arrangement of tiles
- work out the possible moves
- work out what the board will look like if the program adds a 2 to the board
- work out what the board will look like if the program adds a 4 to the board
- move on to working out the possible moves for the new state
At this point I'm thinking I will be here a while waiting on this to run...
So my question is - how would I begin to write this algorithm - what strategy is best for calculating the complexity of the game?
The big difference I see between 2048 and chess is that the program can select randomly between 2 and 4 when adding new tiles - which seems add a massive number of additional possible moves.
Ultimately I'd like the program to output a single figure showing the number of possible permutations in the game. Is this possible?!
Let's determine how many possible board configurations there are.
Each tile can be either empty, or contain a 2, 4, 8, ..., 512 or 1024 tile.
That's 12 possibilities per tile. There are 16 tiles, so we get 1612 = 248 possible board states - and this most likely includes a few unreachable ones.
Assuming we could store all of these in memory, we could work backwards from all board states that would generate a 2048 tile in the next move, doing a constant amount of work to link reachable board states to each other, which should give us a probabilistic best move for each state.
To store all bits in memory, let's say we'd need 4 bits per tile, i.e. 64 bits = 8 bytes per board state.
248 board states would then require 8*248 = 2251799813685248 bytes = 2048 TB (not to mention added overhead to keep track of the best boards). That's a bit beyond what a desktop computer these days has, although it might be possible to cleverly limit the number of boards required at any given time as to get down to something that will fit on, say, a 3 TB hard drive, or perhaps even in RAM.
For reference, chess has an upper bound of 2155 possible positions.
If we were to actually calculate, from the start, every possible move (in a breadth-first search-like manner), we'd get a massive number.
This isn't the exact number, but rather a rough estimate of the upper bound.
Let's make a few assumptions: (which definitely aren't always true, but, for the sake of simplicity)
There are always 15 open squares
You always have 4 moves (left, right, up, down)
Once the total sum of all tiles on the board reaches 2048, it will take the minimum number of combinations to get a single 2048 (so, if placing a 2 makes the sum 2048, the combinations will be 2 -> 4 -> 8 -> 16 -> ... -> 2048, i.e. taking 10 moves)
A 2 will always get placed, never a 4 - the algorithm won't assume this, but, for the sake of calculating the upper bound, we will.
We won't consider the fact that there may be duplicate boards generated during this process.
To reach 2048, there needs to be 2048 / 2 = 1024 tiles placed.
You start with 2 randomly placed tiles, then repeatedly make a move and another tile gets placed, so there's about 1022 'turns' (a turn consisting of making a move and a tile getting placed) until we get a sum of 2048, then there's another 10 turns to get a 2048 tile.
In each turn, we have 4 moves, and there can be one of two tiles placed in one of 15 positions (30 possibilities), so that's 4*30 = 120 possibilities.
This would, in total, give us 1201032 possible states.
If we instead assume a 4 will always get placed, we get 120519 states.
Calculating the exact number will likely involve working our way through all these states, which won't really be viable.

Resources