Download the code for this article here.
In order to complete the assignment of making an implementation of a computer player for the game "Connect four", I started out by making an environment, a state model and a simple console based interface to test it.
The console based interface works by showing the current state as 7 columns with 6 rows in each. When it's the users turn, he/she should choose one of the 7 columns, by using the numbers 1-7. It is a very simple and basic UI because I wanted to my time on developing the search algorithm that will determine the computers moves.
To run the code, compile the project with a C# 2.0 compiler and run the executable. I've included solution and project files for VS2005 as well.
The console program also contains some experimental functionality that allows you to set up the computer to play against itself. This code was used when I was optimizing the weights of the evaluation algorithm, but is now commented out.
The Connect Four game has been "solved" by Victor Allis, and there is a "Perfect Play" path you can follow, that will force a win to the starting player. However I have chosen to disregard that playing strategy in this implementation since the goal was to learn about AI game-playing.
Connect4: The main game class, that holds the current state and controls the flow of the game. In turn calls each of the two players, typically Computer and Human and executes their actions.
StateType: A state, containing the board, the number of moves made, and who is next in turn. It also contains methods to clone itself and perform a move, as well as evaluate if it is in a game-over state.
Computer: The class that holds all the logic related to the search algorithm. It is called like a player using the PlayerTurn(StateType state) method, and then begins to perform a modified MiniMax search with Alpha-Beta pruning to determine which action to choose. Its functionality is described below.
Human: The Human player. The PlayerTurn(StateType state) method is called by Connect4 every time it's the users turn. This then presents the state to the console and waits for input. When an action is retrieved it returns the action, a move is made and the turn changes.
The Search algorithm is a modified MiniMax algorithm with Alpha-Beta pruning. It is implemented in the "Computer" class, which controls the computer player.
Using double recursion is performs a depth-first search in the state-space until the cut-off method evaluates to true. At this point it uses the heuristic evaluation method to determine how close it (the computer player) is to winning. This value is passed up through the state tree, where each node either selects the maximum or the minimum of its values, depending on if it represents the computers turn or the players turn. The procedure is in fact quite similar to how a typical human player subconsciously would play the game: First enumerate the possible actions from the current state, then estimate the opponents move from any of the resulting states, and decide which would leave you in the most favorable state. However human (non-expert) players will typically only look 2-3 moves ahead, while the computer can search a much larger state-space.
In order to further improve the performance Alpha Beta pruning has been applied. This essentially is to remember the best (or worst) options for each node, so no time will be wasted exploring branches of the trees that's already identified as path that will not be played.
Heuristic Evaluation method
The Evaluation method is supposed to evaluate how good a given state is for the computer, e.g. how close is it to winning the game. After having experimented with several different models I found an approach that suited me the best in evaluation a state. In order to win the player must have 4 fields in a row, horizontally, vertically or diagonally. To find out how close both players was to this, I decided to examine all possible combinations of 4 fields that appear in a row in the 7*6 field board. First I would break it down to lines, horizontally, vertically and diagonally and evaluate each line. All lines I examine will of course need to be 4 fields or longer.
An example, a line with 7 fields would lead to the following possible winning combinations:
1 2 3 4 5 6 7 => 1234, 2345, 3456, 4567
For each of these combinations my evaluation method would examine the amount of them taken by each player. If none of the players or both of the players has selected fields in the same 4-field combination, the combination will be instantly discarded. Otherwise it will add to a state-score depending on the number of fields taken (how close it is to be a 4-in-a-row). Depending on the number, and if it's computer or player fields, a certain weight will be applied.
Determining the evaluation weights
Initially I started out with weights that was the square of the number of occupied fields in each block of four (e.g. 3-in-a-row would give a score of 9, 2 would be 4, 1=1) and if state resulted in 4 in a row it would get a big bonus / punishment of 10000/-10000.
However, it quickly became evident that I would at least need to raise the punishment if the player had 4 in a row, or take into account whose turn it was, to keep the computer player from focusing solely on its own block-building. After that optimization it became increasingly difficult to manually fine-tune the optimization of weights, since no matter how I adjusted them, I still couldn't beat the computer.
So I set up a game to allow two computer players to play against each other. The first player had the starting advantage, and the current weights. The second player would have almost the same weights, except one little difference at a time. If the second player would win in spite of the first player having the starter’s advantage, I would use the new weights as the current weights.
After having gone through this manual iteration 15-20 times, I decided that it was fairly optimized. If time had permitted I could imagine automating this process, perhaps using Simulated Annealing. However I felt that this was outside the scope of the current assignment.
Since the state space will typically be too large to explore in reasonable time, it's necessary to cut-off the search at a certain point.
First of all the cut-off method should be able to detect if we've actually reached a winning state, in which case it should always cut-off, alternatively it should determine if it is feasible to stop the search at the current level.
After experimentation I decided on a simple cut-off mechanism that cuts of the search after looking 6 moves ahead. 7 moves also seem possible within the allowed time-frame but I'm too impatient to wait more than 1-2 seconds for every move when playing.
An ideal approach would of course be a more intelligent cut-off mechanism that would keep searching until all available time was used, however time did not allow for such a solution at this time.