Game Theory
Game theory is the study of how decision makers will act and react in various situations like negotiating business deals. It is used quite a bit in the study of economics and politics. This is the field that has recently gained some fame with the ‘A Beautiful Mind’ book and movie about John Nash. Much of game theory is built on the Nash Equilibrium. John Von Neumann laid much of the ground work for game theory. Using simplified models, often based only a few rules, many behaviors of people in various situations can be predicted.In these game models it is assumed that the players will make rational choices in each decision. Rational choice is defined as: ‘the action chosen by a decision maker is at least as good as every other available action’. A player is presented with a set of actions from which she chooses one. She may prefer one action over another or may consider some actions to be equally preferable. The only restriction on the actions preferences is that if a player prefers action A over action B, and she prefers action B over actions C then she must also prefer action A over C. A payoff function is used to describe the benefit of each action from the player’s point of view. For example I may visit a used car lot and find a car that is worth 1,000 by my valuation, you may value that car at 500. So the payoff function for the car for me is 1,000 for you it is 500. If I see another car I like that I value at 250, it only means I prefer the first car to the second car. It does not mean that I prefer it 4 times as much. The values are only to show ordering of choices.
The Nash Equilibrium is the place where each player can do no better, no matter what the other person decides to do.
In the well known game of ‘Prisoner’s Dilemma’ we have two crooks who worked together on a crime and have each been caught and are being held separately from each other. They each have a choice of finking or remaining quiet. If one finks, he walks with 1 years time in jail, the other person gets 4 years in jail. If neither finks there is enough evidence to put each away for 2 years time.
| Crook | quiet | fink |
| 2 | 2,2 | 0,3 |
| 1 | 3,0 | 1,1 |
The highest score of all the plays is for both crooks remain quiet and each receives 2 years jail time. But if Crook one finks, he gets a score of 3 (1 years time ) against the other player who remains quiet or a score of 1(and gets 3 years jail time) So his best bet is to fink, as is the other crooks. The Nash equilibrium is at fink/fink (1,1) since finking is the best move for each player individually.
The payoff function for this game is the same for each player and is: f(Fink, Quiet) f(Quiet, Quiet) f(Fink, Fink) f(Quiet, Fink) So we could as easily score it 3, 2, 1, 0 rather than counting years out or 32, 21, 9, 1 the score only serves to order the choices. A clearer example is a game in which we have 2 players moving pieces on a 3-d game board. Each player can move in the x, y, or z direction.
| X1 | Y1 | Z1 | |
| X2 | 2*,1* | 0,1* | 0,0 |
| Y2 | 0,0 | 2*,1 | 1*,2* |
| Z2 | 1,2* | 1,0 | 0,1 |
The * is the best move for each player considering what the other player does. If player 1 moves in the Y direction player 2′s best move is also in the Y direction (2*,1) The squares with both players have *s are X1 X2 and Z1 Y 2 . These are both Nash equilibriums. Finding the Nash Equilibrium this way is called ‘Best Response’
Suppose instead of set numbers I have a function that describes the payoff for each player. I could have A(x) = y 2 andB(y) = 1 2 x + xy then to find the Nash equilibrium I take the derivative of each function, set it to zero, solve and plot. Any and all places the functions cross on the plot are Nash equilibriums.
It is not possible, even for most simple games, to search all the possible routes the game can take. Too much time and hardware would be needed. For example tic-tac-toe is one of the simplest games we all know. A game tree that mapped all possible moves from start to finish would be 9! or 362,880 nodes large. The first player would have 9 choices of which box to play in, the second 8 choices since the first player had taken one, the first player’s second move would have 7 choices, etc. So the top level of nodes would have 9 choices. Each level in the tree represents a turn in the game. The second level would have 8 nodes off of each of the original 9 nodes and so on. So you can imagine what chess or other more complicated games have as the number of possible moves. All games and all competitions can be represented by trees.
Pruning is used to take sections off of the search tree that make no difference to play. Heuristic (rule of thumb) evaluations allow approximations to save search time. For instance in the tic-tac-toe tree described above once the first player chooses a position to play then the other 8 nodes of the top layer can be trimmed off and only the 8 trees under that node need to be searched.
Since it is not usually practical to calculate each possible outcome a cut off is usually put in place. As an example look at a chess game. For each board in play we can calculate the advantage by adding up the point value of the pieces on the board or adding points for position. Then the program can see which of those gives the program a higher score. Then the program need only calculate five or so moves ahead, calculate the advantage at each node and choose the best path. Rather than calculate ahead a set number of moves, the program can use an iterative deepening approach and calculate until time runs out. A quiescent search restricts the above approach. This eliminates moves that are likely to cause wild swings in the score. The horizon problem occurs when searches do not look ahead to the end of the game. This is a current unsolved problem in game programming.
The min max algorithm assumes a ‘zero sum game’, such as tic-tac-toe where what is good for one player is bad for the other player. This algorithm assumes that both players will play perfectly and attempt to maximize their scores. The algorithm only generates the trees on the nodes that are likely to be played. Max is the computer, min is the opposing player. It is assumed max will get first turn.
-generate entire game tree down to the maximum level to check generate each terminal state value, high values are most beneficial to max, negative values are most beneficial to min, zero holds no advantage for either player.
-go up one level, give the node above the previous layer the best score from the prior layer
-continue up the tree one level at a time until top is reached
-pick the node with the highest score.
The Alpha-Beta method determines whether an evaluation should be made of the top node by the min-max algorithm. It searches all of the nodes, like min max, then eliminates (prunes) those that are never going to reached. The program begins by proceeding with the min-max algorithm systematically through the nodes of a tree. First we go down a branch of the tree and calculate the score for that node. Then we proceed down the next branch. If the score at one of the leaves is lower than the score obtained in a previous branch of the tree we don’t finish evaluating all the nodes of the branch, rather we move onto the next branch. The search can be shallow rather than deep saving time. Further gains in speed can be made by caching the information from branches in a look up table, re-ordering results, extending some and shortening other searches, or using probabilities rather than actual numbers for cutoffs and using parallel algorithms.
Ordering may be used to save time as well. In chess captures would be considered first, followed by forward moves, followed by backward moves. Or, ordering can consider the nodes with the highest values first.
The program must try to find a winning strategy that does not depend on the human user’s moves. Humans often make small goals and consider moves that work toward that goal, i.e. capture the queen. David Wilkins Paradise is the only program so far to do this successfully. Another approach is to use book learning. Several boards are loaded into a table in memory and if the same board comes into play the computer can look up what to do from there. The Monte Carlo simulation has been used successfully in games with non-deterministic information, such as; Scrabble, dice, and card games.
Temporal-difference learning is derived from Samuel’s machine learning re search. Several games are played out and kept in a database. This works well with board games like backgammon, chess and checkers. Neural nets can be trained to play games this way, TD Gammon being one the more famous ones.
More information:
Game Theory
Martin J Osborne
Gam3r 7h30ry
Distinguishing the opponents in the prisoner dilemma in well-mixed populations
Towards Network Games with Social Preferences
An introduction to quantum game theory
Games on social networks: On a problem posed by Goyal
Backward Induction
Are mathematics the new artificial intelligence?
Backward Induction
Leave a Reply
You must be logged in to post a comment.