Rock paper scissors is a game about predicting the opponent. This is a hard problem. Someone new to this may ask, “Why not just play randomly?” It is well known that it is impossible to consistently beat a random player. Then why bother? As it turns out, a random strategy will only win 50% of its matches. However, a good predicting algorithm can exploit patterns in not-so-random opponents (including humans) and beat them more often. In fact, some of the strongest engines on the leaderboard have a win rate of over 80%!
Suppose you have 10 opponents, of which 9 play randomly and 1 always plays rock. If you also play randomly, you’ll win 50% of the time. If you always play paper, however, you will win 55% of the time, since you’ll always beat the guy that always plays rock, while still faring the same against the random players. A win rate of 55% is better than a win rate of 50%. Importantly, as long as there is even a single non-random player in the pool, an algorithm that exploits the patterns will always do better.
Also, as a New York Times article reported, humans are pretty bad at being random.
To this end, this page is about describing various approaches to writing a rock paper scissors algorithm. In the Rock Paper Scissors Programming Competition, my entries, or variants thereof, are consistently ranked near the top of the leaderboard.
2 Primary strategies
There are some basic strategies, in roughly ascending order of strength.
2.1 Fixed move
The simplest strategy is to always play the same move. An example is Always Rock. Not very interesting…
2.2 Frequency counting
This algorithm counts the previous moves of the opponent to determine if the opponent prefers playing one type of move over the others. Essentially, it keeps a “score” for each of rock, paper, and scissors. After each move, the respective score of the opponent’s move is incremented.
Such an algorithm would easily defeat fixed move algorithms. An example is The Frequentist.
It is possible to make more sophisticated variants of this strategy. The algorithm Boltzmann Counter, for example, has a decaying score instead. This makes it more influenced by recent outcomes than past ones and enables it to adapt to a slowly changing opponent. It furthermore reduces the score of the move that would have lost. So, for example, if the opponent played rock, the score for paper would increase, but the score for scissors would decrease.
In rock paper scissors, a rotation is a modular addition.
|Move||Rotation by 0||Rotation by 1||Rotation by 3|
A move rotated by 0 (or equivalently by 3, 6, or any number that is 0 modulo 3) is simply the move itself. A move rotated by 1 is the move that beats it. A move rotated by 2 is the move that loses to it.
An example is Beat Last Input which replies to the opponent’s previous move by a rotation of 1.
This algorithm assumes the opponent is using one of the three possible rotations. It calculates the rotation needed to get the opponent’s last move from its own move before that, and determines the rotation that the opponent must have used. Then, it beats this rotation of its own last move. For example, if the opponent rotated the last move by 1, this algorithm would rotate its own last move by 2 just to one-up the opponent.
An example is Probably not very strong 3.3.
2.5 History string matching
This can be considered the simplest case of prediction by partial matching. The algorithm records all the moves made so far, as a string for example. To determine the next move to play, it uses a substring matching algorithm to find the most recent time that the last sequence of moves have been played.
Then, it plays the move that counters what the opponent played last time after that sequence. In Python, the function
rfind is useful to conduct such a search. For example, if the history looked like this:
RPSRRPRSPRPSRPRSPSPSPRRPSRPR RPSRPR RPSRPR
I have repeated the latest match of the last sequence on a second line for clarity. The letter coming after the match is an
S, so you might guess the opponent might play S next.
An example is rfind. Many variants exist, including matching one’s own moves, a combination of one’s own moves and the opponent’s moves, finding the earliest instead of the latest match, or finding all matches.
Each of the previously discussed strategies has a counter. It is easy to counter any of them, because they all behave predictably. But even a pure counter can be countered by rotating the outcome once more. In fact, for each strategy, there are six possible ways of using it. Such metastrategies were used in Dan Egnor’s Iocaine Powder program.
|P0: naive application||Play the strategy naively.|
|P’0: beat P0||Exchange the position of you and your opponent. Then, play to beat the output of the naive strategy in your opponent’s shoes.|
|P1: beat P’0||Rotate P0 by two, that is play the move that loses to the output of P0.|
|P’1: beat P1||Rotate P’0 by two, same as above.|
|P2: beat P’1||Rotate P1 by two.|
|P’2: beat P2||Rotate P’1 by two.|
Note that P0 beats P’2, thus going a full circle.
3.1 Metastrategy selection
If each strategy has six possible metastrategies, how do we choose which one to use? What if we also have more than one strategy to begin with? The way to do this is to select based on past performance. If a strategy has been successful, we use that one. If a strategy always loses, we don’t use it. Here are some ways to implement this:
- Naive scoring: A number is stored for each predictor. After each round, if the predictor successfully predicted the opponent’s move, its score is incremented by one. If the prediction loses to the opponent’s move, its score is decremented by one. An example is Variable Turbine Geometry.
- Decayed scoring: This works just like naive scoring, except all scores are also multiplied by a number between 0 and 1 after each round (usually around 0.9), so that it gradually “forgets” about the performance of each predictor a long time ago. Thus, if the opponent occasionally switches strategies, this should be able to cope. An example is DNA werfer 2.
- Drop switch: Just like naive scoring, except if a predictor loses even once, its score is reset to zero. This allows for much faster response to opponents with switching strategies than decayed scoring, but can be vulnerable if the opponent adds some sort of noise (e.g. by playing a random move every tenth move). An example is DNA werfer 5 M0.
- Random drop switch: Just like drop switch, except the predictor score is only reset to zero randomly with a probability of, say, 0.5. An example is Random Drop-Switch.
Of course, the predictor with the highest score is chosen to be used. There are also some other selectors, such as resetting the score to zero whenever the predictor does not win (i.e., even when it draws). Some other selectors work like naive scoring or decayed scoring, but also decrement the score by a small value on a draw. It is interesting to note that the frequency counting strategy is in fact simply a fixed move strategy coupled with a naive scoring selector.
Notice that some of the basic strategies are in fact simpler strategies with a metastrategy. For example, Frequency Counting is simply the Fixed Move strategy with a naive scoring: it chooses between the Fixed Move strategies (always play rock, always play paper, or always play scissors) based on which one was the most successful.
3.2 Selector selection
With so many strategy selection methods to choose from, how do we choose which one is the best? Hence, we have another layer: selector selection.
These are implemented in the same way as before. Examples are Centrifugal Bumblepuppy 13 and switching18.
4 Defense mechanisms
It is a well-known fact that a truly random strategy cannot possibly lose consistently. Hence, it is useful to switch to a random strategy once the algorithm detects that it is losing. One way to do this is to simply include the random strategy as one of the predictors alongside other strategies like history matching. Then if the other strategies all start losing, they would all have pretty negative scores, except the random one, which will presumably have a score of close to zero. In this event, the selector would choose the random strategy over the other strategies and hopefully avoid a terrible loss.