Games between Programs: The Ruliology of Competition—Stephen Wolfram Writings

This web page was created programmatically, to learn the article in its authentic location you’ll be able to go to the hyperlink bellow:
https://writings.stephenwolfram.com/2026/06/games-between-programs-the-ruliology-of-competition/
and if you wish to take away this text from our website please contact us


Games between Programs: The Ruliology of Competition

The Basic Setup

Whether one’s coping with biology, economics, politics or a number of different fields, it’s widespread to come across conditions that may be modeled as involving two brokers that repeatedly compete with one another. One imagines that at every step every agent can take one in every of a sure set of actions, and that then—in a basic game theory approach—every agent (or “player”) will get a sure fastened “payoff” primarily based on the motion they and their opponent take. But how do the brokers determine what motion to take? We think about that every agent has a sure fastened process—or “strategy”—for making its selections. And we think about that the enter to every of these selections is the sequence of previous actions that the agent and its opponent have taken.

There’s been numerous work finished over the course of practically a century on specific decisions of methods. But one thing I’ve long been curious about is what occurs if one systematically considers all doable methods. And if we consider methods as applications this turns into a query to which we are able to instantly apply ruliological strategies. Which is what I’m going to do right here.

To be extra particular in regards to the setup, let’s assume that at every step, every agent takes one in every of two doable actions, indicated by and . And for now let’s take the payoffs to be those for the basic “match-or-not” (“matching pennies”) sport—wherein participant 1 has the larger payoff when there’s a match, and participant 2 has the larger payoff when there isn’t a match:

So what occurs when brokers repeatedly play this sport? Well, it is determined by their methods. Here are just a few examples for a number of completely different decisions of every agent’s technique:

Plotting the cumulative payoffs for the 2 brokers (represented by and ) in every of those instances we get:

Often we’ll contemplate the “winning agent” to be the one which has the numerically largest cumulative payoff (i.e. is finally on prime in these plots) after a sure variety of steps. And with a criterion like this, we’ll be capable to rank completely different applications towards one another—and normally discover the ruliology of competitors.

With the fundamental setup we’re utilizing, we are able to symbolize all doable sequences of actions by a multiway graph:

For any given sequence of actions, there may be then a cumulative payoff for every agent for our match-or-not sport:

If every agent adopts a specific technique, it will outline a specific path by the multiway graph. For the methods used within the examples above, the paths are:

What does it take to have a successful technique? In what follows, we’ll contemplate methods primarily based on a number of various kinds of applications. But one primary query we are able to all the time ask is whether or not what turn into the successful methods are typically primarily based on applications which can be extra difficult, or much less so—or to indicate habits that’s extra difficult, or much less so.

In different phrases, if you wish to win, must you usually be making an attempt to construct up one thing difficult? Or must you as an alternative anticipate to have the ability to discover a “simple hack” that may “crack the game” and—a minimum of normally—allow you to win? In impact, we’re asking whether or not competitors tends to result in complexity, or simplicity.

I’ve lately checked out minimal fashions of each organic evolution and machine studying, wherein one is adaptively evolving applications so as to maximize some externally imposed health operate. And what I’ve discovered is that even when the health operate one makes use of is easy, the habits of the applications that maximize it’s usually fairly advanced. In different phrases, adaptive evolution will are inclined to make even a easy, fastened goal be achieved in a sophisticated approach.

So what if as an alternative of getting a hard and fast, externally imposed goal, our aim is simply broadly to win towards different brokers? Does such—probably open-ended—competitors lead us to extra advanced habits (or extra advanced applications), or not? That’s the sort of query we’re going to have the ability to discover right here by trying on the ruliology of competitors.

Strategies from Finite State Machines

Finite state machines could be considered defining very simple applications (which may mannequin pathways in biology, determination processes in economics, and so forth.). And to start out our investigation of the ruliology of competitors we’re going to have a look at methods outlined by finite state machines.

A typical instance of a finite state machine (right here with 3 states) is:

We’re going to make use of this finite state machine to outline a technique for an agent. To see how this works, let’s say that the sequence of actions taken by the agent’s opponent have been:

The thought is to make use of this sequence of actions to outline a path within the finite-state-machine graph, then to find out the following motion from the colour of the state reached. We begin on the vertex with the incoming arrow, then successively observe the sting whose colour matches the following transfer made by the opponent:

At the tip of this course of we’ll attain some vertex within the graph (i.e. some state within the finite state machine). In the actual case proven right here, the state we attain is . And then we take the output of the technique—i.e. the following motion for the agent to take—to be .

It’s typically handy to indicate the states of the finite state machine organized on a line:

And then we are able to summarize the trail taken with a sure enter by displaying the successive states reached:

So what occurs if two finite state machines compete? The primary thought is that the successive outputs from one machine develop into the successive inputs to the opposite, and vice versa. If our second machine is

then we are able to symbolize the habits of the machines by:

If the payoffs we use are for the match-or-not sport, then their cumulative values for these machines are

in order that in the long run agent 2 could be thought-about the winner.

It’s essential to notice right here that within the setup we’re utilizing, every part is deterministic: at each step, every agent takes an motion that’s deterministically computed utilizing its technique from the previous historical past of strikes. It’s a distinct setup from what’s most frequently studied in sport concept, the place every transfer is in impact thought-about independently, however the place there could be possibilities for various actions (“mixed strategies”)—and the place in the long run averaging is completed over “different possible rolls of the dice”.

The Space of Possible Finite State Machines

The variety of doable graphs for finite state machines with s states is (2 s2)s. But a few of these graphs correspond to machines with an identical habits—in order that the variety of distinct machines is smaller:

2-State Machines

In the 2-state case, the 22 distinct machines are

the place we’ve recognized every machine by a quantity.

So what occurs if pairs of those machines compete? Here are just a few examples, the place in every case we’re figuring out the typical payoff (right here for 10 rounds of the match-or-not sport):

(In all competitions between pairs of finite state machines, the sequence of strikes finally has to develop into periodic—with a interval equal at most to the product of the variety of states in every machine.)

What occurs if every of the 22 distinct 2-state machines competes towards every of the opposite ones? We can summarize the outcomes by displaying the imply (long-term) payoff for each pair of machines (the payoff is for every machine “playing as agent 1”; in match-or-not, the payoff is negated if “playing as agent 2”):

So what machine is the “overall winner”? One solution to assess that is to have a look at the typical of the imply payoffs achieved by a given machine when competing with all different (distinct) machines:

The winner by this measure is then machine 26:

Running this machine towards all (distinct) 2-state machines we get the next imply payoffs:

The precise habits in every case—which doesn’t itself rely on the payoffs, solely on the machines concerned—is:

What are the “runners-up” to the successful machine? Here are all of the distinct machines, ranked by their imply payoffs:

Here’s what occurs if we play the highest 3 runners-up towards all machines:

We can summarize how a machine behaves by displaying the historical past of its habits when taking part in towards all different machines (or, in impact, by placing collectively the primary columns in photos like those above). Here are the outcomes for all of the machines (for 15 steps), ordered from highest common rating down:

(Once once more, these photos are fully decided simply from the machines concerned; the payoffs within the match-or-not sport decide solely their ordering.)

One footnote to what we’ve been saying right here has to do with what number of steps of competitors we’re getting the machines to do. For all finite-state machines, the habits should finally develop into periodic—and for 2-state machines the utmost interval is 4 steps, with a most transient of three steps. But the precise common imply payoffs range with the overall variety of steps one considers:

It’s notable that as a minimum for the primary few steps, the rankings transfer round:

But on this case it doesn’t take too many steps for the last word winner to be clear (afterward we’ll see examples the place it takes for much longer).

(There are different subtleties as nicely. One of them is that we’re computing common payoffs by taking part in each machine towards each different distinct machine. In precept we may additionally embrace different equal machines—which might barely change the weighting of our averages. But since we’re actually involved with methods, not machines as such, the scheme we’re utilizing appears extra acceptable.)

3-State Machines

For the 956 distinct machines with s = 3 states, the corresponding “competitive array” (after 1000 steps) is:

The common imply payoff for every of the machines (i.e. the typical throughout every row within the “competitive array”) is then

whereas the distribution of those common imply payoffs is:

The prime few machines for the match-or-not sport are then:

Running the highest machine (s = 3 machine 1164) towards all (distinct) 3-state machines we get the next imply payoffs:

The distribution of doable limiting imply payoffs right here is:

And the most typical types of habits seen are:

The most doable interval for a contest between two 3-state machines is 9. Machine 1164 by no means fairly achieves this; its most interval of seven happens when competing with machines 2546 and 2755 (each giving limiting imply payoff –1):

If one appears in any respect doable pairs of 3-state machines, there turn into 792 that yield period-9 habits, examples being:

(These don’t have any transients; the utmost transient for 3-state machines seems to be 8.)

An Aside: What Do We Mean by “Average”?

We’ve talked about how a machine does “on average” when competing with all different (distinct) machines. But what can we imply by “on average”? So far, we’ve taken the “average” to be the imply of the payoffs obtained by competing with one another machine (and the payoffs listed here are themselves means throughout successive steps). But what if we use the median as an alternative of the imply? Here are the median payoffs from operating every machine for 1000 steps towards all different machines:

The standout successful machine right here is machine 1172:

The imply payoffs and their distributions on this case are:

And the median is “anomalously high” as a result of with this machine precisely 1/2 of all imply payoffs are +1. (The corresponding imply is pulled down by the “left tail” within the distribution of imply payoffs.)

The Complexity of Winning

Let’s look (principally as above) on the precise habits of every of the distinct 2-state finite state machines when competing towards all different 2-state machines, ordered from smallest common imply payoff to largest:

The instances with 0 common imply payoff look easy of their habits. But for different common imply payoffs, the habits of a given machine competing towards all others appears extra difficult.

We can get some sense of this complexity by trying on the compressed dimension (as obtained from Compress) of the array of habits proven above:

Here’s the corresponding consequence for the 956 distinct 3-state machines—displaying no robust correlation between common imply payoff and our estimate of the complexity of habits:

And certainly amongst machines with the very best common imply payoffs there may be nonetheless fairly a range of ranges of complexity in habits

with the “behavior traces” of the machines indicated being

and

In different phrases, a minimum of on this case, we actually can’t say that successful machines are characterised both by being notably advanced of their habits, or notably easy. It appears that it’s detailed construction, somewhat than total options, that determines what machines will win.

Competitions between Machines of Different Sizes

Can finite state machines with extra states systematically do higher (i.e. obtain bigger payoffs) than ones with fewer states? The finest common imply payoff any 2-state machine can obtain when competing with all different 2-state machines is about 0.151. But if, for instance, we contemplate 3-state machines competing (for 1000 rounds) towards 2-state machines, the most effective common imply payoff is as an alternative 0.593:

Looking on the distribution of doable common imply payoffs, we see that the distribution of common imply payoffs is wider for 3-state machines than for 2-state ones—a truth that’s a minimum of partly only a consequence of there being many extra doable 3-state machines than 2-state ones:

But one thing that’s notable is that the very broadest distribution is for 3-state machines competing towards 2-state ones: in impact evidently with their bigger assortment of doable methods, the 3-state machines can do higher at “outmaneuvering” the 2-state ones.

The 3-state machine that does the most effective total towards 2-state machines is machine 1234:

It doesn’t all the time definitively win (with imply payoff +1), however does so the vast majority of the time:

How does it obtain this? Basically, for many completely different 2-state machines, this specific 3-state machine manages to behave simply as they do:

In some sense, there are aspects of the 3-state machine that “resonate” with many 2-state ones:

How about 4-state machines? The 4-state machine that does finest total towards 2-state machines is machine 109828:

Out of the 22 2-state machines, it solely will get lower than payoff +1 in 6 instances:

Here’s the habits for all 22 instances:

And as soon as once more we are able to consider the 4-state machine as efficiently “covering” many of the 2-state behaviors:

Adaptive Evolution of Finite State Machines

In many sensible conditions the place there’s competitors, there’s a approach for the brokers which can be competing to evolve. So can we make a minimal mannequin of this utilizing finite state machines?

In what we’ve finished to this point, we’ve all the time been an area of all doable finite state machines. But what about sequences of machines discovered by adaptive evolution? Is there, for instance, a solution to adaptively evolve machines to do progressively higher in competitions?

The first step in doing that is to see how we’d make successive mutations to finite state machines. A easy method is to say that any given mutation can have an effect on both a random vertex or a random edge within the graph of a machine. For a vertex, the mutation simply reverses its colour. For an edge, it both reverses the colour, or “reroutes” the sting to a distinct vertex (with the constraint that doing so doesn’t disconnect the graph). Applying a sequence of such mutations at random provides for instance

or, with a distinct graph rendering:

(Note that we’re mutating machines in no matter kind we discover them; we’re not worrying about equivalences between machines, or the canonicalization of machines.)

Imagine we have now an opponent machine—like 3-state machine 1165—that normally forces a lose, i.e. limiting payoff –1 (for instance about half the time when competing with different 3-state machines):

Now we are able to ask whether or not we are able to adaptively evolve a machine that may win towards this opponent. In order to offer our adaptive evolution course of some “room to maneuver” we’ll use a 4-state machine. We can begin with a random such machine, say

which “loses” (all the time having payoff –1) towards machine 1165:

To do adaptive evolution, we now make successive random mutations to this machine, “accepting” a mutation if it doesn’t lower the imply payoff, and in any other case rejecting it. The result’s a typical “fitness curve” wherein most mutations (indicated by purple dots) don’t result in enchancment within the payoff—however there are some that result in “breakthroughs” the place the payoff will increase (typically solely by a small quantity), with the payoff finally reaching the utmost worth of +1:

The varied “breakthroughs” progressively converge on a “perfect solution” with payoff +1:

Concatenating the successive outcomes over the course of the adaptive evolution course of, we are able to see the eventual convergence to the right resolution the place the actions of the 2 brokers all the time match:

With completely different random mutations, the “fitness curve” will likely be completely different intimately, although may have the identical basic kind. And the identical is true with completely different particular opponents.

By the best way, utilizing our approach of numbering finite state machines, we are able to make a plot of how the method of adaptive evolution “moves the machine around in rule space”:

But what occurs if we do as we have now finished above, and ask in regards to the imply payoff averaged over all doable finite-state-machine opponents of a given dimension? For instance, how nicely can 4-state machines do towards all doable 2-state machines?

Starting with the identical random 4-state machine as earlier than, a typical health curve is:

The health right here will increase, however by no means reaches +1. The habits of successive “breakthrough” machines taking part in towards all size-2 machines is:

And we are able to see that even the most effective machine we get nonetheless loses to among the 2-state machines, yielding in the long run a median imply payoff of about 0.62.

So what occurs if we have a look at machines which have extra states? With 10 states, for instance, it’s doable to adaptively evolve to a machine that achieves limiting payoff +1 towards each single 2-state machine:

The ultimate machine obtained on this case

could be considered a sort of (2-state) “universal winner”—that finally wins towards all 2-state machines:

How does it do it? In some sense the machine is sufficiently big that it may have completely different “specialized parts” for various opponents. And if we have a look at how the machine behaves we certainly see that with completely different opponents the machine settles into completely different subsets of its full area of states:

And even when we contemplate all 956 3-state machines as opponents, our machine continues to do nicely. It doesn’t win in all instances, nevertheless it nonetheless achieves a median imply payoff of +0.603:

Some examples the place the machine doesn’t win—in impact as a result of it doesn’t include as a submachine one thing to take care of a specific opponent—embrace:

So far we’ve thought-about the adaptive evolution of a single machine competing both towards a single fastened opponent, or towards a group of fastened opponents. But what if each the machine and its opponent are present process adaptive evolution?

For instance, let’s say that on alternating adaptive evolution steps we do a mutation on a machine and on its opponent. We maintain the mutation for every machine if the (imply) payoff for that machine doesn’t lower; in any other case we reject it.

With this setup, right here’s the evolution of imply payoffs for 2 (initially an identical) 4-state machines:

There are intervals the place one machine wins, and intervals the place its opponent wins—as seen within the precise successive behaviors of the machines:

The precise machines discovered by adaptive evolution transfer round in rule area—quickly dropping reminiscence of what they initially had been:

Not a lot adjustments if the variety of states within the machines change, or aren’t the identical—although there may be usually much less alternation of winners for machines with extra states, presumably as a result of every particular person mutation tends to have much less impact on habits if there are extra states.

What About Prisoner’s Dilemma?

Everything we’ve finished to this point has been primarily based on the notably easy sport of match-or-not (“matching pennies”). So what occurs with different video games? And particularly with the well-known “prisoner’s dilemma” sport? Here are the payoffs for this sport

the place within the traditional narrative for the sport one interprets as “defect” and as “cooperate”.

Just as above, we are able to think about defining methods for the prisoner’s dilemma sport primarily based on finite state machines. Here are just a few examples of iterated video games between 2-state machines—now with payoffs decided by the prisoner’s dilemma sport:

In the case of match-or-not, it was visually straightforward to inform whether or not a specific payoff was ±1 or 0 simply by seeing whether or not the actions of the brokers matched at a specific step. Here it’s not fairly so visually apparent.

But utilizing the payoffs for the prisoner’s dilemma sport we are able to compute the cumulative payoffs for these examples (and, not like in match-or-not, which is a zero-sum sport, the payoffs for the 2 brokers don’t sum to zero at every step):

Much as we did earlier than, we are able to now contemplate competitions between brokers whose methods are primarily based on all doable 2-state finite state machines (for match-or-not the zero-sum nature of the sport makes the ensuing array of payoffs symmetrical; right here there’s symmetry solely from the truth that the payoffs stay the identical if one interchanges the roles of agent 1 and agent 2):

With this setup, we are able to now ask what machine is the “overall winner”—say within the sense that it has the most important common imply payoff taking part in towards all different (distinct) 2-state machines:

The reply seems to be machine 30:

In the literature of prisoner’s dilemma that is typically known as “grim trigger”, as a result of it yields a technique that begins with , then repeats this till its opponent first provides —after which it all the time provides .

Running this machine towards all different 2-state machines we get the next behaviors

equivalent to the next imply payoffs:

Looking on the common imply payoff for all 2-state machines, the rating of those machines is:

It’s notable that machine 22 (which corresponds to the famous “tit-for-tat” strategy)

is kind of far down on this rating, though it’s typically recognized as probably the most profitable in collections of human-suggested methods.

The rankings we’ve simply given are primarily based on common imply payoffs obtained after many iterations of the prisoner’s dilemma sport. But if we do just a few iterations, the rankings could be completely different:

Zooming in at the start we are able to then see that machine 30 solely begins to win after 13 steps:

Machine 20 provides a relentless common imply payoff of –1 obtained from

whereas machine 30 yields a median imply payoff given by –, limiting to – ≈ –0.86.

So what about 3-state machines? This provides the typical imply prisoner’s dilemma payoff for every of those machines:

The distribution of those common imply payoffs is:

The machines with the very best final common imply payoffs are:

But this ordering emerges solely after greater than 500 steps

with the crossover of common imply payoffs being surprisingly advanced:

(The seemingly fairly random variation of common imply payoffs displays the combining of many alternative intervals within the always-ultimately-periodic habits of competitions between machines.)

So how do 3-state machines do in comparison with 2-state machines within the prisoner’s dilemma sport? Running 2-state machines towards one another, machine 30 will get the very best common imply payoff of about –0.866. Meanwhile, for 3-state machines operating towards one another, the very best common imply payoff achieved is the very barely smaller –0.885. What about 2-state machines operating towards 3-state ones? They don’t do nicely. Machine 30 does the most effective—however now it provides a median imply payoff not of –0.866 however as an alternative of about –0.97.

But now, operating 3-state machines towards 2-state ones, the most effective common imply payoff is bigger—about –0.80, as achieved by machine 2743

with the imply payoffs obtained by operating it towards every doable 2-state machines being:

How about 4-state machines? Running all these towards 2-state machines, the general winner is machine 336766 with common imply payoff –0.77:

The imply payoffs towards every 2-state machine on this case

are similar to these for the successful 3-state machine, the one completely different behaviors occurring when the opponents are 2-state machines 20 and 30:

Summarizing these outcomes, the successful machines with small numbers of states that we’ve discovered for prisoner’s dilemma are:

But what about machines with extra states—that we’d discover by adaptive evolution? Here’s an instance of adaptive evolution for 10 states, competing towards all 2-state machines:

After 1000 steps of this adaptive evolution, we get the 10-state machine

with common imply payoff –0.73.

The habits of this machine competing with all 2-state machines is:

The Space of All Possible Games

We’ve now checked out two particular examples of video games—match-or-not and prisoner’s dilemma—and we’ve seen very related phenomena in each instances. But what about different video games?

If we enable payoffs –1 and +1 (as in match-or-not) there are a complete of 256 doable video games:

Of these, 16 are zero sum (like match-or-not)—within the sense that the sum of the payoffs for the 2 brokers is all the time zero), and 16 are symmetric (like prisoner’s dilemma)—within the sense that the payoff for the 2 brokers is all the time the identical.

For every of the 256 doable video games, we are able to compute the typical imply payoffs for every doable 2-state finite state machine competing with all 2-state machines:

The successful common imply payoffs for these 256 video games are all the time –1, 0 or +1:

In most instances, many machines obtain the utmost payoff; throughout all video games, that is the variety of instances every machine is a winner:

What about after we have a look at extra video games—for instance ones with payoffs –1, 0, +1? There are 6561 such video games. And the story may be very a lot the identical, with some slight variations:

Cellular Automaton Strategies

Everything we’ve finished right here to this point has been primarily based on utilizing finite state machines as our supply of methods. Now we’re going to show to a different supply of methods: cellular automata.

The setup we’re going to make use of takes the actions of our brokers to be decided by operating mobile automaton guidelines. The primary thought is that at every step the preliminary circumstances for the mobile automaton are given by the sequence of actions taken by the opponent to this point. The subsequent motion of our agent is then decided by the worth of the cell obtained by operating the mobile automaton for as many steps as there have been actions taken to this point by the opponent.

More particularly, let’s say the principles for our mobile automaton are:

And let’s say the actions taken by the opponent to this point have been:

Then the concept is to run the mobile automaton with these as preliminary circumstances

and to extract the ultimate cell worth to find out the following motion to take.

So, for instance, if our two competing mobile automata have guidelines

then the successive steps in operating them towards one another give

the place in our photos every part in regards to the second rule has been reversed. The actions taken on every step can now be learn off both from the opponent preliminary circumstances, or from the outer diagonals of the ultimate sample generated:

To analyze “competition” between guidelines we are able to assign payoffs, say from the match-or-not sport:

And on this case we get the next cumulative payoffs:

There are altogether 16 doable mobile automaton guidelines of the sort we’re utilizing right here:

Running each towards each different we get the next array of limiting imply payoffs:

Some notable “competitions” embrace:

The cumulative imply (match-or-not) payoffs in these instances are:

For most of those pairs of guidelines the winner shortly turns into clear. But for the case of rule 6 vs. rule 7 it’s extra difficult—and after 500 steps it’s nonetheless in no way clear which rule will win:

The underlying habits is:

On their very own, these two guidelines behave in somewhat easy methods (certainly, rule 7 is simply XOR):

But once they’re arrange in competitors, the efficient rule that emerges has far more advanced—and apparently unpredictable—habits, with no signal, for instance, of periodicity.

Looking throughout all the principles, the one with the most important common imply payoff seems to be rule 14:

In a way, rule 14 finds a really “simple solution”, producing both fixed or period-2 habits, and forcing its opponent to do likewise—and in the long run giving a median imply payoff of precisely – ≈ –0.69:

What about with extra difficult mobile automaton guidelines? Are the winners nonetheless ones with easy habits?

Let’s have a look at the 3-color analogs of our mobile automaton guidelines. There are 332 = 19683 of those. And in every case we are able to “make a decision about the next action” by trying on the ultimate worth mod 2. Running all these guidelines towards the 16 2-color guidelines the distribution of scores is:

And as soon as once more the best-performing guidelines (resembling rule 15911) behave in somewhat easy methods:

Looking—as we did for finite state machines—on the compressed dimension of patterns versus the typical imply payoff within the corresponding competitors

we see that the very best payoff guidelines are inclined to behave in less complicated methods.

The guidelines with probably the most difficult habits (a minimum of by this measure) have common imply payoffs close to zero. A typical instance is rule 11948:

Some of the extra difficult competitions on this case are:

What about completely different video games with completely different payoffs? The underlying habits of specific guidelines competing with one another will all the time be the identical. But their payoffs will likely be completely different. And so, for instance, in prisoner’s dilemma, the cumulative payoffs for 2-color rule 6 vs. 2-color rule 7 are actually:

Playing every 2-color rule towards all others the typical imply payoffs obtained are:

Rule 13 has the very best common imply payoff (of –1), and reveals pretty easy habits:

Looking at compressed dimension versus common imply payoff for video games between 3-color and 2-color guidelines, the phenomenon of excessive payoff being related to less complicated habits appears much more marked for prisoner’s dilemma than for match-or-not:

Cellular Automata vs. Finite State Machines

We’ve checked out finite state machines competing with finite state machines, and mobile automata competing with mobile automata. But what about mobile automata competing with finite state machines?

Here’s an instance of a specific step in a contest between a mobile automaton and a finite state machine

and listed here are the cumulative payoffs on this case for the match-or-not sport:

Running all 16 mobile automaton guidelines of this sort towards all 2-state finite state machines the imply payoffs are:

Averaging over all finite state machines, the imply payoffs for the doable mobile automata are:

Rather boringly, the successful mobile automaton is rule 0, which generates in response to something any finite state machine does:

This yields a median imply payoff of solely +0.181. But what if we use 3-color mobile automata? Here are the typical imply payoffs in that case—with the successful case highlighted:

Summarizing the varied competitions between various kinds of methods, we see that—operating towards 2-state finite state machines—probably the most profitable rivals are, by a small margin, 3-color mobile automata:

Adaptive Evolution of Cellular Automaton Strategies

Just as we did above for finite state machines, we are able to contemplate adaptive evolution of mobile automaton guidelines (which can be one thing I’ve studied in different contexts considerably extensively elsewhere). As a primary case, let’s contemplate adaptively evolving a 4-color mobile automaton rule to get the most effective imply payoff towards probably the most profitable 3-state finite state machine above, machine 1165. At every step of adaptive evolution, we’ll randomly change one of many 42 = 16 instances within the mobile automaton rule, protecting this mutation if it will get us a minimum of the payoff we had earlier than. We get a typical adaptive evolution health curve, with the imply payoff limiting to +1:

The “breakthroughs” correspond to the next guidelines:

And as is usually the case, the early breakthroughs are considerably difficult, however in the long run the “solution” that emerges reveals somewhat easy habits—one thing we are able to see a minimum of some proof for if we put the outcomes at successive mutation steps collectively:

What about adapting mobile automata to compete with different mobile automata? As an instance, let’s use adaptive evolution to discover a 6-color mobile automaton with the most important common imply payoff when competing with all 16 of the 2-color mobile automata we’ve thought-about. Here’s a typical health curve for this case:

After 1000 mutation steps, it’s reached a rule that offers common imply payoff 0.91. And right here’s what occurs when that rule competes with all our 2-color guidelines:

What if (as for finite state machines above) each a rule and its opponent are present process adaptive evolution—say on alternating steps? Here’s an instance of the successive payoffs one will get with a pair of 4-color guidelines:

And listed here are the corresponding precise behaviors:

What are the underlying mobile automata doing? Here are outcomes at a sequence of mutation steps—illustrating that adaptive evolution can choose each guidelines with quite simple habits and ones with considerably extra advanced habits:

Turing Machine Strategies

We’ve checked out methods primarily based on finite state machines and techniques primarily based on mobile automata. Now let’s discuss methods primarily based on Turing machines. For our functions, we are able to consider Turing machines as in some methods interpolating between finite state machines and mobile automata—although additionally they introduce some totally new options.

Our primary setup will likely be to make use of the opponent’s actions as preliminary values on a Turing machine tape, with the most recent worth on the correct, which is the place the Turing machine head is initially positioned. We then run the Turing machine till its head goes additional to the correct than it’s ever gone earlier than, at which level we decide the following motion from the worth that seems on the preliminary head place.

For instance, contemplate a Turing machine outlined by the rule:

Then think about that the sequence of opponent actions to this point is:

Running the Turing machine with this as its preliminary situation we get the next:

And from this we are able to then learn off “the next move” in accordance with our “Turing machine strategy”, on this case .

In our finite state machine and mobile automaton setups we did only one step of evolution for every step in our sport. In our Turing machine setup, at each step in our sport we’re operating the Turing machine for as many steps because it takes for the pinnacle to go additional to the correct than it began.

Here’s what occurs if we take a specific pattern 3-state finite state machine

and have it compete with the Turing machine above:

With match-or-not the cumulative imply payoffs listed here are:

There are a complete of 4096 Turing machines of the kind we’re utilizing right here (with s = 2 states and ok = 2 colours). Running every of those towards our pattern 3-state machine the imply payoffs within the match-or-not sport for all of the Turing machines are:

There are a number of Turing machines which have limiting imply payoffs of +1. An instance is machine 2529:

There’s a difficult subject that comes up right here, although. Our Turing machine technique works by operating a Turing machine till its head goes additional to the correct than it began—in order that we are able to contemplate that it halts. But what if it by no means halts, as in:

For our functions we’re simply saying that on this case, the payoff is undefined. And if such an undefined payoff ever happens in a specific sport, we assume the imply payoff for the entire sport is undefined—leaving a spot within the plot above.

What if we have now Turing machines compete towards, say, all distinct 2-state finite state machines? Here are the typical imply payoffs in that case (the gaps are for machines that don’t halt):

The most of +0.4 is achieved for Turing machine 2403

which yields the next behaviors and limiting payoffs when
competing with every of the 22 distinct 2-state finite state machines:

So what about Turing machines competing with Turing machines? To maintain issues manageable, we are able to have a look at 1-state Turing machines, of which there are solely 16 (with ok = 2). Running every of those machines towards one another, the array of imply payoffs is (the grey entries correspond to instances the place one of many Turing machines doesn’t halt):

The common imply payoff for every of those machines is given by:

The “winner” among the many machines is Turing machine 13:

Running this machine towards all different s = 1, ok = 2 Turing machines the behaviors we get are:

If we have a look at the cumulative payoffs, we see that many give imply payoffs that method 1, although some don’t, yielding in the long run a median imply payoff of about +0.81:

A typical competitors between 2-state Turing machines is

which yields a barely extra difficult sample of cumulative payoffs:

What occurs if 2-state and 1-state Turing machines compete? Here’s the array of imply payoffs for all 4096 2-state machines operating towards the 16 1-state machines:

The common imply payoffs for 2-state machines are as follows—once more with most 0.81:

Discussion

We’ve now seen many examples of the ruliology of competitors. And, maybe greater than the rest, it’s now clear that if we glance—ruliologically—in any respect doable applications of specific sorts, the image of how competitors works is kind of difficult, even when all of the applications concerned are easy.

In a way, it is a typical results of computational irreducibility: to understand how competitions between applications will work out, there’s principally no selection however to run them and see what occurs.

Sometimes the applications that win achieve this in quite simple methods—in impact “exploiting simple hacks”. But in different instances, issues are extra difficult. Sometimes two competing applications with each present advanced habits, and in a way, it’ll “just so happen” that one in every of them wins. But typically the win will likely be extra systematic. And usually this occurs as a result of the habits successfully plugs into some pocket of computational reducibility that systematically out-competes opponents of a sure sort.

We’ve largely checked out very simple applications which in some sense inevitably should “expose the same rules” to each competitor. But notably if we have now a reasonably small assortment of rivals, a sufficiently massive program can in impact expose a distinct a part of its guidelines for various rivals, and so have a “customized substrategy” that individually wins towards completely different doable rivals.

In adaptive evolution of methods we’ve typically handled bigger applications. And we’ve usually seen that the adaptive evolution could be fairly profitable at discovering successful methods. But—as is often the case with adaptive evolution—there’s no apparent solution to “describe the mechanism” of the methods which can be produced. Instead, it’s extra like what we’ve seen in different research of adaptive evolution: the method of evolution places collectively sure “lumps of irreducible computation” that in our case right here in impact “just happen” to be competitively profitable.

Different video games—equivalent to completely different patterns of payoffs—result in outcomes which can be completely different intimately. And if one constructs an in depth narrative in regards to the course of a sport, it might nicely appear completely different for various video games. But at an total degree, there appears to be outstanding similarity between completely different video games—and the important thing phenomena appear very a lot the identical.

What does this all say about sensible conditions the place there’s competitors between brokers? One factor is that it’s usually going to be troublesome to “predict in advance” or “prove a theorem” about what the most effective technique will likely be. There’s sufficient computational irreducibility that one will principally simply should attempt operating completely different competitions and seeing what occurs. And in a way the very range of habits we’ve seen right here helps the concept ruliological investigation is essential. Finding some easy parametrization of doable methods gained’t be sufficient to get an correct sense of every part that may occur. There’s no selection however to systematically enumerate some model of “all computationally possible strategies”. Which is what we are able to do in our ruliological investigations.

And, sure, what we’ve finished right here simply scratches the floor of finding out the ruliology of competitors. For a begin, one can scale up the scale of the applications, and see what new phenomena happen. One can anticipate that largely issues would be the identical—with computational irreducibility the dominant power. But there could also be new and surprising pockets of reducibility, maybe every with their very own “paths to competitive success”.

One may think about investigating completely different sorts of computational programs—that function metamodels acceptable for various purposes. The Principle of Computational Equivalence means that there’ll be a sure universality to the general outcomes. But particulars will likely be completely different. And these particulars will probably be essential, notably in deciphering outcomes for very completely different domains. Even if what issues for final functions of competitors is nicely captured by finite state machines—or a mobile automata—the best way one will get to those from microscopic biology, human determination making, societal interactions, AI competitors, and so forth. could also be very completely different.

Historical & Personal Notes

There’s a protracted historical past to formal research of video games—and certainly early developments in areas like combinatorics and likelihood had been largely pushed by them. The trendy discipline often known as game theory emerged within the Forties, concentrating on the query of optimum methods given specific patterns of payoffs. Most typically the concept is to research what occurs when every participant makes a single transfer—albeit maybe a probabilistic one, with averages taken over many situations. Fairly full (although typically difficult) mathematical outcomes have been derived for this type of setup (and are actually, for instance, implemented in the Wolfram Language). But what about repeated, or iterated, video games of the sort we’ve been discussing right here? In the early days of sport concept there was dialogue about defining methods as arbitrary mappings from histories to actions—and varied somewhat summary mathematical outcomes had been proved, notably for purposes in economics.

But by the Seventies there began to emerge the concept one ought to mannequin brokers as having “bounded rationality”, and equivalent to restricted computational programs. And by the tip of the Seventies laptop experiments had been being finished on competitors between what amounted to easy applications. A notable instance was the event organized by Bob Axelrod for the prisoner’s dilemma sport. In this event, a group of specific applications had been submitted by completely different people, and run towards one another. The conclusion was that the “tit for tat” technique (that may be considered a finite state machine) got here out finest—a consequence from which a lot has been made in regards to the worth of cooperation, and so forth.

I have to admit that I used to be all the time suspicious of the consequence. It appeared very unscientific to have simply checked out applications folks occurred to have submitted for the event. Why not as an alternative systematically enumerate all doable applications and see what occurs? In my very own work—beginning at the start of the Eighties—I was routinely doing this kind of thing, notably for mobile automata. I all the time discovered the setup for sport concept just a little arbitrary, and fiddly, and I used to be discovering greater than I may sustain with simply investigating the habits of particular person applications, with out making an attempt to have them compete with one another. Still, lastly, within the mid-Nineties, I did take a look at what occurs when a spread of doable applications (in that case, mobile automata) compete with one another. I summarized the lead to a small word on the finish of my e book A New Kind of Science:

Games between programs

I all the time meant to come back again and have a look at this in additional element. And lastly my current work within the foundations of organic evolution made me suppose it was time to do it. I came upon that there was some literature on utilizing fashions like finite state machines as methods for iterated video games. But as far as I may inform, the sort of systematic ruliological investigation I had imagined had by no means been finished. Which is why I lately determined it was lastly time to do it…

Thanks

Thanks to Willem Nielsen, Brian Ashiundu and Júlia Campolim of the Wolfram Institute for his or her intensive assist. Several individuals at our summer time applications have finished tasks about video games between applications that I’ve prompt: Rodrigo Bazaes, Kantaporn Danchaivijitr and Aziz Sahibnazarov. Over the course of a few years, I’ve mentioned sport concept and associated concepts with fairly just a few folks, together with Brian Arthur, Bob Axelrod, Seth Chandler, Roger Germundsson, Paul Harrald, Jozsef Konczer, Pedro Marquez-Zacarias, Eric Maskin, Zsombor Méder, Chrystopher Nehaniv, Scott Page, Jordan Pollack, John Maynard Smith, Stan Reiter, Nassim Taleb, Valeriu Ungureanu and Marc Vicuna. (Notable sport theorist John Nash was a long-time consumer of what’s now Wolfram Language, and attended conferences about it, however I by no means personally met him.)


This web page was created programmatically, to learn the article in its authentic location you’ll be able to go to the hyperlink bellow:
https://writings.stephenwolfram.com/2026/06/games-between-programs-the-ruliology-of-competition/
and if you wish to take away this text from our website please contact us