Not so long ago firstPerson started what was a series of competition questions for the users. It was fun so I was thinking of doing another.

I've always kind of been interested in A.I so I thought that would be a good place to start.This is probably a more advanced competition but everyone is welcome.

Rules
You are to develop an AI for playing Tic-Tac-Toe, with a catch: You're not allowed to embed any knowledge of the game into your creation beyond how to make legal moves and recognizing that it has won or lost.

Your program is expected to "learn" from the games it plays, until it masters the game and can play flawlessly.

-Your program will play 5000 (maybe this is too much I don't know) games against itself improving its A.I database.
-Your program will then play against opponents programs in a 10 game match in a knock-out style tournament.
-Your program must be written in C++

I will write a universal simple gui so we can see the games played. Therefore
I will state the necessary protocols your program should support later. (i.e. what functions and parameters you need to pass)

Good luck! Suggestions are welcome.

Edited 5 Years Ago by iamthwee: n/a

Is it allowed to generate "knowledge" of the game, i.e., with an algorithm before it plays any game, or is it supposed to ONLY learn by playing? I suppose the two ideas are actually pretty similar.

Sounds interesting.. may be i can do some fun coding instead of my office tasks!! but i left college long time ago.. so may be i will write some brute force!! :)

Is it allowed to generate "knowledge" of the game, i.e., with an algorithm before it plays any game, or is it supposed to ONLY learn by playing? I suppose the two ideas are actually pretty similar.

Actually, I was wondering this before I posted the rules. The original idea is for the AI to learn and improve by itself (otherwise known as unsupervised learning or self organising maps (SOMs). Hence why I stated it could play a set number of games by itself and then use this as a database to play opponents.

Now I've come to think of it, I guess it could just play games with the opponents for a set number of games. Then after all programs have played the same number of games with all their competitors then you could enter into the tournament. I guess it is the same thing anyway.

Sounds interesting.. may be i can do some fun coding instead of my office tasks!! but i left college long time ago.. so may be i will write some brute force!

Yes it is good to exercise the mind! Hopefully there will be enough interest to get this going. The prize being... awesomeness!

Oh here are a few links to read up on
http://en.wikipedia.org/wiki/Artificial_neural_network


I'm just planning the universal protocols the engine should read from stdin and output
http://www.cc.gatech.edu/classes/cs4385_99_spring/handouts/t3po.txt

Will post these up shortly!!

Edited 5 Years Ago by iamthwee: n/a

Is this too easy... Looks like the minimax algo more or less achieves perfect play once it plays all the games...

Is this too easy... Looks like the minimax algo more or less achieves perfect play once it plays all the games...

Yea exactly, you don't even have to use minmax, following a simple series of rules can produce unbeatable tic-tac-toe AI.

I agree, the tic tac toe game is pretty trivial to solve. The set of possible situations a player can face is very limited. And, as firstPerson mentioned, if you write a decision tree for the optimal strategy, it is very simple and unbeatable, that's too easy. Typically, to really be able to test any kind of AI, you need a scenario in which there are no unbeatable strategies, where even the stupidest strategy (usually random decisions) is expected to win at least some of the times, and where no simple strategy is optimal (in AI terms, the value-function is far from simple).

There are several such open-ended problems that could be fun to tackle. For instance, a while ago I took a machine learning course and people had all sorts of diverse AI projects to present at the end. The problem I did my class project on was a Pursuit-Evasion game. This is very simple to set-up, you just simulate a fast-moving, slow-turning pursuer and a slow-moving, highly-maneuverable evader in a 2D or 3D environment (just empty-space environments). Then you just get the pursuer to catch the evader, each starting at random points, and train both AIs to learn their respective optimal strategies, then count to number of successes of each party (if evader is still alive after X period of time, he wins, otherwise, well.. he's dead). I personally tested several AI algorithms on a slightly more complex scenario, and it isn't easy to solve (but luckily, there exists a simple, but non-linear, optimal strategy to test against and compare to). That's just an idea I'm throwing out there.

Another simple idea is to make a rock-paper-scissor game. It fits perfectly. There is no optimal algorithm in the sense that it will never lose. But one can code up a learning machine that could find patterns from human inputs and hence try to predict next behavior.

Yea exactly, you don't even have to use minmax, following a simple series of rules can produce unbeatable tic-tac-toe AI.

Indeed, which is why I stated the program you create shouldn't hard code any rules...Which is the whole point.

Although having said that you can build the entire decision tree using the minimax algo in almost no time at all. The trivial nature of tic-tac-toe probably makes the competition redundant or not much of a challenge at all.

@Mike that sounds like a good alternative. Maybe a bit much for a simple Daniweb competition.

I like firstPerson's idea about rock paper scissors, but seeing as the decisions are totally random and winning is down to pure luck it seems a bit redundant to create an A.I for it.

Maybe an A.I for four-in-a-row or...

Edited 5 Years Ago by iamthwee: n/a

I like firstPerson's idea about rock paper scissors, but seeing as the decisions are totally random and winning is down to pure luck it seems a bit redundant to create an A.I for it.

I beg to differ

Edited 5 Years Ago by firstPerson: n/a

Haha, I beat the computer. When I was playing against a big database, I easily predicted what computer would do, so I won really easily. But then I played agaisnt a learning mode, and it played much better but I still won 30 - 16 - 29. But I was playing, and predicting too, what the computer would do. If I was playing a normal rock-paper-scissors, I would definetly lose to the planning of the computer.

>>but seeing as the decisions are totally random and winning is down to pure luck

@iamthwee: You are making the strong assumption that humans playing rock-paper-scissors are actually making random choices. That is demonstrably false. Humans are no better at generating random numbers than computers, in fact, they are much worse. Humans are not truly capable of coming up with a random pick, there is always an underlying logic or strategy (even in situations where a strategy is meaningless), we simply cannot "compute" the pick any other way. This is a well known fact, especially in the realm of mentalism (magic (or illusionism) based on mental tricks), for instance, if you ask a person for an arbitrary number between 1 and 100, you have about 40% chance that the number picked will be 37 because even if nothing makes one number more random than another, we, humans, in that situation, tend to not go for certain numbers that don't seem random enough to us (like all even numbers, all numbers below 20 or so, all the numbers that appear in multiplication tables we learned as a child, etc.), and the result by elimination and not going to far in the high numbers is 37. Similarly, many mentalists use "suggestion" which just means that they say a few innocent- and inconsequential-sounding things before they ask you to pick a card or make drawing, but that is enough to suggest to your mind to pick a particular logic that leads to a particular "random" choice, and so goes the trick. In AI, this can be modeled as a HMM (Hidden Markov Model), training a ANN or other techniques can be used to learn someone's HMM and thus predict his/her picks.

Comments
Solid post.
nice

I stand corrected.

That is very interesting. I wonder how that flash app firstPerson posted works. I guess it has just monitored hundreds of thousands of games and returns the best statistically winning move (given the previous moves).

But there are so many other things to consider as well. Does it time how long it takes the human to click an option? Does it monitor where the user mouses over on his previous choice?

@Mike totally agree. I once got a book on magic and how to influence people. I guess we assume our actions are totally independent of others but in reality it far from the truth. We are being subconsciously influenced all the time, from the adverts on T.V and even from the opinions of our friends in your social groups.

>>I guess it has just monitored hundreds of thousands of games and returns the best statistically winning move (given the previous moves).

Not exactly. Such gross statistics are not good enough in general. As I said earlier, they typically do an online estimation of a Hidden Markov Model (HMM). The hidden state is representative of the "state of mind" of the player (usually modeled by some vector of numbers that is large enough to account for the complexity of the decisions involved). Then, the result of the previous game (the computer's move) will modify the state of mind of the player, and that next state of mind will determine (with some probability) the most likely next move. The job of the computer which is doing the learning is to try and determine the two relationships involved: the state transition function and the state-output function. In other words, the learning algorithm has to try, based on enough evidence, to estimate how his previous move causes a state transition in the human player's state of mind, and then has to estimate how a particular state of mind translates to probabilities of an output (next move by the player). Standard algorithms for this are typically classified as EM (Expectation Maximization) which is a kind of double-step algorithm (like predictor-corrector algorithms), the best and simplest example of which is the Viterby algorithm (which is a dynamic programming version of EM). The idea is usually to assume a model for one of the things (state transition or state-output), then adapt the second model such that it yields maximum expectation with respect to the recorded games played (fits as well as possible), and then repeat while holding that second model fixed and find the first model that maximizes expectation, and repeat. This can be modified to run online as well (not batch processing). There are, of course, other algorithms, but the basic idea is to estimate the processes involved by inferring it from the sequence of game results. At first, you can assume that all states are as likely and that there are no correlation between the state and the output or between previous state and next state, then, over time, the computer can infer a correlation between state and output as well as a correlation between previous state and previous opponent's move, and the next state. That will eventually lead to a model that is useful enough in predicting next state and output, that the computer can gain a significant edge and win the game.

But, the above is the general HMM method. In the case of rock-paper-scissors, this is much simpler. Here, you can model the state as being equal to the output (next pick). So, the "belief state", which is the probability distribution of the possible states of the human, can be modeled as a vector of 3 values (probability of rock, paper, and scissors) which sum-up to 1.0. Then, the previous result (the computer's previous move) can be modeled as a binary vector of 3 values (again, the probabilities of rock-paper-scissors, but this time, one has 100% probability (picked by the computer) and the others are zero). Now, the state transition function can simply be modeled as a matrix that multiplies the previous belief state vector and the previous computer's move to produce the next belief state. Then, all the algorithm has to do is find the transition matrix which best fits the data observed so far (with EM) and apply it to predict the next move.

Edited 5 Years Ago by mike_2000_17: n/a

This article has been dead for over six months. Start a new discussion instead.