Exploring Different Styles of Pong Agent AI

Pong is a classic game, you could even say it is the classic game. I have been looking for things to help me learn Scratch recently, which is used to teach kids to code, as well as things to help me procrastinate exam study. So I decided to see if I could make the best damn Pong AI in Scratch that I could. This isn’t exactly simple, Scratch doesn’t really have “functions” like a regular programming language, so things get messy really quick. As Wikipedia says;

There is a strong contrast between the powerful multimedia functions and multi-threaded programming style and the rather limited scope of the Scratch programming language.

The Definition

So before we start, let’s define the world and its parameters. I have just come fresh out of an AI unit which did not have nearly enough coding, so I am itching to actually apply my knowledge. We will start off with a PEAS definition:

Performance: We want to minimize the number of balls which we miss, and maximize the number of balls which the opponent misses. AKA we want Max(Computer Score – Player Score).
Environment: We are playing in a 480×360 playing field. The top and bottom are walls where the ball will bounce off and when the ball hits the sides that player loses. When hitting a wall, the ball’s Y vector is reflected across the X axis. When hitting a players paddle, the paddles velocity is added to the Y velocity (assuming the Y velocity of the ball isn’t the maximum) and then the remainder speed is the X velocity (going in the reverse direction). The ball is always going at a constant speed. Lets have a look at the environment characteristics, The world is:
- Fully Observable: All variables are available to the AI (Scratch hasn’t really caught on to Information Hiding principals)
- Deterministic: The only thing the AI cannot predict is the movement of the Player Paddle, everything it can figure out (there is no randomness)
- Episodic: After each paddle hit the AI can practically forget everything that just happened.
- Dynamic: The Paddles and the Ball are all constantly moving.
- Continuous: Because positions and velocities can go down to the decimal, the data is continuous, this isn’t a big deal though because it doesn’t actually affect the decisions much (you can just round).
- Multiple Agent: There are two agents, the Player (a person) and the Computer (the AI). They are competitive agents, one agent succeeding means the other agent fails.
Actuators: We have two actuators in this game, the up accelerator which when on it increases velocity until it reaches the maximum and the down accelerator which decreases velocity until it reaches the negative maximum
Sensors: We can sense the X and Y coordinates of both paddles and the ball. We can also sense the Y velocity of both paddles and the X and Y velocity of the ball.

The Agents

I developed four different pong agents, each building on the lessons from the last one. I also updated and refined the world in which they interact. The first agent uses very hacky code such as bounce “when touching color red”. For a more complex AI to work well, we will need the game to be as deterministic as possible.

The Reflex Agents

Reflex agents are pretty simple. They read in sensor data and react to it. In our case my first agent will read in the position of the ball and if it is above, fire the up accelerator and if it is below fire the down accelerator. This is a very naive approach. It works well when the Y velocity of the ball is less than the maximum Y velocity of the paddle, but once it goes over, the paddle can’t cope. Furthermore, the paddle can’t just change directions, it must first decelerate. This means a ball with a high Y velocity will whiz past the paddle before it can turn around.

We can definitely do better than that. One issue is that the paddle starts following the ball too early and goes too far up or down to get back. So in the second agent, the paddle returns to the centre after hitting the ball and then only following the ball when it passes a certain point. This doesn’t really help it that much, and the steep Y moves still get the better of it. It looks smarter though.

The Model Based Agent

So it seems a bit stupid to have the paddle start moving upwards if the ball ends up below it. The thing is, we can figure out where the ball ends up using data gained from our “sensors”. Now the agent can just calculate the position the ball will hit based on its vector and move into position. This is almost perfect. I had to change the world around to keep the challenge on the agent. Now when the ball hits a paddle, its speed increases. This means the eventually the paddle won’t be able to get to it in time if it is too far away. I also cleaned up quite a bit of the code, relying more on variables and less on hit boxes to make it more consistent.

See if you can beat the agent. It’s interesting that the big issue for the first two agents, balls with high Y velocity, are a walk in the park for this agent. What this agent really struggles with is balls with a high X velocity aimed at its corners.

The Utility Based Agent

So now the AI is pretty good, but there is one problem. Right now the agent only plays defense, which makes it rely on the Player making a mistake. As the cheesy movie actors say, the best defense is a good offense. In order to allow a good offense, we want to design a utility based agent.

Utility based agents have a happiness function, which evaluates how happy a particular action will make the agent. The best offensive moves are moves that get the ball the furthest away from the other player with the highest X velocity, hopefully making it impossible for the Player to reach the ball on time. Note that actually hitting the ball isn’t a part of the happiness function – any plan which involves the AI missing the ball is immediately thrown out. Happiness is based on increasing our chances of winning and if we miss the ball the chance of winning is zero.

There are only 21 different velocities a paddle can hit the ball at, so to plan a move, we just generate all the states (assuming the ball would be able to be hit), test to see which will make the agent the happiest and then make a plan for the agent to follow. If the plan reaches a dead end, it finds the next happiest state. If it can’t find a state where it reaches the ball, it is quite likely the ball is too far away and going too fast. Having the paddle sit there without a plan looks a bit weird to a Player, they want to see it try and fail! So if it doesn’t find a state, it goes to the default state, which is moving as fast as possible toward the point where the ball is predicted to land. The agent would think this is futile, but it must appease the silly humans. And who knows, maybe a rounding error could mean it actually just makes it.

This agent is the most polished and challenging by far. I have also cleaned up the code some more, while simultaneously making it super dirty. Give it a go!

So that’s the best AI we have so far. If anyone has any comments on improving it (or if you found any bugs), or if you would like me to explain in more detail the algorithms used, leave a comment! When I have time, I plan on making it semi modular, so I can pit AIs against each other.

If you want to view that code for these projects, check them out on Scratch [1] [2] [3] [4].

Sigh. If only we had this kind of programming integrated in our Artificial Machine Intelligence unit.

The Definition

The Agents

The Reflex Agents

The Model Based Agent

The Utility Based Agent

Leave a Reply Cancel reply