Battlespace: Using AI to Understand Friendly vs. Hostile Decision Dynamics in MDO

Hare, J. Zachary, B. Christopher Rinderspacher, Sue E. Kase, Simon Su, and Chou P. Hung. “Battlespace: using AI to understand friendly vs. hostile decision dynamics in MDO.” In Artificial Intelligence and Machine Learning for Multi-Domain Operations Applications III, vol. 11746, pp. 268-277. SPIE, 2021.

What is the motivation of this paper?

Future Multi Domain Operation (MDO) wargaming will rely on AI / ML algorithms to aid and accelerate complex Command and Control decision-making.
Complex MDO decision-making requires new algorithms that can
- operate in dynamic environments with
  - changing rules
  - uncertainty
  - individual biases
  - changing cognitive biases
- rapidly mitigate unexpected hostile capabilities
- rapidly exploit friendly technological capabilities
Existing algorithms are inadequate to address the complexity of wargame modeling of MDO.

What is the hypothesis of this paper?

New algorithms for learning and reasoning under uncertainty, game theory with three or more players, and interpretable AI can aid in complex MDO decision-making.

What is the goal of this paper?

To develop a flexible wargaming platform for developing novel reinforcement learning algorithm for multi-player decision-aids with three or more adversarial players.

Research Questions:

How can human decision-making be leveraged by and synergized with AI?
How coordinated actions by the human teammates relate to known game theoretic principles?
How may these actions be extended to three or more adversarial players?
How does a simple agent mimicking these human action heuristics beat a random agent?

What are the main contributions of this paper?

Battlespace, a new flexible MDO warfighter machine interface game.
Experiments with human vs. random AI agents operating in a fixed environment with fixed rules, where the overall goal of human players is to collaboratively capture the opponents’ flags or eliminate all of their units.
Analyses of the evolution of the games and identification of key features characterizing human players’ strategies and their overall goal.
Modeling of the human strategies and development of heuristic strategies for a simple AI agent.

What are the main characteristics of Battlespace wargaming platform?

twenty game plays
two human players vs. two random adversarial AI agents
two types of objectives:
- Capture-the-Flag: maneuver the ground units into the enemy territory to capture the opposing teams’ flag (flag locations are unknown)
- Annihilation: discover and attack all of the enemy units
visualization of the wargame using the Augmented REality Sandtable (ARES) platform

What is the game design?

2 teams: red, blue
multiple players per team
each team is analogous to an alliance
each player represents an actor
two domains: “Air”, “Land”
sa common battlespace designed as a set of boards for each domain
board comprises square grids
each player has a set of pieces (“Units”)
4 ground units: a Soldier, a Tank, a Truck, a Flag
1 air unit: an Airplane
action spaces:
- Soldier, Tank, Truck:
  - doNothing: stay in the current location and keep the current orientation
  - turnH: rotate the orientation by degrees, {-135, -90, -45, 0, 45, 90, 135}
  - advance1: move one cell forward in the cirection of the current orientation
  - shoot: shoot a projectile in the direction of the current orientation, where the projectile continues to advance by one cell forward until it either collides with another unit or travels outside of the game board
  - ram: advance the unit one cell forward in the direction of the current orientation, while perform shooting
- Airplane:
  - doNothing: same as above
  - turnH: same as above
  - advanceX,Y: move cells along the East-West axis and cells along the North-South axis
  - shoot: same as above
  - bomb: shoot a projectile directly below the airplane onto the land game board
- Flag: unable to move and remain in the its location until the end of the game or it is captured

What are the rules of the game?

Repeat until game completion:

Each player places their pieces on the section (evenly divided) of the game board, where the location of each unit is unknown to the other players.
Each unit observes if there are any other units within their visible range (one square of the current location).
Onnce each unit observes, the players on the same team collaborate to identify the set of actions they would like to take for each of their units.
Each player selects an action for each unit. The actions chosen are only known to the player and their teammates.
After all players have selected an action, a game resolution is applied to move the pieces according to their actions and resolves whether any units have been attacked or collided with each other. If a unit is attacked by or collided with another unit, it is removed from the board.

Criteria for game completion:

Capture-the-Flag game: Game is terminated once all of the enemy flags have been captured.
Annihilation game: Game is terminated once all of the enemy units are eliminated.

High uncertainty due to limited visibility of enemy units and flags.

Game Visualization

Why is the development of a visualization component important?

Text-based formatted table:
- lack of understanding
- difficulty in interpretation
- general sustained confusion among human players during testing
- poor decision-making (same team unit collision, friendly fire unit destruction etc.)
Battle command and situational awareness require decision-making based on information visualization of the current and future operational states.

Goals:

improve battlespace visualization
provide a user-defined common operating picture
increase decision-making capabilities

The Augmented REality Sandtable (ARES) Platform

a system for visualizing a battlespace possessing key attributes such as customizability and interactivity of geospatial terrains
a map of an area of operations based on available data from Google Maps, OpenStreetMaps, OpenTopoMap, or USGS
2D (top-down) or 3D view
three visualization modes: virtual reality (VR), augmented reality (AR), mixed reality
blue-colored symbols: Friendly team; red-colored symbols: Hostile team
deployed on DoD Supercomputing Resource Center (DSRC)

How are the experiments conducted?

participants: 5 human subjects, 2 AI agents in total
During each turn, each random agent selects an action for each unit from a fixed categorical distribution.
,
Each game consists of two human subjects vs. two random agents.
20 games, 10 for Capture-the-Flag, and 10 for Annihilation.
Once all 20 games have been conducted, analyze the collected data to identify human strategies.

Game Data Analysis

Study the frequency of actions taken by the human players:

For Capture-the-Flag game, ground units are more likely to choose an Advance and Attack approach via the ram action to search for the flag.
The action doNothing is chosen relatively more frequently.
Human subjects are more likely to choose the advance1, -2 action for the air unit to advance the unit into enemy territory to search for the flag.

For Annihilation game, human subjects are more likely to choose attack-related actions, i.e.
shoot for the ground units, as well as shoot and bomb for the air unit.

Differences in the results for the two game types:

number of projectiles: Annihilation > Capture-the-Flag
total number of turns: Annihilation >> Capture-the-Flag
(Reason: Human agents are able to find the flag faster than to find the enemy units and eliminate them.)

Learning from Historical Game Data

Model the probability of taking an action as a categorical distribution:
Why does the simple frequentist approach to model not work?
- limited data some action-state pairs may not exist in the collected game data epistemic uncertainty in must be incorporated
How to incorporate the epistemic uncertainty in instead?
- By utilizing a prior on the distribution of following a Bayesian approach.
How to identify the prior?
- Count the number of times a unit taking an action in a particular state, using the collected game data :
- Model the distribution of according to a Dirichlet distribution: where : a multivariate beta function
- Why Dirichlet distribution?
  - It is the conjugate prior of a categorical distribution.
- How to determine the hyperparameters and ?
  - Using Bayesian Bootstrap and imprecise Dirichlet models.
Compute the probability of taking an action as the posterior predictive distribution (i.e. the expected value of the action probabilities conditioned on the posterior Dirichlet distribution, conditioned on the collected game data):
- Why using posterior predictive distribution instead of frequency distribution?
  - The probability of an action-state pair is always positive, even if it is not collected in the game data.
Compute the probabilities of joint actions and states:

: number of times the joint actions and states are taken,
: hyperparameter

Performance of a Simple AI Agent

Algorithm for the AI agent:

Repeate during each turn until game terminates:

Randomly place its units in the designated area of the board.
Identify the state of each unit.
Given the state and the goal of the game, draw an action for each unit from a predefined distribution

Winning proportion of the simple AI agent against two random agents:

In both two state distribution and nine state distribution, the Capture-the-Flag strategy outperforms the Annihilation strategy, since Annihiliation is more likely to select the shoot action which results in more frequent friendly fires due to random initial placement.
The winning percentage increases when considering additional states of the units.

Future Work

Collect additional game play data.
Develop deep RL algorithms that integrate Theory-of-Mind to learn asymmetric reward expectations under 3+ player Nash equilibria.
Study complex decision-making when the underlying rules of the Battlespace game change.