The goal of this project is to develop explainable AI systems to advance machine learning and HCI for two objectives.
Goal 1: Developing interpretable models, which will integrate, and thus realize the benefits of, three representational paradigms studied in machine learning, computer vision, autonomy, and AI.
- Deep Neural Network (DNN) for its rich features to ground symbols on raw signals and high performances on a range of specific tasks, through deep learning and big data training.
- And-Or Graphs (AOG) for its compositional graph structures in spatial, temporal, and causal dimensions to enable interpretable probabilistic inference under uncertainty.
- Predicate Calculus (PC) for its capability of deduction and reasoning across long space-time range and for communicating with human users at higher levels.
Goal 2: Developing explanation models and interfaces, which will effectively communicate with human users, e.g. analysts or users collaborating with the XAI system, so that users gain insights and trust by understanding the inner functioning and inference trace of the system that derive its results and decisions. We will develop explanations at three levels in increasing depth.
- Concept compositions that are represented by fragments of parse graph. The latter show how information is aggregated from its constituents and contexts, how decisions are made at various nodes under uncertainty, and confidence levels of these decisions.
- Causal and counterfactual reasoning which is realized by extracting causal diagrams from STC-AOG, predicts what will happen and what could have happened if certain alternative actions had been performed, and thus answers the “how” and “what if” questions.
- Utility (state value, decision loss and action cost) that is the ultimate answer to why the system makes decisions in comparison with alternative actions and choices.
We will develop the XAI system on two closely related task domains with a shared representation.
Domain I: Multi-modal analytics for complex event understanding. The system ingests videos captured by a network of cameras (indoor, outdoor, mobile, infrared) and text input from human intelligence; reconstructs and composes 3D scenes; infers the objects, human pose, actions, attributes and group activities in the global context of the scene; and outputs spatial and temporal parse graphs with probabilities associated with nodes.
Domain II: Autonomy by recognition, reasoning and planning. Leveraging on the recent advances in virtual and augmented reality, we will construct a multi-modal virtual reality platform, in addition to a robot platform developed under SIMPLEX project.
Acknowledgments
This work is supported by the DARPA Award N66001-17-2-4029.