research-article

Open access

Anunnaki: A Modular Framework for Developing Trusted Artificial Intelligence

Authors:

Michael Austin Langford,

Sol Zilberman,

Betty ChengAuthors Info & Claims

ACM Transactions on Autonomous and Adaptive Systems, Volume 19, Issue 3

Article No.: 17, Pages 1 - 34

https://doi.org/10.1145/3649453

Published: 13 September 2024 Publication History

PDF eReader

Abstract

Trustworthy artificial intelligence (Trusted AI) is of utmost importance when learning-enabled components (LECs) are used in autonomous, safety-critical systems. When reliant on deep learning, these systems need to address the reliability, robustness, and interpretability of learning models. In addition to developing strategies to address these concerns, appropriate software architectures are needed to coordinate LECs and ensure they deliver acceptable behavior even under uncertain conditions. This work describes Anunnaki, a model-driven framework comprising loosely-coupled modular services designed to monitor and manage LECs with respect to Trusted AI assurance concerns when faced with different sources of uncertainty. More specifically, the Anunnaki framework supports the composition of independent, modular services to assess and improve the resilience and robustness of AI systems. The design of Annunaki was guided by several key software engineering principles (e.g., modularity, composability, and reusability) in order to facilitate its use and maintenance to support different aggregate monitoring and assurance analysis tools for LESs and their respective data sets. We demonstrate Anunnaki on two autonomous platforms, a terrestrial rover, and an unmanned aerial vehicle. Our studies show how Anunnaki can be used to manage the operations of different autonomous learning-enabled systems with vision-based LECs while exposed to uncertain environmental conditions.

1 Introduction

When artificial intelligence (AI) is used for safety-critical tasks, stakeholders must be able to trust AI systems to perform as intended, despite many uncertainties due to changing operational contexts [68], as well as those that are unique to learning-enabled components (LECs) [88]. Data-driven LECs, such as deep neural networks (DNNs) [28], are often black box, more complex than traditional software, and require a “leap of faith” from stakeholders [91]. However, using inadequate AI in safety-critical applications can be detrimental, possibly leading to human injury or casualties (e.g., autonomous driving accidents [56]). Various high-level “Trusted AI” guidelines have been proposed to systematically address AI assurance concerns [55, 75, 77]. As “best practice” guidelines, these frameworks increase awareness of safety issues unique to AI systems by decomposing assurance topics into categories such as reliability, fairness, robustness, interpretability, and uncertainty quantification [75]. However, because the specific techniques used to address each assurance category are highly application-dependent, it can be challenging to generalize solutions across multiple applications. As machine learning software matures from a largely academic, research-focused domain to industrial software, we need to apply well-established software engineering practices for Trusted AI [10, 48]. This article describes a modular and composable approach to develop and support run-time management of Trusted AI to handle the different dimensions of uncertainty recognized by existing guidelines [88].

Current solutions to Trusted AI concerns, such as adversarial robustness and adversarial detection [43], are typically tightly-coupled to specific problem domains, leading to monolithic applications that are difficult to scale, reuse, and maintain [25]. When “robustifying” DNNs, techniques have been proposed to augment training data, training procedures, or network topologies, with updates interwoven into a single, monolithic learning model [76]. Because these proposed solutions are tightly-coupled to a base learning model, it can be challenging to repurpose them for alternative learning models. Furthermore, when addressing uncertainty for Trusted AI systems, many context-dependent solutions are needed to mitigate the various forms of uncertainty (e.g., robustness to malicious weather effects versus cybersecurity concerns). With monolithic solutions, any change with respect to a single form of uncertainty can require the entire learning model to be retrained and validated. As new adversarial conditions are uncovered, monolithic solutions also require extensive updates to the entire learning model.

This article describes a modular, composable approach to address multiple dimensions of uncertainty in Trusted AI. Rather than using monolithic solutions to address all issues of uncertainty for Trusted AI in a single development environment, this article proposes a framework that comprises loosely-coupled services, each of which are responsible for individual assurance concerns (e.g., robustness, resilience, interpretability). In contrast to monolithic architectures, microservice [52] architectures realize software as a collection of independently-deployable services that interact using a common interface and communication protocol [2, 21]. Microservices can be executed on separate hardware platforms and implemented with a variety of technologies to enable composability and replaceability with code reuse [52]. In the spirit of microservices, Trusted AI systems can be realized as service-oriented architectures, with separate reusable services deployed to manage the reliability, robustness, and interpretability of underlying LECs, rather than incorporating the cumulative functionality into a single component.

This article describes Anunnaki,¹ a framework comprising model-driven services to manage Trusted AI assurance concerns at run time² when LECs are exposed to uncertainty. This work, including our preliminary studies [37], is the first to explore Trusted AI as an aggregation of services to address multiple dimensions of uncertainty [51, 54, 61, 73]. Specifically, the Anunnaki framework comprises services for domain detection (e.g., adversarial detection [27]), multi-domain training (e.g., adversarial training [63]), and autonomic management [34], where the term domain refers to a subset of LEC inputs that share common attributes or characteristics (e.g., presence of rain, low-light conditions). Here, the term multi-domain training refers to techniques that can be used to generate domain-specific data for improving model robustness (via retraining) in some target domain (e.g., Enki [38], DeepRoad [94], DeepXplore [59], DeepTest [80]).³ The Anunnaki framework uses an autonomic manager to support run-time adaptation of AI-based systems. To this end, we developed Utu, a model-driven autonomic manager that coordinates LECs at run time with respect to requirements that model assurance concerns (e.g., KAOS [81] goal models). This work extends our preliminary work [37] in the following key areas. First, we have extended the Anunnaki framework to support goal models that support the uncertainty-aware requirements specification language RELAX [86] to increase an LES’s (Learning-Enabled System) flexibility to account for sources of uncertainty. Second, we have redefined two core services of the Anunnaki framework (i.e., domain detection and multi-domain training) to improve its overall generalizability and support for reuse. The Anunnaki framework leverages domain detection techniques (e.g., behavior oracles [39] or out-of-distribution methods [27]) to detect the presence of adverse phenomena (e.g., rain, fog). Multi-domain training techniques are used to robustify LECs to adverse phenomena [11, 36]. Finally, we have conducted an additional empirical study to demonstrate the use of Anunnaki in a new application area with a different set of safety requirements. The Anunnaki framework and its aggregate services are independent of the internal functionality of the managed AI system, which promotes reusability, portability, and extensibility for more flexible run-time monitoring. During execution, Anunnaki services run in parallel and independent of managed LECs. As such, the Anunnaki framework enables developers to reuse common services to generate robust alternatives to LECs, detect when LECs have entered untrusted states, and mitigate the use of LECs in untrusted states.

To demonstrate the Anunnaki framework and its aggregate services, we have applied them in two different managed AI systems with vision-based LECs: an autonomous terrestrial rover and an autonomous unmanned aerial vehicle. We first implemented the Anunnaki framework for use in an autonomous terrestrial rover that must navigate its environment while detecting pedestrians and avoiding collisions. Next, we demonstrate how Anunnaki may be instantiated and configured for developing and run-time managing an autonomous unmanned aerial vehicle designed for resource delivery missions. The unique requirements and platform-dependent components of each system illustrate the domain-agnostic properties of the proposed framework. By default, the obstacle detectors exhibit a reasonable degree of accuracy on known validation data. However, uncertainties arise when new adverse phenomena are considered (e.g., lighting changes, occluded visibility). This article demonstrates how aggregate services within the Anunnaki framework can be leveraged to reconfigure learning-enabled autonomous systems and mitigate the use of their object detectors under conditions deemed untrustworthy at run time. Through the use of independent services to assess trustworthiness and enact change in system behavior, the Anunnaki framework is a modular, composable, and reusable approach to addressing robustness and resilience to uncertainty in AI-based systems.

The remainder of this article is organized as follows. Section 2 reviews background topics and enabling technologies applicable to this work. Section 3 overviews the Anunnaki architecture and describes the aggregate elements. Section 4 demonstrates two use cases of Anunnaki on distinct autonomous platforms, one terrestrial and the other aerial. Section 5 discusses key findings from the two demonstrations. Section 6 reviews work related to the Anunnaki framework. Finally, Section 7 summarizes this work and discusses future plans to extend this work.

2 Background

This section overviews background topics for this work, including assurance concerns with deep learning systems, robotic control software, goal models, the RELAX [86] requirements specification language, and self-adaptive frameworks.

2.1 Uncertainties in Deep Learning

For DNNs, uncertainty can arise in both data acquisition and model construction [27]. Uncertainties with respect to the dataset can be due to variability in real-world environments or measurement error/noise. Uncertainties with respect to the model can be due to errors in the model structure, errors in the training procedure, or errors caused by unknown data. Two common trust issues to address when implementing Trusted AI with respect to uncertainty are reliability and robustness [88].

For this work, reliability relates to a DNN’s performance with routine (or known) operating conditions, where a DNN’s output is expected to be consistent with ground truth results [28]. Typical test procedures for DNNs include a cross validation step with test data that is independent and identically distributed with respect to training data. Evaluation metrics are task-dependent. For regression problems, metrics such as mean squared error measure the deviance between a DNN’s output and the ground truth. For classification problems, metrics such as cross entropy and accuracy can be used for evaluation. Furthermore, for object detection problems, where a distinction between false positives and false negatives is of interest, additional metrics such as precision and recall can be considered. Depending on the evaluation metrics chosen, a reliable AI system is expected to correctly interpolate and produce results that are consistent with known training data.

Adversarial detection techniques address uncertainty with respect to the reliability of a DNN [43]. Model inference enables the creation of behavior models to map specific run-time conditions of a software component to corresponding patterns of expected behavior [53]. When leveraged at run time, behavior models can be used to proactively mitigate failures resulting from the use of a DNN expected to fail [38]. Out-of-distribution techniques can also assess a degree of confidence by comparing a DNN’s run-time inputs to its training data distribution [27]. Run-time inputs that are found to fall outside of a DNN’s training distribution can be marked as uncertain and trigger alternative actions to prevent use of the DNN in a potentially hazardous state. Thus, adversarial detection techniques help to ensure DNNs are used only in contexts (sufficiently) comparable to those that have been previously validated.

Robustness relates to a DNN’s performance in the presence of previously unseen (or unknown) operating conditions, where the DNN’s output is expected to be consistent with sufficiently similar known conditions. Deviant conditions may cover any perturbation in input data for an AI system that is either malicious (e.g., jamming or image-spoofing) or inadvertent interference (e.g., environmental phenomena such as rainfall or fog). The discovery of adversarial examples [78] has demonstrated that DNNs can be highly sensitive to human-imperceptible noise and exploited into producing erroneous outputs by targeted data manipulation. Furthermore, research shows that the sensitivity of DNNs to latch onto superficial regularities in training data casts doubt on their ability to generalize to semantically valid abstractions [30]. Robust AI systems are expected to extrapolate and produce correct results for data that is reasonably different from training data.

To address the robustification of DNNs, techniques have been proposed to automatically generate synthetic data for retraining DNNs when real examples of adverse interference are absent (i.e., known unknown phenomena) [44, 80, 90, 94]. Typically, synthetic data of simulated interference is generated by transforming existing real-world data. Naïve techniques generate interference by adding random perturbations to given inputs (i.e., fuzzing) [57]. More sophisticated techniques use search-based methods to uncover interference patterns that maximize certain aspects of the AI system (e.g., neuron coverage, Kullback–Leiber divergence). Retraining DNNs with synthetically augmented data has been demonstrated to improve robustness [38].

2.2 Domain Adaptation

A key challenge addressed by this work is how to enable AI systems to operate across distinct domains. In this work, we consider a domain to be a subset of model inputs that share common attributes or characteristics and serve as a means to categorize training samples. A domain D is typically defined by three components: a feature (input) space X, a label (output) space Y, and a probability distribution \(p(x,y)\) [22]. The concept of domain adaptation was introduced by Ben-David et al. [6] in the context of natural language processing with the goal of effectively reusing an existing language model to identify malicious e-mail for a wide range of users. Domain Adaptation is needed when a neural network trained to perform a task on a dataset \(D_s\) (the source domain) may need to perform the same task on dataset \(D_t\) (the target domain) [6, 7]. More recent efforts have adopted domain adaptation as a technique for improving object classification and detection models [82]. Domain adaptation (DA) is closely related to transfer learning [84] but rather than reusing a learned model on a different task, DA reuses a learned model in a different, but closely related, domain. We assume that labels and feature spaces remain consistent across domains. The characteristics of each domain may be described at various levels of abstraction. For an image processing model, we can describe a set of unique domains based on environmental conditions (e.g., rainy, low-light), levels of noise (e.g., peak signal-to-noise ratio metrics), or resulting model behavior. It is important to note that the proposed framework is not limited to the provided examples.

Multi-domain training tools such as Enki [38] aim at increasing model robustness for a given target domain \(D_t\) by generating synthetic data belonging to \(D_t\) and retraining an existing model in this new domain. Enki inputs environmental conditions (e.g., the rain domain) and corresponding contexts (e.g., raindrop positions, appearance) and then uses an evolutionary algorithm to generate a diverse archive of environmental contexts, where diversity is defined with respect to system behavior. Enki can then be used to (i) assess the robustness of a DNN in a given domain, and (ii) robustify the DNN by retraining on the previously generated archive of diverse synthetic dataset.

Domain detection models such as the behavior oracles generated by Enlil [36] map incoming data samples \(x_i\) to a corresponding target domain \(D_t\). Enlil takes environmental conditions, corresponding contexts, and behavior category specifications as input for an evolutionary algorithm to generate diverse archives of environmental contexts for each behavior category. The generated archives can then be used to (i) assess model robustness for a given domain and then (ii) train a behavior oracle that predicts the behavior category of an incoming data sample. The behavior oracle can inform an autonomous system of its current operating context (domain) and enable more informed adaptation decisions (e.g., ensuring the applicability [and utility] of a given LEC for a given operating context).

2.3 Service-oriented Architecture for Robot Control

In order to manage heterogeneous hardware and enable software reuse in robotic applications, many developers implement the control logic of sensors and actuators as components of a robot middleware [19]. The Robot Operating System (ROS) [64] is an open source robot middleware that has been widely adopted by both academia and industry [35]. The fundamental elements of a ROS-based system are nodes, topics, and services. ROS enables the controlling algorithms for a single application to be divided into multiple independent processes (i.e., ROS nodes). ROS nodes can publish/subscribe to data unidirectionally through message buses (i.e., ROS topics) and also handle bidirectional request/reply interactions (i.e., ROS services). As a peer-to-peer network of nodes, a ROS-based system can be implemented over multiple processing units with a common registry service to facilitate communication between nodes (illustrated in Figure 1). Ultimately, ROS enables developers to abstract away from individual robotic components to focus on their software architecture.

Fig. 1.

In order to facilitate systematic development of ROS-based systems, Malavolta et al. [45] empirically identified a set of guidelines (by data mining ROS projects and surveying experts in the field to identify best practices for developing ROS-based systems) to support developers in applying good design principles to meet quality requirements and mitigate common ROS-specific software problems. Importantly, ROS-based systems that follow these guidelines align well with software engineering principles such as modularity, composability, and reusability.

2.4 Goal-based Modeling

Early in the development process, requirement engineers must identify and specify the needs and constraints of the target system to be built. Requirement engineers work closely with stakeholders to identify their goals and objectives in order to create a set of requirements that orient and guide the development process to ensure the system being built will meet the needs of relevant parties. Stakeholder goals are often qualitative in nature (e.g., the system operates safely) and difficult to formalize for automatic satisfaction guarantees. Furthermore, specifying the requirements becomes increasingly difficult when working with cyberphysical systems because of the inherent uncertainty present in unknown environments [12].

Goal-oriented requirements engineering techniques such as KAOS [81] have emerged in response to the aforementioned challenges to enable rigorous requirements specification. KAOS provides a goal-based approach to model system objectives and hierarchically decompose high-level goals into leaf-level requirements of the system [36], represented by a directed acyclic graph. Figure 2(a) presents a graphical depiction of objects in the KAOS goal model ecosystem. KAOS Goals are declarative statements describing objectives that the system under consideration should achieve [41]. Goals may exist at various levels of abstraction and the decomposition of high-level goals into lower-level sub-goals is depicted with refinement arrows. As shown in Figure 2(b), KAOS supports two types of refinements, AND-refinement and OR-refinement. A parent goal can only be satisfied if the boolean conjunction of its children evaluates to true in the case of AND-refinement, or if the boolean disjunction of its children evaluates to true in the case of OR-refinement. KAOS also supports obstacles, defined as any behavior or goal that prevents the satisfaction of another goal [41]. Obstacles may be resolved by including resolution goals that provide alternative ways a blocked goal may be achieved given the presence of an obstacle. Previously, we extended KAOS goal modeling to include utility functions [13] (i.e., functions that map system attributes to Boolean or real scalar values [36]) that are attached to goals to map attributes of the goal to quantifiable metrics to enable design- and run-time assessment of system behavior with respect to requirements satisfaction [65]. Finally, KAOS supports agents, represented by white hexagons, and defined as entities responsible for achieving system requirements and overcoming obstacles. Both the human and non-human components of a system can be represented as agents in KAOS. The combination of these entities into a logical structure enables developers to refine high-level goals into low-level requirements and explicitly define the intended behavior and functionality of the system to be built.

Fig. 2.

2.5 Uncertainty-aware Requirements Specification Languages

One challenge that arises when specifying the requirements of an autonomous system is how to strategically address environmental uncertainty. If requirements are too rigid, then the system may unnecessarily reconfigure and/or enter a failure mode, thereby preventing successful mission completion. To this end, requirements specification languages such as RELAX [86] and FLAGS [4] have been proposed to explicitly account for various sources of uncertainty by adding flexibility to system requirements. Developers can use the RELAX language to formally define and extend existing goal models to account for sources of uncertainty by “relaxing” system requirements through a set of RELAX operators. An overview of the RELAX language and its corresponding definitions are presented in Table 1. The semantics of the RELAX language have been specified in terms of a set of fuzzy logic propositions. Correspondingly, RELAX-ed requirements can be annotated with fuzzy-logic based utility functions in the KAOS goal model [65]. While KAOS obstacles are useful for identifying what factors may cause a goal to become violated, RELAX enables a developer to specify the tolerable impact of uncertainty on requirements satisficement, rather than identifying the specific causes/sources of uncertainty.

Table 1.

RELAX Operator	Description
Modal
SHALL	A requirement must hold.
MAY...OR	A requirement specifies one or more alternatives.
Temporal
EVENTUALLY	The requirement that must hold eventually.
UNTIL	A requirement must hold until a future position.
BEFORE/AFTER	A requirement must hold before or after a particular event.
AS EARLY AS POSSIBLE	A requirement specifies something that should hold as soon as possible.
AS LATE AS POSSIBLE	A requirement specifies something that should be delayed as long as possible.
AS CLOSE AS POSSIBLE TO [frequency t]	A requirement specifies something that happens repeatedly, though the frequency may be relaxed.
Ordinal
AS FEW/MANY AS POSSIBLE	A requirement specifies a countable quantity, though the exact count may be relaxed.
AS CLOSE AS POSSIBLE TO [quantity q]	A requirement specifies a countable quantity, though the exact count may be relaxed.

Table 1. Overview of Relax Vocabulary [66, 86]

2.6 Self-managing Systems

The concept of autonomic computing has become more commonly used with increasing system complexity in deployed software systems that must operate continuously, even under uncertain conditions [34]. Systems comprising numerous interconnected components can be difficult to configure and maintain. Autonomic computing proposes that such systems should manage themselves according to high-level objectives provided by system administrators [34]. These self-managing systems commonly use a feedback controller (i.e., autonomic manager) to observe and adapt managed components of the larger system [8]. Figure 3 illustrates a common realization of an autonomic manager called the Monitor-Analyze-Plan-Execute over a Knowledge base (MAPE-K) loop [34]. A MAPE-K loop comprises steps to monitor system components, analyze the system state, plan what adaptive actions need to be taken to maintain optimal performance, and execute the plan to realize the corresponding system reconfiguration. Adaptation tactics are methods for adaptation [14]. Each tactic has pre and post-conditions and a set of actions to realize an adaptation [14]. A shared knowledge base acts as a repository for any data that can inform each MAPE-K step (e.g., adaptation goals, tactics). For autonomous systems, a MAPE-K controller can automate system adaptations to achieve optimal performance in response to changing environments.

Fig. 3.

3 Methodology

This section provides a high-level overview of the Anunnaki framework. The aggregate collection of services coordinated by the Anunnaki framework collectively manage the operation of LECs in the presence of uncertain conditions in order to mitigate faults resulting from their use in untrusted conditions. Figure 4 depicts the major processes within the Anunnaki framework with a data flow diagram (DFD), where processes are depicted as interconnected circles. Rectangles depict systems external to the Anunnaki framework. Labeled arrows show data flow between processes, and persistent data stores are shown within parallel lines. Each process shown in Figure 4 is a separate service executed in parallel and independent of the managed AI system. After introducing our terrestrial demonstration platform that is used as a running example, the remainder of this section describes each of these processes.

Fig. 4.

Terrestrial Demonstration Platform. As a demonstration platform, we consider an autonomous rover as shown in Figure 5, which comprises subsystems for navigation (i.e., obstacle detection and avoidance), pedestrian communication (i.e., via light and sound signals), and remote monitoring/control. The rover’s subsystems are supported by sensors that include a forward-facing camera, an ultrasonic range finder, and a touch-sensitive bumper. The rover can be controlled either autonomously or manually by a remote operator. When operating autonomously, the rover uses both an ultrasonic range finder and a vision-based object detector to detect obstacles and avoid collisions. In this instance, the entire rover is considered as an LES, while the onboard camera and associated object detection models constitute individual LECs.

Fig. 5.

3.1 Goal Modeling

Anunnaki requires goal models described in the KAOS [81] format to specify the expected system requirements. This section describes the development and run-time monitoring of KAOS goal models to address the robustness and resilience of LESs using the Anunnaki framework and its aggregate services.

3.1.1 Constructing Goal Models.

Our running example is an autonomous rover equipped with a vision-based object detector that must navigate (i.e., detect and avoid obstacles) through an environment to fulfill a mission objective where safety is a top priority. In order to produce a set of requirements outlining the intended functionality of the system, many aspects of the current system need to be considered (e.g., mission objective, operating domain, sources of uncertainty) For example, when the rover is operating autonomously, we want to ensure the system can detect and warn nearby pedestrians. We may also want the rover to detect when such capabilities become degraded in order to trigger a fail-safe mechanism. These requirements can be explicitly defined via KAOS goal models. A corresponding KAOS goal model is shown in Figure 6, comprising system objectives for the managed learning-enabled rover. Blue parallelograms represent system goals (e.g., G12: “Rover warns nearby pedestrians.”). Any potential hazards or obstacles that could prevent the satisfaction of a goal are shown as red parallelograms (e.g., O1: “Object detector is degraded/compromised.”). The Anunnaki framework includes a microservice that analyzes utility functions attached At the leaf-level, agents are shown as white hexagons to indicate which system components are responsible for achieving associated goals (e.g., A1: controller, A2: camera, A3: ultrasonic sensor).

Fig. 6.

3.1.2 RELAX-ing Requirements.

In order to account for the potential impact of environmental uncertainty and increase system flexibility at run time, we extended Anunnaki to support goal models that include RELAX [86] goals. Consider goal G6 in Figure 7(a), which specifies a minimum allowed sample rate for the rover’s onboard ultrasonic sensor. At run time, several external and internal conditions may impact the update frequency of the ultrasonic sensor. In practice, our system should be flexible to certain variations in operating conditions within some acceptable range to avoid unnecessary fail-safe procedures. Figure 7(b) presents goal G6 after the requirement has been “relaxed”. Originally, the associated utility function would return 0 or 1, indicating if the sensor update frequency f was above an acceptable value (\(f \ge 5.0\)). The new RELAX-ed requirement now returns 0.0 when \(f \le 4.5\), 1.0 when \(f \ge 5.0\), and \(\frac{(4.5-f)}{0.5}\) when \(f \in (4.5,5.0)\) (Figure 7(c)). This RELAXation provides greater system flexibility to sources of uncertainty by increasing the fidelity of observable system properties.

Fig. 7.

3.2 Resiliency Through Predictive Behavior

In this work, we consider a system to be resilient if it can mitigate different sources of uncertainty to maintain safe behavior [40]. To this end, the Anunnaki framework leverages model inference and behavior models of an LEC to support domain detection (Figure 4, Steps 1 and 2). Adverse interference can include any malicious noise or environmental phenomena that result in undesirable behavior from an LEC. Behavior models are used to predict the impact of adverse conditions absent from existing training/validation data, thus enabling the Anunnaki framework to prevent the use of LECs under conditions they would normally perform unreliably (e.g., poor lighting conditions). As an abstract service, the Domain Detection service (Figure 4, Step 1) can implement any behavior modeling technique that takes raw sensor data and detects the presence of adverse interference.

One example model inference method for domain detection is Enlil [39], that constructs behavior models of an LEC by assessing the impact of various environmental phenomena within an external simulator. Enlil generates a behavior model that can be executed, independent of the LEC as a behavior oracle. One or more behavior oracles can run in parallel to the managed AI system and subscribe to the same sensor data received by managed LECs. As sensor data is received, behavior oracles output behavior assessments (Figure 4, Step 1), which include both a perceived context for any apparent adversarial noise and an inferred behavior category to summarize the impact of the adversarial noise. As adversarial detection services, behavior oracles publish behavior assessments to any other subscribing service, thus enabling the Anunnaki framework to detect and respond to adverse run-time conditions.

3.3 Robustifying Learning Models

To address the robustness of LECs, the Anunnaki framework can use robustified alternate learning models, created through any multi-domain training techniques, such as adversarial training, or using data synthetically augmented to include adverse phenomena (e.g., rain, fog). For example, Enki is a method proposed for robustifying LECs to known unknown adverse environmental phenomena [38]. Using Enki, robust learning models are generated by running a simulator to uncover examples of adverse phenomena that lead to a diverse array of behavior patterns for the given LEC. The diverse collection of adversarial examples are then used to retrain the default learning model [38].

At run time, a Learning Model Manager service (Figure 4, Step 2) enables the managed AI system’s LECs to swap default learning models with alternative, robustified learning models created through adversarial training (e.g., Enki). This service-oriented approach enables separate learning models to be robustified with respect to specific forms of adverse interference, and applicable learning models can be swapped in based on the behavior oracle’s assessment of run-time contexts. When no adverse interference is detected, the default learning model is activated. By decoupling the problem of robustification from a single learning model to separate, independent learning models, the Anunnaki framework enables more flexibility to the developers on what forms of adverse phenomena are addressed by any given implementation of the managed AI system. Furthermore, this approach enables developers to maintain and augment specific context-dependent models without needing to retrain and validate the base learning model. For example, if rainy environments are a concern for an LEC, an additional robustified learning model can be provided to the Anunnaki framework at run time to handle rain without needing to retrain/validate the default learning model. Furthermore, additional robustified models can be created for alternative phenomena (e.g., foggy weather, poor lighting) that are also independent from each other and the default learning model. Thus, the Anunnaki framework provides a modular and composable solution to robustifying LECs.

3.4 Run-time Monitoring and Management

To monitor and control the managed AI system, the Anunnaki framework uses Utu to monitor, analyze, and reconfigure the use of LECs in response to uncertain environmental conditions (Figure 4, Step 3). In order to mitigate faults from the use of LECs in untrusted conditions, Utu assesses the run-time state of the managed AI system and issues reconfiguration requests in response to the run-time environment. Namely, Utu follows the MAPE-K model for autonomic management that comprises five separate services to support run time decision-making; see Figure 8 for a high-level description of each of these services. The remainder of this section describes how Anunnaki uses the Utu services to ensure run-time goal satisfaction.

Fig. 8.

Run-time Goal Monitoring. Utu inputs a KAOS goal model (Figure 8, Knowledge Manager Service) to analyze utility functions [16] associated with each goal and obstacle with logic propositions [70] to support run-time monitoring of goal model satisfaction (Figure 8, Monitor Service). For example, the utility function “A1.buzzer == true” is attached to goal G14. Thus, when the “buzzer” attribute of agent A1 is set to true, the goal G14 is evaluated as satisfied. Through the use of utility functions, the Anunnaki framework can interpret a KAOS goal model as a logic tree of run-time system checks to determine the satisfaction of high-level system objectives (Figure 8, Analyze Service). For example, Figure 9 shows a logic tree interpretation of the KAOS goal model in Figure 6. The Anunnaki framework also extends goal models by enabling message channels to be associated with each agent to specify which channels each respective agent publishes state data. For example, the message channel “/utu/oracle/output” is attached to agent A4, indicating that attributes for the behavior oracle can be monitored by observing the corresponding message channel. These extensions enable developers to map the same goal model to different platforms by simply redefining the associated message channels and system attributes.

Fig. 9.

Run-time Mitigation. Utu also takes a predefined set of tactics to determine what actions should be taken to mitigate faults resulting from violated goal models [14] (Figure 8, Plan Service). Because system objectives and tactics are not hard-coded into Utu, but, instead, are model-driven, the Anunnaki framework can be deployed with alternative goals and adaptation tactics by simply re-instantiating Utu at design time, with new goal models and tactics. Figure 10 shows an example adaptation tactic [14], specified in an Extensible Markup Language (XML) format. Tactics are defined with a set of preconditions, actions, and postconditions. In the given example, a “fail-safe” tactic is defined with a precondition to trigger when G3 in Figure 6 is found to be unsatisfied. For the example fail-safe tactic, the actions are to (1) request a mode-change to “manual” mode for the rover and (2) e-mail a notification to the user. Finally, a postcondition is given in the example to state that goal G3 is expected to be satisfied upon execution of the given actions. Thus, when a reconfiguration is needed due to goal violations, Utu realizes the specific actions defined by the corresponding adaptation tactic (Figure 8, Execute Service) to ensure continued goal satisfaction at run time.

Fig. 10.

4 Demonstration

To demonstrate use of the Anunnaki framework and the Utu autonomic manager to develop Trusted AI, we have implemented two autonomous cyberphysical systems, one terrestrial and another one that is aerial, both of which operate in environments with uncertain run-time conditions. This section describes the implementation of these systems, the potential impact of known unknowns on their LECs, and how the Anunnaki framework may be used to mitigate faults from using an LEC in the presence of adverse conditions. Our demonstration addresses the following research questions:

RQ1:

Is it possible to use a modular approach to support the automated assessment and improvement of the robustness and resilience of LESs?

RQ2:

Is the Anunnaki framework data and model agnostic?

4.1 Autonomous Rover Case Study

This section describes how the Anunnaki framework has been implemented for the autonomous rover presented in Section 3. First, we describe the hardware and software used in the rover case study. Next, we present results obtained during the implementation, instantiation, and execution of the Anunnaki framework (see Figure 4) and how they pertain to increased robustness to environmental uncertainty. Finally, we outline the implementation of the Utu autonomic manager and how the aggregate components of Anunnaki are integrated to provide a robust and resilient learning-enabled autonomous system.

4.1.1 Implementation of Autonomous Rover.

For demonstration purposes, a robotic rover has been assembled with a suite of sensors and actuators to enable autonomous behavior. Photographed in Figure 5, the dimensions of the rover are approximately \(30.5 \times 20.5 \times 22.0\) centimeters. The rover includes an NVIDIA Jetson Nano processor to support efficient onboard deep learning computations [23]. Control software for the rover is implemented using the Melodic [71] distribution of ROS packages.

In autonomous mode, the rover relies on computer vision to identify the types of obstacles present in its environment. The rover’s vision-based object detector is implemented as a RetinaNet [42] DNN, using PyTorch [62] deep learning libraries. The object detector has been trained to detect objects from two-dimensional images taken from the rover’s forward-facing camera. For each object detection, both a category label and bounding box are given to identify the type of object and what region of the image it covers.

To train and validate the object detector, 2,500 labeled images were manually collected using replica objects of both humans and deer scattered in the operating environment of the autonomous rover’s onboard camera. Two thousand images were reserved for training and 500 were reserved for only validation. The object detector was trained until its training error converged to a minimum (after 25 epochs). When evaluated against the reserved validation images, the object detector was found to correctly detect images of humans and deer with a precision of \(98.8\%\), a recall of \(94.8\%\), and an F-score of 96.8%.

Despite the promising results when testing the object detector with validation images, uncertainty remains with respect to the robustness and reliability of the object detector in the presence of phenomena missing from both training and validation images. For example, Figure 11 shows examples of the object detector’s performance in a variety of lighting conditions. In Figure 11(a), the impact of dimmed lighting is shown. As light intensity decreases from Figure 11(a)i. to 11(a)iii., the ability of the object detector diminishes. However, the exact threshold and conditions at which this degradation occurs is unknown. Similarly, in Figure 11(b), the ability of the object detector is degraded as a bright light source is introduced into the scene, either behind the camera (Figure 11(b)ii.) or behind the obstacles (Figure 11(b)iii.). Though the object detector has been observed to have a high precision and recall under known conditions, it remains unclear how it will perform in these known unknown conditions.

Fig. 11.

4.1.2 Creating Behavior Oracles for Autonomous Rover.

Because the threshold between an acceptable environmental and unacceptable environmental condition (Figure 11(a)i. and 11(a)ii., respectively) is unknown for the rover’s object detector, we need a method to determine when resulting object detections can be trusted. The Anunnaki framework can leverage Enlil to create a behavior oracle to determine this threshold (Figure 4, Step 1). Enlil creates an oracle by automatically assessing the object detector’s performance boundaries under simulated environmental conditions. For example, Enlil can automatically assess the object detector’s performance under a range of hue, saturation, and lightness (HSL) conditions and create a behavior oracle to predict the object detector’s performance under any given HSL context. When additional known unknown phenomena are discovered (e.g., a raindrop occluding the camera’s view), additional behavior oracles can be generated to predict how the rover’s object detector will be impacted by each respective phenomenon.

The scatter plot in Figure 12(a) shows Enlil’s automated behavior assessments under a range of HSL contexts, with each point corresponding to a different context. Green points represent cases in which the object detector’s performance was not impacted (i.e., \(\lt\)5% decrease in the default object detector’s F1-score). Yellow points represent cases in which the object detector’s performance is degraded (i.e., \(\gt\)5% decrease in F-score). Red points represent cases in which the object detector’s performance is compromised (i.e., \(\gt\)10% decrease in F-score). From these results, Enlil can generate a behavior oracle that correctly predicts the behavior of the object detector under any HSL context with an 83% accuracy. Similarly, Figure 12(b) shows Enlil’s assessments of the object detector’s performance when its view has been occluded by raindrops placed on the camera lens, where raindrop_x and raindrop_y represent the (center) position of a raindrop within an image, and raindrop_radius represents the size of the raindrop. Enlil can generate a behavior oracle that correctly predicts the impact of a raindrop occluding the view of the rover’s object detector with an 87% accuracy. The Anunnaki framework can leverage these behavior oracles to prevent the rover from relying on its object detector under environmental conditions in which it is expected to fail.

Fig. 12.

4.1.3 Creating Robustified Learning Models for Autonomous Rover.

Instead of updating the rover’s object detector to be robust to all environmental conditions, our approach is to create a range of context-dependent operational modes, with separate DNNs robustified for each respective known unknown phenomenon. The Anunnaki framework uses Enki to create these robustified DNNs. When exposing the default object detector to a random sampling of HSL variations, we found that its F-score decreased from 96.8% to only 2%. This significant decrease demonstrates that the object detector is not sufficiently robust to different lighting conditions. However, using Enki to generate diverse synthetic data, we were able to retrain the default learning model, to create a robustified version of the object detector’s DNN that achieves an F-score of 60.7% under random HSL variations. Similarly, we found that the default object detector was not very robust to raindrop occlusion, observing that its F-score decreased from 96.8% to 5% when evaluated with a random sampling of occluding raindrops. Using Enki, we were able to train and create a separate DNN more robust to raindrops, with an F-score of 87% for random raindrops. Under both environmental contexts (i.e., HSL variations and raindrop occlusions), we observe a significant decrease in F1-scores when evaluating the default object detectors. This decrease in performance can be explained by an increase in false negative predictions. Namely, as environmental conditions distort sensor inputs, the DNN fails to detect objects in the scene as it was never exposed to the observed variations during training. By using separate DNNs that target each respective environmental phenomenon, the integrity of the default object detector is preserved (i.e., it is not influenced by Enki or any synthetic data). However, if an adverse condition is uncovered at run time and the object detector’s default DNN is expected to fail, then the corresponding robustified DNN created by Enki can be used in place of the default DNN. Switching in different learning models to handle the changing environmental conditions is analogous to the mode-changing commonly used in adaptive automotive systems and with traditional transportation systems [69].

4.1.4 Implementing Anunnaki Services for Autonomous Rover.

The Anunnaki framework has been implemented with ROS. Each of the services depicted in Figure 4 (e.g., Step 1: Domain Detection, Step 2: Learning Model Manager, Step 3a: Knowledge Manager) can be instantiated as separate ROS nodes within a single Anunnaki ROS package. Figure 13 provides a graph of Anunnaki ROS nodes (shown as ellipses) and ROS message topics (shown as rectangles) used for communication.

Fig. 13.

When executed on the same network as the autonomous rover, Anunnaki ROS nodes can publish and subscribe to ROS topics provided by the rover in order to monitor and reconfigure the behavior of the rover. ROS nodes are instantiated for each behavior oracle created by Enlil (e.g., /adv_detector in Figure 13). As ROS nodes, behavior oracles can continuously monitor the rover’s sensor data and predict how the object detector will perform at run time, publishing behavior assessments to any ROS node on the same network. Separate ROS nodes are also instantiated for each Utu MAPE-K step (e.g., /utu_monitor, /utu_analyze, /utu_plan, /utu_execute, and /utu_knowledge). The /utu_monitor node monitors ROS message traffic published by the rover and any behavior oracles that have been instantiated. The /utu_analyze node evaluates the active goal model and selects an adaptation tactic when the goal model is not satisfied. The /utu_plan and /utu_execute nodes then translate the selected tactic into ROS messages that can be published to the rover or into ROS services that can be requested from the rover. Additionally, a /lm_manager node is instantiated to handle the swapping of Enki learning models when an adaptation tactic requests a robustified model should be substituted for the default learning model. Thus, the Anunnaki framework is realized as a package of coordinated ROS node services that automatically monitor and control the rover’s object detector with respect to user-defined goal models.

Modularity-driven Approach for ROS-based Systems. In order to illustrate the benefits (i.e., modularity, composability, and reusability) of Anunnaki, we have identified a subset of the ROS-based architectural guidelines proposed by Malavolta et al. [46] that are reflected in our ROS-based implementation of the framework. While we did not have access to these guidelines during the development process, a retrospective analysis of our codebase’s alignment indicates that our ROS-based implementation of Anunnaki is well-aligned with a majority of them, particularly those that support software engineering principles. Table 2 provides an overview of the identified guidelines as manifested in our implementation, including the guideline ID and the guideline description. In most cases, it is sufficient to indicate that a particular guideline was used (denoted by a checkmark), but for a few guidelines that are a bit more nuanced or include more than one implementation option, we also provide a brief explanation of our design decision and its manifestation in our implementation. Furthermore, our ROS-based implementation of Anunnaki adheres to the guidelines most related to modularity (e.g., C2, N1-N4, I2), maintainability (e.g., N9, I1, I6, H1), and robustness (e.g., C9, S1, S2, S4).

Table 2.

ID	Guideline	Realization
Communication and networking (C)
C1	Use standardized ROS message formats, possibly supporting also their legacy versions.	✓
C2	ROS nodes should be agnostic of underlying communication mechanisms.	✓
C5	Nodes that potentially produce/consume large amounts of messages should be configurable in terms of their publish/subscribe rates.	✓
C6	Selectively limit the data exchanged between nodes to provide only the information that is strictly necessary for completing tasks.	✓
C8	Develop adapter components when data exchanged between nodes is not compatible (semantically), incorrect, out-of-order, or redundant.	✓
C9	Use services when starting up robots (instead of publishing to topics) so that the status of the system can be checked before operation.	✓
C11	Frequent messages should be exchanged either via services with persistent connections or via topic-based communication.	Frequent messages use topics.
C12	Run multiple nodes in a single process when the overhead due to interprocess communication is too high both in terms of frequency of messages and payload.	✓
C13	Manage topics to avoid unnecessary publishing and subscribing.	✓
Node responsibilities within the system (N)
N1	Group nodes and interfaces into cohesive sets, each with its own responsibilities and well-defined dependencies.	✓
N2	Each ROS package should be responsible for one and only one feature of the system or robot capability and provide a well-defined interface.	Packages are separated for Utu and terrestrial rover.
N3	Decouple nodes with responsibilities that naturally work at different rates and use different rates for different purposes.	Nodes can be configured at different rates independently.
N4	By design, limit unnecessary computationally-heavy operations by carefully analyzing the execution scenarios across ROS nodes.	✓
N5	Transform data only when it is used, for efficiency in terms of computation and bandwidth.	✓
N6	Design each single node so that it is runnable (and testable) in isolation.	✓
N8	Use a dedicated node to store and represent globally-relevant data (e.g., the physical environment where the system operates) and use it as the single source of truth for all the other nodes in the system.	The Knowledge Manager (see Figure 4, Step 3a) stores global information.
N9	Keep the number of nodes as low as possible to support the basic execution scenarios and extend the architecture for managing corner cases.	✓
Internal behavior of the nodes (B)
B2	Nodes with high-frequency operations should be configurable so that they can operate according to available computational resources.	✓
B5	Nodes with configuration errors should fail explicitly at bringup time.	✓
B6	If a node is computationally expensive, then ensure that it only executes when it is strictly needed.	✓
Interface to external users and third-party developers (I)
I1	Assign meaningful names to architectural elements and group them by adopting standard prefixes/suffixes.	✓
I2	When possible, core algorithms, libraries, and other generic software components should be ROS-agnostic.	Core algorithms (e.g., Enki, Enlil) are ROS-agnostic.
I6	Logging should be standardized across the project and follow well-defined guidelines.	✓
Interaction with hardware and other lower-level entities (H)
H1	Nodes interacting with simulators and hardware devices should provide identical ROS messaging interfaces to the rest of the system.	✓
H2	When possible, design the system to be hardware-independent.	✓
Safety-critical concerns (S)
S1	ROS nodes should be resilient with respect to the amount and frequency of data received by sensors.	ROS nodes will only process data at configured data rates, set upon instantiation.
S2	Use different communication channels and different (hardware and software) platforms depending on the criticality and real-time requirements of the nodes.	Deployment topology can be configured as necessary.
S4	Provide at least one globally-reachable node capable of receiving run-stop messages and stopping/resetting the whole system.	✓
Data persistence (P)
P1	Avoid persisting raw data if only part of it will be used.	Media is compressed to reduce overhead, also not persistent.
P3	Use a dedicated node for persisting and querying long-term data.	✓

Table 2. Overview of the ROS-based Guidelines [46] Manifested in the ROS-based Implementation of Anunnaki as Identified through Retrospective Analysis

To facilitate interpretability of the managed AI system, a graphical user interface (GUI) of the Anunnaki framework is provided for users to visually and dynamically observe the Utu MAPE-K controller at run time (see Figure 14 and Figure 15 for example scenarios corresponding to ideal and adverse conditions, respectively). Throughout system deployment, the instantiated GUI provides run-time visualizations to monitor system behavior, observe utility value measurements, and obtain explicit reasoning behind adaptive actions. Figure 14 shows an example of the autonomous rover operating in an ideal lighting condition, where the rover’s object detector can properly detect all pedestrians. In Figure 14, the Anunnaki GUI displays the state of each Utu MAPE-K step, the output of each behavior oracle, and the current evaluation of the active goal model (from Figure 6). The goal model is shown as a logic tree of goals, each of which has an associated utility function. At run time, individual goals are highlighted in green when satisfied and red when unsatisfied. In Figure 14, the behavior oracle predicts that the current environment has no adverse impact on the object detector (i.e., Category 0). Thus, the overall goal model in Figure 14 is satisfied (i.e., root goal G1 is green), and no adaptation is selected to reconfigure the rover. In contrast, Figure 15 shows an example of the rover operating in a dim lighting condition, where the rover’s object detector fails to recognize two of the pedestrians in front of the rover. The output of the behavior oracle in Figure 15 indicates that the object detector is degraded (i.e., Category 1). The resulting evaluation shows that the goal model shows that is unsatisfied (i.e., root goal G1 is red), and therefore the “fail-safe” tactic from Figure 10 is executed to switch the rover from autonomous operation to a manual mode. Thus, the Anunnaki framework can prevent a pedestrian collision that would otherwise result from the use of the rover’s object detector in dim lighting.

Fig. 14.

Fig. 15.

4.2 Autonomous Unmanned Aerial Vehicle Case Study

To further illustrate how Anunnaki supports modularity, reusability, and extendability when developing learning-enabled autonomous systems, we present a second empirical study implementing the Anunnaki framework for use in an unmanned aerial vehicle (UAV). The new operating domain necessitates a new set of safety requirements, a different LEC architecture, and a different goal model when compared with the autonomous rover. To assess Anunnaki in a UAV, we implemented a simulation environment and UAV controller using the Webots autonomous vehicle simulator [49]. A 3D render of the UAV model used in our demonstration is provided in Figure 16. The UAV’s sensors include a forward facing camera, altitude meter, and onboard GPS. The UAV can be controlled both manually and autonomously. A comprehensive overview of differences in Anunnaki framework instantiations and customizations for the respective applications (e.g., DNN type, hardware) is presented in Table 3, where the shaded rows indicate application differences.

Table 3.

Feature	Rover	UAV
Platform	Jetson Nano	WEBOTS Robot
Object Detector	RetinaNet	YoloV5
Training Data	Custom	VisDrone
Application Type	Terrestrial	Aerial
Multi-domain Training	Enki	Enki
Domain Detection	Enlil	Enlil
Autonomic Manager	Utu	Utu

Table 3. Comparison between the Managed Rover and UAV Autonomous Systems

Grey shading indicates application differences. This table highlights the Anunnaki framework’s support for reusability and extensibility, both of which facilitate rigorous development of trusted autonomous system software.

Fig. 16.

When operating autonomously, the UAV relies on computer vision to perceive its environment via an onboard camera. In contrast to the autonomous rover, the UAV’s object detector is implemented as a YOLOv5 [31] DNN architecture. Previously, Zhan et al. [92] used this architecture successfully for onboard UAV image detection. YOLOv5 offers faster inference times than other single-shot detection models such as RetinaNet [79], thus making YOLOv5 a popular choice for real-time object detection on mobile devices with limited resources. The greater inference speed comes at the cost of lower overall accuracy due to localization errors. To train and validate the UAV’s object detector, we use the professional-grade VisDrone-2019 dataset [96]. This dataset consists of 6,471 training images with 343,205 object labels; 548 validation images with 38,800 labels; and 1,060 unlabeled test images. Each labeled object belongs to one of the following 10 classes: pedestrian, people, bicycle, car, van, truck, tricycle, awning-tricycle, bus, and motor. We use the pretrained weights provided by the open source YOLOv5 implementation [31] and train our model until convergence. When evaluated against the unseen validation sample, the model correctly identified labeled bounding boxes for the 10 classes with a precision of \(60.6\%\), recall of \(62.5\%\), and an F-score of \(61.3\%\).

4.2.1 Creating Behavior Oracles for Autonomous UAV.

The UAV may encounter certain environmental conditions that would deem its perception mechanism untrustworthy. When such known unknowns are encountered, it is important for the UAV to recognize potential performance impacts and implement mitigation procedures. Although the UAV’s perception mechanism is based on a different architecture and trained on a different dataset, the Anunnaki framework enables significant development resource reductions due to code reusability. We demonstrate this feature of Anunnaki by reusing Enlil to create behavior oracles for the UAV. We set up communication channels between the default learning model and the domain detection microservice and automatically generate behavior oracles for a target operating domain. Figure 17(a) displays an example of a training sample after a randomly sampled fog transformation has been applied to it. Figure 17(b) displays a scatter plot of Enlil’s automated behavior assessment under a range of fog contexts, where fog_density represents the number of fog layers (i.e., “depth”), and fog_intensity represents the opacity of each fog layer, where red points indicate DNN failure, yellow points indicate DNN degradation, and green points indicate default DNN behavior. The set of diverse environmental contexts generated by Enlil is used to train a behavior oracle to predict expected object detector degradation (\(\gt 5\%\) decrease in F1-score) and object detector failure (\(\gt 10\%\) decrease in F1-score) with an accuracy of 74\(\%\).

Fig. 17.

4.2.2 Creating Robustified Learning Models for Autonomous UAV.

The aerial domain poses unique challenges (e.g., heavy winds, extreme cold, atmospheric clouds) for learning-enabled autonomous systems, specifically UAVs [26]. To mitigate uncertainty from adverse environmental phenomena, the Anunnaki framework makes use of Enki for (i) generating a diverse set of domain-specific environmental contexts, and (ii) generating a robustified DNN for the corresponding domain. In order to integrate Enki with the updated DNN architecture and dataset, we need only to configure context generation parameters, and set up communication channels by implementing a wrapper class for the YOLOv5 model and corresponding dataset. This modularity is a key feature of Anunnaki that reduces development costs while promoting increased system robustness. When the UAV’s default object detector is exposed to a combination of Enki-generated diverse weather phenomena (including adverse conditions), such as raindrops and varying brightness levels, the F-score decreases from \(61.3\%\) to \(27\%\). This decrease in performance demonstrates that the UAV’s default object detector is not robust to known-unknown adverse weather phenomena. After assessing the default DNN’s robustness, we can use Enki to generate a robustified model for the above-described domain. Enki’s synthetic data retrains the default DNN on a random sample of Enki-generated raindrop and varying brightness level contexts, returning a new model with an improved F-score of \(42\%\). Likewise, we can use Enki’s data to assess the default DNN’s robustness when operating in the fog domain. When the default model is assessed in Enki-generated diverse fog contexts, the F-score decreases to \(39\%\), indicating the default model is not robust to the fog domain. To improve our managed AI system’s robustness to fog, we utilize Enki’s data to retrain the object detector on a random sample of fog contexts, which yields a robustified model with in an improved F-score of \(47\%\) for the fog domain.

4.2.3 Implementing Anunnaki Services for Autonomous UAV.

This section describes how the Utu elements and corresponding KAOS goal model have been instantiated and configured to deploy a resource delivery UAV (e.g., Goal Models, Autonomic Manager, Adaptation Tactics). Control software for the UAV was implemented using the ros-noetic distribution of ROS and Python3. Simulations were carried out on a computer running Ubuntu 20.04, with 32 GB of RAM, Intel I7 CPU, and 12 GB NVIDIA GTX 3060 GPU.

For demonstration purposes, we consider a scenario where a UAV must deliver resources to an area not accessible by ground vehicles. Such situations may arise during natural disasters such as floods, wildfires, and earthquakes. Although a location may not be safe for humans or autonomous terrestrial vehicles, stranded victims may require life-saving resources such as food, water, or medical supplies [1, 18]. To this end, UAVs can be used for delivering crucial supplies to the target locations [20, 72]. However, many sources of uncertainty must be considered in order to safely and securely deliver a package to a target location (e.g., environmental conditions, obstacles, battery power limitations). An example goal model for a supply delivery mission is shown in Figure 18. In contrast to the rover’s goal model (see Figure 6), the new goal model includes several RELAX-ed requirements (highlighted green), thus increasing system flexibility to known unknown sources of uncertainty. The top-level goal of this mission is represented by goal G1: “UAV successfully completes package delivery.” We consider two explicit obstacles that may prevent the satisfaction of the goal model, O1 (“Object detector compromised”), and O2 (“UAV has inadequate power level”).

Fig. 18.

Our simulation has been designed as a way-point-directed delivery mission for the model UAV inside a Webots environment.⁴ We utilize the Utu GUI to monitor the mission at run time. To instantiate the GUI, we need only to configure the ROS-based communication channels that enable two-way communication with the simulation software and corresponding persistent data stores. The framework’s GUI depicts the aggregate run-time elements of Anunnaki (see Figure 19), including run-time monitoring and analysis of KAOS goal models, Utu MAPE-K elements, and behavior oracle decision, as well as a visualization of the UAV camera feed. A multi-pane display visually depicts the Utu MAPE-K, where the (i) Knowledge-pane shows KAOS agents and corresponding utility values, (ii) Monitor-pane shows each agent’s published topics, (iii) Analyze-pane shows goal violation patterns and available adaptation tactics, (iv) Plan-pane shows a priority-sorted queue of selected adaptation tactics, and an (v) Execute-pane shows published adaptation tactics. The Utu GUI also includes the (i) Oracle-pane that shows a predicted behavior category as obtained from the behavior oracle during the Monitor step, (ii) Goal Model-pane that shows the KAOS goal model with satisfied goals highlighted green, and violated goals highlighted red, and (iii) Camera Feed-pane that shows real-time image and object-detection data, as published by the managed system.

Fig. 19.

Figures 19 and 20 depict run-time snapshots of ideal and adverse conditions, respectively, as observed via the Utu GUI during an execution of a UAV simulated mission. Early in the mission, no adverse environmental conditions are present, and the UAV’s top-level goal remains satisfied. Figure 19 displays a live camera-feed of ideal conditions and a satisfied goal model with the top-level goal G1 highlighted green. As the UAV approaches the target location, we dynamically introduce a synthetic fog effect to demonstrate the adaptive capabilities of the Anunnaki managed system. Figure 20 displays the impact of the synthetic fog on the live camera-feed, and an unsatisfied goal model with the top-level goal G1 highlighted red. During the monitor phase, Anunnaki’s domain detector correctly identifies object detector failure. The Analyze node uses the updated utility values to obtain a corresponding adaptation tactic (Failsafe F-1). Utu sends a signal to the UAV in real time to implement Failsafe F-1, mitigating adverse weather performance impacts.

Fig. 20.

5 Discussion

An increased use of LECs for safety-critical tasks requires rigorous software engineering principles to support the deployment of trustworthy systems. A monolithic system may fail to address increasing safety concerns due to the need for context-dependent implementations of uncertainty mitigation techniques. Additional changes in hardware, software, and run-time environments can require extensive updates to existing codebases to provide continued safety assurance. We have applied the Anunnaki framework in two learning-enabled autonomous systems with practical applications to illustrate how the proposed framework may be used in practice. The remainder of this section considers our demonstration results in the context of RQ1 and RQ2.

5.1 (RQ1) Modular Approach for Trusted AI

A goal of this work was to explore how fundamental principles of software design (e.g., modularity, composability, reusability) [5, 32] can be used to develop tools/techniques to address trusted AI concerns. A key feature of modular systems is their flexibility and ability to manage complexity and uncertainty [3], both of which apply when attempting to assess and improve robustness and resilience of LESs. Namely, a modular system comprises hierarchical units that are well-defined, have high cohesion (internal interconnectedness), and have low coupling (units are independent of other units) [3]. Additionally, previous work in service-oriented architecture [52, 83] has demonstrated how decomposing a monolithic application into smaller individual services that can be developed, monitored, and reconfigured independently, has led to greater system modularity. By constructing the Anunnaki framework following key tactics outlined by Bass et al. [5], Johnson et al. [32], and Baldwin et al. [3] (e.g., low-coupling, service-oriented architecture, interoperability), the Anunnaki framework provides a core set of application-independent configurable services with support for future modifications, thereby making it well aligned with the definition of modular architecture [3]. In additional to the structure of a software system, Baldwin et al. explain that modular software should implicitly support a set of actions or operations that are unique to modular systems; these include: substitution, augmentation, and porting. We further explore how Anunnaki supports each of these modular operations to address trusted AI concerns next. Specifically, Anunnaki targets two dimensions of trust: robustness and resilience, both of which can be addressed in a variety of different ways for different sources of uncertainty. In our first demonstration, the autonomous terrestrial rover’s goal model (see Figure 6) used standard utility functions [65] to inform the autonomic manager of goal violations. However, when greater flexibility to sources of uncertainty was needed for the autonomous UAV, a new microservice was added to Anunnaki to support the RELAX language (applying modularity’s support for augmentation and substitution), thus enabling run-time management of an LES with a goal model containing RELAX-ed goals (see Figure 18). Additionally, we have shown how developers can reuse Anunnaki services to address assurance concerns for different sources of uncertainty. Namely, Anunnaki enabled run-time adaptations (e.g., swapping the active learning model, executing a fail-safe tactic) for an autonomous terrestrial rover exposed to poor lighting conditions and for an autonomous UAV exposed to fog conditions. We anticipated the autonomous terrestrial rover might encounter poor lighting conditions and leveraged the Enlil domain detection service to improve system resilience to this phenomenon. To implement a similar service for a completely different use case (i.e., autonomous UAV), we only needed to change Enlil parameters (e.g., behavior category specification, weather phenomena specifications, model and dataset addresses), thus saving a significant amount of development time (benefiting from modularity’s support for porting). This modification illustrated the low-coupling of the domain detection module as the implemented changes did not impact the majority of the framework. The Anunnaki model-driven framework and its aggregate microservices facilitated modular operations to improve LES robustness to interference and resilience with respect to known unknown sources of uncertainty (i.e., lighting variations, rain, fog conditions). Therefore, we have shown that a modular approach can be used to support the automated assessment and improvement of the robustness and resilience of LESs (RQ1).

5.2 (RQ2) Data and Model-agnostic Approach to Robustness and Resilience

A system that can be used for developing and run-time managing LESs for diverse applications and their respective data sets without needing to rebuild/retrain the entire system is considered data and model-agnostic [87, 93]. To this end, we have shown how the Anunnaki framework enables run-time adaptation in response to adverse environmental conditions for two managed AI systems deployed for two different applications. In the first case study, we leveraged the Enki microservice to generate specialized RetinaNet models with increased robustness to HSL lighting variations for our custom real-world dataset. When the autonomous rover encountered adverse lighting conditions, Utu’s autonomic manager used an Enlil behavior oracle to detect LEC degradation and reconfigure the system to execute a fail-safe procedure. In the second case study, we demonstrated how Enki was used to generate specialized YoloV5 models with increased robustness to fog conditions for the VisDrone dataset. We also demonstrated how Utu used an Enlil behavior oracle to detect the presence of LEC failure and then automatically reconfigured the managed UAV to apply a fail-safe procedure preventing potential mission failure. Anunnaki supported the generation of robustified models and the detection of the operating domain for each set of uncertainties through isolated configuration changes to corresponding microservices. Therefore, we have shown that Anunnaki is data and model-agnostic as each system’s LEC relied on a different set of learning models with distinct DNN architectures and different datasets (RQ2).

6 Related Work

This section overviews related work in developing trustworthy AI for use in self-adaptive systems.

6.1 Trustworthy AI

Many previous works propose techniques to address trustworthy AI concerns such as robustness [58, 60, 89, 95] and/or resilience [24, 53] with respect to environmental uncertainty. For example, DeepCert [58] uses formally defined environmental contexts and image perturbation levels to verify contextually-dependent DNN robustness. DeepCert further supports the selection of the best DNN from a set of developer-provided models for a given operational context. Given the large variety of tools that can support trusted AI concerns (e.g., Enki [38], Enlil [39], DeepRoad [95], DeepXplore [60], BESTEST [24]), Anunnaki is intended as a loosely-coupled collection of services rather than a fixed set of hard-coded tools. For example, to instantiate alternative services (e.g., replace Enki with DeepCert), developers need only configure the appropriate interface (e.g., published ROS messages) for the new service. Each service-type supported by Anunnaki can be interchanged with alternative techniques, to meet evolving stakeholder requirements, without requiring (potentially extensive) architecture or code changes.

6.2 Configurable Frameworks and Learning-enabled Systems

Several existing works propose configurable frameworks that use AI to support adaptive systems. For example, Weyns et al. [85] propose an architecture that uses AI to model sources of environmental uncertainty, a MAPE-K loop and user-provided goals to make adaptation decisions, and control theory to realize the selected adaptations. Specifically, Weyns et al. use DNNs to build models of sources of environmental uncertainty to enable adaptation decisions that are relevant to the current operating context. Caldas et al. [9] use AI to first optimize an algorithm that searches over a system’s adaptation space and then use AI to generate controller configurations that can realize the discovered adaptations. Jamshidi et al. [29] propose a technique that uses a simulator to generate a range operating contexts and then uses transfer learning to train a performance model that predicts an adaptive system’s performance in a given context. They show how the performance model can instantiate a knowledge base in a MAPE-K loop to inform context-dependent configuration changes. The learned performance model may be considered an implicit behavior oracle, and could potentially be used for domain detection in an instantiation of Anunnaki. Table 4 provides an overview of the specific differences between the related frameworks, where rows correspond to frameworks, columns represent features of the frameworks, and checkmarks indicate if the framework supports the respective feature. First, Table 4a overviews the role of AI in each framework. Next, Table 4b overviews design-time services supported by each framework, including the use of hierarchical goal models to represent both functional and non-functional system requirements and inform system adaptations (column D4.), and the use of optimization algorithms to generate adaptation configurations (column D5.). Model Robustification (column D3.) indicates a framework’s support for generating alternative robustified learning models to address different sources of uncertainty, which is a distinguishing feature of Anunnaki. Finally, Table 4c overviews run-time services supported by each framework. Model Inference (column R1.) indicates support for a service that can assess a managed LEC’s behavior and output behavioral assessments (e.g., perceived operating context, behavior category) to inform adaptation decisions. Model Management (column R2.) indicates support for switching the active learning model (e.g., switching the default learning model to a context-specific robustified model) in response to environmental uncertainty. Online Learning (column R3.) indicates support for incremental training or updating of LECs from incoming data in an online (i.e., during run-time) setting [74]. Quantifying Functional Goals (column R4.) signifies a framework’s ability to evaluate functional goal satisficement (e.g., via utility functions). A key difference between existing approaches and this work is the role of AI. Specifically, while previous work focuses on using AI to support one or more steps of the adaptation process for self-adaptive systems, this work focuses on using modularity and other foundational software engineering principles to address assurance for self-adaptive systems that contain one or more AI components.

Table 4.

6.3 Threats to Validity

We consider several threats to the validity of our study as outlined in this section.

External. For the demonstration of the resource delivery UAV, we rely on several external sources. These include the open source YOLOv5 model, VisDrone dataset, and the Webot simulator and corresponding UAV model. We have reviewed the source code of the YOLOv5 model to confirm its implementation follows the theoretical architecture outlined in previous research [31, 92]. The dataset used for training has been used for validating state-of-the-art techniques and model training competitions [17], both of which make it useful as a benchmark dataset for aerial object detection. Likewise, the Webots simulator has been used for numerous robotics applications [15, 67].⁵ As such, we believe that the external software used poses a minimal risk to the validity of the obtained results.

Simulation. We note that there exists a reality gap between the simulated environments (i.e., Webots environment, Enki-generated environmental contexts and the physical world. The Webots simulator is well-regarded as a robotics testing platform enabling us to demonstrate a proof-of-concept case study. We argue that any physical inconsistencies do not directly impact the results presented in this work. Additionally, our demonstration of the autonomous rover provides an implementation of the proposed framework in the physical world. We acknowledge that simulated phenomena, such as those produced by Enki, may differ from their appearance in the physical world. While the simulated contexts for environmental phenomena may not perfectly reflect the real world, previous work has shown that they can provide significant insights to developers leading to a more targeted approach of real-world data collection [38]. Finally, we note that Enki is an instantiation of an Anunnaki service rather than a core feature of the framework. Development teams may use a different tool for multi-domain training and doing so will not impact the architecture of the proposed framework.)

Stochastic Variations. The results obtained from learning-enabled services are subject to variations due to stochastic functions present in their implementations. It is important to note that this work does not seek to promote a specific learning-based technique but draws from are well studied in previous research and implemented as-provided for our demonstrations. Any refactoring of these components was conducted through a configurable wrapper class to integrate the various framework services with platform-dependent technologies and support seamless communication while keeping the underlying architecture unaltered.

Computational Complexity. In our demonstration of the Anunnaki framework, we rely on configurable services (i.e., domain detection and multi-domain training) that generate/use context-specific models (e.g., model robustified for rain conditions) to address LES robustness and resilience for different sources of uncertainty. Naturally, developers may want to robustify an LES with respect to combinations of operating contexts (e.g., model robustified to fog and rain and lighting variations) which may lead to an exponential increase in the number of models, training/memory costs, and complexity of goal model/adaptation specifications. While Enki was used to study context composition in previous work [38], there are still concerns regarding scalability of such an approach as more contexts are considered. By using evolutionary algorithms that can effectively explore complex spaces, services such as Enki and Enlil can mitigate some concerns regarding algorithmic complexity. Moreover, the Anunnaki framework enables developers to specify how many sources of uncertainty are considered and the granularity of the search procedures. Nonetheless, as the number of sources of uncertainty increase and the granularity of information increases, computational challenges increase. We will explore techniques to address this limitation in future work.

7 Conclusion

When autonomous AI systems are deployed in uncertain environments, we need to prevent system failures resulting from the inappropriate use of LECs in adverse contexts. In contrast to existing monolithic techniques to address adversarial detection and robustness, the Anunnaki framework provides a more modular service-oriented approach. The Anunnaki framework can detect adverse run-time contexts for LECs, monitor and control the use of LECs with respect to user-defined goal models, and leverage robust alternative learning models for adverse phenomena. The composability of Anunnaki services enables developers to modularly add, remove, or update behavior oracles and goal models to address new or changing assurance concerns without retraining or rebuilding LECs. Furthermore, the loose-coupling of services enables the Anunnaki framework to run in parallel and independently of managed AI systems. This article has demonstrated how the Anunnaki framework can be deployed in autonomous systems, such as a terrestrial rover to prevent obstacle collision and a UAV to deliver resources to a target location, while mitigating uncertainty resulting from the use of LECs in adverse environmental conditions (i.e., poor lighting and heavy fog, respectively). As described, the Anunnaki framework requires user-defined goal models and adaptation tactics. Future work will investigate alternative machine learning applications (e.g., ensemble models for model and uncertainty management, reinforcement learning for dynamic adaptation tactics), additional microservices, and examine run-time evaluation of dynamic goal models and services for both cybersecurity and performance concerns.

Footnotes

One meaning associated with the term “Anunnaki” is a collection of ancient Mesopotamian deities including Enki and Enlil.

In this work, the hyphenated form of this term denotes its use as an adjective (e.g., “run-time”), where as the space-separated form denotes a noun (e.g., “run time”) [47].

In contrast to many multi-domain learning techniques that aim at improving the performance of a single model [33], multi-domain training seeks to generate a set of domain-specific specialized models.

⁴

Our proof-of-concept demonstration supports soft real-time requirements. In practice, Anunnaki can be instantiated with alternative services that support the hard real-time command requirements of UAVs.

⁵

A collection of publications relying on Webots can be accessed at: https://github.com/cyberbotics/webots/discussions/2621

References

[1]

Ludovic Apvrille, Tullio Tanzi, and Jean-Luc Dugelay. 2014. Autonomous drones for assisting rescue services within the context of natural disasters. In Proceedings of the 2014 XXXIth URSI General Assembly and Scientific Symposium. 1–4. DOI:

Abstract

1 Introduction

2 Background

2.1 Uncertainties in Deep Learning

2.2 Domain Adaptation

2.3 Service-oriented Architecture for Robot Control

2.4 Goal-based Modeling

2.5 Uncertainty-aware Requirements Specification Languages

2.6 Self-managing Systems

3 Methodology

3.1 Goal Modeling

3.1.1 Constructing Goal Models.

3.1.2 RELAX-ing Requirements.

3.2 Resiliency Through Predictive Behavior

3.3 Robustifying Learning Models

3.4 Run-time Monitoring and Management

4 Demonstration

4.1 Autonomous Rover Case Study

4.1.1 Implementation of Autonomous Rover.

4.1.2 Creating Behavior Oracles for Autonomous Rover.

4.1.3 Creating Robustified Learning Models for Autonomous Rover.

4.1.4 Implementing Anunnaki Services for Autonomous Rover.

4.2 Autonomous Unmanned Aerial Vehicle Case Study

4.2.1 Creating Behavior Oracles for Autonomous UAV.

4.2.2 Creating Robustified Learning Models for Autonomous UAV.

4.2.3 Implementing Anunnaki Services for Autonomous UAV.

5 Discussion

5.1 (RQ1) Modular Approach for Trusted AI

5.2 (RQ2) Data and Model-agnostic Approach to Robustness and Resilience

6 Related Work

6.1 Trustworthy AI

6.2 Configurable Frameworks and Learning-enabled Systems

6.3 Threats to Validity

7 Conclusion

Footnotes

References

Index Terms

Recommendations

Artificial intelligence

Toward Trustworthy Artificial Intelligence (TAI) in the Context of Explainability and Robustness

Modern software cybernetics

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

PDF

eReader

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations