skip to main content
research-article
Open access

Anunnaki: A Modular Framework for Developing Trusted Artificial Intelligence

Published: 13 September 2024 Publication History

Abstract

Trustworthy artificial intelligence (Trusted AI) is of utmost importance when learning-enabled components (LECs) are used in autonomous, safety-critical systems. When reliant on deep learning, these systems need to address the reliability, robustness, and interpretability of learning models. In addition to developing strategies to address these concerns, appropriate software architectures are needed to coordinate LECs and ensure they deliver acceptable behavior even under uncertain conditions. This work describes Anunnaki, a model-driven framework comprising loosely-coupled modular services designed to monitor and manage LECs with respect to Trusted AI assurance concerns when faced with different sources of uncertainty. More specifically, the Anunnaki framework supports the composition of independent, modular services to assess and improve the resilience and robustness of AI systems. The design of Annunaki was guided by several key software engineering principles (e.g., modularity, composability, and reusability) in order to facilitate its use and maintenance to support different aggregate monitoring and assurance analysis tools for LESs and their respective data sets. We demonstrate Anunnaki on two autonomous platforms, a terrestrial rover, and an unmanned aerial vehicle. Our studies show how Anunnaki can be used to manage the operations of different autonomous learning-enabled systems with vision-based LECs while exposed to uncertain environmental conditions.

1 Introduction

When artificial intelligence (AI) is used for safety-critical tasks, stakeholders must be able to trust AI systems to perform as intended, despite many uncertainties due to changing operational contexts [68], as well as those that are unique to learning-enabled components (LECs) [88]. Data-driven LECs, such as deep neural networks (DNNs) [28], are often black box, more complex than traditional software, and require a “leap of faith” from stakeholders [91]. However, using inadequate AI in safety-critical applications can be detrimental, possibly leading to human injury or casualties (e.g., autonomous driving accidents [56]). Various high-level “Trusted AI” guidelines have been proposed to systematically address AI assurance concerns [55, 75, 77]. As “best practice” guidelines, these frameworks increase awareness of safety issues unique to AI systems by decomposing assurance topics into categories such as reliability, fairness, robustness, interpretability, and uncertainty quantification [75]. However, because the specific techniques used to address each assurance category are highly application-dependent, it can be challenging to generalize solutions across multiple applications. As machine learning software matures from a largely academic, research-focused domain to industrial software, we need to apply well-established software engineering practices for Trusted AI [10, 48]. This article describes a modular and composable approach to develop and support run-time management of Trusted AI to handle the different dimensions of uncertainty recognized by existing guidelines [88].
Current solutions to Trusted AI concerns, such as adversarial robustness and adversarial detection [43], are typically tightly-coupled to specific problem domains, leading to monolithic applications that are difficult to scale, reuse, and maintain [25]. When “robustifying” DNNs, techniques have been proposed to augment training data, training procedures, or network topologies, with updates interwoven into a single, monolithic learning model [76]. Because these proposed solutions are tightly-coupled to a base learning model, it can be challenging to repurpose them for alternative learning models. Furthermore, when addressing uncertainty for Trusted AI systems, many context-dependent solutions are needed to mitigate the various forms of uncertainty (e.g., robustness to malicious weather effects versus cybersecurity concerns). With monolithic solutions, any change with respect to a single form of uncertainty can require the entire learning model to be retrained and validated. As new adversarial conditions are uncovered, monolithic solutions also require extensive updates to the entire learning model.
This article describes a modular, composable approach to address multiple dimensions of uncertainty in Trusted AI. Rather than using monolithic solutions to address all issues of uncertainty for Trusted AI in a single development environment, this article proposes a framework that comprises loosely-coupled services, each of which are responsible for individual assurance concerns (e.g., robustness, resilience, interpretability). In contrast to monolithic architectures, microservice [52] architectures realize software as a collection of independently-deployable services that interact using a common interface and communication protocol [2, 21]. Microservices can be executed on separate hardware platforms and implemented with a variety of technologies to enable composability and replaceability with code reuse [52]. In the spirit of microservices, Trusted AI systems can be realized as service-oriented architectures, with separate reusable services deployed to manage the reliability, robustness, and interpretability of underlying LECs, rather than incorporating the cumulative functionality into a single component.
This article describes Anunnaki,1 a framework comprising model-driven services to manage Trusted AI assurance concerns at run time2 when LECs are exposed to uncertainty. This work, including our preliminary studies [37], is the first to explore Trusted AI as an aggregation of services to address multiple dimensions of uncertainty [51, 54, 61, 73]. Specifically, the Anunnaki framework comprises services for domain detection (e.g., adversarial detection [27]), multi-domain training (e.g., adversarial training [63]), and autonomic management [34], where the term domain refers to a subset of LEC inputs that share common attributes or characteristics (e.g., presence of rain, low-light conditions). Here, the term multi-domain training refers to techniques that can be used to generate domain-specific data for improving model robustness (via retraining) in some target domain (e.g., Enki [38], DeepRoad [94], DeepXplore [59], DeepTest [80]).3 The Anunnaki framework uses an autonomic manager to support run-time adaptation of AI-based systems. To this end, we developed Utu, a model-driven autonomic manager that coordinates LECs at run time with respect to requirements that model assurance concerns (e.g., KAOS [81] goal models). This work extends our preliminary work [37] in the following key areas. First, we have extended the Anunnaki framework to support goal models that support the uncertainty-aware requirements specification language RELAX [86] to increase an LES’s (Learning-Enabled System) flexibility to account for sources of uncertainty. Second, we have redefined two core services of the Anunnaki framework (i.e., domain detection and multi-domain training) to improve its overall generalizability and support for reuse. The Anunnaki framework leverages domain detection techniques (e.g., behavior oracles [39] or out-of-distribution methods [27]) to detect the presence of adverse phenomena (e.g., rain, fog). Multi-domain training techniques are used to robustify LECs to adverse phenomena [11, 36]. Finally, we have conducted an additional empirical study to demonstrate the use of Anunnaki in a new application area with a different set of safety requirements. The Anunnaki framework and its aggregate services are independent of the internal functionality of the managed AI system, which promotes reusability, portability, and extensibility for more flexible run-time monitoring. During execution, Anunnaki services run in parallel and independent of managed LECs. As such, the Anunnaki framework enables developers to reuse common services to generate robust alternatives to LECs, detect when LECs have entered untrusted states, and mitigate the use of LECs in untrusted states.
To demonstrate the Anunnaki framework and its aggregate services, we have applied them in two different managed AI systems with vision-based LECs: an autonomous terrestrial rover and an autonomous unmanned aerial vehicle. We first implemented the Anunnaki framework for use in an autonomous terrestrial rover that must navigate its environment while detecting pedestrians and avoiding collisions. Next, we demonstrate how Anunnaki may be instantiated and configured for developing and run-time managing an autonomous unmanned aerial vehicle designed for resource delivery missions. The unique requirements and platform-dependent components of each system illustrate the domain-agnostic properties of the proposed framework. By default, the obstacle detectors exhibit a reasonable degree of accuracy on known validation data. However, uncertainties arise when new adverse phenomena are considered (e.g., lighting changes, occluded visibility). This article demonstrates how aggregate services within the Anunnaki framework can be leveraged to reconfigure learning-enabled autonomous systems and mitigate the use of their object detectors under conditions deemed untrustworthy at run time. Through the use of independent services to assess trustworthiness and enact change in system behavior, the Anunnaki framework is a modular, composable, and reusable approach to addressing robustness and resilience to uncertainty in AI-based systems.
The remainder of this article is organized as follows. Section 2 reviews background topics and enabling technologies applicable to this work. Section 3 overviews the Anunnaki architecture and describes the aggregate elements. Section 4 demonstrates two use cases of Anunnaki on distinct autonomous platforms, one terrestrial and the other aerial. Section 5 discusses key findings from the two demonstrations. Section 6 reviews work related to the Anunnaki framework. Finally, Section 7 summarizes this work and discusses future plans to extend this work.

2 Background

This section overviews background topics for this work, including assurance concerns with deep learning systems, robotic control software, goal models, the RELAX [86] requirements specification language, and self-adaptive frameworks.

2.1 Uncertainties in Deep Learning

For DNNs, uncertainty can arise in both data acquisition and model construction [27]. Uncertainties with respect to the dataset can be due to variability in real-world environments or measurement error/noise. Uncertainties with respect to the model can be due to errors in the model structure, errors in the training procedure, or errors caused by unknown data. Two common trust issues to address when implementing Trusted AI with respect to uncertainty are reliability and robustness [88].
For this work, reliability relates to a DNN’s performance with routine (or known) operating conditions, where a DNN’s output is expected to be consistent with ground truth results [28]. Typical test procedures for DNNs include a cross validation step with test data that is independent and identically distributed with respect to training data. Evaluation metrics are task-dependent. For regression problems, metrics such as mean squared error measure the deviance between a DNN’s output and the ground truth. For classification problems, metrics such as cross entropy and accuracy can be used for evaluation. Furthermore, for object detection problems, where a distinction between false positives and false negatives is of interest, additional metrics such as precision and recall can be considered. Depending on the evaluation metrics chosen, a reliable AI system is expected to correctly interpolate and produce results that are consistent with known training data.
Adversarial detection techniques address uncertainty with respect to the reliability of a DNN [43]. Model inference enables the creation of behavior models to map specific run-time conditions of a software component to corresponding patterns of expected behavior [53]. When leveraged at run time, behavior models can be used to proactively mitigate failures resulting from the use of a DNN expected to fail [38]. Out-of-distribution techniques can also assess a degree of confidence by comparing a DNN’s run-time inputs to its training data distribution [27]. Run-time inputs that are found to fall outside of a DNN’s training distribution can be marked as uncertain and trigger alternative actions to prevent use of the DNN in a potentially hazardous state. Thus, adversarial detection techniques help to ensure DNNs are used only in contexts (sufficiently) comparable to those that have been previously validated.
Robustness relates to a DNN’s performance in the presence of previously unseen (or unknown) operating conditions, where the DNN’s output is expected to be consistent with sufficiently similar known conditions. Deviant conditions may cover any perturbation in input data for an AI system that is either malicious (e.g., jamming or image-spoofing) or inadvertent interference (e.g., environmental phenomena such as rainfall or fog). The discovery of adversarial examples [78] has demonstrated that DNNs can be highly sensitive to human-imperceptible noise and exploited into producing erroneous outputs by targeted data manipulation. Furthermore, research shows that the sensitivity of DNNs to latch onto superficial regularities in training data casts doubt on their ability to generalize to semantically valid abstractions [30]. Robust AI systems are expected to extrapolate and produce correct results for data that is reasonably different from training data.
To address the robustification of DNNs, techniques have been proposed to automatically generate synthetic data for retraining DNNs when real examples of adverse interference are absent (i.e., known unknown phenomena) [44, 80, 90, 94]. Typically, synthetic data of simulated interference is generated by transforming existing real-world data. Naïve techniques generate interference by adding random perturbations to given inputs (i.e., fuzzing) [57]. More sophisticated techniques use search-based methods to uncover interference patterns that maximize certain aspects of the AI system (e.g., neuron coverage, Kullback–Leiber divergence). Retraining DNNs with synthetically augmented data has been demonstrated to improve robustness [38].

2.2 Domain Adaptation

A key challenge addressed by this work is how to enable AI systems to operate across distinct domains. In this work, we consider a domain to be a subset of model inputs that share common attributes or characteristics and serve as a means to categorize training samples. A domain D is typically defined by three components: a feature (input) space X, a label (output) space Y, and a probability distribution \(p(x,y)\) [22]. The concept of domain adaptation was introduced by Ben-David et al. [6] in the context of natural language processing with the goal of effectively reusing an existing language model to identify malicious e-mail for a wide range of users. Domain Adaptation is needed when a neural network trained to perform a task on a dataset \(D_s\) (the source domain) may need to perform the same task on dataset \(D_t\) (the target domain) [6, 7]. More recent efforts have adopted domain adaptation as a technique for improving object classification and detection models [82]. Domain adaptation (DA) is closely related to transfer learning [84] but rather than reusing a learned model on a different task, DA reuses a learned model in a different, but closely related, domain. We assume that labels and feature spaces remain consistent across domains. The characteristics of each domain may be described at various levels of abstraction. For an image processing model, we can describe a set of unique domains based on environmental conditions (e.g., rainy, low-light), levels of noise (e.g., peak signal-to-noise ratio metrics), or resulting model behavior. It is important to note that the proposed framework is not limited to the provided examples.
Multi-domain training tools such as Enki [38] aim at increasing model robustness for a given target domain \(D_t\) by generating synthetic data belonging to \(D_t\) and retraining an existing model in this new domain. Enki inputs environmental conditions (e.g., the rain domain) and corresponding contexts (e.g., raindrop positions, appearance) and then uses an evolutionary algorithm to generate a diverse archive of environmental contexts, where diversity is defined with respect to system behavior. Enki can then be used to (i) assess the robustness of a DNN in a given domain, and (ii) robustify the DNN by retraining on the previously generated archive of diverse synthetic dataset.
Domain detection models such as the behavior oracles generated by Enlil [36] map incoming data samples \(x_i\) to a corresponding target domain \(D_t\). Enlil takes environmental conditions, corresponding contexts, and behavior category specifications as input for an evolutionary algorithm to generate diverse archives of environmental contexts for each behavior category. The generated archives can then be used to (i) assess model robustness for a given domain and then (ii) train a behavior oracle that predicts the behavior category of an incoming data sample. The behavior oracle can inform an autonomous system of its current operating context (domain) and enable more informed adaptation decisions (e.g., ensuring the applicability [and utility] of a given LEC for a given operating context).

2.3 Service-oriented Architecture for Robot Control

In order to manage heterogeneous hardware and enable software reuse in robotic applications, many developers implement the control logic of sensors and actuators as components of a robot middleware [19]. The Robot Operating System (ROS) [64] is an open source robot middleware that has been widely adopted by both academia and industry [35]. The fundamental elements of a ROS-based system are nodes, topics, and services. ROS enables the controlling algorithms for a single application to be divided into multiple independent processes (i.e., ROS nodes). ROS nodes can publish/subscribe to data unidirectionally through message buses (i.e., ROS topics) and also handle bidirectional request/reply interactions (i.e., ROS services). As a peer-to-peer network of nodes, a ROS-based system can be implemented over multiple processing units with a common registry service to facilitate communication between nodes (illustrated in Figure 1). Ultimately, ROS enables developers to abstract away from individual robotic components to focus on their software architecture.
Fig. 1.
Fig. 1. Typical ROS configuration [64]. Software for a ROS-based system executes ROS nodes over multiple onboard and offboard processors that communicate over a wireless bridge.
In order to facilitate systematic development of ROS-based systems, Malavolta et al. [45] empirically identified a set of guidelines (by data mining ROS projects and surveying experts in the field to identify best practices for developing ROS-based systems) to support developers in applying good design principles to meet quality requirements and mitigate common ROS-specific software problems. Importantly, ROS-based systems that follow these guidelines align well with software engineering principles such as modularity, composability, and reusability.

2.4 Goal-based Modeling

Early in the development process, requirement engineers must identify and specify the needs and constraints of the target system to be built. Requirement engineers work closely with stakeholders to identify their goals and objectives in order to create a set of requirements that orient and guide the development process to ensure the system being built will meet the needs of relevant parties. Stakeholder goals are often qualitative in nature (e.g., the system operates safely) and difficult to formalize for automatic satisfaction guarantees. Furthermore, specifying the requirements becomes increasingly difficult when working with cyberphysical systems because of the inherent uncertainty present in unknown environments [12].
Goal-oriented requirements engineering techniques such as KAOS [81] have emerged in response to the aforementioned challenges to enable rigorous requirements specification. KAOS provides a goal-based approach to model system objectives and hierarchically decompose high-level goals into leaf-level requirements of the system [36], represented by a directed acyclic graph. Figure 2(a) presents a graphical depiction of objects in the KAOS goal model ecosystem. KAOS Goals are declarative statements describing objectives that the system under consideration should achieve [41]. Goals may exist at various levels of abstraction and the decomposition of high-level goals into lower-level sub-goals is depicted with refinement arrows. As shown in Figure 2(b), KAOS supports two types of refinements, AND-refinement and OR-refinement. A parent goal can only be satisfied if the boolean conjunction of its children evaluates to true in the case of AND-refinement, or if the boolean disjunction of its children evaluates to true in the case of OR-refinement. KAOS also supports obstacles, defined as any behavior or goal that prevents the satisfaction of another goal [41]. Obstacles may be resolved by including resolution goals that provide alternative ways a blocked goal may be achieved given the presence of an obstacle. Previously, we extended KAOS goal modeling to include utility functions [13] (i.e., functions that map system attributes to Boolean or real scalar values [36]) that are attached to goals to map attributes of the goal to quantifiable metrics to enable design- and run-time assessment of system behavior with respect to requirements satisfaction [65]. Finally, KAOS supports agents, represented by white hexagons, and defined as entities responsible for achieving system requirements and overcoming obstacles. Both the human and non-human components of a system can be represented as agents in KAOS. The combination of these entities into a logical structure enables developers to refine high-level goals into low-level requirements and explicitly define the intended behavior and functionality of the system to be built.
Fig. 2.
Fig. 2. Overview of KAOS Goal Model Notation 2(a) and KAOS refinement types 2(b).

2.5 Uncertainty-aware Requirements Specification Languages

One challenge that arises when specifying the requirements of an autonomous system is how to strategically address environmental uncertainty. If requirements are too rigid, then the system may unnecessarily reconfigure and/or enter a failure mode, thereby preventing successful mission completion. To this end, requirements specification languages such as RELAX [86] and FLAGS [4] have been proposed to explicitly account for various sources of uncertainty by adding flexibility to system requirements. Developers can use the RELAX language to formally define and extend existing goal models to account for sources of uncertainty by “relaxing” system requirements through a set of RELAX operators. An overview of the RELAX language and its corresponding definitions are presented in Table 1. The semantics of the RELAX language have been specified in terms of a set of fuzzy logic propositions. Correspondingly, RELAX-ed requirements can be annotated with fuzzy-logic based utility functions in the KAOS goal model [65]. While KAOS obstacles are useful for identifying what factors may cause a goal to become violated, RELAX enables a developer to specify the tolerable impact of uncertainty on requirements satisficement, rather than identifying the specific causes/sources of uncertainty.
Table 1.
RELAX OperatorDescription
Modal
SHALLA requirement must hold.
MAY...ORA requirement specifies one or more alternatives.
Temporal
EVENTUALLYThe requirement that must hold eventually.
UNTILA requirement must hold until a future position.
BEFORE/AFTERA requirement must hold before or after a particular event.
AS EARLY AS POSSIBLEA requirement specifies something that should hold as soon as possible.
AS LATE AS POSSIBLEA requirement specifies something that should be delayed as long as possible.
AS CLOSE AS POSSIBLE TO [frequency t]A requirement specifies something that happens repeatedly, though the frequency may be relaxed.
Ordinal
AS FEW/MANY AS POSSIBLEA requirement specifies a countable quantity, though the exact count may be relaxed.
AS CLOSE AS POSSIBLE TO [quantity q]A requirement specifies a countable quantity, though the exact count may be relaxed.
Table 1. Overview of Relax Vocabulary [66, 86]

2.6 Self-managing Systems

The concept of autonomic computing has become more commonly used with increasing system complexity in deployed software systems that must operate continuously, even under uncertain conditions [34]. Systems comprising numerous interconnected components can be difficult to configure and maintain. Autonomic computing proposes that such systems should manage themselves according to high-level objectives provided by system administrators [34]. These self-managing systems commonly use a feedback controller (i.e., autonomic manager) to observe and adapt managed components of the larger system [8]. Figure 3 illustrates a common realization of an autonomic manager called the Monitor-Analyze-Plan-Execute over a Knowledge base (MAPE-K) loop [34]. A MAPE-K loop comprises steps to monitor system components, analyze the system state, plan what adaptive actions need to be taken to maintain optimal performance, and execute the plan to realize the corresponding system reconfiguration. Adaptation tactics are methods for adaptation [14]. Each tactic has pre and post-conditions and a set of actions to realize an adaptation [14]. A shared knowledge base acts as a repository for any data that can inform each MAPE-K step (e.g., adaptation goals, tactics). For autonomous systems, a MAPE-K controller can automate system adaptations to achieve optimal performance in response to changing environments.
Fig. 3.
Fig. 3. High-level depiction of a MAPE-K autonomic manager to monitor, analyze, plan, and execute reconfigurations of managed components.

3 Methodology

This section provides a high-level overview of the Anunnaki framework. The aggregate collection of services coordinated by the Anunnaki framework collectively manage the operation of LECs in the presence of uncertain conditions in order to mitigate faults resulting from their use in untrusted conditions. Figure 4 depicts the major processes within the Anunnaki framework with a data flow diagram (DFD), where processes are depicted as interconnected circles. Rectangles depict systems external to the Anunnaki framework. Labeled arrows show data flow between processes, and persistent data stores are shown within parallel lines. Each process shown in Figure 4 is a separate service executed in parallel and independent of the managed AI system. After introducing our terrestrial demonstration platform that is used as a running example, the remainder of this section describes each of these processes.
Fig. 4.
Fig. 4. A high-level DFD of the Anunnaki framework, comprising internal processes (circles) and external systems (rectangles). Labeled arrows show data transmitted between processes, with persistent data stores bound by parallel lines.
Terrestrial Demonstration Platform. As a demonstration platform, we consider an autonomous rover as shown in Figure 5, which comprises subsystems for navigation (i.e., obstacle detection and avoidance), pedestrian communication (i.e., via light and sound signals), and remote monitoring/control. The rover’s subsystems are supported by sensors that include a forward-facing camera, an ultrasonic range finder, and a touch-sensitive bumper. The rover can be controlled either autonomously or manually by a remote operator. When operating autonomously, the rover uses both an ultrasonic range finder and a vision-based object detector to detect obstacles and avoid collisions. In this instance, the entire rover is considered as an LES, while the onboard camera and associated object detection models constitute individual LECs.
Fig. 5.
Fig. 5. For demonstration, an autonomous rover has been assembled to explore deep learning on embedded systems. Sensors include a camera and an ultrasonic range finder.

3.1 Goal Modeling

Anunnaki requires goal models described in the KAOS [81] format to specify the expected system requirements. This section describes the development and run-time monitoring of KAOS goal models to address the robustness and resilience of LESs using the Anunnaki framework and its aggregate services.

3.1.1 Constructing Goal Models.

Our running example is an autonomous rover equipped with a vision-based object detector that must navigate (i.e., detect and avoid obstacles) through an environment to fulfill a mission objective where safety is a top priority. In order to produce a set of requirements outlining the intended functionality of the system, many aspects of the current system need to be considered (e.g., mission objective, operating domain, sources of uncertainty) For example, when the rover is operating autonomously, we want to ensure the system can detect and warn nearby pedestrians. We may also want the rover to detect when such capabilities become degraded in order to trigger a fail-safe mechanism. These requirements can be explicitly defined via KAOS goal models. A corresponding KAOS goal model is shown in Figure 6, comprising system objectives for the managed learning-enabled rover. Blue parallelograms represent system goals (e.g., G12: “Rover warns nearby pedestrians.”). Any potential hazards or obstacles that could prevent the satisfaction of a goal are shown as red parallelograms (e.g., O1: “Object detector is degraded/compromised.”). The Anunnaki framework includes a microservice that analyzes utility functions attached At the leaf-level, agents are shown as white hexagons to indicate which system components are responsible for achieving associated goals (e.g., A1: controller, A2: camera, A3: ultrasonic sensor).
Fig. 6.
Fig. 6. An example KAOS goal model to graphically depict system requirements of a robot rover as a hierarchy of logically interconnected goals. Blue parallelograms represent system goals and red parallelograms represent potential obstacles to the satisfaction of goals. White hexagons represent system components responsible for achieving leaf-level goals. Agents can be associated with specific message topics to inform the Utu monitor process. Yellow ellipses represent utility functions when attached to parallelograms and message topics when attached to hexagons.

3.1.2 RELAX-ing Requirements.

In order to account for the potential impact of environmental uncertainty and increase system flexibility at run time, we extended Anunnaki to support goal models that include RELAX [86] goals. Consider goal G6 in Figure 7(a), which specifies a minimum allowed sample rate for the rover’s onboard ultrasonic sensor. At run time, several external and internal conditions may impact the update frequency of the ultrasonic sensor. In practice, our system should be flexible to certain variations in operating conditions within some acceptable range to avoid unnecessary fail-safe procedures. Figure 7(b) presents goal G6 after the requirement has been “relaxed”. Originally, the associated utility function would return 0 or 1, indicating if the sensor update frequency f was above an acceptable value (\(f \ge 5.0\)). The new RELAX-ed requirement now returns 0.0 when \(f \le 4.5\), 1.0 when \(f \ge 5.0\), and \(\frac{(4.5-f)}{0.5}\) when \(f \in (4.5,5.0)\) (Figure 7(c)). This RELAXation provides greater system flexibility to sources of uncertainty by increasing the fidelity of observable system properties.
Fig. 7.
Fig. 7. Requirements for the rover’s ultrasonic sensor (goal G6) has been “relaxed” to account for uncertainty resulting from variable external sensor throughput. This increased flexibility prevents unnecessary fail-safe adaptation procedures while maintaining safety assurance.

3.2 Resiliency Through Predictive Behavior

In this work, we consider a system to be resilient if it can mitigate different sources of uncertainty to maintain safe behavior [40]. To this end, the Anunnaki framework leverages model inference and behavior models of an LEC to support domain detection (Figure 4, Steps 1 and 2). Adverse interference can include any malicious noise or environmental phenomena that result in undesirable behavior from an LEC. Behavior models are used to predict the impact of adverse conditions absent from existing training/validation data, thus enabling the Anunnaki framework to prevent the use of LECs under conditions they would normally perform unreliably (e.g., poor lighting conditions). As an abstract service, the Domain Detection service (Figure 4, Step 1) can implement any behavior modeling technique that takes raw sensor data and detects the presence of adverse interference.
One example model inference method for domain detection is Enlil [39], that constructs behavior models of an LEC by assessing the impact of various environmental phenomena within an external simulator. Enlil generates a behavior model that can be executed, independent of the LEC as a behavior oracle. One or more behavior oracles can run in parallel to the managed AI system and subscribe to the same sensor data received by managed LECs. As sensor data is received, behavior oracles output behavior assessments (Figure 4, Step 1), which include both a perceived context for any apparent adversarial noise and an inferred behavior category to summarize the impact of the adversarial noise. As adversarial detection services, behavior oracles publish behavior assessments to any other subscribing service, thus enabling the Anunnaki framework to detect and respond to adverse run-time conditions.

3.3 Robustifying Learning Models

To address the robustness of LECs, the Anunnaki framework can use robustified alternate learning models, created through any multi-domain training techniques, such as adversarial training, or using data synthetically augmented to include adverse phenomena (e.g., rain, fog). For example, Enki is a method proposed for robustifying LECs to known unknown adverse environmental phenomena [38]. Using Enki, robust learning models are generated by running a simulator to uncover examples of adverse phenomena that lead to a diverse array of behavior patterns for the given LEC. The diverse collection of adversarial examples are then used to retrain the default learning model [38].
At run time, a Learning Model Manager service (Figure 4, Step 2) enables the managed AI system’s LECs to swap default learning models with alternative, robustified learning models created through adversarial training (e.g., Enki). This service-oriented approach enables separate learning models to be robustified with respect to specific forms of adverse interference, and applicable learning models can be swapped in based on the behavior oracle’s assessment of run-time contexts. When no adverse interference is detected, the default learning model is activated. By decoupling the problem of robustification from a single learning model to separate, independent learning models, the Anunnaki framework enables more flexibility to the developers on what forms of adverse phenomena are addressed by any given implementation of the managed AI system. Furthermore, this approach enables developers to maintain and augment specific context-dependent models without needing to retrain and validate the base learning model. For example, if rainy environments are a concern for an LEC, an additional robustified learning model can be provided to the Anunnaki framework at run time to handle rain without needing to retrain/validate the default learning model. Furthermore, additional robustified models can be created for alternative phenomena (e.g., foggy weather, poor lighting) that are also independent from each other and the default learning model. Thus, the Anunnaki framework provides a modular and composable solution to robustifying LECs.

3.4 Run-time Monitoring and Management

To monitor and control the managed AI system, the Anunnaki framework uses Utu to monitor, analyze, and reconfigure the use of LECs in response to uncertain environmental conditions (Figure 4, Step 3). In order to mitigate faults from the use of LECs in untrusted conditions, Utu assesses the run-time state of the managed AI system and issues reconfiguration requests in response to the run-time environment. Namely, Utu follows the MAPE-K model for autonomic management that comprises five separate services to support run time decision-making; see Figure 8 for a high-level description of each of these services. The remainder of this section describes how Anunnaki uses the Utu services to ensure run-time goal satisfaction.
Fig. 8.
Fig. 8. An overview of the five separate services of the MAPE-K loop that make up Utu.
Run-time Goal Monitoring. Utu inputs a KAOS goal model (Figure 8, Knowledge Manager Service) to analyze utility functions [16] associated with each goal and obstacle with logic propositions [70] to support run-time monitoring of goal model satisfaction (Figure 8, Monitor Service). For example, the utility function “A1.buzzer == true” is attached to goal G14. Thus, when the “buzzer” attribute of agent A1 is set to true, the goal G14 is evaluated as satisfied. Through the use of utility functions, the Anunnaki framework can interpret a KAOS goal model as a logic tree of run-time system checks to determine the satisfaction of high-level system objectives (Figure 8, Analyze Service). For example, Figure 9 shows a logic tree interpretation of the KAOS goal model in Figure 6. The Anunnaki framework also extends goal models by enabling message channels to be associated with each agent to specify which channels each respective agent publishes state data. For example, the message channel “/utu/oracle/output” is attached to agent A4, indicating that attributes for the behavior oracle can be monitored by observing the corresponding message channel. These extensions enable developers to map the same goal model to different platforms by simply redefining the associated message channels and system attributes.
Fig. 9.
Fig. 9. A logic tree representation of the KAOS goal model in Figure 6. The Anunnaki framework automatically parses and interprets goal models as logic trees of utility functions for run-time evaluation of goal satisfaction.
Run-time Mitigation. Utu also takes a predefined set of tactics to determine what actions should be taken to mitigate faults resulting from violated goal models [14] (Figure 8, Plan Service). Because system objectives and tactics are not hard-coded into Utu, but, instead, are model-driven, the Anunnaki framework can be deployed with alternative goals and adaptation tactics by simply re-instantiating Utu at design time, with new goal models and tactics. Figure 10 shows an example adaptation tactic [14], specified in an Extensible Markup Language (XML) format. Tactics are defined with a set of preconditions, actions, and postconditions. In the given example, a “fail-safe” tactic is defined with a precondition to trigger when G3 in Figure 6 is found to be unsatisfied. For the example fail-safe tactic, the actions are to (1) request a mode-change to “manual” mode for the rover and (2) e-mail a notification to the user. Finally, a postcondition is given in the example to state that goal G3 is expected to be satisfied upon execution of the given actions. Thus, when a reconfiguration is needed due to goal violations, Utu realizes the specific actions defined by the corresponding adaptation tactic (Figure 8, Execute Service) to ensure continued goal satisfaction at run time.
Fig. 10.
Fig. 10. Example adaptation tactic. This “fail-safe” tactic triggers when precondition goal G3 (from Figure 6) is unsatisfied. Actions include a request to switch the managed system to “manual” mode and to notify the user. The postcondition states that goal G3 is expected to be satisfied upon completion.

4 Demonstration

To demonstrate use of the Anunnaki framework and the Utu autonomic manager to develop Trusted AI, we have implemented two autonomous cyberphysical systems, one terrestrial and another one that is aerial, both of which operate in environments with uncertain run-time conditions. This section describes the implementation of these systems, the potential impact of known unknowns on their LECs, and how the Anunnaki framework may be used to mitigate faults from using an LEC in the presence of adverse conditions. Our demonstration addresses the following research questions:
RQ1:
Is it possible to use a modular approach to support the automated assessment and improvement of the robustness and resilience of LESs?
RQ2:
Is the Anunnaki framework data and model agnostic?

4.1 Autonomous Rover Case Study

This section describes how the Anunnaki framework has been implemented for the autonomous rover presented in Section 3. First, we describe the hardware and software used in the rover case study. Next, we present results obtained during the implementation, instantiation, and execution of the Anunnaki framework (see Figure 4) and how they pertain to increased robustness to environmental uncertainty. Finally, we outline the implementation of the Utu autonomic manager and how the aggregate components of Anunnaki are integrated to provide a robust and resilient learning-enabled autonomous system.

4.1.1 Implementation of Autonomous Rover.

For demonstration purposes, a robotic rover has been assembled with a suite of sensors and actuators to enable autonomous behavior. Photographed in Figure 5, the dimensions of the rover are approximately \(30.5 \times 20.5 \times 22.0\) centimeters. The rover includes an NVIDIA Jetson Nano processor to support efficient onboard deep learning computations [23]. Control software for the rover is implemented using the Melodic [71] distribution of ROS packages.
In autonomous mode, the rover relies on computer vision to identify the types of obstacles present in its environment. The rover’s vision-based object detector is implemented as a RetinaNet [42] DNN, using PyTorch [62] deep learning libraries. The object detector has been trained to detect objects from two-dimensional images taken from the rover’s forward-facing camera. For each object detection, both a category label and bounding box are given to identify the type of object and what region of the image it covers.
To train and validate the object detector, 2,500 labeled images were manually collected using replica objects of both humans and deer scattered in the operating environment of the autonomous rover’s onboard camera. Two thousand images were reserved for training and 500 were reserved for only validation. The object detector was trained until its training error converged to a minimum (after 25 epochs). When evaluated against the reserved validation images, the object detector was found to correctly detect images of humans and deer with a precision of \(98.8\%\), a recall of \(94.8\%\), and an F-score of 96.8%.
Despite the promising results when testing the object detector with validation images, uncertainty remains with respect to the robustness and reliability of the object detector in the presence of phenomena missing from both training and validation images. For example, Figure 11 shows examples of the object detector’s performance in a variety of lighting conditions. In Figure 11(a), the impact of dimmed lighting is shown. As light intensity decreases from Figure 11(a)i. to 11(a)iii., the ability of the object detector diminishes. However, the exact threshold and conditions at which this degradation occurs is unknown. Similarly, in Figure 11(b), the ability of the object detector is degraded as a bright light source is introduced into the scene, either behind the camera (Figure 11(b)ii.) or behind the obstacles (Figure 11(b)iii.). Though the object detector has been observed to have a high precision and recall under known conditions, it remains unclear how it will perform in these known unknown conditions.
Fig. 11.
Fig. 11. Examples of real adverse phenomena for vision-based object detection. Objects are correctly identified in normal lighting ((a)i. and (b)i.). Detection is degraded in dim lighting ((a)ii. and (a)iii.). Detection is also degraded when a light is placed behind the camera ((b)ii.) or behind the objects ((b)iii.). The boundary leading to degraded performance is unknown.

4.1.2 Creating Behavior Oracles for Autonomous Rover.

Because the threshold between an acceptable environmental and unacceptable environmental condition (Figure 11(a)i. and 11(a)ii., respectively) is unknown for the rover’s object detector, we need a method to determine when resulting object detections can be trusted. The Anunnaki framework can leverage Enlil to create a behavior oracle to determine this threshold (Figure 4, Step 1). Enlil creates an oracle by automatically assessing the object detector’s performance boundaries under simulated environmental conditions. For example, Enlil can automatically assess the object detector’s performance under a range of hue, saturation, and lightness (HSL) conditions and create a behavior oracle to predict the object detector’s performance under any given HSL context. When additional known unknown phenomena are discovered (e.g., a raindrop occluding the camera’s view), additional behavior oracles can be generated to predict how the rover’s object detector will be impacted by each respective phenomenon.
The scatter plot in Figure 12(a) shows Enlil’s automated behavior assessments under a range of HSL contexts, with each point corresponding to a different context. Green points represent cases in which the object detector’s performance was not impacted (i.e., \(\lt\)5% decrease in the default object detector’s F1-score). Yellow points represent cases in which the object detector’s performance is degraded (i.e., \(\gt\)5% decrease in F-score). Red points represent cases in which the object detector’s performance is compromised (i.e., \(\gt\)10% decrease in F-score). From these results, Enlil can generate a behavior oracle that correctly predicts the behavior of the object detector under any HSL context with an 83% accuracy. Similarly, Figure 12(b) shows Enlil’s assessments of the object detector’s performance when its view has been occluded by raindrops placed on the camera lens, where raindrop_x and raindrop_y represent the (center) position of a raindrop within an image, and raindrop_radius represents the size of the raindrop. Enlil can generate a behavior oracle that correctly predicts the impact of a raindrop occluding the view of the rover’s object detector with an 87% accuracy. The Anunnaki framework can leverage these behavior oracles to prevent the rover from relying on its object detector under environmental conditions in which it is expected to fail.
Fig. 12.
Fig. 12. Scatter plots of Enlil’s automatic assessment of an object detector’s response to HSL variations (a) and raindrop occlusion (b). Points represent unique contexts of the respective phenomena, including acceptable (green), degraded (yellow), and fully compromised (red) conditions. With this data, Enlil creates behavior oracles for each respective phenomena.

4.1.3 Creating Robustified Learning Models for Autonomous Rover.

Instead of updating the rover’s object detector to be robust to all environmental conditions, our approach is to create a range of context-dependent operational modes, with separate DNNs robustified for each respective known unknown phenomenon. The Anunnaki framework uses Enki to create these robustified DNNs. When exposing the default object detector to a random sampling of HSL variations, we found that its F-score decreased from 96.8% to only 2%. This significant decrease demonstrates that the object detector is not sufficiently robust to different lighting conditions. However, using Enki to generate diverse synthetic data, we were able to retrain the default learning model, to create a robustified version of the object detector’s DNN that achieves an F-score of 60.7% under random HSL variations. Similarly, we found that the default object detector was not very robust to raindrop occlusion, observing that its F-score decreased from 96.8% to 5% when evaluated with a random sampling of occluding raindrops. Using Enki, we were able to train and create a separate DNN more robust to raindrops, with an F-score of 87% for random raindrops. Under both environmental contexts (i.e., HSL variations and raindrop occlusions), we observe a significant decrease in F1-scores when evaluating the default object detectors. This decrease in performance can be explained by an increase in false negative predictions. Namely, as environmental conditions distort sensor inputs, the DNN fails to detect objects in the scene as it was never exposed to the observed variations during training. By using separate DNNs that target each respective environmental phenomenon, the integrity of the default object detector is preserved (i.e., it is not influenced by Enki or any synthetic data). However, if an adverse condition is uncovered at run time and the object detector’s default DNN is expected to fail, then the corresponding robustified DNN created by Enki can be used in place of the default DNN. Switching in different learning models to handle the changing environmental conditions is analogous to the mode-changing commonly used in adaptive automotive systems and with traditional transportation systems [69].

4.1.4 Implementing Anunnaki Services for Autonomous Rover.

The Anunnaki framework has been implemented with ROS. Each of the services depicted in Figure 4 (e.g., Step 1: Domain Detection, Step 2: Learning Model Manager, Step 3a: Knowledge Manager) can be instantiated as separate ROS nodes within a single Anunnaki ROS package. Figure 13 provides a graph of Anunnaki ROS nodes (shown as ellipses) and ROS message topics (shown as rectangles) used for communication.
Fig. 13.
Fig. 13. Anunnaki ROS graph of ROS nodes (ellipses) and ROS topics/services (rectangles). Anunnaki nodes dynamically publish/subscribe to topics of the managed LES by referencing agents found in given goal models.
When executed on the same network as the autonomous rover, Anunnaki ROS nodes can publish and subscribe to ROS topics provided by the rover in order to monitor and reconfigure the behavior of the rover. ROS nodes are instantiated for each behavior oracle created by Enlil (e.g., /adv_detector in Figure 13). As ROS nodes, behavior oracles can continuously monitor the rover’s sensor data and predict how the object detector will perform at run time, publishing behavior assessments to any ROS node on the same network. Separate ROS nodes are also instantiated for each Utu MAPE-K step (e.g., /utu_monitor, /utu_analyze, /utu_plan, /utu_execute, and /utu_knowledge). The /utu_monitor node monitors ROS message traffic published by the rover and any behavior oracles that have been instantiated. The /utu_analyze node evaluates the active goal model and selects an adaptation tactic when the goal model is not satisfied. The /utu_plan and /utu_execute nodes then translate the selected tactic into ROS messages that can be published to the rover or into ROS services that can be requested from the rover. Additionally, a /lm_manager node is instantiated to handle the swapping of Enki learning models when an adaptation tactic requests a robustified model should be substituted for the default learning model. Thus, the Anunnaki framework is realized as a package of coordinated ROS node services that automatically monitor and control the rover’s object detector with respect to user-defined goal models.
Modularity-driven Approach for ROS-based Systems. In order to illustrate the benefits (i.e., modularity, composability, and reusability) of Anunnaki, we have identified a subset of the ROS-based architectural guidelines proposed by Malavolta et al. [46] that are reflected in our ROS-based implementation of the framework. While we did not have access to these guidelines during the development process, a retrospective analysis of our codebase’s alignment indicates that our ROS-based implementation of Anunnaki is well-aligned with a majority of them, particularly those that support software engineering principles. Table 2 provides an overview of the identified guidelines as manifested in our implementation, including the guideline ID and the guideline description. In most cases, it is sufficient to indicate that a particular guideline was used (denoted by a checkmark), but for a few guidelines that are a bit more nuanced or include more than one implementation option, we also provide a brief explanation of our design decision and its manifestation in our implementation. Furthermore, our ROS-based implementation of Anunnaki adheres to the guidelines most related to modularity (e.g., C2, N1-N4, I2), maintainability (e.g., N9, I1, I6, H1), and robustness (e.g., C9, S1, S2, S4).
Table 2.
IDGuidelineRealization
Communication and networking (C)
C1Use standardized ROS message formats, possibly supporting also their legacy versions.
C2ROS nodes should be agnostic of underlying communication mechanisms.
C5Nodes that potentially produce/consume large amounts of messages should be configurable in terms of their publish/subscribe rates.
C6Selectively limit the data exchanged between nodes to provide only the information that is strictly necessary for completing tasks.
C8Develop adapter components when data exchanged between nodes is not compatible (semantically), incorrect, out-of-order, or redundant.
C9Use services when starting up robots (instead of publishing to topics) so that the status of the system can be checked before operation.
C11Frequent messages should be exchanged either via services with persistent connections or via topic-based communication.Frequent messages use topics.
C12Run multiple nodes in a single process when the overhead due to interprocess communication is too high both in terms of frequency of messages and payload.
C13Manage topics to avoid unnecessary publishing and subscribing.
Node responsibilities within the system (N)
N1Group nodes and interfaces into cohesive sets, each with its own responsibilities and well-defined dependencies.
N2Each ROS package should be responsible for one and only one feature of the system or robot capability and provide a well-defined interface.Packages are separated for Utu and terrestrial rover.
N3Decouple nodes with responsibilities that naturally work at different rates and use different rates for different purposes.Nodes can be configured at different rates independently.
N4By design, limit unnecessary computationally-heavy operations by carefully analyzing the execution scenarios across ROS nodes.
N5Transform data only when it is used, for efficiency in terms of computation and bandwidth.
N6Design each single node so that it is runnable (and testable) in isolation.
N8Use a dedicated node to store and represent globally-relevant data (e.g., the physical environment where the system operates) and use it as the single source of truth for all the other nodes in the system.The Knowledge Manager (see Figure 4, Step 3a) stores global information.
N9Keep the number of nodes as low as possible to support the basic execution scenarios and extend the architecture for managing corner cases.
Internal behavior of the nodes (B)
B2Nodes with high-frequency operations should be configurable so that they can operate according to available computational resources.
B5Nodes with configuration errors should fail explicitly at bringup time.
B6If a node is computationally expensive, then ensure that it only executes when it is strictly needed.
Interface to external users and third-party developers (I)
I1Assign meaningful names to architectural elements and group them by adopting standard prefixes/suffixes.
I2When possible, core algorithms, libraries, and other generic software components should be ROS-agnostic.Core algorithms (e.g., Enki, Enlil) are ROS-agnostic.
I6Logging should be standardized across the project and follow well-defined guidelines.
Interaction with hardware and other lower-level entities (H)
H1Nodes interacting with simulators and hardware devices should provide identical ROS messaging interfaces to the rest of the system.
H2When possible, design the system to be hardware-independent.
Safety-critical concerns (S)
S1ROS nodes should be resilient with respect to the amount and frequency of data received by sensors.ROS nodes will only process data at configured data rates, set upon instantiation.
S2Use different communication channels and different (hardware and software) platforms depending on the criticality and real-time requirements of the nodes.Deployment topology can be configured as necessary.
S4Provide at least one globally-reachable node capable of receiving run-stop messages and stopping/resetting the whole system.
Data persistence (P)
P1Avoid persisting raw data if only part of it will be used.Media is compressed to reduce overhead, also not persistent.
P3Use a dedicated node for persisting and querying long-term data.
Table 2. Overview of the ROS-based Guidelines [46] Manifested in the ROS-based Implementation of Anunnaki as Identified through Retrospective Analysis
To facilitate interpretability of the managed AI system, a graphical user interface (GUI) of the Anunnaki framework is provided for users to visually and dynamically observe the Utu MAPE-K controller at run time (see Figure 14 and Figure 15 for example scenarios corresponding to ideal and adverse conditions, respectively). Throughout system deployment, the instantiated GUI provides run-time visualizations to monitor system behavior, observe utility value measurements, and obtain explicit reasoning behind adaptive actions. Figure 14 shows an example of the autonomous rover operating in an ideal lighting condition, where the rover’s object detector can properly detect all pedestrians. In Figure 14, the Anunnaki GUI displays the state of each Utu MAPE-K step, the output of each behavior oracle, and the current evaluation of the active goal model (from Figure 6). The goal model is shown as a logic tree of goals, each of which has an associated utility function. At run time, individual goals are highlighted in green when satisfied and red when unsatisfied. In Figure 14, the behavior oracle predicts that the current environment has no adverse impact on the object detector (i.e., Category 0). Thus, the overall goal model in Figure 14 is satisfied (i.e., root goal G1 is green), and no adaptation is selected to reconfigure the rover. In contrast, Figure 15 shows an example of the rover operating in a dim lighting condition, where the rover’s object detector fails to recognize two of the pedestrians in front of the rover. The output of the behavior oracle in Figure 15 indicates that the object detector is degraded (i.e., Category 1). The resulting evaluation shows that the goal model shows that is unsatisfied (i.e., root goal G1 is red), and therefore the “fail-safe” tactic from Figure 10 is executed to switch the rover from autonomous operation to a manual mode. Thus, the Anunnaki framework can prevent a pedestrian collision that would otherwise result from the use of the rover’s object detector in dim lighting.
Fig. 14.
Fig. 14. Anunnaki monitoring an autonomous rover with a GUI to show the state of each service. Utu evaluates the goal model in Figure 9, highlighting satisfied (green) and unsatisfied (red) goals. A behavior oracle detects no adverse interference (Cat. 0) and the overall goal model is satisfied.
Fig. 15.
Fig. 15. Anunnaki reconfigures the rover to prevent use of its object detector in poor lighting with a GUI to show state of each service. Utu evaluates the goal model in Figure 9, highlighting satisfied (green) and unsatisfied (red) goals. The behavior oracle detects an adverse condition that degrades detection (Cat. 1), and a fail-safe tactic is executed to reconfigure the rover into manual operation.

4.2 Autonomous Unmanned Aerial Vehicle Case Study

To further illustrate how Anunnaki supports modularity, reusability, and extendability when developing learning-enabled autonomous systems, we present a second empirical study implementing the Anunnaki framework for use in an unmanned aerial vehicle (UAV). The new operating domain necessitates a new set of safety requirements, a different LEC architecture, and a different goal model when compared with the autonomous rover. To assess Anunnaki in a UAV, we implemented a simulation environment and UAV controller using the Webots autonomous vehicle simulator [49]. A 3D render of the UAV model used in our demonstration is provided in Figure 16. The UAV’s sensors include a forward facing camera, altitude meter, and onboard GPS. The UAV can be controlled both manually and autonomously. A comprehensive overview of differences in Anunnaki framework instantiations and customizations for the respective applications (e.g., DNN type, hardware) is presented in Table 3, where the shaded rows indicate application differences.
Table 3.
FeatureRoverUAV
PlatformJetson NanoWEBOTS Robot
Object DetectorRetinaNetYoloV5
Training DataCustomVisDrone
Application TypeTerrestrialAerial
Multi-domain TrainingEnkiEnki
Domain DetectionEnlilEnlil
Autonomic ManagerUtuUtu
Table 3. Comparison between the Managed Rover and UAV Autonomous Systems
Grey shading indicates application differences. This table highlights the Anunnaki framework’s support for reusability and extensibility, both of which facilitate rigorous development of trusted autonomous system software.
Fig. 16.
Fig. 16. For the UAV case study, we use a replica of the commercial-grade DJI Mavic 2 Pro Drone [50] provided by the WEBOTS simulation software.
When operating autonomously, the UAV relies on computer vision to perceive its environment via an onboard camera. In contrast to the autonomous rover, the UAV’s object detector is implemented as a YOLOv5 [31] DNN architecture. Previously, Zhan et al. [92] used this architecture successfully for onboard UAV image detection. YOLOv5 offers faster inference times than other single-shot detection models such as RetinaNet [79], thus making YOLOv5 a popular choice for real-time object detection on mobile devices with limited resources. The greater inference speed comes at the cost of lower overall accuracy due to localization errors. To train and validate the UAV’s object detector, we use the professional-grade VisDrone-2019 dataset [96]. This dataset consists of 6,471 training images with 343,205 object labels; 548 validation images with 38,800 labels; and 1,060 unlabeled test images. Each labeled object belongs to one of the following 10 classes: pedestrian, people, bicycle, car, van, truck, tricycle, awning-tricycle, bus, and motor. We use the pretrained weights provided by the open source YOLOv5 implementation [31] and train our model until convergence. When evaluated against the unseen validation sample, the model correctly identified labeled bounding boxes for the 10 classes with a precision of \(60.6\%\), recall of \(62.5\%\), and an F-score of \(61.3\%\).

4.2.1 Creating Behavior Oracles for Autonomous UAV.

The UAV may encounter certain environmental conditions that would deem its perception mechanism untrustworthy. When such known unknowns are encountered, it is important for the UAV to recognize potential performance impacts and implement mitigation procedures. Although the UAV’s perception mechanism is based on a different architecture and trained on a different dataset, the Anunnaki framework enables significant development resource reductions due to code reusability. We demonstrate this feature of Anunnaki by reusing Enlil to create behavior oracles for the UAV. We set up communication channels between the default learning model and the domain detection microservice and automatically generate behavior oracles for a target operating domain. Figure 17(a) displays an example of a training sample after a randomly sampled fog transformation has been applied to it. Figure 17(b) displays a scatter plot of Enlil’s automated behavior assessment under a range of fog contexts, where fog_density represents the number of fog layers (i.e., “depth”), and fog_intensity represents the opacity of each fog layer, where red points indicate DNN failure, yellow points indicate DNN degradation, and green points indicate default DNN behavior. The set of diverse environmental contexts generated by Enlil is used to train a behavior oracle to predict expected object detector degradation (\(\gt 5\%\) decrease in F1-score) and object detector failure (\(\gt 10\%\) decrease in F1-score) with an accuracy of 74\(\%\).
Fig. 17.
Fig. 17. Figure 17(a) shows an operational context discovered by Enlil applied to a random training sample. Figure 17(b) shows a scatter plots of Enlil’s automatic assessment of the UAV’s object detector’s response to the generated archive of operational contexts.

4.2.2 Creating Robustified Learning Models for Autonomous UAV.

The aerial domain poses unique challenges (e.g., heavy winds, extreme cold, atmospheric clouds) for learning-enabled autonomous systems, specifically UAVs [26]. To mitigate uncertainty from adverse environmental phenomena, the Anunnaki framework makes use of Enki for (i) generating a diverse set of domain-specific environmental contexts, and (ii) generating a robustified DNN for the corresponding domain. In order to integrate Enki with the updated DNN architecture and dataset, we need only to configure context generation parameters, and set up communication channels by implementing a wrapper class for the YOLOv5 model and corresponding dataset. This modularity is a key feature of Anunnaki that reduces development costs while promoting increased system robustness. When the UAV’s default object detector is exposed to a combination of Enki-generated diverse weather phenomena (including adverse conditions), such as raindrops and varying brightness levels, the F-score decreases from \(61.3\%\) to \(27\%\). This decrease in performance demonstrates that the UAV’s default object detector is not robust to known-unknown adverse weather phenomena. After assessing the default DNN’s robustness, we can use Enki to generate a robustified model for the above-described domain. Enki’s synthetic data retrains the default DNN on a random sample of Enki-generated raindrop and varying brightness level contexts, returning a new model with an improved F-score of \(42\%\). Likewise, we can use Enki’s data to assess the default DNN’s robustness when operating in the fog domain. When the default model is assessed in Enki-generated diverse fog contexts, the F-score decreases to \(39\%\), indicating the default model is not robust to the fog domain. To improve our managed AI system’s robustness to fog, we utilize Enki’s data to retrain the object detector on a random sample of fog contexts, which yields a robustified model with in an improved F-score of \(47\%\) for the fog domain.

4.2.3 Implementing Anunnaki Services for Autonomous UAV.

This section describes how the Utu elements and corresponding KAOS goal model have been instantiated and configured to deploy a resource delivery UAV (e.g., Goal Models, Autonomic Manager, Adaptation Tactics). Control software for the UAV was implemented using the ros-noetic distribution of ROS and Python3. Simulations were carried out on a computer running Ubuntu 20.04, with 32 GB of RAM, Intel I7 CPU, and 12 GB NVIDIA GTX 3060 GPU.
For demonstration purposes, we consider a scenario where a UAV must deliver resources to an area not accessible by ground vehicles. Such situations may arise during natural disasters such as floods, wildfires, and earthquakes. Although a location may not be safe for humans or autonomous terrestrial vehicles, stranded victims may require life-saving resources such as food, water, or medical supplies [1, 18]. To this end, UAVs can be used for delivering crucial supplies to the target locations [20, 72]. However, many sources of uncertainty must be considered in order to safely and securely deliver a package to a target location (e.g., environmental conditions, obstacles, battery power limitations). An example goal model for a supply delivery mission is shown in Figure 18. In contrast to the rover’s goal model (see Figure 6), the new goal model includes several RELAX-ed requirements (highlighted green), thus increasing system flexibility to known unknown sources of uncertainty. The top-level goal of this mission is represented by goal G1: “UAV successfully completes package delivery.” We consider two explicit obstacles that may prevent the satisfaction of the goal model, O1 (“Object detector compromised”), and O2 (“UAV has inadequate power level”).
Fig. 18.
Fig. 18. KAOS goal model for the managed UAV. This goal model is loaded into the UTU autonomic manager pre-deployment and is used to inform the system when a goal is violated and an adaptation is needed.
Our simulation has been designed as a way-point-directed delivery mission for the model UAV inside a Webots environment.4 We utilize the Utu GUI to monitor the mission at run time. To instantiate the GUI, we need only to configure the ROS-based communication channels that enable two-way communication with the simulation software and corresponding persistent data stores. The framework’s GUI depicts the aggregate run-time elements of Anunnaki (see Figure 19), including run-time monitoring and analysis of KAOS goal models, Utu MAPE-K elements, and behavior oracle decision, as well as a visualization of the UAV camera feed. A multi-pane display visually depicts the Utu MAPE-K, where the (i) Knowledge-pane shows KAOS agents and corresponding utility values, (ii) Monitor-pane shows each agent’s published topics, (iii) Analyze-pane shows goal violation patterns and available adaptation tactics, (iv) Plan-pane shows a priority-sorted queue of selected adaptation tactics, and an (v) Execute-pane shows published adaptation tactics. The Utu GUI also includes the (i) Oracle-pane that shows a predicted behavior category as obtained from the behavior oracle during the Monitor step, (ii) Goal Model-pane that shows the KAOS goal model with satisfied goals highlighted green, and violated goals highlighted red, and (iii) Camera Feed-pane that shows real-time image and object-detection data, as published by the managed system.
Fig. 19.
Fig. 19. Anunnaki monitors the autonomous UAV at run time and a GUI shows the state of each service. Utu evaluates the goal model in Figure 18, highlighting satisfied (green) and unsatisfied (red) goals. A behavior oracle detects no adverse interference (Cat. 0) and the overall goal model is satisfied.
Figures 19 and 20 depict run-time snapshots of ideal and adverse conditions, respectively, as observed via the Utu GUI during an execution of a UAV simulated mission. Early in the mission, no adverse environmental conditions are present, and the UAV’s top-level goal remains satisfied. Figure 19 displays a live camera-feed of ideal conditions and a satisfied goal model with the top-level goal G1 highlighted green. As the UAV approaches the target location, we dynamically introduce a synthetic fog effect to demonstrate the adaptive capabilities of the Anunnaki managed system. Figure 20 displays the impact of the synthetic fog on the live camera-feed, and an unsatisfied goal model with the top-level goal G1 highlighted red. During the monitor phase, Anunnaki’s domain detector correctly identifies object detector failure. The Analyze node uses the updated utility values to obtain a corresponding adaptation tactic (Failsafe F-1). Utu sends a signal to the UAV in real time to implement Failsafe F-1, mitigating adverse weather performance impacts.
Fig. 20.
Fig. 20. Anunnaki continues monitoring the autonomous UAV and a behavior oracle detects adverse interference (Cat. 2) leading to a violation of the top-level goal G1 (now highlighted red). Utu analyzes the updated utility values to plan and execute the Failsafe-F1 adaptation tactic autonomously.

5 Discussion

An increased use of LECs for safety-critical tasks requires rigorous software engineering principles to support the deployment of trustworthy systems. A monolithic system may fail to address increasing safety concerns due to the need for context-dependent implementations of uncertainty mitigation techniques. Additional changes in hardware, software, and run-time environments can require extensive updates to existing codebases to provide continued safety assurance. We have applied the Anunnaki framework in two learning-enabled autonomous systems with practical applications to illustrate how the proposed framework may be used in practice. The remainder of this section considers our demonstration results in the context of RQ1 and RQ2.

5.1 (RQ1) Modular Approach for Trusted AI

A goal of this work was to explore how fundamental principles of software design (e.g., modularity, composability, reusability) [5, 32] can be used to develop tools/techniques to address trusted AI concerns. A key feature of modular systems is their flexibility and ability to manage complexity and uncertainty [3], both of which apply when attempting to assess and improve robustness and resilience of LESs. Namely, a modular system comprises hierarchical units that are well-defined, have high cohesion (internal interconnectedness), and have low coupling (units are independent of other units) [3]. Additionally, previous work in service-oriented architecture [52, 83] has demonstrated how decomposing a monolithic application into smaller individual services that can be developed, monitored, and reconfigured independently, has led to greater system modularity. By constructing the Anunnaki framework following key tactics outlined by Bass et al. [5], Johnson et al. [32], and Baldwin et al. [3] (e.g., low-coupling, service-oriented architecture, interoperability), the Anunnaki framework provides a core set of application-independent configurable services with support for future modifications, thereby making it well aligned with the definition of modular architecture [3]. In additional to the structure of a software system, Baldwin et al. explain that modular software should implicitly support a set of actions or operations that are unique to modular systems; these include: substitution, augmentation, and porting. We further explore how Anunnaki supports each of these modular operations to address trusted AI concerns next. Specifically, Anunnaki targets two dimensions of trust: robustness and resilience, both of which can be addressed in a variety of different ways for different sources of uncertainty. In our first demonstration, the autonomous terrestrial rover’s goal model (see Figure 6) used standard utility functions [65] to inform the autonomic manager of goal violations. However, when greater flexibility to sources of uncertainty was needed for the autonomous UAV, a new microservice was added to Anunnaki to support the RELAX language (applying modularity’s support for augmentation and substitution), thus enabling run-time management of an LES with a goal model containing RELAX-ed goals (see Figure 18). Additionally, we have shown how developers can reuse Anunnaki services to address assurance concerns for different sources of uncertainty. Namely, Anunnaki enabled run-time adaptations (e.g., swapping the active learning model, executing a fail-safe tactic) for an autonomous terrestrial rover exposed to poor lighting conditions and for an autonomous UAV exposed to fog conditions. We anticipated the autonomous terrestrial rover might encounter poor lighting conditions and leveraged the Enlil domain detection service to improve system resilience to this phenomenon. To implement a similar service for a completely different use case (i.e., autonomous UAV), we only needed to change Enlil parameters (e.g., behavior category specification, weather phenomena specifications, model and dataset addresses), thus saving a significant amount of development time (benefiting from modularity’s support for porting). This modification illustrated the low-coupling of the domain detection module as the implemented changes did not impact the majority of the framework. The Anunnaki model-driven framework and its aggregate microservices facilitated modular operations to improve LES robustness to interference and resilience with respect to known unknown sources of uncertainty (i.e., lighting variations, rain, fog conditions). Therefore, we have shown that a modular approach can be used to support the automated assessment and improvement of the robustness and resilience of LESs (RQ1).

5.2 (RQ2) Data and Model-agnostic Approach to Robustness and Resilience

A system that can be used for developing and run-time managing LESs for diverse applications and their respective data sets without needing to rebuild/retrain the entire system is considered data and model-agnostic [87, 93]. To this end, we have shown how the Anunnaki framework enables run-time adaptation in response to adverse environmental conditions for two managed AI systems deployed for two different applications. In the first case study, we leveraged the Enki microservice to generate specialized RetinaNet models with increased robustness to HSL lighting variations for our custom real-world dataset. When the autonomous rover encountered adverse lighting conditions, Utu’s autonomic manager used an Enlil behavior oracle to detect LEC degradation and reconfigure the system to execute a fail-safe procedure. In the second case study, we demonstrated how Enki was used to generate specialized YoloV5 models with increased robustness to fog conditions for the VisDrone dataset. We also demonstrated how Utu used an Enlil behavior oracle to detect the presence of LEC failure and then automatically reconfigured the managed UAV to apply a fail-safe procedure preventing potential mission failure. Anunnaki supported the generation of robustified models and the detection of the operating domain for each set of uncertainties through isolated configuration changes to corresponding microservices. Therefore, we have shown that Anunnaki is data and model-agnostic as each system’s LEC relied on a different set of learning models with distinct DNN architectures and different datasets (RQ2).

6 Related Work

This section overviews related work in developing trustworthy AI for use in self-adaptive systems.

6.1 Trustworthy AI

Many previous works propose techniques to address trustworthy AI concerns such as robustness [58, 60, 89, 95] and/or resilience [24, 53] with respect to environmental uncertainty. For example, DeepCert [58] uses formally defined environmental contexts and image perturbation levels to verify contextually-dependent DNN robustness. DeepCert further supports the selection of the best DNN from a set of developer-provided models for a given operational context. Given the large variety of tools that can support trusted AI concerns (e.g., Enki [38], Enlil [39], DeepRoad [95], DeepXplore [60], BESTEST [24]), Anunnaki is intended as a loosely-coupled collection of services rather than a fixed set of hard-coded tools. For example, to instantiate alternative services (e.g., replace Enki with DeepCert), developers need only configure the appropriate interface (e.g., published ROS messages) for the new service. Each service-type supported by Anunnaki can be interchanged with alternative techniques, to meet evolving stakeholder requirements, without requiring (potentially extensive) architecture or code changes.

6.2 Configurable Frameworks and Learning-enabled Systems

Several existing works propose configurable frameworks that use AI to support adaptive systems. For example, Weyns et al. [85] propose an architecture that uses AI to model sources of environmental uncertainty, a MAPE-K loop and user-provided goals to make adaptation decisions, and control theory to realize the selected adaptations. Specifically, Weyns et al. use DNNs to build models of sources of environmental uncertainty to enable adaptation decisions that are relevant to the current operating context. Caldas et al. [9] use AI to first optimize an algorithm that searches over a system’s adaptation space and then use AI to generate controller configurations that can realize the discovered adaptations. Jamshidi et al. [29] propose a technique that uses a simulator to generate a range operating contexts and then uses transfer learning to train a performance model that predicts an adaptive system’s performance in a given context. They show how the performance model can instantiate a knowledge base in a MAPE-K loop to inform context-dependent configuration changes. The learned performance model may be considered an implicit behavior oracle, and could potentially be used for domain detection in an instantiation of Anunnaki. Table 4 provides an overview of the specific differences between the related frameworks, where rows correspond to frameworks, columns represent features of the frameworks, and checkmarks indicate if the framework supports the respective feature. First, Table 4a overviews the role of AI in each framework. Next, Table 4b overviews design-time services supported by each framework, including the use of hierarchical goal models to represent both functional and non-functional system requirements and inform system adaptations (column D4.), and the use of optimization algorithms to generate adaptation configurations (column D5.). Model Robustification (column D3.) indicates a framework’s support for generating alternative robustified learning models to address different sources of uncertainty, which is a distinguishing feature of Anunnaki. Finally, Table 4c overviews run-time services supported by each framework. Model Inference (column R1.) indicates support for a service that can assess a managed LEC’s behavior and output behavioral assessments (e.g., perceived operating context, behavior category) to inform adaptation decisions. Model Management (column R2.) indicates support for switching the active learning model (e.g., switching the default learning model to a context-specific robustified model) in response to environmental uncertainty. Online Learning (column R3.) indicates support for incremental training or updating of LECs from incoming data in an online (i.e., during run-time) setting [74]. Quantifying Functional Goals (column R4.) signifies a framework’s ability to evaluate functional goal satisficement (e.g., via utility functions). A key difference between existing approaches and this work is the role of AI. Specifically, while previous work focuses on using AI to support one or more steps of the adaptation process for self-adaptive systems, this work focuses on using modularity and other foundational software engineering principles to address assurance for self-adaptive systems that contain one or more AI components.
Table 4.
Table 4. Comparison of Configurable Frameworks for Self-adaptive Systems

6.3 Threats to Validity

We consider several threats to the validity of our study as outlined in this section.
External. For the demonstration of the resource delivery UAV, we rely on several external sources. These include the open source YOLOv5 model, VisDrone dataset, and the Webot simulator and corresponding UAV model. We have reviewed the source code of the YOLOv5 model to confirm its implementation follows the theoretical architecture outlined in previous research [31, 92]. The dataset used for training has been used for validating state-of-the-art techniques and model training competitions [17], both of which make it useful as a benchmark dataset for aerial object detection. Likewise, the Webots simulator has been used for numerous robotics applications [15, 67].5 As such, we believe that the external software used poses a minimal risk to the validity of the obtained results.
Simulation. We note that there exists a reality gap between the simulated environments (i.e., Webots environment, Enki-generated environmental contexts and the physical world. The Webots simulator is well-regarded as a robotics testing platform enabling us to demonstrate a proof-of-concept case study. We argue that any physical inconsistencies do not directly impact the results presented in this work. Additionally, our demonstration of the autonomous rover provides an implementation of the proposed framework in the physical world. We acknowledge that simulated phenomena, such as those produced by Enki, may differ from their appearance in the physical world. While the simulated contexts for environmental phenomena may not perfectly reflect the real world, previous work has shown that they can provide significant insights to developers leading to a more targeted approach of real-world data collection [38]. Finally, we note that Enki is an instantiation of an Anunnaki service rather than a core feature of the framework. Development teams may use a different tool for multi-domain training and doing so will not impact the architecture of the proposed framework.)
Stochastic Variations. The results obtained from learning-enabled services are subject to variations due to stochastic functions present in their implementations. It is important to note that this work does not seek to promote a specific learning-based technique but draws from are well studied in previous research and implemented as-provided for our demonstrations. Any refactoring of these components was conducted through a configurable wrapper class to integrate the various framework services with platform-dependent technologies and support seamless communication while keeping the underlying architecture unaltered.
Computational Complexity. In our demonstration of the Anunnaki framework, we rely on configurable services (i.e., domain detection and multi-domain training) that generate/use context-specific models (e.g., model robustified for rain conditions) to address LES robustness and resilience for different sources of uncertainty. Naturally, developers may want to robustify an LES with respect to combinations of operating contexts (e.g., model robustified to fog and rain and lighting variations) which may lead to an exponential increase in the number of models, training/memory costs, and complexity of goal model/adaptation specifications. While Enki was used to study context composition in previous work [38], there are still concerns regarding scalability of such an approach as more contexts are considered. By using evolutionary algorithms that can effectively explore complex spaces, services such as Enki and Enlil can mitigate some concerns regarding algorithmic complexity. Moreover, the Anunnaki framework enables developers to specify how many sources of uncertainty are considered and the granularity of the search procedures. Nonetheless, as the number of sources of uncertainty increase and the granularity of information increases, computational challenges increase. We will explore techniques to address this limitation in future work.

7 Conclusion

When autonomous AI systems are deployed in uncertain environments, we need to prevent system failures resulting from the inappropriate use of LECs in adverse contexts. In contrast to existing monolithic techniques to address adversarial detection and robustness, the Anunnaki framework provides a more modular service-oriented approach. The Anunnaki framework can detect adverse run-time contexts for LECs, monitor and control the use of LECs with respect to user-defined goal models, and leverage robust alternative learning models for adverse phenomena. The composability of Anunnaki services enables developers to modularly add, remove, or update behavior oracles and goal models to address new or changing assurance concerns without retraining or rebuilding LECs. Furthermore, the loose-coupling of services enables the Anunnaki framework to run in parallel and independently of managed AI systems. This article has demonstrated how the Anunnaki framework can be deployed in autonomous systems, such as a terrestrial rover to prevent obstacle collision and a UAV to deliver resources to a target location, while mitigating uncertainty resulting from the use of LECs in adverse environmental conditions (i.e., poor lighting and heavy fog, respectively). As described, the Anunnaki framework requires user-defined goal models and adaptation tactics. Future work will investigate alternative machine learning applications (e.g., ensemble models for model and uncertainty management, reinforcement learning for dynamic adaptation tactics), additional microservices, and examine run-time evaluation of dynamic goal models and services for both cybersecurity and performance concerns.

Footnotes

1
One meaning associated with the term “Anunnaki” is a collection of ancient Mesopotamian deities including Enki and Enlil.
2
In this work, the hyphenated form of this term denotes its use as an adjective (e.g., “run-time”), where as the space-separated form denotes a noun (e.g., “run time”) [47].
3
In contrast to many multi-domain learning techniques that aim at improving the performance of a single model [33], multi-domain training seeks to generate a set of domain-specific specialized models.
4
Our proof-of-concept demonstration supports soft real-time requirements. In practice, Anunnaki can be instantiated with alternative services that support the hard real-time command requirements of UAVs.
5
A collection of publications relying on Webots can be accessed at: https://github.com/cyberbotics/webots/discussions/2621

References

[1]
Ludovic Apvrille, Tullio Tanzi, and Jean-Luc Dugelay. 2014. Autonomous drones for assisting rescue services within the context of natural disasters. In Proceedings of the 2014 XXXIth URSI General Assembly and Scientific Symposium. 1–4. DOI:
[2]
Armin Balalaie, Abbas Heydarnoori, and Pooyan Jamshidi. 2016. Microservices architecture enables DevOps: Migration to a cloud-native architecture. IEEE Software 33, 3 (2016). DOI:
[3]
Carliss Young Baldwin and Kim B. Clark. 2000. Design Rules: The Power of Modularity. MIT Press.
[4]
Luciano Baresi, Liliana Pasquale, and Paola Spoletini. 2010. Fuzzy goals for requirements-driven adaptation. In Proceedings of the 2010 18th IEEE International Requirements Engineering Conference.IEEE Computer Society, USA, 125–134. DOI:
[5]
Len Bass, Paul Clements, and Rick Kazman. 2012. Software Architecture in Practice (3rd ed.). Addison-Wesley.
[6]
Shai Ben-David, John Blitzer, Koby Crammer, Alex Kulesza, Fernando Pereira, and Jennifer Wortman Vaughan. 2010. A theory of learning from different domains. Machine Learning 79, 1-2(2010), 151–175. DOI:
[7]
Shai Ben-David, John Blitzer, Koby Crammer, and Fernando Pereira. 2006. Analysis of representations for domain adaptation. In Proceedings of the Advances in Neural Information Processing Systems.B. Schölkopf, J. Platt, and T. Hoffman (Eds.), Vol. 19, MIT Press. Retrieved from https://proceedings.neurips.cc/paper/2006/file/b1b0432ceafb0ce714426e9114852ac7-Paper.pdf
[8]
Yuriy Brun, Giovanna Di Marzo Serugendo, Cristina Gacek, Holger Giese, Holger Kienle, Marin Litoiu, Hausi Müller, Mauro Pezzè, and Mary Shaw. 2009. Engineering Self-Adaptive Systems through Feedback Loops. Springer. DOI:
[9]
Ricardo Diniz Caldas, Arthur Rodrigues, Eric Bernd Gil, Genaína Nunes Rodrigues, Thomas Vogel, and Patrizio Pelliccione. 2020. A hybrid approach combining control theory and AI for engineering self-adaptive systems. In Proceedings of the IEEE/ACM 15th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. 9–19.
[10]
Anita D. Carleton, Erin Harper, Tim Menzies, Tao Xie, Sigrid Eldh, and Michael R. Lyu. 2020. The AI effect: Working at the intersection of AI and SE. IEEE Software 37, 4 (2020), 26–35. DOI:
[11]
Anirban Chakraborty, Manaar Alam, Vishal Dey, Anupam Chattopadhyay, and Debdeep Mukhopadhyay. 2018. Adversarial attacks and defences: A survey. arXiv:1810.00069. Retrieved from https://arxiv.org/abs/1810.00069
[12]
Betty H. C. Cheng, Pete Sawyer, Nelly Bencomo, and Jon Whittle. 2009. A goal-based modeling approach to develop requirements of an adaptive system with environmental uncertainty. In Proceedings of the Model Driven Engineering Languages and Systems.Andy Schürr and Bran Selic (Eds.), Vol. 5795, Springer, Berlin, 468–483. DOI:
[13]
Shang-Wen Cheng. 2008. Rainbow: Cost-Effective Software Architecture-Based Self-Adaptation. Ph.D. Dissertation. Carnegie Mellon University. Advisor(s) Garlan, David.
[14]
Shang-Wen Cheng, David Garlan, and Bradley Schmerl. 2006. Architecture-based self-adaptation in the presence of multiple objectives. In Proceedings of the International Workshop on Self-Adaptation and Self-Managing Systems. ACM. DOI:
[15]
Jeff Craighead, Robin Murphy, Jenny Burke, and Brian Goldiez. 2007. A survey of commercial & open source unmanned vehicle simulators. In Proceedings of the 2007 IEEE International Conference on Robotics and Automation. 852–857. DOI:
[16]
Paul deGrandis and Giuseppe Valetto. 2009. Elicitation and utilization of application-level utility functions. In Proceedings of the 6th International Conference on Autonomic Computing. ACM. DOI:
[17]
Dawei Du, et al. 2019. VisDrone-DET2019: The vision meets drone object detection in image challenge results. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops. 2019.
[18]
Margaret Eichleay, Emily Evens, Kayla Stankevitz, and Caleb Parker. 2019. Using the unmanned aerial vehicle delivery decision tool to consider transporting medical supplies via drone. Global Health: Science and Practice 7, 4(2019), 500–506. DOI:
[19]
Ayssam Elkady and Tarek Sobh. 2012. Robotics middleware: A comprehensive literature survey and attribute-based bibliography. Journal of Robotics 2012 (2012), 1–15. DOI:
[20]
Mario Arturo Ruiz Estrada and Abrahim Ndoma. 2019. The uses of unmanned aerial vehicles –UAV’s- (or drones) in social logistic: Natural disasters response and humanitarian relief aid. Procedia Computer Science 149(2019), 375–383. DOI:
[21]
Chen-Yuan Fan and Shang-Pin Ma. 2017. Migrating monolithic mobile application to microservice architecture: An experiment report. In Proceedings of the International Conference on AI Mobile Services. IEEE. DOI:
[22]
Abolfazl Farahani, Sahar Voghoei, Khaled Rasheed, and Hamid R. Arabnia. 2021. A brief review of domain adaptation. In Proceedings of the Advances in Data Science and Information Engineering(Transactions on Computational Science and Computational Intelligence). Robert Stahlbock, Gary M. Weiss, Mahmoud Abou-Nasr, Cheng-Ying Yang, Hamid R. Arabnia, and Leonidas Deligiannidis (Eds.), Springer International Publishing, Cham, 877–894. DOI:
[23]
Dustin Franklin. 2019. Jetson Nano Brings AI Computing to Everyone. (2019). Retrieved 18 March 2019 from https://developer.nvidia.com/blog/jetson-nano-ai-computing/
[24]
Gordon Fraser and Neil Walkinshaw. 2015. Assessing and generating test sets in terms of behavioural adequacy. Software Testing, Verification and Reliability 25, 8(2015), 749–780. DOI:
[25]
Jonas Fritzsch, Justus Bogner, Alfred Zimmermann, and Stefan Wagner. 2018. From monolith to microservices: A classification of refactoring approaches. In Proceedings of the 1st International Workshop on Software Engineering Aspects of Continuous Development and New Paradigms of Software Production and Deployment. Springer. DOI:
[26]
Mozhou Gao, Chris H. Hugenholtz, Thomas A. Fox, Maja Kucharczyk, Thomas E. Barchyn, and Paul R. Nesbit. 2021. Weather constraints on global drone flyability. Scientific Reports 11, 1(2021), 12092. DOI:
[27]
Jakob Gawlikowski, Cedrique Rovile Njieutcheu Tassi, Mohsin Ali, Jongseok Lee, Matthias Humt, Jianxiang Feng, Anna Kruspe, Rudolph Triebel, Peter Jung, Ribana Roscher, and others. 2023. A survey of uncertainty in deep neural networks. Artificial Intelligence Review 56, Suppl 1 (2023), 1513–1589.
[28]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT.
[29]
Pooyan Jamshidi, Miguel Velez, Christian Kästner, Norbert Siegmund, and Prasad Kawthekar. 2017. Transfer learning for improving model predictions in highly configurable software. In Proceedings of the 2017 IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. IEEE, 31–41.
[30]
Jason Jo and Yoshua Bengio. 2017. Measuring the tendency of CNNs to learn surface statistical regularities. arXiv:1711.11561. Retrieved from https://arxiv.org/abs/1711.11561
[31]
Glenn Jocher, Ayush Chaurasia, Alex Stoken, Jirka Borovec, Yonghye Kwon, Kalen Michael, Jiacong Fang, Zeng Yifu, Colin Wong, Diego Montes, and others. 2022. ultralytics/yolov5: v7. 0-yolov5 sota realtime instance segmentation. Zenodo (2022).
[32]
Ralph E. Johnson and Brian Foote. 1988. Designing reusable classes. Journal of Object-oriented Programming 1, 2 (1988), 22–35.
[33]
Mahesh Joshi, Mark Dredze, William Cohen, and Carolyn Rose. 2012. Multi-domain learning: When do domains matter?. In Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. 1302–1312.
[34]
Jeffrey O. Kephart and David M. Chess. 2003. The vision of autonomic computing. Computer 36, 1 (2003), 41–50. DOI:
[35]
Sophia Kolak, Afsoon Afzal, Claire Le Goues, Michael Hilton, and Christopher Steven Timperley. 2020. It takes a village to build a robot: An empirical study of the ROS ecosystem. In Proceedings of the IEEE International Conference on Software Maintenance and Evolution. IEEE. DOI:
[36]
Michael Austin Langford, Kenneth H. Chan, Jonathon Emil Fleck, Philip K. McKinley, and Betty H. C. Cheng. 2021. MoDALAS: Model-driven assurance for learning-enabled autonomous systems. In Proceedings of the 24th International Conference on Model Driven Engineering Languages and Systems. ACM.
[37]
Michael Austin Langford and Betty H.C. Cheng. 2022. A modular and composable approach to develop trusted artificial intelligence. In (To Appear in) Proceedings of the IEEE International Conference on Autonomic Computing and Self-Organizing Systems.
[38]
Michael Austin Langford and Betty H. C. Cheng. 2021. Enki: A diversity-driven approach to test and train robust learning-enabled systems. ACM Transactions on Autonomous and Adaptive Systems 15, 2(2021), 1–32. DOI:
[39]
Michael A. Langford and Betty H. C. Cheng. 2021. “Know What You Know”: Predicting behavior for learning-enabled systems when facing uncertainty. In Proceedings of the 16th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. ACM.
[40]
Michael A. Langford, Glen A. Simon, Philip K. McKinley, and Betty H. C. Cheng. 2019. Applying evolution and novelty search to enhance the resilience of autonomous systems. In Proceedings 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. ACM. DOI:
[41]
Alexei Lapouchnian. 2005. Goal-Oriented Requirements Engineering: An Overview of the Current Research. Technical Report. University of Toronto. Retrieved from http://www.cs.utoronto.ca/alexei/pub/Lapouchnian-Depth.pdf
[42]
Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollár. 2017. Focal loss for dense object detection. In Proceedings of the IEEE International Conference on Computer Vision. IEEE. DOI:
[43]
Ninghao Liu, Mengnan Du, Ruocheng Guo, Huan Liu, and Xia Hu. 2021. Adversarial attacks and defenses: An interpretation perspective. ACM SIGKDD Explorations Newsletter 23, 1 (2021), 86–99. DOI:
[44]
Lei Ma, Felix Juefei-Xu, Fuyuan Zhang, Jiyuan Sun, Minhui Xue, Bo Li, Chunyang Chen, Ting Su, Li Li, and Yang Liu. 2018. DeepGauge: Multi-granularity testing criteria for deep learning systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM. DOI:
[45]
Ivano Malavolta, Grace A. Lewis, Bradley Schmerl, Patricia Lago, and David Garlan. 2021. Mining guidelines for architecting robotics software. Journal of Systems and Software 178 (2021). DOI:
[46]
Ivano Malavolta, Grace A. Lewis, Bradley Schmerl, Patricia Lago, and David Garlan. 2021. Mining guidelines for architecting robotics software. Journal of Systems and Software 178 (2021), 110969.
[47]
P.K. McKinley, S.M. Sadjadi, E.P. Kasten, and B.H.C. Cheng. 2004. Composing adaptive software. Computer 37, 7(2004), 56–64. DOI:
[48]
Tim Menzies. 2020. The five laws of SE for AI. IEEE Software 37, 1 (2020), 81–85. DOI:
[49]
O. Michel. 2004. Webots: Professional mobile robot simulation. Journal of Advanced Robotics Systems 1, 1 (2004), 39–42. Retrieved from http://www.ars-journal.com/International-Journal-of-Advanced-Robotic-Systems/Volume-1/39-42.pdf
[50]
Olivier Michel. 2004. Cyberbotics ltd. webots: professional mobile robot simulation. International Journal of Advanced Robotic Systems 1, 1 (2004), 5.
[51]
Vinod Muthusamy, Aleksander Slominski, and Vatche Ishakian. 2018. Towards enterprise-ready AI deployments: Minimizing the risk of consuming AI models in business applications. In Proceedings of the 1st International Conference on Artificial Intelligence for Industries. IEEE.
[52]
Irakli Nadareishvili, Ronnie Mitra, Matt McLarty, and Mike Amundsen. 2016. Microservice Architecture: Aligning Principles, Practices, and Culture. O’Reilly Media.
[53]
Muhammad Naeem Irfan, Catherine Oriat, and Roland Groz. 2013. Model inference and testing. Advances in Computers 89 (2013), 89–139. DOI:
[54]
Maziar Nekovee, Sachin Sharma, Navdeep Uniyal, Avishek Nag, Reza Nejabati, and Dimitra Simeonidou. 2020. Towards AI-enabled microservice architecture for network function virtualization. In Proceedings of the 8th International Conference on Communications and Networking. IEEE. DOI:
[55]
NIST. 2019. U.S. Leadership In AI: A Plan for Federal Engagement in Developing Technical Standards and Related Tools. Technical Report. U.S. National Institute of Standards and Technology.
[56]
NTSB. 2019. Highway Accident Report, Collision Between Vehicle Controlled by Developmental Automated Driving System and Pedestrian. Technical Report NTSB/HAR-19/03. U.S. National Transportation Safety Board.
[57]
Augustus Odena, Catherine Olsson, David Andersen, and Ian Goodfellow. 2019. TensorFuzz: Debugging neural networks with coverage-guided fuzzing. In Proceedings of the 36th International Conference on Machine Learning.
[58]
Colin Paterson, Haoze Wu, John Grese, Radu Calinescu, Corina S. Păsăreanu, and Clark Barrett. 2021. Deepcert: Verification of contextually relevant robustness for neural network image classifiers. In Computer Safety, Reliability, and Security: 40th International Conference, SAFECOMP 2021, York, UK, September 8–10, 2021, Proceedings 40. Springer, 3–17.
[59]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. DeepXplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. ACM. DOI:
[60]
Kexin Pei, Yinzhi Cao, Junfeng Yang, and Suman Jana. 2017. Deepxplore: Automated whitebox testing of deep learning systems. In Proceedings of the 26th Symposium on Operating Systems Principles. 1–18.
[61]
Felipe Pontes and Edward Curry. 2021. Cloud-edge microservice architecture for DNN-based distributed multimedia event processing. In Proceedings of the Advances in Service-Oriented and Cloud Computing. ESOCC 2020. Communications in Computer and Information Science. Springer. DOI:
[62]
PyTorch.org. 2022. PyTorch Documentation. Website. (2022). Retrieved 01 January 2022 from https://pytorch.org/docs/stable/index.html. Accessed: Jan 2022.
[63]
Zhuang Qian, Kaizhu Huang, Qiu-Feng Wang, and Xu-Yao Zhang. 2022. A survey of robust adversarial training in pattern recognition: Fundamental, theory, and methodologies. Pattern Recognition 131, (2022), 108889.
[64]
Morgan Quigley, Ken Conley, Brian Gerkey, Josh Faust, Tully Foote, Jeremy Leibs, Rob Wheeler, and Andrew Ng. 2009. ROS: An open-source robot operating system. In Proceedings of the Internatinal Conference on Robotics and Automation Workshop on Open Source Software. IEEE.
[65]
Andres J. Ramirez and Betty H. C. Cheng. 2011. Automatic derivation of utility functions for monitoring software requirements. In Model Driven Engineering Languages and Systems, 14th International Conference, MODELS 2011, Wellington, New Zealand, October 16-21, 2011. Proceedings., Jon Whittle, Tony Clark, and Thomas Kühne (Eds.), Lecture Notes in Computer Science, Vol. 6981, Springer, 501–516. DOI:
[66]
Andres J. Ramirez, Erik M. Fredericks, Adam C. Jensen, and Betty H. C. Cheng. 2012. Automatically RELAXing a goal model to cope with uncertainty. In Search Based Software Engineering, David Hutchison, Takeo Kanade, Josef Kittler, Jon M. Kleinberg, Friedemann Mattern, John C. Mitchell, Moni Naor, Oscar Nierstrasz, C. Pandu Rangan, Bernhard Steffen, Madhu Sudan, Demetri Terzopoulos, Doug Tygar, Moshe Y. Vardi, Gerhard Weikum, Gordon Fraser, and Jerffeson Teixeira de Souza (Eds.). Vol. 7515. Springer, Berlin, 198–212. DOI:
[67]
Andres J. Ramirez, Adam C. Jensen, Betty H. C. Cheng, and David B. Knoester. 2011. Automatically exploring how uncertainty impacts goal satisfaction. In Proceedings of the International Conference on Automated Software Engineering. 568–571.
[68]
Andres J. Ramirez, Adam C. Jensen, and Betty H. C. Cheng. 2012. A taxonomy of uncertainty for dynamically adaptive systems. In Proceedings of the 2012 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems.IEEE, Zurich, Switzerland, 99–108. DOI:
[69]
Jorge Real and Alfons Crespo. 2004. Mode change protocols for real-time systems: A survey and a new proposal. Real-time systems 26, C (2004), 161–197.
[70]
Arthur Rodrigues, Ricardo Diniz Caldas, Genaína Nunes Rodrigues, Thomas Vogel, and Patrizio Pelliccione. 2018. A learning approach to enhance assurances for real-time self-adaptive systems. In Proceedings of the 13th International Symposium on Software Engineering for Adaptive and Self-Managing Systems. ACM. DOI:
[71]
ROS.org. 2022. ROS Melodic Morenia Documentation. (2022). Retrieved from http://wiki.ros.org/melodicAccessed: Jan 2022.
[72]
Piotr Rudol and Patrick Doherty. 2008. Human body detection and geolocalization for UAV search and rescue missions using color and thermal imagery. In Proceedings of the 2008 IEEE Aerospace Conference. 1–8. DOI:
[73]
Patrik Sabol and Peter Sincak. 2018. AI bricks: A microservices-based software for a usage in the cloud robotics. In Proceedings of the World Symposium on Digital Intelligence for Systems and Machines. IEEE. DOI:
[74]
Doyen Sahoo, Quang Pham, Jing Lu, and Steven C. H. Hoi. 2018. Online deep learning: Learning deep neural networks on the fly. In Proceedings of the 27th International Joint Conference on Artificial Intelligence. 2660–2666.
[75]
Irfan Saif and Beena Ammanath. 2020. ‘Trustworthy AI’ is a framework to help manage unique risk. MIT Technology Review(2020).
[76]
Connor Shorten and Taghi M. Khoshgoftaar. 2019. A survey on image data augmentation for deep learning. Journal of Big Data 6, 1 (2019), 1–48.
[77]
Philip C. Slingerland and Lauren H. Perry. 2021. A Framework for Trusted Artificial Intelligence in High-Consequence Environments. Technical Report ATR-2021-01456. The Aerospace Corporation.
[78]
Christian Szegedy, Wojciech Zaremba, Ilya Sutskever, Joan Bruna, Dumitru Erhan, Ian Goodfellow, and Rob Fergus. 2013. Intriguing properties of neural networks. arXiv:1312.6199. Retrieved from https://arxiv.org/abs/1312.6199
[79]
Lu Tan, Tianran Huangfu, Liyao Wu, and Wenying Chen. 2021. Comparison of RetinaNet, SSD, and YOLO v3 for real-time pill identification. BMC Medical Informatics and Decision Making 21, 1(2021), 324. DOI:
[80]
Yuchi Tian, Kexin Pei, Suman Jana, and Baishakhi Ray. 2018. DeepTest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering. ACM. DOI:
[81]
Axel van Lamsweerde and Emmanuel Letier. 2004. From object orientation to goal orientation: A paradigm shift for requirements engineering. In Proceedings of the Radical Innovations of Software and Systems Engineering in the Future. Lecture Notes in Computer Science Vol. 2941. Springer-Verlag. DOI:
[82]
Hemanth Venkateswara, Shayok Chakraborty, and Sethuraman Panchanathan. 2017. Deep-learning systems for domain adaptation in computer vision: Learning transferable feature representations. IEEE Signal Processing Magazine 34, 6(2017), 117–129. DOI:
[83]
Mario Villamizar, Oscar Garces, Harold Castro, Mauricio Verano, Lorena Salamanca, Rubby Casallas, and Santiago Gil. 2015. Evaluating the monolithic and the microservice architecture pattern to deploy web applications in the cloud. In Proceedings of the 2015 10th Computing Colombian Conference.IEEE, Bogota, Colombia, 583–590. DOI:
[84]
Karl Weiss, Taghi M. Khoshgoftaar, and DingDing Wang. 2016. A survey of transfer learning. Journal of Big data 3, 1 (2016), 1–40.
[85]
Danny Weyns, Bradley Schmerl, Masako Kishida, Alberto Leva, Marin Litoiu, Necmiye Ozay, Colin Paterson, and Kenji Tei. 2021. Towards better adaptive systems by combining mape, control theory, and machine learning. In Proceedings of the 2021 International Symposium on Software Engineering for Adaptive and Self-Managing Systems. IEEE, 217–223.
[86]
Jon Whittle, Pete Sawyer, Nelly Bencomo, Betty H.C. Cheng, and Jean-Michel Bruel. 2009. RELAX: Incorporating uncertainty into the specification of self-adaptive systems. In Proceedings of the 2009 17th IEEE International Requirements Engineering Conference. 79–88. DOI:
[87]
Chathurika S. Wickramasinghe, Kasun Amarasinghe, Daniel L. Marino, Craig Rieger, and Milos Manic. 2021. Explainable unsupervised machine learning for cyber-physical systems. IEEE Access 9 (2021), 131824–131843. DOI:
[88]
Jeannette M. Wing. 2021. Trustworthy AI. Communications of the ACM 64, 10 (2021), 64–71. DOI:
[89]
Haoze Wu, Teruhiro Tagomori, Alexander Robey, Fengjun Yang, Nikolai Matni, George Pappas, Hamed Hassani, Corina Pasareanu, and Clark Barrett. 2023. Toward certified robustness against real-world distribution shifts. In Proceedings of the 2023 IEEE Conference on Secure and Trustworthy Machine Learning.IEEE, 537–553.
[90]
Xiaofei Xie, Lei Ma, Felix Juefei-Xu, Minhui Xue, Hongxu Chen, Yang Liu, Jianjun Zhao, Bo Li, Jianxiong Yin, and Simon See. 2019. DeepHunter: A coverage-guided fuzz testing framework for deep neural networks. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis. ACM. DOI:
[91]
Becky Yerak and Tatyana Shumsky. 2019. More companies flag a new risk: Artificial intelligence. Wall Street Journal 129, (2019).
[92]
Wei Zhan, Chenfan Sun, Maocai Wang, Jinhui She, Yangyang Zhang, Zhiliang Zhang, and Yong Sun. 2022. An improved Yolov5 real-time detection method for small objects captured by UAV. Soft Computing 26, 1(2022), 361–373. DOI:
[93]
Hongyi Zhang, Moustapha Cisse, Yann N. Dauphin, and David Lopez-Paz. 2018. mixup: Beyond empirical risk minimization. In Proceedings of the International Conference on Learning Representations.
[94]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. ACM. DOI:
[95]
Mengshi Zhang, Yuqun Zhang, Lingming Zhang, Cong Liu, and Sarfraz Khurshid. 2018. DeepRoad: GAN-based metamorphic testing and input validation framework for autonomous driving systems. In Proceedings of the 33rd ACM/IEEE International Conference on Automated Software Engineering. 132–142.
[96]
Pengfei Zhu, Longyin Wen, Dawei Du, Xiao Bian, Heng Fan, Qinghua Hu, and Haibin Ling. 2021. Detection and tracking meet drones challenge. IEEE Transactions on Pattern Analysis and Machine Intelligence (2021), 1–1. DOI:

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems
ACM Transactions on Autonomous and Adaptive Systems  Volume 19, Issue 3
September 2024
242 pages
EISSN:1556-4703
DOI:10.1145/3613578
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 September 2024
Online AM: 06 March 2024
Accepted: 11 February 2024
Revised: 21 November 2023
Received: 02 March 2023
Published in TAAS Volume 19, Issue 3

Check for updates

Author Tags

  1. Software engineering
  2. models at run time
  3. self-adaptive systems
  4. artificial intelligence
  5. deep learning

Qualifiers

  • Research-article

Funding Sources

  • National Science Foundation (NSF)
  • Air Force Research Laboratory (AFRL)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 875
    Total Downloads
  • Downloads (Last 12 months)875
  • Downloads (Last 6 weeks)301
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media