skip to main content
research-article
Free access

Generative AI for Self-Adaptive Systems: State of the Art and Research Roadmap

Published: 30 September 2024 Publication History

Abstract

Self-adaptive systems (SASs) are designed to handle changes and uncertainties through a feedback loop with four core functionalities: monitoring, analyzing, planning, and execution. Recently, generative artificial intelligence (GenAI), especially the area of large language models, has shown impressive performance in data comprehension and logical reasoning. These capabilities are highly aligned with the functionalities required in SASs, suggesting a strong potential to employ GenAI to enhance SASs. However, the specific benefits and challenges of employing GenAI in SASs remain unclear. Yet, providing a comprehensive understanding of these benefits and challenges is complex due to several reasons: limited publications in the SAS field, the technological and application diversity within SASs, and the rapid evolution of GenAI technologies. To that end, this article aims to provide researchers and practitioners a comprehensive snapshot that outlines the potential benefits and challenges of employing GenAI’s within SAS. Specifically, we gather, filter, and analyze literature from four distinct research fields and organize them into two main categories to potential benefits: (i) enhancements to the autonomy of SASs centered around the specific functions of the MAPE-K feedback loop, and (ii) improvements in the interaction between humans and SASs within human-on-the-loop settings. From our study, we outline a research roadmap that highlights the challenges of integrating GenAI into SASs. The roadmap starts with outlining key research challenges that need to be tackled to exploit the potential for applying GenAI in the field of SAS. The roadmap concludes with a practical reflection, elaborating on current shortcomings of GenAI and proposing possible mitigation strategies.

1 Introduction

Self-Adaptive Systems (SASs) are designed to manage changes and uncertainties within their environment, themselves, and their goals [Weyns, 2020]. To that end, these systems are equipped with a feedback loop that typically acts without human intervention, yet, if preferred, humans may be involved in certain function(s) of the feedback loop. The concept of self-adaptation relates to various fields, including autonomic computing systems, control systems, context-aware systems, auto-tuning systems, and digital twins, and has been actively applied in the software industry [Weyns et al., 2023]. Effective self-adaptation typically relies on a set of four crucial functions or capabilities [Kephart and Chess, 2003]: (i) to monitor their operational environment and their own state; (ii) to analyze the current situation, determine whether the goals are achieved and if not evaluate the options to adapt the system, (iii) to plan an adaptation of the system for the best adaptation option, and (iv) to execute the plan and adapt the system accordingly. The four basic functions of self-adaptation along with the knowledge they share are often referred to as MAPE-K.
Generative Artificial Intelligence (GenAI) leverages AI to learn patterns and structures from training data and generate new data that exhibits similar characteristics [OpenAI, 2023]. Advances in Transformer technology, a deep learning approach capable of processing long-range data dependencies, have significantly propelled GenAI. As a representative, Large Language Models (LLMs), have ignited widespread interest in various research fields. From the perspectives of Semiotics and Linguistics, language serves not only as a symbolic system for representing and comprehending the real world [Campbell et al., 2019], but also as a framework reflecting the logic of human thought, known as linguistic determinism [Hickmann, 2000]. Trained on extensive corpora of human language data, LLMs “naturally” exhibit remarkable performance in (i) understanding the semantic meaning within textual data and (ii) performing logical reasoning using text.1
When comparing the core capabilities required to realize self-adaptation and the features offered by GenAI, it is clear that GenAI has the potential to significantly improve and enhance the capabilities of SASs. Some studies have taken initial steps to explore this potential. For instance, Nascimento et al. [2023] investigates the use of LLMs as analyzers and planners for reasoning and generating adaptation plans, and [Sarda, 2023] employs LLMs in automatically adapting configurations and deployments addressing faults in microservice systems. However, we still lack a systematic treatment of the potential of GenAI for SASs, and that is the objective of this article.
Yet, obtaining a comprehensive understanding of the benefits and challenges of employing GenAI in SASs is challenging due to three primary factors. Firstly, while GenAI-related research is abundant in fields such as AI and Software Engineering (SE), it is notably limited in leading conferences and journals on SASs like SEAMS, ACSOS, and TAAS. This necessitates searching and analyzing literature from adjacent disciplines to discern their direct and potential contributions to SASs. Secondly, the methodological diversity within the field of SASs—spanning a broad variety of analysis and planning techniques and algorithms [Weyns, 2020]—and the technological diversity in application domains—from microservices [Nunes et al., 2024] to recent advances in cyber-physical systems [Chu et al., 2024; Sanchez et al., 2024] and digital twins [Kamburjan et al., 2024]—complicates a comprehensive understanding of GenAI’s potential for SASs. Thirdly, the rapidly evolving and expanding field of GenAI, as evidenced by the increasing number of publications across various research fields, illustrated in Figure 1, adds another layer of complexity. Although these challenges reveal the difficulty of recognizing the potential of GenAI for SASs, they also underpin its necessity and urgency.
Fig. 1.
Fig. 1. Trend in the number of articles with different title keywords in conferences across various fields. Here, the solid line represents articles with “Transformer” in the title, while the dashed line represents articles with “language model” or “LLM” in the title. We can observe that Transformer has consistently been a prominent research topic in AI and robotics, while the application of language models in each research field shows rapid growth from 2023 to 2024.
To that end, this article aims to provide researchers and practitioners with a comprehensive snapshot of (i) the potential of applying GenAI to SASs, and (ii) the challenges of employing GenAI in SASs. Regarding the first point, we aim to provide a concise overview that broadly covers GenAI’s potential applications within SASs, as summarized in Table 1. Specifically, we adopt two complementary perspectives to select, filter, and categorize the state-of-the-art literature. The first perspective highlights GenAI’s potential to enhance the functions and autonomy of SASs. It specifically explores how GenAI might improve the modules for the construction of SASs, including aspects of monitoring, analysis, planning, execution, and the knowledge these modules share. The second perspective explores how GenAI could improve interactions between humans and SASs in a Human-on-the-Loop (HOTL) setting. Although SASs were initially designed to operate automatically with minimal human intervention [Kephart and Chess, 2003], integrating humans into the decision-making loop may offer considerable benefits [Cámara et al., 2015] and enhance trustworthiness [Calinescu et al., 2018; Weyns et al., 2023]. Specifically, we discuss three key directions that are fundamental design principles in HOTL setting [Gil et al., 2019]: (i) user preference acquisition to enable user-centric adaptation and increase user satisfaction, (ii) system transparency to enhance explainability and enable effective user supervision, and (iii) human-system collaboration to capitalize on the respective strengths of both humans and systems for greater efficiency. Regarding the second point, we aim to outline the challenges in employing GenAI in SASs. Specifically, we discuss the challenges from two perspectives: the first focuses on interesting opportunities for future research, consolidating the insights obtained from this study into a research roadmap that outlines the deficiencies of current studies and potential future research directions for applying GenAI in SASs. The second perspective focuses on the use of GenAI in practice, discussing the inherent shortcomings of GenAI and potential mitigation strategies.
Table 1.
MAPE-KMonitorunderstand contextLLM: transform unstructured data into a structured format, e.g., log parse, SQL automation
TF/LLM: detect anomaly by capturing contextual and positional semantics within the text
predict contextTF/LLM/DM: forecast or impute missing time series data
TF/LLM: predict event sequence by event relation capturing or semantic reasoning
Analyzer/plannerenhance existing approachesLLM: enable reasoning based on natural language (e.g., API document) for architecture-based and requirement-driven adaptation
LLM/DM: provide prior knowledge (as metaheuristics) or act as data synthesizer for enhancing performance or reducing cost in learning-based and search-based approaches
LLM: translate models to reduce design cost for formal and control-based approaches
new planning diagramLLM: employ multiple LLM agents with different roles to collaborate for comprehensive and multi-perspective decision-making
LLM: enable self-reflection (e.g., analyzing failed experiences from environment feedback) and self-evolution (e.g., summarizing and reusing successful experiences as skills)
TF/DM: serve as the policy expression or act as a planner (in model-based reinforcement learning)
Executor LLM: enable end-to-end robotic manipulation and navigation via vision-language-action models
Knowledge LLM: utilize LLMs inherent knowledge and environment feedback to build or refine SAS’s knowledge in formats of knowledge graphs, system models, or world models
LLM: translate the given (natural language-based) descriptions into (domain-specific language-based) models
HOTLPreference acquisition LLM: reason or infer user preference from user feedback
Transparency LLM: explain code, log, or decision-making models
LLM: generate intuitive visualization and interaction
Collaboration LLM: decompose task and allocate to machine or human
LLM: infer or summarize the user’s intention or action patterns to provide support
LLM: visualize both intermediate steps and change impacts for user correction
Table 1. A Brief Summary of GenAI’s Potential Applications in SASs
TF represents Transformer, DM represents Diffusion model.
The remainder of this article is structured as follows: Section 2 provides the necessary background and related work. Section 3 introduces our methods for searching and filtering literature. Sections 4 and 5 address the current state-of-the-art GenAI literature for SASs, focusing on the perspectives of MAPE-K and HOTL, respectively. Section 6 outlines and discusses challenges built on the above foundation reified in a roadmap. Section 7 discusses threats to validity of this study, and Section 8 concludes our study.

2 Background and Related Work

This section starts with introducing the foundational MAPE-K reference model of SASs. Next, we provide a brief history of GenAI and introduce several generative models targeted in this study, including Transformer, LLM, and the diffusion model. Additionally, we discuss relevant surveys and reviews pertinent to this article.

2.1 SAS with MAPE-K Feedback Loop

From an external system perspective, self-adaptation equips a software system with the ability to adjust itself to meet user-defined goals in response to uncertainties and changes that are hard or impossible to deal with before runtime [Weyns, 2020]. From an internal perspective, an SAS comprises a dual structure: the managed system, which interacts with the environment to address domain-specific user concerns, and the managing system, which comprises a feedback loop that coordinates with the managed system to address adaptation concerns, i.e., concerns about the domain concerns. On this basis, the managing system generally includes the key elements of Monitor, Analyzer, Planner, Executor, and shared Knowledge, collectively known as the MAPE-K loop [Dobson et al., 2006; Kephart and Chess, 2003; Weyns et al., 2012b], as illustrated in Figure 2. In Section 4, we employ the MAPE-K reference model to categorize and discuss the relevant literature. For a comprehensive introduction to self-adaptation, we refer the reader to [Weyns, 2020].
Fig. 2.
Fig. 2. SAS with MAPE-K feedback loop Andersson et al. [2009].

2.2 History and Scope of GenAI

GenAI has a longstanding history, with its earliest developments traceable back to Hidden Markov Models [Knill and Young, 1997] and Gaussian Mixture Models [Reynolds and Rose, 1995], used for generating time-series data. With the emergence of deep learning, each domain has developed its own methods. In the Natural Language Processing (NLP) field, classic works include N-grams [Bengio et al., 2000], Recurrent Neural Networks (RNNs) [Mikolov et al., 2010], and Long Short-Term Memory (LSTM) [Graves, 2012]. In Computer Vision (CV), representative studies involve Generative Adversarial Networks [Goodfellow et al., 2020], Variational Autoencoders [Kingma and Welling, 2022], and diffusion models [Ho et al., 2020]. Subsequently, these two fields have intersected in the Transformer architecture, initially utilized for NLP tasks but later introduced into the CV domain, such as Vision Transformer [Dosovitskiy et al., 2021] and Swin Transformer [Liu et al., 2021]. Additionally, the versatility of Transformers further spurred the development of multi-modal models like Contrastive Language–Image Pre-training [Radford et al., 2021]. Later, with the introduction of Generative Pre-trained Transformer (GPT), and the continuous increase in the number of parameters and training text data, GPT-3 [Brown et al., 2020] demonstrated astonishing generalization capabilities, being referred to as LLMs.
In this article, we primarily focus on recent achievements such as Transformers, LLMs, and diffusion models, driven by the timeliness and relevance of this study. These technologies provide invaluable inspiration for advancing the development of SASs.

2.3 Transformer

The Transformer [Vaswani et al., 2017] is a deep learning architecture optimized for sequence-to-sequence tasks, which forms the basis for several advanced models like Bidirectional Encoder Representations from Transformers (BERT) [Devlin et al., 2019], GPT-3 [Brown et al., 2020], and DALL-E-2 [Ramesh et al., 2022]. Unlike its predecessors (RNNs), the Transformer excels in managing long-range dependencies within texts. This capability stems from its self-attention mechanism, which evaluates the relationship between all pairs in an input sequence. A Transformer model consists of an encoder and a decoder. The encoder transforms the input sequence into a set of vectors representing the text in a high-dimensional space (called hidden representations). The decoder then generates output tokens, using the context from the encoder and the part of the output it has already produced. Notably, the Transformer allows for parallel training of its components, significantly enhancing the efficiency and scalability of model training for large datasets.
BERT, or Bidirectional Encoder Representations from Transformers, builds upon the Transformer’s encoder and integrates a bidirectional self-attention mechanism. This enhancement enables BERT to gain a more comprehensive understanding of the context surrounding each word in the text. It is crucial to clarify that BERT includes only the encoder component of the Transformer, rendering it a “non-generative” model focused on producing sophisticated language representations for downstream tasks like classification.

2.4 LLMs

LLMs refer to Transformer-based models with billions of parameters, pre-trained on vast amounts of text data.2 For example, GPT-3 is equipped with 175 billion parameters and utilizes a pre-processed dataset of 570GB [Brown et al., 2020]. The data sources for pre-training typically include web pages, conversational texts, books, multilingual and scientific texts, and program code, all subjected to quality filtering, de-duplication, and privacy data reduction to enhance quality and privacy.
Architecture. The architecture of LLMs primarily falls into one of three possible categories: (i) encoder-only models like BERT [Devlin et al., 2019]; (ii) encoder-decoder models such as T5 (Text-to-Text Transfer Transformer) [Raffel et al., 2023]; and (iii) decoder-only models, exemplified by GPT-3. Another notable architecture is the Mixture of Experts, speculated to be used in GPT-4, which employs multiple specialized sub-models, or “experts,” to improve scalability.
Fine-Tuning. LLMs undergo fine-tuning with domain-specific datasets to enhance performance on particular tasks [Howard and Ruder, 2018]. A notable instance is OpenAI’s Codex [Chen et al., 2021b], based on GPT-3 and fine-tuned for coding tasks. Additionally, LLMs typically require: (i) Instruction tuning: training with instruction-formatted datasets to better follow user-given natural language instructions; (ii) Alignment tuning: employing methods like Reinforcement Learning from Human Feedback (RLHF) [Christiano et al., 2023] to better align the models with human values such as helpfulness, harmlessness, and honesty [Liu et al., 2024e].
Basic Prompting Strategies. LLMs are applied to various tasks. Enhancing their effectiveness, especially for complex tasks, involves developing effective prompting strategies [Sahoo et al., 2024]. Basic strategies include: (i) In-Context Learning (ICL): based on task descriptions, with added examples or demonstrations [Brown et al., 2020]. Techniques like Retrieval-Augmented Generation (RAG) [Lewis et al., 2020] are used to provide appropriate examples; (ii) Chain-of-Thought (CoT): includes zero-shot CoT [Kojima et al., 2022] with prompts like “Let’s think step by step,” and few-shot CoT [Wei et al., 2023a], which integrates intermediate reasoning steps into prompts. More complex strategies, such as Tree-of-Thought (ToT) [Yao et al., 2023a], Graph-of-Thought (GoT) [Besta et al., 2024], and self-consistency [Wang et al., 2023g], are also employed to enhance CoT.
Abilities of LLMs. As summarized in Zhao et al. [2023b], LLMs demonstrate diverse capabilities: (i) Language generation, or conditional text generation, involves creating text that meets specific requirements for tasks like summarization, translation, and question answering, including text in natural languages, mathematical formulas, or program code; (ii) Knowledge utilization refers to the ability of LLMs to accomplish knowledge-intensive tasks (e.g., common sense question answering) based on supporting factual evidence. Specifically, it requires LLMs to properly utilize the rich factual knowledge from the pre-training data or retrieve external data when necessary; (iii) Reasoning refers to the ability to understand and utilize supporting evidence or logic to derive conclusions or make decisions. The main types of reasoning include knowledge reasoning to use logical relations and knowledge to answer the given question, symbolic reasoning to manipulate symbols in a formal rule setting to fulfill some specific goal, and mathematical reasoning to utilize mathematical knowledge and logic for solving mathematical problems or generating proof statements; (iv) Human alignment, ensuring models conform to human values like truthfulness, honesty, and safety; (v) Interaction with the external environment, enabling models to receive feedback and perform actions based on behavioral instructions, for instance, generating detailed action plans in natural language or other formats based on the natural language-based feedback; and (vi) Tool manipulation, using external tools like search engines, calculators, and APIs to enhance task performance. Note that these capabilities, while categorized, often overlap in practical applications, with tasks frequently requiring a combination of different abilities.
Multimodal LLMs (MLLMs). MLLMs extend the capabilities of traditional text-based LLMs to include understanding multiple modalities like text, images, audio, and video. These models enhance context understanding and interaction by integrating data across these modalities. There are two main approaches for handling multimodal information within MLLMs. In early studies, fusion mechanisms were studied to integrate features from various modalities at different stages-early, mid-level, or late fusion for different purposes. Recently, unified modeling [Hu and Singh, 2021] such as OpenAI’s GPT-4o [OpenAI, 2024] and Google’s Project Astra [Deepmind, 2024], which processes various data types through a consistent framework rather than at specific fusion points, is becoming mainstream. For instance, transformers can manage inputs from different modalities by adjusting their input layers, such as using position encoding for text and spatial encoding for images.

2.5 Diffusion Models

Diffusion models [Ho et al., 2020; Sohl-Dickstein et al., 2015] represent a class of generative models that simulate a diffusion process to generate data. It typically involves two phases: a noise-adding phase that gradually introduces noise until the data becomes completely random, and a denoising phase that reconstructs the original data from the noise. The field primarily features three key publications. Denoising Diffusion Probabilistic Models (DDPM) [Ho et al., 2020] are generally considered pioneering in the field, introducing noise through an ordered Markov chain and reversing this process by learning a step-by-step denoising model. Simultaneously, Noise Conditional Score Networks [Song and Ermon, 2019] introduces a score matching-based method that uses conditional score networks to estimate the gradient (score) of data at various noise levels, and this score guides the data from a noisy state back to a clean state. Following this, Score-Stochastic Differential Equations (SDE) [Song et al., 2021] places the score-based diffusion model within the framework of SDE. It directly simulates a continuous-time SDE to generate or denoise data.
Diffusion models facilitate two types of generation: unconstrained generation, which synthesizes samples from pure noise without guiding data, and constrained generation, which utilizes additional information such as class labels, text descriptions, or other images to steer the output toward specific results. Due to the inherent strengths of diffusion models, their applications have broadened well beyond initial CV tasks. These advantages include (1) effective capture of the complexity of high-dimensional data distributions, (2) support for various data types, and (3) a gradual denoising generation process that facilitates the production of high-quality, complex samples. Consequently, diffusion models are now applied to a diverse array of generative tasks. These applications extend to natural language [Zou et al., 2023], tabular data [Kim et al., 2022], 3D models [Lin et al., 2023b], medical design [Xie et al., 2022], and even (imitation-based motion) planning [Janner et al., 2022]. The primary drawback of diffusion models is their computational speed and cost, as these models typically necessitate hundreds of iterations in the denoising process.

2.6 Related Surveys and Reviews

The rapidly expanding research field of GenAI encompasses a variety of surveys and reviews that delineate the state-of-the-art and lines for future research.
Firstly, there is a wealth of literature reviewing LLMs and diffusion models, encompassing general surveys [Yang et al., 2023e] as well as more specialized reviews focusing on particular technical aspects, such as hallucinations [Huang et al., 2023c] and trustworthiness [Sun et al., 2024a] in LLMs. These reviews provide specific technical details and applications of GenAI, enhancing our basic understanding of these technologies. In the field of SE, extensive literature [Fan et al., 2023b; Zhang et al., 2023a, 2024a] details the application of LLMs to enhance processes across the software development lifecycle, including requirements engineering, design, development, quality assurance, and maintenance. These studies illuminate potential advantages for engineering adaptive systems. Within the context of autonomous systems, several works [Guo et al., 2024a; Xi et al., 2023; Wang et al., 2024b] discuss the augmentation of agent components by LLMs, covering profiling (which defines an agent’s role) [Chen et al., 2024f], perception, memory, decision-making, and action modules. These enhancements are particularly advantageous for the analysis and planning stages of adaptive systems. In the sphere of Human–Computer Interaction (HCI), research [Shi et al., 2024a] reviews interactions between humans and GenAI. It explores GenAI’s generative capabilities across various modalities—textual, 2D visual, audio, and 3D graphics—and their applications in fields like writing, programming, and education. These insights are invaluable for integrating HOTL approaches in adaptive systems. Furthermore, LLMs are increasingly prevalent in specialized application areas such as intelligent transportation systems [Yan and Li 2023], autonomous driving [Cui et al., 2023; Yang et al., 2023b], and robotics [Zeng et al., 2023b]. The innovative approaches from these fields may provide transferable insights into general methodologies for adaptive systems. Additionally, improvements in specific technologies like evolutionary computation [Cai et al., 2024b] and Reinforcement Learning (RL) [Pternea et al., 2024; Zhu et al., 2024] offer enhanced planning within SASs. While there will be some overlap with the previously mentioned surveys and reviews in the selection of literature, this article is dedicated to providing a literature overview and discussing future research challenges, with a distinct perspective on SASs.
Additionally, another directly relevant study is [Li et al., 2024c], where we initially explored the potential of LLMs in SASs. This article expands that initial study in the following aspects: Firstly, within the MAPE-K, we introduce three new categories on enhancing planning methods (Section 4.2.6), LLMs as planner (Section 4.2.7), and Diffusion model as planner (Section 4.2.8), which are specifically relevant to aspects of autonomy relevant to SASs. Secondly, we further refine the categories within both MAPE-K and HOTL, detailing the specific contributions of each study referenced. In our initial study, each category typically highlighted only one piece of literature. Finally, this article introduces a new section that outlines potential issues and discusses research challenges for the integration of GenAI into SAS with a roadmap.

3 Literature Search and Selection Methodology

This section outlines our methodology for systematically searching and selecting relevant GenAI literature relevant to SAS, focusing on targeted conferences, specific keywords, and rigorous selection criteria to ensure the inclusion of the most relevant and timely studies.

3.1 Literature Search

Given the rapid expansion of the GenAI research field, comprehensively covering all existing literature is impractical. Therefore, our literature search strategy focuses on sourcing publications related to GenAI relevant to SAS from top conferences across relevant fields and categorizing this literature. We conducted our literature search using the following criteria.
Target Conference. Given the topics of MAPE-K and HOTL, we conducted a literature search targeting leading conferences across various related fields. These included SAS (SEAMS, ACSOS), SE (ICSE, ASE, FSE, RE), AI (AAMAS, ACL, ICLR, IJCAI, NeurIPS, AAAI, ICML, GECCO), HCI (CHI, UIST), and Robotics (CoRL, ICRA). Additionally, workshops and companion proceedings of these conferences were also included.
Keywords. We used keywords of Transformer, BERT, T5, GPT, pre-train, language model, LLM, ChatGPT, generative, and diffusion, as these terms could be directly related to the topic of GenAI discussed in this article.
Publication Year. We collected literature from 2017 until June, 2024, as the Transformer was introduced in 2017 [Vaswani et al., 2017]. For diffusion, we collected literature from 2020 onwards, as the concept of denoising diffusion is generally considered to have been proposed in 2020 [Ho et al., 2020]. Additionally, for the timeliness, our policy was to include up-to-date literature as much as possible, with SEAMS, ICSE, AAMAS, ICLR, AAAI, CHI, and ICRA including literature from 2024. For RE, we only included main conference articles and not workshop articles, as workshop articles were not yet publicly available when searching for literature.
Source. For articles that are published in conference proceedings, we obtain articles from official databases like IEEE Xplore and ACM Library. For articles that have not yet been officially published, we first collect article titles using the official conference program, and then collected the full articles through preprint platforms such as OpenReview or ArXiv. In total 18 articles (5 from RE and 13 from ICRA) were not available on these platforms and were not included in our review.
Search Results. As a result of our literature search, we obtained a total of 5,874 pieces of literature. The breakdown is as follows: 3 from SAS [Li et al., 2024c; Nascimento et al., 2023; Sarda, 2023], 302 from SE, 5061 from AI, 245 from HCI, and 228 from Robotics.

3.2 Literature Selection and Categorization

We screened and categorized the searched literature based on the following steps.
Relevance to GenAI. To confirm the relevance to GenAI, we initially scrutinized the abstracts to ensure that terms like “language model,” “generative,” and “diffusion” used in the titles align with the context of GenAI discussed in this article. In this process, we primarily filtered out non-transformer-based language models, such as LSTM and RNN, and instances where these terms are used with different implications, such as “diffusion” which refers to a physical concept in dynamics rather than a denoising process of data. In this step, we excluded 1,401 pieces of literature, leaving a total of 4,473 articles remaining for the next analysis.
Relevance to SAS. After confirming the relevance of the selected works to GenAI, we further assessed their direct connection to SASs. Our primary focus here is to determine if these studies were relevant to the topics of MAKE-K or HOTL.
To determine the relevance of articles, we started with excluding studies that are focusing on specifics of GenAI only (e.g., improvements in Transformer, application of LLMs in text generation, collaboration in writing). We then applied the following set of selection rules. Firstly, we omitted dataset-related literature that, while potentially enhancing the evaluation of SASs, does not directly improve their functionality. Secondly, for the topic of “monitor,” we excluded literature focused on vision-based scene perception (such as object recognition, scene segmentation, and entity relation extraction), and predictions in various applications, including action prediction, posture prediction, and biochemical predictions like proteins and weather forecasting. However, we retained more general detection and prediction methods and literature that is more relevant to SAS domains, such as traffic flow prediction and log detection. Thirdly, for “knowledge,” we did not consider natural language-based knowledge. For instance, Hou et al. [2024] encapsulates factors like context and occurrence time into one “memory” to simulate human recall of past experiences. Fourthly, in relation to “analysis & planning” (i.e., decision-making), we have primarily focused on two types of studies. The first type pertains to leveraging GenAI to strengthen the “seven waves” [Weyns, 2020], established approaches for engineering SASs. The second type of study mainly involves utilizing GenAI to realize or enhance decision-making. We exclude studies on the application of Transformers in communication, as they are often very technical in nature although they may have some potential applications in distributed planning settings. For example, literature such as [Inala et al., 2020] explored the use of Transformers to generate communication graphs, which aim to minimize communication in multi-agent planning scenarios. In relation to HOTL and in particular “preference acquisition,” we excluded the literature related to emotion detection, such as text-based empathy detection and depression detection, as well as literature focusing on Transformer and LLM’s human alignment. Fifth and Lastly, we did not include research on code generation, a prominent topic in SE. While we acknowledge that automatic code generation may assist in the development and evolution of adaptive systems, its contributions are deemed indirect.
To confirm the relevance, two authors independently reviewed each article, and a third author was involved to discuss and resolve conflicts. As a result, we finally obtained 219 pieces of literature.
First and Section-Level Categorization. We began by establishing a preliminary categorization, using MAPE-K and HOTL as two fundamental first-level categories. For further refinement, we categorized MAPE-K into monitor and analyzer and planner, executor, and knowledge. Here, due to the frequent difficulty in distinguishing between analyzer and planner in various studies, we combine them into one category. For HOTL, we devised a secondary categorization that includes preference acquisition, transparency, and collaboration. These categories directly correspond to the three main purposes involved in integrating HOTL within SAS. Based on these primary and secondary categories, we proceed with a preliminary classification of the literature.
Further Categorization Refinement. Subsequently, based on the primary and secondary categorizations mentioned above, we discussed further subdivisions of the literature. Since the criteria for these subdivisions vary for each secondary category, we refrain from detailing the methods for further subdivision in this section. Instead, the criterion will be introduced separately in the subsequent sections, tailored to the specific nuances of each category.
Results of Literature Selection and Categorization. Figure 3 summarizes the results of our literature selection and classification, where the numbers following each category represent the number of articles within that category. We have made the specific classification and the complete list of literature publicly available, which can be accessed at https://github.com/545659928/GenAI4SAS.
Fig. 3.
Fig. 3. Literature categorization overview. One piece of literature may be involved in multiple categories.

4 Enhancing the Modules in MAPE-K Feedback Loops

This section discusses GenAI’s potential enhancements to MAPE-K modules derived from the review of the literature. Figure 4 summarises the results. Note that we deal with the analyzer and planner together, as the distinction between these roles is often mixed in studies.
Fig. 4.
Fig. 4. Overview of empowerment of MAPE-K modules via GenAI.

4.1 Monitor

In SASs, the primary tasks of the monitor are: (i) detecting changes within the managed system and its context, i.e., its operational environment, and reflecting these changes in the runtime models; and (ii) determining whether to trigger the analyzer [Weyns, 2020]. A well-known work that puts the emphasis on the monitoring function (and context relevance in particular) is DYNAMICO [Villegas et al., 2013]. Traditionally, both monitoring tasks are realized by manually defined mechanisms. Leveraging the Transformer’s ability to handle long-range dependencies in data, as well as the reasoning capabilities of LLMs, has the potential to enhance self- and context-awareness of an SAS significantly.

4.1.1 Context Understanding.

Context understanding involves two key areas: data structuring and anomaly detection, with the former often translating machine-unreadable data into machine-readable, and the latter often serving as a trigger for adaptation.
Data Structuring. Data structuring transforms unstructured data into a structured format to aid in storing observed data as knowledge and support decision-making. In the realm of SE, there has been notable progress using LLMs for log parsing. For instance, LLMParser [Ma et al., 2024a] enhances parsing accuracy through fine-tuning different open-source LLMs, and [Le and Zhang 2023; Wang et al., 2023d] explores the impact of various prompt strategies. Chen et al. [2024c] utilizes a prefix-tree to enhance LLMs matching the most suitable log output format, thereby enhancing parsing efficiency and accuracy. The results in the above studies show a notable 95% average accuracy, significantly higher than state-of-the-art parsers. However, research by Astekin et al. [2024] highlights the lack of determinism in LLM outputs for log parsing, showing that the results across six LLMs and 16 system logs are unstable even at a temperature setting of zero. Additionally, some studies have explored the use of LLMs as interfaces for data repositories to automate data operations. For example, Li et al. [2023b] has investigated the potential of translating text to SQL data in real-world “dirty” datasets. The study involved 95 datasets (33.4 GB) across 37 professional fields, revealing that even with GPT-4, the execution accuracy is only 54.89%.
Anomaly Detection. Many studies have utilized Transformers for unsupervised anomaly detection by capturing contextual and positional semantics within text [Le and Zhang, 2021; Xu et al., 2022b]. For instance, Xu et al. [2022b] enhances the anomaly detection capabilities of Transformers by implementing an Anomaly-Attention mechanism that amplifies the distinguishability between normal and abnormal patterns. Specifically for log-based anomaly detection [Ma et al., 2024c], fine-tunes BERT to understand universal log representations through three fine-tuning tasks: (1) leveraging abbreviations to enhance the understanding of abbreviations, (2) leveraging natural language descriptions of logs to enhance the understanding of domain-specific terminology, and (3) utilizing log templates conveying the same semantics across different vendors (e.g., WIFI router logs from Cisco and Huawei). The results indicate that the above method achieves an average F1 score of 0.8 in the task of risk log identification. Regarding LLM-based log anomaly detection, Liu et al. [2024d] addresses the online scenario where logs originate from diverse application environments. These logs often change in format and content due to regular software updates. The study introduces a set of prompt strategies tailored for log analysis tasks. The effectiveness of these strategies was evaluated through their performance in log parsing, achieving an average F1 score of 0.797, and in anomaly detection, with an average F1 score of 0.412.
Others. Additionally, Transformers are also used in sensor information fusion [Shao et al., 2022], and (unobservable) state estimation [Yoneda et al., 2024]. For instance, Yoneda et al. [2024] employs LLMs to maintain an estimate of the world state, which is usually unobservable, through reasoning. For example, after the action of moving a cup, the changed position of the cup is inferred.

4.1.2 Context Prediction.

Context Prediction is important because it can identify potential future target violations, thereby proactively triggering adaptation. Here we discuss two types: time series data, which is usually quantitative and often measured at regular intervals, and event sequences that emphasize the order and timing of events without necessarily adhering to a uniform time scale.
LLM-Based Time Series Forecasting. In earlier studies, many improvements to Transformer-based forecasting have been proposed [Cao et al., 2023; Chen et al., 2024g; Huang et al., 2023a; Jiang et al., 2023a; Liu et al., 2022; López-Ruiz et al., 2022; McDermott et al., 2023; Tang and Matteson 2021; Tang and Zhang 2023; Wen et al., 2023b; Wu et al., 2020; Zhang and Yan 2023; Zeng et al., 2023a; Zhou et al., 2022]. As state-of-the-art and the generalized Transformer-based model in particular, researchers from Google have introduced the TimesFM (Time-series Foundation Model) [Das et al., 2024b]. This model showcases competitive zero-shot performance across various public datasets, highlighting its robustness across different forecasting history lengths, prediction lengths, and temporal granularities.
Regarding LLM-based methods, LLMTIME [Gruver et al., 2023] proposed a zero-shot time series forecaster that encodes numbers as text and samples possible extrapolations as text completions. Additionally, the article presents two interesting findings: (i) LLMs can naturally accommodate missing data; and (ii) the Uncertainty Calibration—how well a model’s predicted probabilities reflect the actual likelihood of outcomes—of GPT-4 is less reliable than GPT-3, likely due to interventions such as RLHF. Zhou et al. [2023b] trains a language model to achieve state-of-the-art or comparable performance in all major types of time series analysis tasks, including short/long-term forecasting, imputation, few-shot and zero-shot forecasting. Furthermore, both theoretically and empirically, they found that the self-attention mechanism performs a function similar to PCA (Principal Component Analysis), which helps explain the universality of transformer models in handling various data analysis tasks. Jin et al. [2024] introduces Time-LLM, which translates time series into text prototype representations that are more naturally processed by LLMs. This approach augments the input context with declarative prompts, such as domain expert knowledge, to guide LLM’s forecasting capabilities. Cao et al. [2024] proposes a more complex processing pipeline for time series analysis that includes (a) decomposing the time series input into trend, seasonality, and residual information, (b) embedding and inputting each of these components into a pre-trained GPT model separately, and (c) recombining the outputs to form the final prediction. The majority of the above studies demonstrate the performance of LLM-based forecasting surpassing that of mainstream specialized models.
Diffusion Model-Based Time Series Forecasting. The concept of “diffusion” has effectively extended to time series analysis, demonstrating significant performance in time series forecasting and imputation. TimeGrad [Rasul et al., 2021], the pioneering DDPM-based work in this area, injects noise into data at each predictive time point, followed by a gradual denoising process using a backward transition kernel conditioned on historical time series data. Subsequent studies focus on improving the performance and reducing the training costs of the above method [Fan et al., 2024b, Kollovieh et al., 2023; Shen et al., 2024]. In addition to the above methods for multivariate time series, diffusion models have also been adapted for Spatio-temporal Graphs, which incorporate time and spatial relationships between different entities, such as in traffic prediction. Notable works include DiffSTG [Wen et al., 2023a] and Graph Convolution Recurrent Denoising Diffusion [Li et al., 2023c]. These models have shown their effectiveness across thousands of dimensions in real datasets and have achieved state-of-the-art performance on multiple real-world datasets.
Diffusion model-based prediction and generation have also been specifically applied across various application domains. In SE, Maat [Lee et al., 2023] uses diffusion models to forecast future performance metrics in cloud services and employs an additional detector to identify impending anomalies. In autonomous driving, Generative AI for Autonomy (GAIA-1) [Hu et al., 2023b] explores leveraging video, text, and action inputs to generate realistic driving scenarios in the manner of video. Specifically, GAIA-1 demonstrates its ability to understand and finely control static and dynamic concepts such as the distribution of buildings and traffic lights, comprehend 3D assemblies like pitch and roll induced by road irregularities, and grasp decision causality, such as the reactions of road users. In traffic scenarios, diffusion models have been applied to predict and generate information including the distribution of vehicle poses, orientations, and trajectories across different geographical regions [Lu et al., 2024; Pronovost et al., 2023; Zhong et al., 2023a, 2023b].
Furthermore, the imputation of time series data has also been extensively explored with methods such as Conditional Score-based Diffusion for Imputation [Tashiro et al., 2021]. Yang et al. [2023d] utilizes Diffusion+, a sample-efficient diffusion model, to impute data that trains another (non-diffusion) prediction model, thereby enhancing cloud failure prediction at Microsoft 365.
LLM-Based Event Sequence Prediction. In Transformer-based prediction, Zhu et al. [2023b] focuses on script event prediction by incorporating event-level knowledge into the fine-tuning of Transformers, thus capturing inter-event relationships more effectively. GraphBERT [Du et al., 2022] automatically constructs event graphs, which is similar to state machines, from natural language descriptions. Shou et al. [2023] incorporates causal reasoning for time-event sequences into Transformers to enhance predictive accuracy. Regarding LLM-based methods, Shi et al. [2023] introduces the Language Model in Event Prediction framework. This framework employs an event sequence model to generate multiple prediction candidates, which are then evaluated through abductive reasoning by LLMs. The LLMs match patterns against actual previous events and retrieve the most pertinent sequences. A ranking model selects then the predictions with the strongest support from the retrieved evidence. Additionally, Geometrically Grounding LLM (GG-LLM) [Graule and Isler, 2024] for Human Activity Forecasting, aids in human-aware task planning. For instance, if a human is observed holding a laundry basket, GG-LLM would advise a cleaning robot against cleaning the laundry room at that time. GG-LLM incorporates a semantic map detailing room locations and item placements and is fine-tuned using extensive text corpora that describe typical human behaviors, enabling it to learn likely sequences of human actions and activities.
Summary—Monitor. GenAI and LLM in particular offer a huge potential to support the monitor function of SASs in two particular directions: context understanding and context prediction. Regarding context understanding, LLMs have the potential to enhance the structuring of unstructured data collected by the monitor and facilitate anomaly detection, which are crucial features to deal with understanding the growing amounts of data systems face. Regarding context prediction, LLM-based and diffusion-based methods offer the potential to enhance the monitor function with time series forecasting and event sequence prediction, which are key to identifying potential future target violations.

4.2 Analyzer and Planner

The analyzer and planner play pivotal roles in self-adaptation. The main tasks of the analyzer are exploring possible configurations for adaptation (i.e., adaptation options) and evaluating them, while the main tasks of the planner are selecting the best adaptation option based on the adaptation goals and generating a plan to adapt the managed system for this new configuration [Weyns, 2020]. However, it is often not easy to distinguish between these roles as the functions of the analyzer and the planner may be integrated (often referred to as decision-making). Hence, we deal with them together. We start with describing how LLMs have the potential to enhance specific aspects of different aspects of engineering SASs based on the “seven waves” of research interests within the research community [Weyns, 2020]. 3 This discussion extends from Section 4.2.1 through Section 4.2.5. Next, we examine how LLMs have the potential to augment current planning that is generally used in SASs in Section 4.2.6. Finally, we introduce two new planning paradigms for SASs that leverage the direct use of LLMs and diffusion models as planners respectively. These paradigms are described in Sections 4.2.7 and 4.2.8.

4.2.1 Architecture-Based Adaptation.

Architecture-based adaptation centers on leveraging software architecture to realize self-adaptation, reflected in two complementary functions. First architecture allows abstracting the design of SASs through layers and system components. A seminal model in this approach is the three-layer reference model of Kramer and Magee [Sykes et al., 2008], which delineates the system’s operations across three layers: (a) the goal management layer, responsible for generating action plans; (b) the change management layer, tasked with configuring components per these plans; and (c) the component layer, handling the operations of these components. Formal Model for Self-Adaptation (FORMS) formalizes this structure [Weyns et al., 2012b]. Second, architecture enables the system to exploit high-level models to reason about the adaptation options, potentially system wide. Characteristic work in this area over time are Rainbow [Garlan et al., 2004], Models at Runtime [Blair et al., 2009], QoSMOS [Calinescu et al., 2011], proactive adaptation [Moreno et al., 2015], and ActivFORMS [Iftikhar and Weyns, 2014].
Recent developments in LLMs reflect similar principles, which can be considered as CoTs with external tool calls [Inaba et al., 2023]. For specific problems or goals, an LLM initially segments the problem into sub-problems either sequentially or hierarchically, selects appropriate components, often APIs, for addressing each sub-problem, and accurately deploys and calls these components. For instance, HuggingGPT [Shen et al., 2023] uses LLMs as controllers to orchestrate existing AI models with language interfaces. HuggingGPT selects AI models based on their functional descriptions from Hugging Face and employs them for executing complex tasks across language, vision, and speech domains, demonstrating robust performance. Another example, ToolLLM [Qin et al., 2024] tackles problem-solving by generating sequences of API calls from a pool of 16,464 real-world APIs.
Furthermore, some studies focus specifically on the aspect of component selection, akin to the change management layer. Schick et al. [2023] introduces Toolformer, a model trained explicitly to determine which APIs to use, the timing of their invocation, and the parameters to be passed. Zohar et al. [2023] introduces Language-Only Vision Model Selection, which facilitates model selection and performance prediction based solely on textual descriptions of the application. Lastly, Alsayed et al. [2024] proposes MicroRec, a framework designed to recommend or select microservices using information from README files and Dockerfiles. Zhuang et al. [2024] considers the API call space as a decision tree, where nodes represent API function calls and their cost functions, and uses the A* algorithm to achieve efficient call paths.

4.2.2 Requirements-Driven Adaptation.

Requirement-driven adaptation puts the emphasis on the requirements as the driver of adaptation, treating them as first-class citizens. Notable methods include RELAX, a language that facilitates the relaxation of requirements to address uncertainties [Whittle et al., 2009], and awareness and evolution requirements reified in the ZANSHIN framework, which introduced meta-requirements for determining adaptation and its actual execution respectively [Silva Souza et al., 2011]. We explore the potential of GenAI through three key aspects of requirement management: specification, operationalization, and change.
Requirement Specification. Specifying requirements involves defining the objectives that the system should fulfill. Central to self-adaptation are quality requirements [Weyns et al., 2012a]. In this context, LLMs may significantly alleviate the modeling burden. For example, LLMs have been used to convert requirements expressed in natural language into formal specification languages such as Linear Temporal Logic (LTL) or a user-given domain-specific model language, as demonstrated in Izquierdo et al. [2024] and Yang et al. [2024b].
Requirement Operationalization and Traceability. This aspect refers to aligning or synchronizing system elements with dynamic requirements, which is essential in requirement-driven adaptation [Sawyer et al., 2010]. As the traceability between high-level goals and components has been discussed in architecture-based adaptation, here we discuss the linking within requirements and the linking from requirements to the code level. For linking within requirements, Preda et al. [2024] applies LLM to the task of High-Level to Low-Level Requirements Coverage Reviewing, verifying LLM’s high understanding ability in mapping between high-level abstract requirements and low-level scenario-specific requirements. For linking to the code level, T-BERT [Lin et al., 2021] effectively creates trace links between source code and natural language artifacts, achieving F1 scores between 0.71 and 0.93 across various datasets. Similarly, BERT4RE [Ajagbe and Zhao, 2022] fine-tunes BERT to support establishing requirements traceability links for a wide range of requirements.
Requirement Change. Requirement change is a crucial aspect of an adaptive system’s capability to modify its objectives based on changes, particularly in the environmental context, representing a significant challenge within requirement-driven adaptation [Weyns, 2020]. LLMs have shown promising potential in addressing this challenge from three perspectives: Firstly, LLMs have been extensively utilized in RL, particularly in dynamic and complex environments, with a focus on reward design and reward shaping. These models have demonstrated capabilities that surpass manually designed rewards. For instance, Kwon et al. [2023] validates the consistency between LLM-generated rewards and user’s objectives under zero-shot or few-shot conditions. Xie et al. [2024] emphasizes generating dense reward functions based on natural language descriptions of system goals and environmental representations. These ideas can be directly applied to dynamic requirement adjustments in adaptive systems. Secondly, requirements extraction and analysis often require inputs from multiple perspectives, including end-users, engineers, and domain experts. To address this, Nakagawa and Honiden [2023] proposed a multi-LLM agent framework that enables LLM agents to assume various roles and iteratively refine system requirements through discussions. Originally designed for the requirements engineering phase, this framework is equally applicable to runtime requirements adaptations by equipping agents with up-to-date runtime context. Moreover, in situations involving requirements conflicts, negotiation or debate-based approaches [Chan et al., 2024a; Hunt et al., 2024] have shown to be potentially more effective than traditional discussion methods. Finally, leveraging LLMs’ capabilities for natural language interaction allows them to effectively capture and integrate user preferences based on runtime feedback into the system’s requirements. This aspect is not discussed here but is covered in detail in Section 5.1.
Additionally, Transformers and LLMs have also been used for requirement classification [Hassani, 2024; Luo et al., 2023; Mehder and Başak Aydemir 2022; Varenov and Gabdrahmanov 2021], dependency classification [Deshpande et al., 2021], and inconsistency detection [Fantechi et al., 2023; Feng et al., 2024]. These automation techniques could potentially assist in extending requirements engineering into the runtime phase as used in SASs.

4.2.3 Guarantees under Uncertainty.

“Guarantees Under Uncertainty” focuses on ensuring that an SAS complies with its adaptation goals despite the inherent uncertainties it faces. Formal verification techniques such as quantitative verification [Calinescu et al., 2011], statistical model checking [Weyns and Iftikhar, 2023], and proactive adaptation using probabilistic model checking [Moreno et al., 2016] are extensively studied for their abilities to provide evidence of assurance compliance with its requirements at runtime.
To the best of our knowledge, there is no research on using LLMs to directly enhance verification processes. However, several studies demonstrate how LLMs can automate or assist the modeling activities for model checking, potentially lowering entry barriers for developers. For instance, Yang and Wang [2024] employs LLMs to convert natural language network protocol descriptions into quantifiable dependency graphs and formal models, aiding the formal verification of next-generation network protocols. Some other studies aimed at converting natural language into LTL format specifications [Izquierdo et al., 2024; Mavrogiannis et al., 2024; Yang et al., 2024b].
Furthermore, the use of LLMs in theorem proof (in both the context of mathematics and programs) has also seen initial efforts. Welleck et al. [2022] fine-tunes GPT-3 for Mathematical Proof Generation, verifying its correctness rate of about 40% in short proofs (2–6 steps). Han et al. [2022] extract training data from kernel-level proof to improve the Transformer’s (next-step proof) tactic prediction, addressing the scarcity of training data for formal theorem proof. Thor [Jiang et al., 2022] allows a language model-based theorem prover to additionally call automated theorem provers (namely hammers [Lukasz Czajka and Kaliszyk, 2018]) for premise selection, achieving performance comparable to existing SOTA while reducing computational demand. First et al. [2023] proposes Baldur, a fine-tuned LLM for generating entire proofs, which proves to be as effective as search-based techniques but without the associated high costs. Baldur has also demonstrated capabilities in proof repair by utilizing additional context from previous failed attempts and error messages, proving an additional 8.7% of theorems compared to Thor. Additionally, [Wu et al., 2022a; Zhou et al., 2024c] then attempt to automatically define mathematical problems into formal specifications such as Isabelle (a formal theorem proving environment). Regarding program verification, LEMUR [Wu et al., 2023] combines LLMs and automated reasoners, where LLMs are employed to propose program invariants in the form of sub-goals, and then reasoners are used to verify their Boolean properties. Yao et al. [2023b] explore the use of LLMs to synthesize invariants and other proof structures necessary for demonstrating program correctness within the Verus framework, significantly reducing the effort required to (manually) write entry-level proof code.

4.2.4 Control-Based Software Adaptation.

Control-based adaptation leverages the mathematical principles of control theory to implement and analyze adaptive systems, ensuring their key properties are maintained [Shevtsov et al., 2018]. A pioneering work in this area is the so called push-button methodology that automatically generates and adjusts a controller at runtime [Filieri et al., 2014]. However, to date, to the best of our knowledge, there is currently no research on using LLMs to directly augment control theory or its direct applications.
As outlined in [Weyns, 2020], the application of control theory to software adaptation encounters two main challenges: (i) the difficulty in precisely formulating a system model, particularly the mathematical model (usually linear) that captures the dynamic behavior of software systems, including defining key variables and the equations that govern their interactions; and (ii) the challenge of defining bidirectional mapping between SE’s non-functional requirements (such as performance and cost) and control theory’s properties (such as stability and overshoot). Given LLMs’ vast knowledge base regarding software, and their capability to identify important feature variables [Hollmann et al., 2023], LLMs could potentially contribute to overcoming these challenges.

4.2.5 Learning from Experience.

Learning from experience in SASs refers to the use of Machine Learning (ML) techniques to manage the growing scale and increasing complexity of uncertainty [Gheibi et al., 2021a]. A representative example is reducing large search or adaptation spaces, thereby enabling formal methods to efficiently complete analysis and planning within a designated time window [Gheibi et al., 2021b; Jamshidi et al., 2019]. We present three potential aspects of integrating LLMs and diffusion models to enhance ML applications in SASs: (i) using LLMs to boost ML model performance, (ii) utilizing LLMs to improve RL, and (iii) employing LLMs or diffusion models to reduce the adaptation space.
Enhancing ML. The literature in this domain can be categorized into four types, each aiming to automate different aspects of ML: (i) ML pipeline generation: Literature such as [Xu et al., 2024b; Zhang et al., 2023b] focuses on automating the entire ML pipeline, from data processing to model architecture and hyperparameter tuning, enhancing overall ML performance. (ii) Data annotation: [Ding et al., 2022] explores the performance of GPT-3 in automating data labeling. (iii) Algorithm and model selection: MLCopilot [Zhang et al., 2024f] applies experiential reasoning to recommend effective models for new tasks by analyzing historical data on task performance, code, and accuracy. (iv) Feature engineering automation:4 Tools like CAAFE [Hollmann et al., 2023] automate feature engineering by generating context-aware features based on dataset characteristics and iteratively updating features based on performance feedback. Integrating LLMs into the ML model construction process can not only reduce the manual effort required in ML model construction but could also potentially improve the model’s performance. Additionally, such LLM-based automated ML also has the potential to facilitate lifelong learning and model updates at the runtime phase [Gheibi and Weyns, 2024; Silver et al., 2013].
Enhancing RL. RL is highly effective for planning in dynamic environments as it models decision-making through a sequence of actions designed to maximize long-term rewards [Kim and Park, 2009; Li et al., 2022b; Zhang et al., 2021].
LLMs have been used to augment RL in the following ways: (i) Reward function: As previously discussed in the Requirement Adaptation (Section 4.2.2), LLMs can automate the design of reward functions, demonstrating higher performance and faster convergence speed than expert-designed reward function [Kwon et al., 2023; Sun et al., 2024d; Xie et al., 2024; Yu et al., 2023a]; (ii) Providing sub-goals or skills: LLMs can utilize their high-level planning abilities to guide RL agents by defining intermediate tasks. Exploring with LLMs [Du et al., 2023], for example, encourages agents to explore strategically significant behaviors, like locating a key before attempting to open a door. Relevant studies include [Dalal et al., 2024; Ma et al., 2024b; Melo, 2022; Rocamonde et al., 2024; Shukla et al., 2024; Tan et al., 2024; Zhang et al., 2023d, 2023f]. This type of study could enhance the performance of RL in scenarios that require multiple skills or long-term planning; (iii) Policy: LLMs or Transformers can decrease the expenses associated with offline RL training by directly serving as demonstration policies [Carta et al., 2023; Szot et al., 2024; Wang et al., 2022]; and (iv) State representation or quality function: Transformer could serve as representation of state [Hu et al., 2021; Lee and Moon 2023; Parisotto et al., 2020; Yang et al., 2022; Zhang et al., 2023d] or quality function [Chebotar et al., 2023; Gallici et al., 2023] to enhance the performance, scalability, and transferability of RL.
Diffusion models have also been explored for enhancing RL, serving in three different roles: (i) Data synthesizer: Diffusion models are employed to synthesize data for training due to the prevalent issue of data scarcity. Multi-Task Diffusion Model [He et al., 2023] leverages the extensive knowledge available in multi-task datasets, performing implicit knowledge sharing among tasks, with experimental results indicating significant enhancements in generating data for unseen tasks. (ii) Policy: Diffusion-QL [Wang et al., 2023c] innovatively employs a conditional diffusion model to express policies, integrating Q-learning guidance into the reverse diffusion chain to optimize action selection. Kang et al. [2023] enhances the sampling efficiency of Diffusion-QL by strengthening the diffusion policy. Similarly, Chen et al. [2023a] decouples policy learning into behavior learning and action evaluation. This approach allows for improving policy expressivity by incorporating the distributional expressivity of a diffusion-based behavior model; (iii) Planner: Diffusion models serve as planners, enhancing model-based RL by estimating action sequences that maximize cumulative rewards [Ni et al., 2023]. Detailed methodologies are discussed in Section 4.2.8.
Adaptation Space Reduction via LLMs. LLMs’ extensive knowledge also offers opportunities to reduce or condense the analysis and planning space of SASs semantically. Nottingham et al. [2023] applies LLMs to hypothesize, verify, and refine an Abstract World Model (AWM), thus abstracting the state space to enhance the training efficiency of RL agents. Rana et al. [2023] uses semantic search in robot planning tasks involving multiple floors and rooms to prune the planning space, thus speeding up traditional planning techniques.

4.2.6 Enhancing Existing Planning Techniques.

This section explores how LLMs have the potential to enhance four existing planning methods.
Search-Based Planning. Search-based planning involves algorithms that systematically explore spaces of possible actions or configurations to identify sequences that achieve specific goals [Harman et al., 2012]. The design of heuristics to improve the practicality and efficiency of these searches is a key focus. For instance, Yu et al. [2023b] proposes a Graph Transformer as a heuristic function for Multi-Agent Planning, which can be trained in environments with fewer agents and generalized to situations with more agents. For LLM, Shah et al. [2023] utilizes “semantic guesswork” as a guiding heuristic for robot planning, such as guiding the robot to head to the kitchen for the task “find gas stove”. Dai et al. [2024] uses LLM to generate and translate multi-resolution (i.e., hierarchical) LTL, such as building, floor, and room as different resolutions, within a multi-resolution multi-heuristic A* algorithm. LLAMBO [Liu et al., 2024a] utilizes the knowledge of LLMs to enhance zero-shot warmstarting in Bayesian optimization.
Evolutionary Algorithms (EAs). Although EAs are a form of search method, they are discussed separately here due to their distinct characteristics and widespread application. EAs, inspired by natural evolution and genetics, are known for their global search capabilities and adaptability to various problem types [Li et al., 2024a; Mc Donnell et al., 2023]. Enhancements via LLMs in EAs focus on search operators like LLM-based crossover, mutation, and selection [Cai et al., 2024b]. A representative example, [Liu et al., 2024b], demonstrates how LLMs can first select parent solutions from the current population, and then facilitate crossover and mutation processes to generate offspring solutions. The experiments indicate achieving competitive performance in small-scale, single-objective problems like the traveling salesman problem with 20 nodes. Similarly, Guo et al. [2024c] employs LLMs as evolutionary search operators to automatically generate optimization algorithms for the traveling salesman problem, showing that LLM-generated heuristic algorithms surpass traditional greedy heuristics. Yang and Li [2023a] proposes a decomposition-based multi-objective EA framework, using LLMs to manage the reproduction of individuals within decomposed subproblems.
Game Theory. Game theory provides a mathematical framework to analyze strategic interactions among rational decision-makers and is extensively applied in adversarial settings, such as security [Chan et al., 2024b; Li et al., 2024b]. Leveraging the natural language and understanding capabilities of LLMs, game theory can now be “realized” directly through natural language instead of mathematical definitions, broadening its application scope to include areas like social simulation. Fan et al. [2024a] conducted a systematic analysis of LLMs’ rationality in game theory, assessing their performance in three classical games focused on (a) clear desire, (b) belief refinement, and (c) optimal actions. The study highlighted that even advanced models like GPT-4 require enhancements in these areas. Furthermore, developments in game theory benchmarks and platforms have been made to better evaluate LLMs’ game-playing capabilities. Challenges remain, as [Fan et al., 2024a] pointed out, particularly in strengthening the rationality of LLMs in game-theoretic settings. Enhancing LLMs’ performance through targeted prompt engineering, such as incorporating explicit desire and belief information, could significantly improve their rationality. Additionally, while traditional game theory still relies on mathematical definitions, the efficacy of LLMs within this conventional framework has yet to be fully ascertained.
Swarm Algorithm. Inspired by biological phenomena such as ant colonies and fish schooling, swarm intelligence focuses on the collective behavior of decentralized, self-organized systems and has recently seen renewed interest by the research community [Bozhinoski, 2024]. The integration of LLMs into swarm intelligence is still nascent, with [Pluhacek et al., 2023] being the only study we found in our review. This research explores the automation of hybrid swarm intelligence optimization algorithms using LLMs, tackling the challenge posed by the exponential growth in the number of hybrid (swarm) algorithms due to the diversity of base (swarm) algorithms.

4.2.7 Language Model as Planner.

Given the above background, LLMs’ reasoning capabilities and broad knowledge further position them as potentially powerful, generalized planners. We outline four unique paradigms in LLM-based planning:
Transofrmer as Planner. Prior to the adoption of LLMs for planning, several studies already conceptualized planning as a sequence modeling problem, thereby allowing the use of Transformers as planners. Decision Transformer (DT) [Chen et al., 2021a] is a foundational work in this area. It aligns with RL and trains a Transformer to output optimal actions based on expected returns (rewards), past states, and actions, achieving performance that surpassed the then state-of-the-art model-free offline RL methods. From this foundation, many improvements have been derived: Online DT [Zheng et al., 2022] further combines offline pre-training with online fine-tuning, Weighting Online DT [Ma and Li, 2024] introduces an episodic memory mechanism to enhance sample efficiency during online fine-tuning. Multi-Game DT is trained on large, diverse datasets, enabling near-human performance in up to 46 Atari games. Generalized DT [Furuta et al., 2022] addresses a wide range of “hindsight information-matching problems,” such as imitation learning and state-marginal matching. Hyper-DT [Xu et al., 2023] incorporates an adaptation module into DT, which uses a hyper-network to initialize its parameters based on task demonstrations, effectively adapting to new tasks. Constrained DT [Liu et al., 2023b] achieves dynamic adjustments between safety and performance during deployment. Q-learning DT [Yamagata et al., 2023] enhances DT performance when only sub-optimal trajectories are included in the dataset by using dynamic programming (Q-learning) to label training data. Zhu et al. [2023c] decomposes long-delayed rewards into each timestep, where the decomposition of rewards is described as a globally optimal bi-level optimization problem, thereby enhancing the performance of DT in settings with delayed rewards. It is important to note that these studies can also be viewed as a new realization of RL, where Transformer pre-training is employed to replace traditional methods of fitting value functions or computing policy gradients.
Additionally, Transformers have been utilized as planners in the following applications. Yang et al. [2023a] trains a Recurrent Transformer to enable logical reasoning on constraint satisfaction problems. Takagi [2022] explores the impact of different modalities on Transformer performance, investigating why models pre-trained on image data perform poorly. TIMAT [Kang et al., 2024] extracts temporal information and models Multi-Agent RL (MARL) as a sequential model, its advantage is its ability to plan for an arbitrary number of agents. MetaMorph [Gupta et al., 2022] trained Universal Controllers for exponentially morphable modular robots, demonstrating the Transformer’s combinatorial generalization capabilities.
Collective Intelligence. Collective intelligence, also referred to as crowdsourcing or self-collaboration in some literature, utilizes the wisdom of crowds to achieve consensus-driven decision-making through discussion, debate, or voting [Ferreira et al., 2024]. Here, multiple agents or roles are often enabled by various fine-tuned LLMs or prompted by different contexts. Zhang et al. [2023c] integrates the Actor-Critic concept from RL into LLM multi-agent crowdsourcing, highlighting its potential to cut hallucinations and reduce token usage costs. RoCo [Mandi et al., 2024] promotes information exchange and task reasoning among robots in multi-robot planning by facilitating discussions. Shi et al. [2024b] offers a concept that is similar to the MAPE loop, involving three agents working together to complete tasks, which include (i) observing to collect environmental data, (ii) decomposing instructions for planning, and (iii) using skills to execute tasks. Chen et al. [2024e] explores automated expert recruitment (deciding what kind of domain expert is needed for the task and then generating their persona) and various forms of crowdsourcing (democratic or hierarchical). Guo et al. [2024b] evaluates the impact of designated leadership in LLM-agent organizations, demonstrating some interesting results include (a) in small teams, higher efficiency can be achieved with less communication cost; (b) agents can elect their own leader and dynamically adjust leadership via communication; and (c) agents spontaneously engage in activities that mimic human behaviors, such as reporting task progress to the leader agent. This study also introduces a criticize-reflect framework to evaluate and adjust organizational structures. Dong [2024] explores the high costs and negative impacts of misinformation in large-scale democratic discussions. This paradigm offers new decision-making avenues, which may be particularly suitable for decentralized SASs [Weyns et al., 2013].
Experience Accumulation. Experience accumulation, also called lifelong learning in some studies [Silver et al., 2013], enables agents to use LLMs to gather experience from both failures and successes, learning to improve future planning.
For failed experiences, LLMs or human analyses can identify the causes of failures, reflecting on these insights and integrating them into future planning cycles. This approach is also known in some studies as planning with feedback or self-reflection. Madaan et al. [2022] records instances of LLM misunderstandings along with user feedback, enhancing prompt accuracy for future queries by integrating past clarifications. Li et al. [2022a] refers to this as an “active data collection process,” iterating strategies through interactions with the environment based on past failed experiences. Huang et al. [2022] refers to this process as “inner monologue”. Wang et al. [2023a] introduces the Describe, Explain, Plan, and Select framework, where an LLM describes the plan execution process and provides self-explanations upon encountering failures, facilitating effective error correction. Zhang et al. [2024c] propose the Prompt Ensemble learning via Feedback-Reflect-Refine method, which uses a feedback mechanism to reflect on planning inadequacies and generates new prompts for iterative refinement. Yang et al. [2024c] treats LLMs as optimizers to solve optimization problems described in natural language, where previously generated solutions and their outcomes are used to prompt the LLM to generate new solutions.
For successful experiences, LLMs store these in memory or a skill pool for later retrieval and reuse in similar scenarios. Zhu et al. [2023a] introduces a three-step process for LLM-based memory reuse: (a) during each game scenario, once the goal is achieved, the executed plan is stored; (b) summarizing common reference plans from multiple scenarios for more generalized situations; and (c) creating new plans based on these reference plans when similar goals arise. Over time, as these summaries accumulate, the effectiveness of the LLM-based planner increases. Similarly, Zhao et al. [2024] propose ExpeL (Experiential Learning), which enhances task success rates through experience gathering and insight extraction. LLMs As Tool Makers (LATM) [Cai et al., 2024a] approaches from a tool maker’s perspective, enabling LLMs to create and utilize tools, which are implemented as Python functions. Moreover, LATM attempts to utilize different LLMs to create tools of varying complexity, thereby reducing the cost of tool production. AdaPlanner [Sun et al., 2023b] introduces skill filtering, which involves comparing the performance of including versus not including past successful experiences in prompts to determine the generalizability of these experiences.
Optimizing Prompting for Black-Box LLMs. Prompt engineering is crucial in maximizing the planning capabilities of LLMs as it directly impacts the model’s understanding and response to tasks [Sahoo et al., 2024]. However, LLMs often operate as a black box to users, particularly in the context of LLM as a service (e.g., accessing LLMs through an API). Beyond the previously discussed prompt patterns such as CoT, self-consistency, ToTs, and GoT, recent studies have treated prompt design as an optimization problem to enhance the LLM’s planning performance. These studies can be categorized into four types: (i) RL-based optimization: TEMPERA [Zhang et al., 2023e] treats prompt optimization as an RL challenge, where the action space includes editing instructions, in-context examples, and verbalizers. The rewards are gauged by the performance improvements from these edits. Similarly, RLPrompt [Deng et al., 2022] trains a policy network to generate effective prompts, noting that optimized prompts sometimes appear as “gibberish” that defies standard grammatical conventions. Additionally, Prompt-OIRL [Sun et al., 2024b] leverages an expert dataset and inverse RL to derive a reward model that facilitates prompt evaluations; (ii) Evolutionary Algorithm (EA)-based optimization: Employing EAs for gradient-free prompt optimization, several methodologies have emerged. Gradient-free Instructional Prompt Search [Prasad et al., 2023], Genetic Prompt Search [Xu et al., 2022a], and EvoPrompt [Guo et al., 2024c] utilize the robust optimization capabilities of EAs. InstOptima [Yang and Li, 2023a] extends this approach by considering multi-objective goals, evaluating both performance and additional metrics like instruction length; (iii) Incorporating classic planning ideas into prompt: Classic planning principles have also been integrated into prompt engineering. PromptAgent [Wang et al., 2024a] treats the design space of prompts as a planning problem and uses Monte Carlo Tree Search to strategically explore high-quality prompts, where experiences of failure during interaction with the environment are used to define the rewards in the search. Hazra et al. [2024] introduces the SayCanPay framework, where LLMs (a) generate candidate actions based on a goal and initial observation (“Say”), (b) an affordance model evaluates the feasibility of these actions (“Can”), and (c) the most feasible and cost-effective plan is selected using a combined score as a heuristic (“Pay”). Here, Can and Pay are independent models that require domain-specific training to ensure the alignment of plans with the current environment. Furthermore, combining hybrid planning (“fast and slow”) [Pandey et al., 2016] and hierarchical planning, Lin et al. [2023a] and Liu et al. [2024f] employs a dual-LLM framework where a detailed, reasoning-focused LLM (“slow mind”) for detailed planning or teammate’s intentions interpretation, and a lightweight LLM (“fast mind”) generates reactive policies and macro actions; and (iv) Self-adaptive prompting: Self-adaptive prompting refers to an approach tailored for zero-shot learning, designed to automatically optimize prompt design. The concept involves initially using LLMs to generate pseudo-demonstrations in a zero-sample setting. Generally, several candidates for pseudo-demonstrations are first generated, and the most effective are then selected for implementing ICL based on metrics such as consistency and logit entropy. Key studies include Consistency-based Self-adaptive Prompting (COSP) [Wan et al., 2023a] and Universal Self-adaptive Prompting (USP) [Wan et al., 2023b]. Experimental results indicate that COSP enhances performance by an average of 15% over the zero-shot baseline, and both COSP and USP have demonstrated comparable or even superior performance to few-shot baselines in certain tasks.

4.2.8 Diffusion Model as Planner.

Diffusion models have recently been applied for use in planning tasks. Janner et al. [2022] pioneered this approach by reinterpreting diffusion-based image inpainting as a method for coherent planning strategies, demonstrating the model’s capability in long-horizon decision-making and its adaptability to unseen environments, as demonstrated in 2D maze experiments. Subsequently, diffusion has been extensively applied in motion planning for robotic arms [Mishra and Chen, 2023; Pearce et al., 2023; Ze et al., 2024] and quadruped robots [Liu et al., 2024c], as well as continuous constraint solvers [Yang et al., 2023c].
Additionally, further developments have been made in enhancing different aspects of diffusion models. For enhancing long-range decision-making capabilities, Generative Skill Chaining [Mishra et al., 2023] introduces a method where individual skills are modeled as separate diffusion models and sequentially chained to address long-horizon goals. This chaining process involves generating post-condition states of one skill that satisfy the pre-conditions of the subsequent skill. Regarding uncertainty-aware planning, Dynamics-informed Diffusion [Cachay et al., 2023] couples probabilistic temporal dynamics forecasting with the diffusion steps, and PlanCP [Sun et al., 2023a] quantifies the uncertainty of diffusion dynamics models using Conformal Prediction and modifies the loss function for model training. Chen et al. [2024b] introduces a hierarchical diffuser strategy that employs a “jumpy” high-level planning technique with a broader receptive field and reduced computational demands, effectively directing the lower-level diffuser through strategic sub-goals. Similarly, Li et al. [2023d] proposes a hierarchical diffusion method, which includes a reward-conditional goal diffuser for subgoal discovery and a goal-conditional trajectory diffuser for generating the corresponding action sequence of subgoals. Zhou et al. [2023a] focuses on online replanning, where the timing of replanning is determined based on the diffusion model’s estimated likelihood of existing generated plans, and the replanning is based on existing trajectories to ensure that new plans follow the same goal state as the original trajectory. Jin et al. [2023] introduces a hierarchical semantic graph for fine-grained control of generation, including overall movement, local actions, and action details, to improve the granularity of generated controls.
Summary—Analyzer and Planner. GenAI techniques offer significant potential in supporting analysis and planning of SASs. In architecture-based adaptation and requirement-driven adaptation, LLMs have potential to support reasoning based on natural language or unstructured data, potentially broadening their application scope. For the application of learning in analysis and planning, LLMs and Diffusion models could support generating prior knowledge, enhancing model performance and reducing training/planning costs. For providing guarantees under uncertainty and control-based adaptation that rely on strict mathematical frameworks, LLMs’ translation capabilities may have the potential to reduce the modeling costs associated with using these methods. Furthermore, interesting new planning paradigms for LLMs and Diffusion have emerged: (i) Transformer-based planning methods have strong advantages in offline RL and scalability, potentially suitable for offline learning (i.e., inability to interact with the real environment during training) and large-scale adaptive systems; (ii) Collective Intelligence explores how multiple agents can collaborate and make decisions, offering potential methods for distributed SASs; (iii) Experience accumulation shows a paradigm similar to self-reflection (for failed experiences) and self-evolution (for successful experiences), which can inform lifelong learning and self-evolution for SASs; (iv) diffusion models provide a planning diagram tailored for high-dimensional and complex constraints.

4.3 Executor

The executor is crucial for enacting the adaptation plan on the managed system, with its specific roles and implementation varying based on the design and the division of responsibilities between the managed and managing systems [Weyns, 2020]. For example, consider a mobile robot with an adaptation plan to “change the movement to the destination.” Here, the executor’s involvement can differ significantly depending on the case at hand: (i) it might simply relay destination coordinates to the managed system, which autonomously completes the movement, or (ii) it might convert the high-level plan into a detailed path or even low-level control parameters for the managed system.
In simpler scenarios like the first, the executor’s role is straightforward, offering limited scope for GenAI to add value. However, in more complex tasks like the second scenario, where translating a high-level plan into specific actions or configurations is required, some research based on Transformers or LLMs has demonstrated their potential for end-to-end transformation. For instance, Google’s LM-Nav [Shah et al., 2022], RT-2 (Robot Transformer 2) [Brohan et al., 2023] and PaLM-E [Driess et al., 2023] are representative works in this area. All of them are called vision-language-action models, enabling the interpretation of user commands such as ‘pick up the biggest object’ and corresponding robot observations to directly initiate appropriate robot actions.
In relation to the execution stage in SASs, research areas like embodied agents and robotics are particularly focused on the (M)LLM’s capabilities of “physically-grounding.” For example, Gao et al. [2024] fine-tuned an MLLM to understand the physical properties of objects (e.g., material, fragility) to improve the success rate of execution.
Summary—Executor. Considering that the implementation of execution in SASs is often straightforward, LLMs offer limited benefits. Yet, for more complex cases where the executor needs to convert plans Transformers and LLMs have potential to support end-to-end transformation. Additionally, studies in robotics still demonstrate the capability of MLLMs to successfully execute given plans in uncertain environments.

4.4 Knowledge and Runtime Models

In SASs, knowledge reified as runtime models [Blair et al., 2009; Garlan et al., 2004; Weyns et al., 2012b] serves as a critical runtime abstraction of that system or any aspect related to that system that is used for the purpose of realizing self-adaptation [Weyns, 2020].
Fig. 5.
Fig. 5. Overview of empowerment of HOTL via GenAI.
Our survey reveals that existing literature primarily employs LLMs for three distinct formats of knowledge: (i) Knowledge graphs: Here, language models are used in two different ways. First, LLMs trained on extensive text corpora act as implicit knowledge bases, such as COMET [Bosselut et al., 2019] and BertNet [Hao et al., 2023], enable the re-extraction of knowledge graphs from LLMs. To improve the precision and robustness of knowledge distillation, Walker et al. [2024] investigate interactions and responsibilities between LLMs and stakeholders (knowledge engineer), and [Potyka et al., 2024] use methods derived from social choice theory to adapt and aggregate ranking queries. Second, LLMs serve as tools to translate information, for instance, Ringwald [2024] translate Wikipedia pages into Resource Description Framework graphs, which consist of subject-predicate-object triples; (ii) System modeling: In studies of SE, LLMs are explored for generating diverse models like requirement models [Hiroyuki Nakagawa, 2023] and architectural models [Hong et al., 2024b]. Additionally, LLMs can transform natural language into Domain-Specific Modeling Languages (DSML) such as LTL [Mavrogiannis et al., 2024; Yang et al., 2024b], Backus-Naur Form [Wang et al., 2023f], and Planning Domain Definition Language (PDDL) [Ding et al., 2024; Guan et al., 2023; Zhou et al., 2024b], reducing the manual effort involved in modeling. (iii) World models: In the studies of robotics, LLMs are extensively applied to create world models, also called planning spaces. For instance, LLMs are used to generate “explicit world models” in the PDDL, and enable human corrections based on natural language instructions [Guan et al., 2023]. Similarly, Nottingham et al. [2023] utilizes LLMs to develop an “AWM” for planning and exploration (called “dream phase”). Subsequently, the RL agent learns and corrects the AWM based on the plans (called “wake phase”), thereby improving the sample efficiency of learning. Furthermore, LLMs are also used to generate other paradigms of planning space, such as behavior trees [Saccon et al., 2024; Sakib and Sun, 2024; Zhou et al., 2024a].
Summary—Knowledge. LLMs offer two primary potential benefits in the realm of knowledge and runtime models. The first benefit is their capacity to establish models by leveraging their extensive, inherent knowledge. However, these models often require further alignment with real-world scenarios, through manual adjustments or LLM-based corrections based on feedback from actual interactions. The second benefit involves the use of LLMs’ translation capabilities to convert descriptions in natural language or other formats into DSML, thereby significantly reducing the costs associated with manual modeling.

5 Enhancing HOTL

While SASs are designed to reduce human intervention and increase automation, incorporating purposeful human interaction remains essential, in particular in relation to trustworthiness [Cámara et al., 2015; Weyns et al., 2023]. The advanced language understanding capabilities of LLMs undoubtedly offer significant potential to enhance HOTL configurations within SASs. In this section, we organize the literature based on the purpose of designing HOTL mechanisms, with each purpose corresponding to varying levels of human involvement in the operations of SASs [Barnes, 2010]. The first category is preference acquisition. Accurately capturing users’ dynamic preferences during operation is necessary for achieving better user-centered adaptation. In this context, humans primarily serve as stakeholders to be satisfied by the system. The second category focuses on transparency, which is critical for helping users understand system behavior and thereby enhancing trust. In this category, it is essential for humans to comprehend the system’s actions and intentions. The final category is collaboration, which is primarily about leveraging human expertise to correct system errors or combining the strengths of both humans and systems to achieve more complex goals. This approach requires humans to actively engage with the SAS, playing a crucial role in its operations.

5.1 Preference Acquisition

Preference acquisition is the process of gathering and interpreting user preferences to tailor system adaptations that better meet user needs [Li et al., 2023e; Zhang et al., 2024b]. This process is essential for improving user experience and personalizing system behavior, thereby enhancing user satisfaction and trust.
In this discussion, we concentrate on explicitly representable user preferences, while excluding the fine-tuning of ICL for human alignment, such as the personalized Transformer [Li et al., 2021b]. PlanCollabNL [Izquierdo et al., 2024], addressing human-robot collaboration, uses LLMs to infer the cost associated with specific user tasks from natural language inputs, such as “I have back pain today.” It translates such information as formulas in the PDDL, involving definitions of operation objects, (human) agents, and the numeric cost. Similarly, Lou et al. [2024] and Liu et al. [2024f] utilizes LLMs’ domain expertise to translate language constraints into well-defined cost functions. These functions are used to determine constraint violations, essentially functioning as the inverse of reward functions, which planning algorithms aim to minimize. Additionally, personas, as a commonly employed method for representing user characteristics, have been extensively studied to be generated by LLMs. An 11-participant user study [Schuller et al., 2024] has verified that personas generated by LLMs are virtually indistinguishable from those written by humans, exhibiting comparable quality and acceptance. Another study, [Sera et al., 2024], explores the dynamic updating of personas during runtime. This study utilizes k-means clustering along with LLMs to analyze attributes and tendencies from actual user clickstream log data. The insights gained from this analysis are then used by LLMs to refine and update manually designed personas.
Summary and Discussion—Preference Acquisition. LLMs have demonstrated potential in preference acquisition due to their common sense and language understanding capabilities. Specifically, LLMs can infer preferences expressed as hard constraints (e.g., LTL), utility functions, or personas from natural language-based user feedback or user action history. However, potential conflicts between different needs and preferences in multi-objective settings, such as the tradeoff between non-functional properties like cost and efficiency, which are core to self-adaptation, is still lacking and needs further exploration.

5.2 Transparency

Transparency in SASs, often synonymous with explainability or interpretability, involves making system operations and decision-making processes clear and comprehensible to users. This transparency is crucial as it allows users to better understand the decision-making process of adaptive systems, thereby enabling them to effectively identify errors in system decisions [Li et al., 2020b; Parra-Ullauri et al., 2022]. We categorize the related literature by the object of explanation (code, decision-making module, and log) and the form of expression.

5.2.1 Code Explanation.

Explaining how a piece of code functions is a direct method to enhance system transparency. Originally intended to boost development efficiency, these explanations could also be applied for runtime system transparency. Initially, several works have applied Transformers to Code Summarization. Ahmad et al. [2020] was an early attempt, Tang et al. [2021] introduced Abstract Syntax Tree preprocessing to reduce Transformer computational complexity, and Mastropaolo et al. [2024] fine-tuned Transformers for more granular comments (code snippets or single statements instead of method-level). Regarding LLM-based methods, Ahmed and Devanbu [2023] demonstrated the performance enhancement of Codex (GPT-3) after few-shot training specific to a project. Ahmed et al. [2024] validated from the perspective of prompt engineering that adding additional semantic facts (such as control flow, data flow) can significantly improve LLMs’ code summarization performance. Nam et al. [2024] confirmed through a user experiment with 32 participants that LLMs can assist in code understanding in an integrated development environment more effectively than web searches in helping complete coding tasks. Geng et al. [2024] examines multi-intent comment generation, where multi-intent means, for example, creating different comments to explain what the functionality is and when to use it. Furthermore, Khan and Uddin [2023] assesses Codex’s effectiveness in documentation generation.

5.2.2 Explanation of Decision-Making Modules.

Explaining decision-making processes is essential, especially when they are executed by opaque, grey- or black-box modules. de Zarzà et al. (2023) uses LLMs to interpret data from Proportional–Integral–Derivative (PID) control loops—like control parameters and errors—to elucidate the behavior of PID controllers. Pandya et al. [2024] explores explanations for game theory-based multi-agent collaborative policies (i.e., multiple Nash equilibria), by utilizing LLMs to generate visual task trajectories for different agents.

5.2.3 Log Explanation.

Logs, which record system events, processes, or communications, are often vital for understanding operational status or traceability. Liu et al. [2024d] investigates LLMs’ capabilities in anomaly detection and explanation. Despite needing improvements in detection accuracy (F1 Ave = 0.412), the explanations received high ratings for usefulness and readability from experts (six experts with over ten years of work experience, Ave = 4.42/5).

5.2.4 Advanced Visualization and Interaction.

Beyond explanatory content, the methods of visualization and interaction are critical for ensuring explanations are easily understood.
For visualization, Jiang et al. [2023c] uses LLMs to create node-link diagrams from the text by extracting entities and relationships within the text, allowing users to adjust and interact with the visual presentation dynamically. Pandya et al. [2024] explores explanations for multi-agent collaborative policies, by utilizing LLMs to generate visual task trajectories for different agents. Liu et al. [2023c] introduces visual captions, where LLMs suggest context-relevant visual graphs for the ongoing conversations (e.g., display photos of Disneyland and the beach when talking about vacation plans). Similarly, ZINify [Shriram and Pradeep Kumar Sreekala, 2023] transforms research articles into engaging magazines to enhance their comprehensibility. Additionally, LLMs are widely used in the automation of data visualization, potentially supporting the automatic construction and runtime adjustment of dashboards [Arawjo et al., 2024; Chung et al., 2022; Ko et al., 2024; Reif et al., 2024]. AnalogyMate [Chen et al., 2024d] enhances the understandability of unfamiliar data measurements and abstract representations through data analogies. An example is visualizing the size of a pile of bottles stacked up against the Eiffel Tower to explain the meaning of “1.3 billion bottles are sold daily.” For interaction, LLMs facilitate natural language-based or visual-based Q & A or control interactions [Bernstein et al., 2023]. For example, Wang et al. [2023d] shows how LLMs can manage mobile UI tasks through conversational interactions.
Summary and Discussion—Transparency. LLMs have demonstrated potential in explaining code, decision models, and system logs, as well as in creating more intuitive and understandable visualizations. However, the exploration of explaining code and decision models remains preliminary; the former typically involves only static aspects rather than dynamic behaviors of the code, and the latter often uses LLMs to directly explain decision-making models. An immediate improvement strategy involves providing LLMs with appropriate contexts for different types of decision models—white-box, gray-box, and black-box—incorporating elements such as runtime intermediate results to enhance explanation accuracy. Additionally, another promising direction is the use of LLMs for model interpretability, such as employing decision trees as surrogate models to approximate and elucidate complex deep-learning models. In this context, LLMs’ common-sense capabilities could be particularly useful in assisting with feature selection and importance analysis.

5.3 Collaboration

Human–computer collaboration involves systems actively participating alongside humans in tasks traditionally performed by people. This partnership leverages the unique strengths of both participants and dynamically adjusts based on the runtime context to enhance efficiency and effectiveness [Cámara et al., 2015; Gheibi and Weyns, 2024; Li et al., 2021a].

5.3.1 Task Allocation.

Task allocation is critical in optimizing the collaborative use of human and machine capabilities, assigning appropriate tasks to the best-suited agent [Ranz et al., 2017]. Chen et al. [2024a] explores the use of LLMs for multi-robot task planning, comparing the task success rate and token efficiency across four multi-agent communication frameworks (centralized, decentralized, and two hybrid forms) in various tasks. Their findings suggest that hybrid frameworks generally achieve higher task success rates and better scalability with an increasing number of agents. MetaGPT [Hong et al., 2024b] uses a pipeline paradigm to assign different roles to various agents, decomposing complex tasks into subtasks that involve multiple agents working collaboratively. This approach has been proven effective in the waterfall software development process from requirements engineering to testing. Similarly, Xiao et al. [2024] introduces Chain-of-Experts (CoE), where each agent is assigned a specific role and endowed with relevant domain knowledge. Additionally, CoE incorporates a conductor who coordinates these agents through a forward-thinking structure and backward reflection mechanism. Liu et al. [2024f] applies an LLM as a human-machine interface in real-time video games to implement natural language-based intent communication for task allocation.

5.3.2 Cooperative Behavior.

Cooperative behavior focuses on system agents planning and executing tasks in concert with human actions, often requiring more granular coordination than task allocation. ProAgent [Zhang et al., 2024e] uses LLMs to deduce teammates’ intentions from observed actions (called beliefs), and continuously update these beliefs. These updated beliefs then guide LLM-based planning for proactive cooperation. Tanneberg et al. [2024] introduces Attentive Support, utilizing LLMs to decide when and how robots should support humans only when needed, while remaining silent at other times to avoid disturbing users.

5.3.3 User Correction.

User Correction involves users making adjustments to the system’s outputs or processes to correct errors or enhance performance. AI Chains [Wu et al., 2022b] implements a chained processing approach where users can modify the sequence of operations and their intermediate outcomes in a modular fashion. This framework also facilitates the comparison of alternative strategies by allowing users to observe their parallel effects. Furthermore, Cai et al. [2023] integrates manual corrections into their CoT framework, applying a cost-utility analysis model to assess and balance the benefits and costs associated with these interventions. It should be noted that the processing chains or workflows described in these studies have the potential for broader applications to control flow or data flow in a variety of system domains. However, employing these methods in SASs necessitates more detailed considerations. For instance, a notable aspect is assessing whether data corrected by humans might lead to issues such as system overflow.
Summary and Discussion—Collaboration. LLMs have been applied in task allocation, cooperative behavior, and user correction, where their main function is to infer users’ intentions or behavioral patterns and plan collaborative patterns. However, the use of LLMs in these scenarios is still in its preliminary stages. Future research could potentially explore deeper avenues such as: (i) more advanced intent inference and communication, such as exploring multi-modal inputs and outputs; (ii) more in-depth analysis of user capabilities or the impact of user involvement, which could promote more efficient human-system co-adaptation. These capabilities offer substantial benefits to enhance self-adaptation.

6 Research Roadmap

We now consolidate the insights obtained from the study of the state of the art into a research roadmap. The roadmap comprises key research challenges that need to be tackled to exploit the potential for applying GenAI (particularly LLMs) in the field of SAS. Additionally, it provides a practical reflection on the current shortcomings of GenAI with possible mitigation strategies.
Figure 6 summarizes the research challenges outlined in the roadmap. The concepts on the left-hand side outline key SE aspects that need to be considered in the design and realization of SASs, as discussed in [Cheng et al., 2009; de Lemos et al., 2013]. The concepts in the middle summarize the challenges of employing GenAI and LLMs in particular in SASs, which we discussed from Section 6.1 through Section 6.9. The concepts on the right-hand side highlight the primary functions that are involved in self-adaptation with an emphasis on MAPE-K and HOTL. The relations between (i) and (ii) show which specific aspect(s) of SE of SASs each challenge of employing GenAI and LLMs may involve. The relations between (ii) and (iii) map the challenges of employing LLMs to key functions within SASs. We elaborate now on each of the research challenges of (ii).
Fig. 6.
Fig. 6. Left-hand side: key SE aspects that need to be considered in the design and realization of SASs. Middle: challenges of employing GenAI and LLMs in particular in SASs. Right-hand side: primary functions that are involved in self-adaptation with an emphasis on MAPE-K and HOTL. Mapping expresses the relationships between the concepts and challenges.

6.1 Transfer of Design-Time Methods to Runtime Use

As widely discussed in the area of SE, LLMs have been utilized to (semi-)automate various aspects of realizing software systems. However, existing methods have primarily focused on design time, limiting their direct applicability to runtime settings. This challenge involves transferring methods initially developed for the design-time phase to be used during the runtime phase, taking into account the substantial differences between these two phases. Key gaps between design-time and runtime methods that need to be closed include: (i) Different Objectives: Design-time methods typically focus on either greenfield design or offline extending existing systems. Runtime methods on the other hand are concerned with adapting or modifying systems during operation. For instance, while design-time requirement elicitation focuses on extracting and analyzing the demands of stakeholders, runtime requirement management often involves adjusting, changing, or even evolving the initially established requirements. (ii) Different Information Sources: At design time, methods generally rely on historical data and knowledge available from domain experts. Runtime methods on the other hand can leverage online operational data as a primary information source. For instance, while the construction and explainability of a system at design time mainly relies on expert’s knowledge, assumptions and static code, the runtime phase can leverage specific, grounded observations and data obtained during execution to adapt the system. (iii) Human Involvement: At design time, decision-making is centered around stakeholders that may leverage GenAI as a supportive tool, for instance, GenAI outputs may be used for manually evaluating scenarios and conducting in-depth discussions. On the other hand, to make timely decisions at runtime, GenAI needs to take on a more autonomous role supporting on-time decision-making for adapting the system to different contexts and environmental changes.
To address (i) and (ii), a key strategy involves refining prompting approaches of GenAI and LLM methods in particular by clearly defining the runtime tasks and incorporating relevant execution information as context. For point (iii), the challenge aligns closely with the quality assurance of ML. It requires careful design of use cases for LLMs, encompassing rigorous performance evaluations in practical scenarios, and a comprehensive consideration of the overall system’s robustness to mitigate the risks associated with erroneous inputs from LLMs.

6.2 Towards LLM as a Service

While LLMs have demonstrated generalized capabilities, they often see enhanced performance when fine-tuned for specific domains, resulting in smaller models with lower usage costs and faster response times [Liu et al., 2023a]. As a response, industry has been developing numerous domain-specific LLMs; for example, an industry analysis report [Nandu Digital Economy Governance Research Center, 2023] notes that out of 190 large models in China, only 45 are general-purpose LLMs, while 145 are tailored for specific domains. Based on this, the trend towards LLMs as a Service (LMaaS) is a promising thought, where LLMs are provided on-demand as a cloud service tailored for specific domains and tasks at hand.
This emerging future of LMaaS introduces two key challenges: Firstly, within the context of architecture-based adaptation, SASs need to treat LLMs as system components akin to APIs, microservices, and ML models. This calls for enabling effective integration and management of these models within the system architecture. Secondly, and probably more critical, as LLMs become integral to various systems, they introduce new sources of uncertainty. For example, in a microservice system where each service might be powered by a task-specific LLM. However, the output of LLMs is inherently probabilistic, i.e., LLMs might produce different outcomes even for the same input. Although studies on prompt engineering and prompt optimization can effectively reduce such uncertainty, from a system-wide perspective, addressing how these LM-based components contribute to system-level uncertainties, and how to manage these uncertainties within the adaptation process or through adaptation itself becomes a paramount concern.

6.3 Observation and Representation

In SASs, observation refers to the data the system collects through monitoring, and representation refers to how this data are conceptualized, stored, and utilized as knowledge within the system. LLMs, and MLLMs in particular have shown the ability to process and interpret data across diverse modalities and fields. However, this versatility also complicates the design and management of system observations and representations.
In terms of observation, MLLMs expand the scope of data that systems can process, such as understanding unstructured, possibly multimodal data. This significantly enlarges the observation design space or modeling dimensions. When considering representation, two primary challenges emerge: Firstly, the influence of syntax in prompts on performance. Despite the ability of MLLMs to understand semantically identical information presented in different syntactic forms, the format of representation can significantly affect outcomes. For instance, studies have shown that LLMs perform better with data in HTML or XML formats than in JSON or Markdown, likely due to the structured repetition in tags such as <tr> and </tr>, which enhances attention mechanisms in Transformers [Sui et al., 2024]. Additionally, some studies report that languages more suitable for communication between LLMs (i.e., higher performance) are not necessarily human-readable [Deng et al., 2022]. Secondly, the tradeoff between the quality of the context’s semantics and the inference cost. Generally, the more contextually rich and relevant the information provided in a prompt, the higher the quality of the LLM’s output will be. However, LLMs typically have a static context window (e.g., 16k tokens in GPT-3), and the LLMs’ inference time also increases with the length of the context. Therefore, the mechanism to store and select appropriate context, e.g., knowledge graph-based RAG [Microsoft, 2024]), and how to (dynamically) compress context [Li et al., 2023a] may be potential research challenges.
In addition, another similar topic is the use of LLMs to integrate human feedback into the SASs. Such integration requires careful consideration of how and where this feedback is assimilated into the runtime model, for instance, incorporating preference acquisition typically into requirement models; and integrating human-as-sensor feedback into environmental models.

6.4 Towards LLM-enhanced Decentralized Control

The notion of decentralization of control in SASs, by dividing the responsibilities of the MAPE-K functions across modules, is well-documented [Weyns et al., 2013]. We examine such decentralization from the perspective of LLM-based and LLM-enhanced methods.
In LLM-based methods, specifically the use of LLMs as planners, the notion of collective intelligence has shown promise in distributed planning within LLM Multi-Agent Systems (LLM-MAS). However, there are critical areas for further exploration: Firstly, the common practice of treating each agent as an “independent and complete individual” without explicitly sharing observations and experiences, while mimicking human interaction as proposed in the basic study “Generative Agents: Interactive Simulacra of Human Behavior” [Park et al., 2023], inherently limits efficiency. For software systems in general, LLM-MAS could potentially adopt more efficient interaction methods, such as direct data transmission rather than through dialogue, and advanced planning techniques that consider synergistic effects from multi-agent interactions. Therefore, optimizing collective intelligence for different problem settings (e.g., depending on the type of communication that is possible) remains a direction worth exploring. Additionally, the scalability of LLM-MAS is currently limited (typically up to 5 agents), and as the number of agents grows, the communication costs increase exponentially. Hence, another valuable research direction is coordinating large-scale LLM-MAS, e.g., leveraging LLMs to generate efficient communication protocols.
For LLM-enhanced methods, such as those incorporating RL and search-based planning, current research predominantly focuses on enhancing single-agent planning [Pternea et al., 2024]. LLMs are employed to inject knowledge or commonsense into RL as reward functions and into search algorithms as heuristics. However, leveraging LLMs, such as generating reward functions for cooperation to improve multi-agent planning in frameworks like MARL holds substantial promise and warrants further investigation. Additionally, research is required to transfer such agent-based solutions to MAPE-K based solutions.

6.5 Towards Adaptive and Personalized Interaction

A common setup in existing HOTL studies is in safety-critical domains, where humans are often experts who are well-versed in domain knowledge and system interaction. An example is data acquisition within a control service scenario [Cámara et al., 2015], in which human operators are tasked with adjusting configurations. However, as SASs are penetrating daily life and serve more general end-users [Weyns et al., 2023], these assumptions may not hold, introducing new uncertainties related to the users’ knowledge and interaction capabilities. Previous research has begun to address these human uncertainties [Li et al., 2020a], integrating them into the planning problem in a formal way, but it remains unexplored from a HCI perspective.
LLMs provide a promising avenue for enhancing adaptive and personalized interactions in a flexible and low-cost way. Firstly, LLMs facilitate a deeper understanding of user preferences and behaviors, as explored in Section 5.1. Secondly, the generative nature of LLMs allows for the customization of interactions, such as tailoring explanations to a user’s domain familiarity or adapting user interfaces to the specific task and user context [Huang et al., 2024; Madugalla et al., 2024]. We anticipate that the exploration of this research direction will greatly expand the application fields of HOTL in SASs.

6.6 Ethics and Responsibility

The rise of GenAI introduces new ethical challenges, including potential impacts on the job market [Ghosh and Fossas, 2022], issues of credit allocation between GenAI and humans [Roose, 2022] (e.g., attributing contributions in GenAI-created artworks), and biases in generated content.
In systems that make autonomous decisions, such as SASs, the focus is the influence on attributing and defining decision-making responsibility. The concept of a “responsibility gap” [Sio and Mecacci, 2021] highlights the difficulty in assigning accountability as decisions by machines become more autonomous and complex, blurring the lines between the responsibilities of human operators, developers, or the machines themselves. This is rigorously discussed in contexts like autonomous weapon systems [Wood, 2023], where documents such as “Autonomy in Weapon Systems” [U.S. Department of Defense, 2023] discuss appropriate human involvement and handling of ethical issues during design and deployment. While SASs might not directly involve human lives like weapon systems, incorrect adaptations and misguidance (such as wrong explanations) can still lead to significant economic losses and performance degradation.
Here, the role of LLMs poses a dual challenge. On the one hand, in a gray- or black-box way, LLMs enhance system autonomy and adaptability in a non-transparent way. On the other hand, LLMs further complicate responsibilities while enhancing the process of human-machine interaction [Ehsan et al., 2024]. Discussing and clarifying the attribution and definition of responsibilities under this dual challenge is an important challenge for future research.

6.7 Artifacts for Evaluation

Artifacts, including datasets, benchmarks, and exemplars, play a critical role in driving, communicating, and evaluating research in SASs [Weyns et al., 2022b]. In LLM studies, diverse artifacts like the physics engine MUJOCO [Todorov et al., 2012], META-WORLD for multi-task robot learning [Yu et al., 2019], WebShop for information retrieval [Yao et al., 2022], and the BDD-X Dataset featuring driving videos [Kim et al., 2018] have been utilized. Games like Minecraft and OverCooked [AI, 2023] also serve to assess LLMs’ planning and cooperative capabilities in unpredictable environments.
Similarly, the SASs community has developed various exemplars, such as DeltaIoT [Iftikhar et al., 2017] and DARTSim [Moreno et al., 2019], to support research. However, these exemplars often face challenges when used to evaluate LLM-based or LLM-enhanced methods. Firstly, there is a discrepancy between the observation spaces designed for traditional analysis and planning methods and those required for LLMs. Secondly, many exemplar implementations do not fully conform to the MAPE-K structure, complicating the process of deriving observations. For example, some implementations may lack a knowledge module, relying instead on direct data transmission between different modules (even the transmission between the managed system and the planner). These two issues lead to additional costs for evaluating LLM methods, as it requires developing additional observations and interfaces for LLMs on the existing artifacts. To address these issues, we advocate that future exemplar implementations should aim to explicitly preserve the observation space required for LLMs, even if these observations are not necessary for current algorithms. Furthermore, there is a need for clear modularization of knowledge components within the system architecture to facilitate more effective evaluations of LLM methods.
Secondly, current exemplars typically assess performance based on utility, typically through simulation or testing. Although LLMs can function as end-to-end models (to cover all of MAPE-K), they would be more commonly integrated as modules within the architecture of SASs. In such scenarios, system testing can facilitate the comparison of performance differences before and after integrating LLMs in an ablation way. However, the inclusion of unit testing for specific modules requires further exploration. As we can observe in the studies of other fields, accurately evaluating the effectiveness of LLM-based modules poses new challenges. First, there is the issue of prompt robustness; LLMs, as stochastic black boxes, may produce varying outputs for the same prompt [Wang et al., 2023b]. Second, measuring the quality of outputs in formats like natural language poses significant challenges. LLM-as-a-Judge has been explored to evaluate LLM’s output by LLMs, where an LLM assesses the output of another LLM. However, recent research has highlighted limitations that LLMs tend to assign higher scores to their own outputs, which may stem from internal similarity preferences or inherent biases [Koo et al., 2023]. This has spurred further research into developing specialized LLMs designed for multi-dimensional judgment, such as LLM-EVAL [Lin and Chen, 2023] and PandaLM [Wang et al., 2024c].

6.8 Towards Self-Testing

Generally, software testing has always been regarded as a fault-finding process carried out during the development cycle. However, in the context of SASs, the problems are twofold: (i) the systems include a large number of possible contexts, configurations, and adaptation options; and (ii) the unpredictability of uncertainties and dynamics at design time, i.e., systems may encounter unforeseen conditions in their systems and environments at runtime. For the former, some traditional offline testing methods are expected to mitigate the problem. However, the current exploration of the more important latter is still relatively preliminary. Possibly relevant concepts include online testing [Bertolino et al., 2012], runtime testing [da Silva and de Lemos, 2011; Fredericks et al., 2013, 2014; Lahami and Krichen, 2021], field testing [Bertolino et al., 2021; Silva et al., 2024], and vivo testing [Murphy et al., 2009].
In SE, Transformers and LLMs have been extensively applied to automate testing processes. These applications are primarily categorized into three types. The first category includes the use of language models for fault localization [Ciborowska and Damevski, 2022; Yang et al., 2024a] and vulnerability detection [Mamede et al., 2023; Sun et al., 2024c; Zhang et al., 2024d, Zhou et al., 2024d]. The second category involves the automated generation of test oracles [Tsigkanos et al., 2023a, b], assertions [Tufano et al., 2022], and test cases [Bhatia et al., 2024; Hoffmann and Frister 2024; Plein et al., 2024; Rao et al., 2023], with further exploration of domain-specific tuning and knowledge enhancement documented in [Arora et al., 2024; Xue et al., 2024]. The third category features LLMs as fuzzers, where they generate abnormal or randomized input data (i.e., fuzz data), and their performance could be incrementally enhanced through iterations [Jha et al., 2024; Xia et al., 2024] or ICL [Deng et al., 2024]. Despite these advancements in automated testing, direct applications to testing for SASs remain scarce. To our knowledge, the only study directly associated with SASs is Co-Evolution of Production-Test Code [Hu et al., 2023a]. This Transformer-based approach focuses on identifying outdated test cases and automatically updating them in response to changes in the production code.
As outlined in [Fredericks et al., 2013; Silva et al., 2024], self-testing can also be conceptualized as a MAPE-K loop. This process involves several stages: monitoring to determine whether to trigger testing, analyzing environmental changes, planning new test case strategies or altering monitoring methods (such as adjusting detection targets or frequencies), and executing these tests on the software. The core challenge lies in identifying how environmental changes impact test cases, establishing traceability between requirements and test cases, and devising methods to generate new test cases. With LLMs’ understanding of environmental changes and their capabilities in testing automation, as discussed above, we anticipate further advancements in self-testing could be facilitated by LLMs.

6.9 Towards Self-Evolution

Software maintenance and evolution, in the context of SE, refer to the process of continuously updating software after its initial deployment, primarily aimed at correcting discovered problems, improving system performance, or adding new functionality.
Based on our survey, a substantial body of studies in SE primarily focuses on addressing identified bugs or vulnerabilities. Specifically, these studies leverage LLMs’ capabilities in code understanding and generation for code-level software repair and correction. Specific topics include vulnerability repair [Fu et al., 2022] and automated program repair [Berabi et al., 2021; Chen, 2024; Fan et al., 2023a; Guo et al., 2024d; Gupta et al., 2023; Huang et al., 2023b; Jiang et al., 2023b, 2024; Lajkó et al., 2024; Ribeiro et al., 2023; Santos et al., 2024; Sobania et al., 2023; Wang et al., 2023e; Wei et al., 2023b; Xia et al., 2023a, 2023b; Yuan et al., 2022].
However, in the context of SASs, research on evolution is limited and the few studies that focus on automatic evolution approach the problem from the viewpoint of scenarios that are not anticipated during design time [Weyns and Andersson 2023; Weyns et al., 2022a]. We anticipate that LLMs could facilitate two potential paradigms for implementing self-evolution in SASs. The first paradigm involves collective intelligence. The metaGPT multi-agent collaboration framework [Hong et al., 2024b] allows different LLM agents to assume various roles, and has demonstrated the capability to automate the entire waterfall development process, from requirements engineering to testing. Given that evolution in software can be viewed as an incremental development problem, especially within agile development contexts, using metaGPT as a framework for self-evolution represents an evident approach. The second paradigm centers on experience accumulation, a concept that parallels self-evolution, where LLM agents continuously acquire new “skills” for emerging tasks. This approach, combined with architecture-based adaptation, shows promise. For instance, if LLMs identify that existing components or APIs are inadequate for unforeseen runtime conditions, they can autonomously search for and integrate new APIs (from online sources) into the adaptation space and facilitate reasoning about these components through the available API documentation. Furthermore, within the context of LLM agents, Tao et al. [2024] propose a similar concept where LLM agents undergo self-evolution through four stages: experience acquisition, experience refinement, updating, and evaluation. However, a critical gap is that self-evolution in LLM agents often relies on natural language descriptions, while SASs typically reason using knowledge expressed in some form of a DSML. This results in a need to define observations and skills in such a DSML. In the context of self-evolution, it might also necessitate evolving the DSML accordingly (e.g., by introducing new actions and events into the Markov Decision Process) and adapting the corresponding reasoning methods.

6.10 Inherent Shortcomings of LLMs and Mitigation Strategies

While this article does not delve into the technical specifics of LLMs, understanding and addressing their inherent limitations is crucial when employing them in practical SASs.
Firstly, a significant issue with LLMs is hallucination, which refers to the potential generation of misleading or factually incorrect content, thereby impacting the overall reliability and trust of the system. This problem can be mitigated by human verification when it is used in the design-time phase, such as during translation and knowledge construction, or mitigated by other algorithm’s evaluation mechanisms when LLMs do not directly generate the final outcomes, e.g., used to generate heuristics for searching methods. However, when LLMs are employed at runtime to directly produce final outcomes, such as in monitoring, architecture-based adaptation, or HOTL, this issue requires rigorous consideration. As mitigation strategies, techniques like RAG and feedback mechanisms (including human feedback, interactive environmental feedback, and feedback from other LLMs) can help reduce hallucination.
Secondly, LLMs often require high deployment and operational costs due to their large number of parameters, necessitating high-performance hardware and generally resulting in slower inference speeds. This restricts the applications of LLMs for local deployment on devices and in scenarios demanding fast or real-time responses. Strategies such as model quantization [Hubara et al., 2017], knowledge distillation [Xu et al., 2024a], and hardware acceleration [Kachris, 2024] are fundamental and generalized approaches to reducing usage costs or improving response times. Additionally, combining LLMs with other methods can also mitigate these issues. For instance, in Question and Answer systems, it is a common strategy to combine a rule-based engine triggered by keywords for frequently asked questions with LLMs (for answering other questions). Furthermore, selecting appropriate LLMs (with different scales) [Tian et al., 2023] and/or employing suitable prompt strategies (e.g., whether to enable CoT) [Pan et al., 2024], based on the complexity of the problem (as runtime context), is also a practical deployment strategy.
Thirdly, the values embedded within LLMs also constitute a significant concern. As previously discussed in Section 2.4, human or value alignment within the context of LLMs typically involves aligning models with positive values such as honesty and helpfulness. However, similar to the physically-grounding capabilities in execution, overemphasizing these positive values can disconnect decision-making from real-world reality, leading to “biased” outcomes. Lin et al. [2024] describes this issue as “alignment tax,” where LLMs may compromise or weaken certain capabilities through RLHF aimed at promoting positive behaviors. Furthermore, Chen et al. [2023b] demonstrate through experimentation that while LLMs can accurately judge the truthfulness of negative commonsense knowledge (like answering “No” to questions such as “Do lions live in the ocean?”), their reasoning on such knowledge tends to be overly positive to produce erroneous outcomes like “Lions live in the ocean.” In the context of this article, the concept of alignment tax has several implications. Firstly, in requirement-driven adaptation and testing—especially in penetration testing [Deng et al., 2023; Happe and Cito, 2023]—it often requires a perspective that considers negative aspects, akin to adversarial thinking in cybersecurity [Hamman et al., 2017]. Secondly, in collaboration scenarios, particularly in task allocation, Guo et al. [2024b] suggests that a “stricter” leadership approach might enhance team efficiency compared to a more lenient style. More broadly, the alignment tax could potentially influence any analysis and planning activities of LLMs. Addressing these complexities involves exploring how LLMs can incorporate a balanced spectrum of human values, including those perceived as negative, and how prompt engineering can enhance this balance.
Fourthly, security and privacy remain particularly significant and hard concerns [Das et al., 2024a]. Regarding security, when employing LLMs, it is crucial to be aware of their vulnerabilities. These vulnerabilities primarily encompass data poisoning and backdoor attacks, as well as instruction tuning attacks, including jailbreaking and prompt injection. To mitigate these risks during usage, instruction pre-processing and generation post-processing are commonly implemented strategies. Concerning privacy, the use of third-party LLMs that are not locally deployed poses a risk of data retention, which could be utilized in future training sets and potentially exposed through model inversion attacks, and extraction attacks [Yao et al., 2024]. Moreover, there is a concern that LLMs may inadvertently leak private information during interactions with external tools, such as those occurring in architecture-based adaptations. As countermeasures, local deployment and the employment of privacy-preserving prompting techniques [Hong et al., 2024a] can help address these privacy concerns.
Last but not least, LLMs’ complexity and vast scale result in a lack of explainability [Zhao et al., 2023a]. Therefore, using LLMs in critical and safety-critical applications necessitates careful consideration. Integrating techniques such as attention visualization [Chefer et al., 2021] and feature attribution [Sundararajan et al., 2017] could potentially help mitigate these issues.

7 Threats to Validity

The first threat to our research methodology is the limitation of our literature search coverage due to our reliance on specific title keywords to search for literature within particular conferences. We primarily focused on articles in leading conferences due to their balance of quality and timeliness, but this approach might have caused us to overlook valuable and recent articles published in journals or on preprint platforms like ArXiv. Additionally, our search keyword was specifically targeted to GenAI, deliberately avoiding broader keywords such as AI, deep learning, or NLP. This approach was chosen to minimize the inclusion of irrelevant publications and enhance the efficiency of our survey. However, we recognize the risk that this decision may have led to the exclusion of relevant literature that did not specifically mention GenAI in the title. To mitigate these limitations, we expanded our search to include as many conferences and keywords as possible within our limited timeframe, although a more comprehensive systematic literature review would be ideal to further address this issue.
The second threat stems from our method of filtering literature for relevance to SASs, which was based on specific rules. These rules may reflect the authors’ biases, potentially impacting the objectivity and accuracy of our literature selection. To mitigate these concerns, we first refer to the MAPE-K loop and the three key design principles for HOTL to identify the literature’s relevance to SASs. Additionally, the filtering process was collaboratively conducted by two authors, with a third author involved to resolve any discrepancies. Simultaneously, as discussions occur about the parts of ambiguity among the authors, the rule is accordingly refined and integrated several times.
The third threat involves the potential subjectivity in our interpretation of categorization. Even with structured guidelines, the classification of literature into categories can be influenced by individual perspectives, which may affect the neutrality of the analysis. To mitigate this, categorization was also based on discussions involving two or more authors.
The fourth threat concerns the presentation of the literature. To maintain a balance between the length of the article and the distribution of content, we simplified the discussion of certain topics. For instance, despite the rich GenAI literature on monitoring and some specific planning techniques, these topics are not the main focus of research within SASs, leading to more concise explanations in our article. Similarly, discussions on different architectural variants of Transformers and the various LLM prompt designs for different tasks are also simplified, as they might provide less insight for the readers. Such a presentation may reflect the authors’ inherent biases. To counteract this, we mention all literature in the main text as comprehensively as possible (even if it leads to redundancy), and we make the survey data publicly accessible. This allows interested readers to delve into the original articles for a deeper understanding of the technical details.

8 Conclusion

In this article, we aimed to shed light on the potential use and challenges of applying GenAI in SASs. To that end, we first presented a comprehensive and systematic overview of the potential use of GenAI in SASs. Our overview involved gathering literature from four distinct research fields: artificial intelligence, SE, HCI, and robotics. We then conducted a thorough filtering and categorization of this literature. Specifically, we organized the literature into two main categories: the first involves enhancing autonomy and adaptability by augmenting the modules within the MAPE-K feedback loop; the second focuses on improving HOTL interactions, enhancing the interaction between humans and SASs in terms of preference acquisition, transparency, and collaboration.
Leveraging these insights, we then outlined a research roadmap that identified specific challenges for further integrating GenAI within SASs. The roadmap provided a research map outlining future research directions that remain to be addressed in the integration of GenAI into SASs. We concluded with a discussion of current shortcomings of GenAI, particularly LLMs, and potential mitigation strategies that need consideration for practical deployment.
We hope that this article will serve as a source of inspiration for anyone with an interest at the crossroads of GenAI and self-adaptation. We anticipate that the realization of the roadmap will demand a multi-year and concerted research effort of different research groups around the globe.

Acknowledgement

We extend our sincere thanks to the anonymous reviewers for their insightful comments and suggestions, which have greatly enhanced the quality of this manuscript.

Footnotes

Invited article as part of ACM TAAS Editor’s Special Collection to inaugurate a new continuous theme on “Generative AI and Large Language Models for Autonomous and Adaptive Systems.”
1
It is important to note that the output of LLMs simulates human reasoning processes, this does not imply that LLMs, which are based on neural networks, are truly reasoning in the way symbolic AI approaches do [Fedorenko et al., 2024].
2
Strictly speaking, LLMs such as GPT are specific instances of the Transformer architecture, and large-scale Transformers that are trained using language text can also be referred to as LLMs. To clarify distinctions within this article, “LLMs” will predominantly denote models trained with large volumes of natural language text. Conversely, Transformer or BERT will refer to models trained on domain-specific datasets.
3
We omit “automating tasks,” typically associated with the MAPE-K framework, and “runtime models,” covered in Section 4.4.
4
In this context, the term “feature engineering” pertains to ML rather than SE. Here, “features” denote the attributes or variables that characterize each instance within a dataset, as opposed to the functional components of a software product designed to fulfill specific user requirements.

A Appendix

A Evaluation Metrics

Table A1 briefly summarizes the metrics for evaluation, as employed in the selected literature, to facilitate a comprehensive evaluation of GenAI when applied in SASs.
Table A1.
 GenAI’s usageEvaluation Metrics
Mdata structingLog: group and parsing accuracy (Precision, Recall, F1), edit distance
DB: exact match, execution match, execution accuracy, valid efficiency score
anomaly detectionclassification metrics: Precision, Recall, F1
prediction/regression metrics: MSE (Mean Squared Error), MAPE (Mean Absolute Percentage Error), MAE (Mean Absolute Error)
time series forecastingMSE, MAE, MAPE, RMSE (Root Mean Squared Error), NMAE (Normalized Mean Absolute Error), NRMSE (Normalized Root Mean Squared Error), CRPS (Continuous Ranked Probability Score), normalized quantile loss (\(\pi\) -risk), SSR (spread-skill ratio)
event sequence predictionprediction accuracy, MR (Mean Rank), MRR (Mean Reciprocal Rank)
APperformance-related metrics(human normalized) score/return/reward, win rate, success rate (for both task solved and optimization found), # of task/problem solved, length of successful plan (the shorter the better)
cost-related metricsTF/DM: training speed, sample efficiency, diffusion step, planning time
LLM: cost-effectiveness, # of API calls, token cost
LLM-specific metrics# of replan attempt (until plan success), # of option/skill created, executability % of plan
Erobotic executionsuccess rate, efficiency (of the navigated route)
Kknowledge construction and translation# of (error in) predicates/literals, Miss Ratio, POC (Partial Ordering Count), BLEU (bilingual evaluation understudy), unique objects %, human score, Accuracy %, Novelty %, Kendall’s rank correlation coefficient, Spearman’s rank correlation coefficient, AWM size (verified through interaction with the environment)
Hpreference acquisitionsubjective Likert score, # of changes required
transparencycode/log: BLEU (BLEU-CU, BLEU-DC), CIDEr, METEOR, ROUGE-L, Flesch-Kincaid Grade Level, subjective Likert score (e.g., usefulness and readability)
user correction: # of correction, matching accuracy (P, R, F1)
collaborationsubjective Likert score, human efforts (%), reward, game score, successful support rate, success steps, task completeness, sub-goal correctness, token cost, # of API calls
Table A1. A Brief Summary of Evaluation Metrics
M, AP, E, K represent the modules within MAPE-K, H represents HOTL, TF represents Transformer, DM represents Diffusion model.

References

[1]
Wasi Ahmad, Saikat Chakraborty, Baishakhi Ray, and Kai-Wei Chang. 2020. A transformer-based approach for source code summarization. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Online, 4998–5007. DOI:
[2]
Toufique Ahmed and Premkumar Devanbu. 2023. Few-shot training LLMs for project-specific code-summarization. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). Article 177, 5 pages. DOI:
[3]
Toufique Ahmed, Kunal Suresh Pai, Premkumar Devanbu, and Earl Barr. 2024. Automatic semantic augmentation of language model prompts (for code summarization). In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Article 220, 13 pages. DOI:
[4]
Human Compatible AI. 2023. overcooked_ai: A Cooperative Multi-Agent Environment Based on the Overcooked Game. Retrieved May 12, 2024 from https://github.com/HumanCompatibleAI/overcooked_ai
[5]
Muideen Ajagbe and Liping Zhao. 2022. Retraining a BERT model for transfer learning in requirements engineering: A Preliminary study. In Proceedings of the IEEE 30th International Requirements Engineering Conference (RE ’22), 309–315. DOI:
[6]
Ahmed Saeed Alsayed, Hoa Khanh Dam, and Chau Nguyen. 2024. MicroRec: Leveraging large language models for microservice recommendation. In Proceedings of the 21st International Conference on Mining Software Repositories, 419–430.
[7]
Jesper Andersson, Rogério de Lemos, Sam Malek, and Danny Weyns. 2009. Modeling Dimensions of Self-Adaptive Software Systems. Springer, Berlin, 27–47. DOI:
[8]
Ian Arawjo, Chelse Swoopes, Priyan Vaithilingam, Martin Wattenberg, and Elena L. Glassman. 2024. ChainForge: A visual toolkit for prompt engineering and LLM hypothesis testing. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24). Article 304, 18 pages. DOI:
[9]
Chetan Arora, Tomas Herda, and Verena Homm. 2024. Generating test scenarios from NL requirements via LLMs: An industrial study. In Proceedings of the 32nd IEEE International Requirements Engineering 2024 Conference.
[10]
Merve Astekin, Max Hort, and Leon Moonen. 2024. An exploratory study on how non-determinism in large language models affects log parsing. In Proceedings of the 2nd International Workshop on Interpretability, Robustness, and Benchmarking in Neural Software Engineering @ ICSE.
[11]
Michael Barnes. 2010. Human-Robot Interactions in Future Military Operations (Human Factors in Defence). CRC Press.
[12]
Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. 2000. A neural probabilistic language model. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 13. MIT Press.
[13]
Berkay Berabi, Jingxuan He, Veselin Raychev, and Martin Vechev. 2021. TFix: Learning to fix coding errors with a text-to-text transformer. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139 PMLR, 780–791.
[14]
Michael S. Bernstein, Joon Sung Park, Meredith Ringel Morris, Saleema Amershi, Lydia B Chilton, and Mitchell L. Gordon. 2023. Architecting novel interactions with generative AI models. In Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23 Adjunct). Article 107, 3 pages. DOI:
[15]
Antonia Bertolino, Pietro Braione, Guglielmo De Angelis, Luca Gazzola, Fitsum Kifetew, Leonardo Mariani, Matteo Orrù, Mauro Pezzè, Roberto Pietrantuono, Stefano Russo, and Paolo Tonella. 2021. A survey of field-based testing techniques. ACM Computing Surveys 54, 5 (May 2021), Article 92, 39 pages. DOI:
[16]
Antonia Bertolino, Guglielmo De Angelis, Sampo Kellomaki, and Andrea Polini. 2012. Enhancing service federation trustworthiness through online testing. Computer 45, 1 (2012), 66–72. DOI:
[17]
Maciej Besta, Nils Blach, Ales Kubicek, Robert Gerstenberger, Michal Podstawski, Lukas Gianinazzi, Joanna Gajda, Tomasz Lehmann, Hubert Niewiadomski, Piotr Nyczyk, and Torsten Hoefler. 2024. Graph of thoughts: Solving elaborate problems with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 17682–17690. DOI:
[18]
Shreya Bhatia, Tarushi Gandhi, Dhruv Kumar, and Pankaj Jalote. 2024. Unit test generation using generative AI: A comparative performance analysis of autogeneration tools. In Proceedings of the 1st International Workshop on Large Language Models for Code.
[19]
Gordon Blair, Nelly Bencomo, and Robert B. France. 2009. Models@ run.time. Computer 42, 10 (2009), 22–27. DOI:
[20]
Antoine Bosselut, Hannah Rashkin, Maarten Sap, Chaitanya Malaviya, Asli Celikyilmaz, and Yejin Choi. 2019. COMET: Commonsense transformers for automatic knowledge graph construction. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, 4762–4779. DOI:
[21]
Darko Bozhinoski. 2024. Swarm Intelligence-based Bio-inspired Algorithms. In Proceedings of the 19th Conference on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’24).
[22]
Anthony Brohan, Noah Brown, Justice Carbajal, Yevgen Chebotar, Xi Chen, Krzysztof Choromanski, Tianli Ding, Danny Driess, Avinava Dubey, Chelsea Finn, Pete Florence, Chuyuan Fu, Montse Gonzalez Arenas, Keerthana Gopalakrishnan, Kehang Han, Karol Hausman, Alexander Herzog, Jasmine Hsu, Brian Ichter, Alex Irpan, Nikhil Joshi, Ryan Julian, Dmitry Kalashnikov, Yuheng Kuang, Isabel Leal, Lisa Lee, Tsang-Wei Edward Lee, Sergey Levine, Yao Lu, Henryk Michalewski, Igor Mordatch, Karl Pertsch, Kanishka Rao, Krista Reymann, Michael Ryoo, Grecia Salazar, Pannag Sanketi, Pierre Sermanet, Jaspiar Singh, Anikait Singh, Radu Soricut, Huong Tran, Vincent Vanhoucke, Quan Vuong, Ayzaan Wahid, Stefan Welker, Paul Wohlhart, Jialin Wu, Fei Xia, Ted Xiao, Peng Xu, Sichun Xu, Tianhe Yu, and Brianna Zitkovich. 2023. RT-2: Vision-language-action models transfer web knowledge to robotic control. arXiv:2307.15818 [cs.RO]
[23]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott Gray, Benjamin Chess, Jack Clark, Christopher Berner, Sam McCandlish, Alec Radford, Ilya Sutskever, and Dario Amodei. 2020. Language models are few-shot learners. arXiv:2005.14165 [cs.CL]
[24]
Salva Rühling Cachay, Bo Zhao, Hailey James, and Rose Yu. 2023. DYffusion: A dynamics-informed diffusion model for spatiotemporal forecasting. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[25]
Jinyu Cai, Jinglue Xu, Jialong Li, Takuto Yamauchi, Hitoshi Iba, and Kenji Tei. 2024b. Exploring the improvement of evolutionary computation via large language models. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’24).
[26]
Tianle Cai, Xuezhi Wang, Tengyu Ma, Xinyun Chen, and Denny Zhou. 2024a. Large language models as tool makers. In Proceedings of the 12th International Conference on Learning Representations.
[27]
Zefan Cai, Baobao Chang, and Wenjuan Han. 2023. Human-in-the-loop through chain-of-thought. arXiv:2306.07932 [cs.CL]
[28]
R. Calinescu, L. Grunske, M. Kwiatkowska, R. Mirandola, and G. Tamburrelli. 2011. Dynamic QoS management and optimization in service-based systems. IEEE Transactions on Software Engineering 37, 3 (2011). DOI:
[29]
R. Calinescu, D. Weyns, S. Gerasimou, U. Iftikhar, I. Habli, and T. Kelly. 2018. Engineering trustworthy self-adaptive software with dynamic assurance cases. IEEE Transactions on Software Engineering 44, 11 (2018), 1039–1069. DOI:
[30]
Cary Campbell, Alin Olteanu, and Kalevi Kull. 2019. Learning and knowing as semiosis: Extending the conceptual apparatus of semiotics. Sign Systems Studies 47, 3/4 (Dec. 2019), 352–381. DOI:
[31]
Defu Cao, Furong Jia, Sercan O. Arik, Tomas Pfister, Yixiang Zheng, Wen Ye, and Yan Liu. 2024. TEMPO: Prompt-based generative pre-trained transformer for time series forecasting. In Proceedings of the 12th International Conference on Learning Representations.
[32]
Haizhou Cao, Zhenhao Huang, Tiechui Yao, Jue Wang, Hui He, and Yangang Wang. 2023. InParformer: Evolutionary decomposition transformers with interactive parallel attention for long-term time series forecasting. Proceedings of the AAAI Conference on Artificial Intelligence 37, 6 (Jun. 2023), 6906–6915. DOI:
[33]
Thomas Carta, Clément Romac, Thomas Wolf, Sylvain Lamprier, Olivier Sigaud, and Pierre-Yves Oudeyer. 2023. Grounding large language models in interactive environments with online reinforcement learning. In Proceedings of the 40th International Conference on Machine Learning (ICML ’23). JMLR.org, Article 150, 38 pages.
[34]
Chi-Min Chan, Weize Chen, Yusheng Su, Jianxuan Yu, Wei Xue, Shanghang Zhang, Jie Fu, and Zhiyuan Liu. 2024a. ChatEval: Towards better LLM-based evaluators through multi-agent debate. In Proceedings of the 12th International Conference on Learning Representations.
[35]
Kenneth Chan, Sol Zilberman, Nicholas Polanco, Betty H. C. Cheng, and Josh Siegel. 2024b. SafeDriveRL: Combining non-cooperative game theory with reinforcement learning to explore and mitigate human-based uncertainty for autonomous vehicles. In Proceedings of the 19th Conference on Software Engineering for Adaptive and Self-Managing Systems. SEAMS.
[36]
Yevgen Chebotar, Quan Vuong, Karol Hausman, Fei Xia, Yao Lu, Alex Irpan, Aviral Kumar, Tianhe Yu, Alexander Herzog, Karl Pertsch, Keerthana Gopalakrishnan, Julian Ibarz, Ofir Nachum, Sumedh Anand Sontakke, Grecia Salazar, Huong T Tran, Jodilyn Peralta, Clayton Tan, Deeksha Manjunath, Jaspiar Singh, Brianna Zitkovich, Tomas Jackson, Kanishka Rao, Chelsea Finn, and Sergey Levine. 2023. Q-Transformer: Scalable offline reinforcement learning via autoregressive Q-functions. In Proceedings of the 7th Annual Conference on Robot Learning.
[37]
Hila Chefer, Shir Gur, and Lior Wolf. 2021. Transformer interpretability beyond attention visualization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’21), 782–791. DOI:
[38]
Chang Chen, Fei Deng, Kenji Kawaguchi, Caglar Gulcehre, and Sungjin Ahn. 2024b. Simple hierarchical planning with diffusion. In Proceedings of the 12h International Conference on Learning Representations.
[39]
Huayu Chen, Cheng Lu, Chengyang Ying, Hang Su, and Jun Zhu. 2023a. Offline reinforcement learning via high-fidelity generative behavior modeling. In Proceedings of the 11th International Conference on Learning Representations.
[40]
Jiangjie Chen, Wei Shi, Ziquan Fu, Sijie Cheng, Lei Li, and Yanghua Xiao. 2023b. Say what you mean! Large language models speak too positively about negative commonsense knowledge. arXiv:2305.05976 [id=cs.CL].
[41]
Jiangjie Chen, Xintao Wang, Rui Xu, Siyu Yuan, Yikai Zhang, Wei Shi, Jian Xie, Shuang Li, Ruihan Yang, Tinghui Zhu, Aili Chen, Nianqi Li, Lida Chen, Caiyu Hu, Siye Wu, Scott Ren, Ziquan Fu, and Yanghua Xiao. 2024f. From persona to personalization: A survey on role-playing language agents. arXiv:2404.18231 [cs.CL].
[42]
Lili Chen, Kevin Lu, Aravind Rajeswaran, Kimin Lee, Aditya Grover, Michael Laskin, Pieter Abbeel, Aravind Srinivas, and Igor Mordatch. 2021a. Decision transformer: Reinforcement learning via sequence modeling. In Proceedings of the Advances in Neural Information Processing Systems.
[43]
Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. 2021b. Evaluating large language models trained on code. arXiv:2107.03374 [cs.LG].
[44]
Peng Chen, Yingying ZHANG, Yunyao Cheng, Yang Shu, Yihang Wang, Qingsong Wen, Bin Yang, and Chenjuan Guo. 2024g. Pathformer: Multi-scale transformers with adaptive pathways for time series forecasting. In Proceedings of the 12th International Conference on Learning Representations.
[45]
Qing Chen, Wei Shuai, Jiyao Zhang, Zhida Sun, and Nan Cao. 2024d. Beyond numbers: Creating analogies to enhance data comprehension and communication with generative AI. In Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24). Article 377, 14 pages. DOI:
[46]
Weize Chen, Yusheng Su, Jingwei Zuo, Cheng Yang, Chenfei Yuan, Chi-Min Chan, Heyang Yu, Yaxi Lu, Yi-Hsin Hung, Chen Qian, Yujia Qin, Xin Cong, Ruobing Xie, Zhiyuan Liu, Maosong Sun, and Jie Zhou. 2024e. AgentVerse: Facilitating multi-agent collaboration and exploring emergent behaviors. In Proceedings of the 12th International Conference on Learning Representations.
[47]
Xiaolei Chen, Jie Shi, Jia Chen, Peng Wang, and Wei Wang. 2024c. High-precision online log parsing with large language models. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion ’24), 354–355. DOI:
[48]
Yang Chen. 2024. Flakiness repair in the era of large language models. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion ’24), 441–443. DOI:
[49]
Yongchao Chen, Jacob Arkin, Yang Zhang, Nicholas Roy, and Chuchu Fan. 2024a. Scalable multi-robot collaboration with large language models: Centralized or decentralized systems? arXiv:2309.15943 [cs.RO].
[50]
Betty H. C. Cheng, Rogério de Lemos, Holger Giese, Paola Inverardi, Jeff Magee, Jesper Andersson, Basil Becker, Nelly Bencomo, Yuriy Brun, Bojan Cukic, Giovanna Di Marzo Serugendo, Schahram Dustdar, Anthony Finkelstein, Cristina Gacek, Kurt Geihs, Vincenzo Grassi, Gabor Karsai, Holger M. Kienle, Jeff Kramer, Marin Litoiu, Sam Malek, Raffaela Mirandola, Hausi A. Müller, Sooyong Park, Mary Shaw, Matthias Tichy, Massimo Tivoli, Danny Weyns, and Jon Whittle. 2009. Software Engineering for Self-Adaptive Systems: A Research Roadmap. Springer, Berlin, 1–26. DOI:
[51]
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. 2023. Deep reinforcement learning from human preferences. arXiv:1706.03741 [stat.ML].
[52]
Simon Chu, Justin Koe, David Garlan, and Eunsuk Kang. 2024. Integrating graceful degradation and recovery through requirement-driven adaptation. In Proceedings of the 19th Conference on Software Engineering for Adaptive and Self-Managing Systems.
[53]
John Joon Young Chung, Wooseok Kim, Kang Min Yoo, Hwaran Lee, Eytan Adar, and Minsuk Chang. 2022. TaleBrush: Visual sketching of story generation with pretrained language models. In Extended Abstracts of the 2022 CHI Conference on Human Factors in Computing Systems (CHI EA ’22). Article 172, 4 pages. DOI:
[54]
Agnieszka Ciborowska and Kostadin Damevski. 2022. Fast changeset-based bug localization with BERT. In Proceedings of the 44th International Conference on Software Engineering (ICSE ’22), 946–957. DOI:
[55]
Can Cui, Yunsheng Ma, Xu Cao, Wenqian Ye, Yang Zhou, Kaizhao Liang, Jintai Chen, Juanwu Lu, Zichong Yang, Kuei-Da Liao, Tianren Gao, Erlong Li, Kun Tang, Zhipeng Cao, Tong Zhou, Ao Liu, Xinrui Yan, Shuqi Mei, Jianguo Cao, Ziran Wang, and Chao Zheng. 2023. A survey on multimodal large language models for autonomous driving. arXiv:2311.12320 [cs.AI].
[56]
Javier Cámara, Gabriel Moreno, and David Garlan. 2015. Reasoning about human participation in self-adaptive systems. In Proceedings of the 2015 IEEE/ACM 10th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, 146–156. DOI:
[57]
Carlos Eduardo da Silva and Rogério de Lemos. 2011. Dynamic plans for integration testing of self-adaptive software systems. In Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’11), 148–157. DOI:
[58]
Zhirui Dai, Arash Asgharivaskasi, Thai Duong, Shusen Lin, Maria-Elizabeth Tzes, George Pappas, and Nikolay Atanasov. 2024. Optimal scene graph planning with large language model guidance. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[59]
Murtaza Dalal, Tarun Chiruvolu, Devendra Singh Chaplot, and Ruslan Salakhutdinov. 2024. Plan-seq-learn: Language model guided RL for solving long horizon robotics tasks. In Proceedings of the 12h International Conference on Learning Representations.
[60]
Abhimanyu Das, Weihao Kong, Rajat Sen, and Yichen Zhou. 2024b. A decoder-only foundation model for time-series forecasting. In Forty-first International Conference on Machine Learning (ICML).
[61]
Badhan Chandra Das, M. Hadi Amini, and Yanzhao Wu. 2024a. Security and privacy challenges of large language models: A survey. arXiv:2402.00888 [cs.CL].
[62]
Rogério de Lemos, Holger Giese, Hausi A. Müller, Mary Shaw, Jesper Andersson, Marin Litoiu, Bradley Schmerl, Gabriel Tamura, Norha M. Villegas, Thomas Vogel, Danny Weyns, Luciano Baresi, Basil Becker, Nelly Bencomo, Yuriy Brun, Bojan Cukic, Ron Desmarais, Schahram Dustdar, Gregor Engels, Kurt Geihs, Karl M. Göschka, Alessandra Gorla, Vincenzo Grassi, Paola Inverardi, Gabor Karsai, Jeff Kramer, Antónia Lopes, Jeff Magee, Sam Malek, Serge Mankovskii, Raffaela Mirandola, John Mylopoulos, Oscar Nierstrasz, Mauro Pezzè, Christian Prehofer, Wilhelm Schäfer, Rick Schlichting, Dennis B. Smith, João Pedro Sousa, Ladan Tahvildari, Kenny Wong, and Jochen Wuttke. 2013. Software Engineering for Self-Adaptive Systems: A Second Research Roadmap. Springer, Berlin, 1–32. DOI:
[63]
I. de Zarzà, J. de Curtò, Gemma Roig, and Carlos T. Calafate. 2023. LLM adaptive PID control for B5G truck platooning systems. Sensors 23, 13 (2023). DOI:
[64]
Google Deepmind. 2024. Project Astra. Retrieved May 16, 2024 from https://deepmind.google/technologies/gemini/project-astra/
[65]
Gelei Deng, Yi Liu, Víctor Mayoral-Vilches, Peng Liu, Yuekang Li, Yuan Xu, Tianwei Zhang, Yang Liu, Martin Pinzger, and Stefan Rass. 2023. PentestGPT: An LLM-empowered automatic penetration testing tool. arXiv:2308.06782 [cs.SE].
[66]
Mingkai Deng, Jianyu Wang, Cheng-Ping Hsieh, Yihan Wang, Han Guo, Tianmin Shu, Meng Song, Eric Xing, and Zhiting Hu. 2022. RLPrompt: Optimizing discrete text prompts with reinforcement learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 3369–3391. DOI:
[67]
Yinlin Deng, Chunqiu Steven Xia, Chenyuan Yang, Shizhuo Dylan Zhang, Shujing Yang, and Lingming Zhang. 2024. Large Language models are edge-case generators: Crafting unusual programs for fuzzing deep learning libraries. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Article 70, 13 pages. DOI:
[68]
Gouri Deshpande, Behnaz Sheikhi, Saipreetham Chakka, Dylan Lachou Zotegouon, Mohammad Navid Masahati, and Guenther Ruhe. 2021. Is BERT the new silver bullet? - An empirical investigation of requirements dependency classification. In Proceedings of the IEEE 29th International Requirements Engineering Conference Workshops (REW ’21). 136–145. DOI:
[69]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs.CL].
[70]
Bosheng Ding, Chengwei Qin, Linlin Liu, Lidong Bing, Shafiq R. Joty, and Boyang Albert Li. 2022. Is GPT-3 a good data annotator? In Proceedings of the Annual Meeting of the Association for Computational Linguistics.
[71]
Yan Ding, Xiaohan Zhang, Saeid Amiri, Nieqing Cao, Hao Yang, Andy Kaminski, Chad Esselink, and Shiqi Zhang. 2024. Integrating action knowledge and LLMs for task planning and situation handling in open worlds. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[72]
Simon Dobson, Spyros Denazis, Antonio Fernández, Dominique Gaïti, Erol Gelenbe, Fabio Massacci, Paddy Nixon, Fabrice Saffre, Nikita Schmidt, and Franco Zambonelli. 2006. A survey of autonomic communications. ACM Transactions on Autonomous and Adaptive Systems (TAAS) 1, 2 (Dec. 2006), 223–259. DOI:
[73]
Yihan Dong. 2024. The multi-agent system based on LLM for online discussions. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’24), 2731–2733.
[74]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations.
[75]
Danny Driess, Fei Xia, Mehdi S. M. Sajjadi, Corey Lynch, Aakanksha Chowdhery, Brian Ichter, Ayzaan Wahid, Jonathan Tompson, Quan Vuong, Tianhe Yu, Wenlong Huang, Yevgen Chebotar, Pierre Sermanet, Daniel Duckworth, Sergey Levine, Vincent Vanhoucke, Karol Hausman, Marc Toussaint, Klaus Greff, Andy Zeng, Igor Mordatch, and Pete Florence. 2023. PaLM-E: An embodied multimodal language model. In Proceedings of the 40th International Conference on Machine Learning (ICML’23). Article 340, 20 pages.
[76]
Li Du, Xiao Ding, Yue Zhang, Ting Liu, and Bing Qin. 2022. A graph enhanced BERT model for event prediction. In Findings of the Association for Computational Linguistics (ACL ’22). Association for Computational Linguistics, 2628–2638. DOI:
[77]
Yuqing Du, Olivia Watkins, Zihan Wang, Cédric Colas, Trevor Darrell, Pieter Abbeel, Abhishek Gupta, and Jacob Andreas. 2023. Guiding pretraining in reinforcement learning with large language models. In Proceedings of the 40th International Conference on Machine Learning, Vol. 202. PMLR, 8657–8677.
[78]
Upol Ehsan, Elizabeth A. Watkins, Philipp Wintersberger, Carina Manger, Sunnie S. Y. Kim, Niels Van Berkel, Andreas Riener, and Mark O. Riedl. 2024. Human-centered explainable AI (HCXAI): Reloading explainability in the era of large language models (LLMs). In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA ’24). Article 477, 6 pages. DOI:
[79]
A. Fan, B. Gokkaya, M. Harman, M. Lyubarskiy, S. Sengupta, S. Yoo, and J. M. Zhang. 2023b. Large language models for software engineering: survey and open problems. In Proceedings of the IEEE/ACM International Conference on Software Engineering: Future of Software Engineering (ICSE-FoSE ’23), 31–53. DOI:
[80]
Caoyun Fan, Jindou Chen, Yaohui Jin, and Hao He. 2024a. Can large language models serve as rational players in game theory? A systematic analysis. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 17960–17967. DOI:
[81]
Xinyao Fan, Yueying Wu, Chang Xu, Yuhao Huang, Weiqing Liu, and Jiang Bian. 2024b. MG-TSD: Multi-granularity time series diffusion models with guided learning process. In Proceedings of the 12th International Conference on Learning Representations.
[82]
Z. Fan, X. Gao, M. Mirchev, A. Roychoudhury, and S. Tan. 2023a. Automated repair of programs from large language models. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE ’23), 1469–1481. DOI:
[83]
Alessandro Fantechi, Stefania Gnesi, Lucia Passaro, and Laura Semini. 2023. Inconsistency detection in natural language requirements using ChatGPT: A preliminary evaluation. In Proceedings of the IEEE 31st International Requirements Engineering Conference (RE ’23), 335–340. DOI:
[84]
Evelina Fedorenko, Steven T. Piantadosi, and Edward A. F. Gibson. 2024. Language is primarily a tool for communication rather than thought. Nature 630, 8017 (2024), 575–586. DOI:
[85]
Nick Feng, Lina Marsso, S. Getir Yaman, Isobel Standen, Yesugen Baatartogtokh, Reem Ayad, Victória Oldemburgo de Mello, Bev Townsend, Hanne Bartels, Ana Cavalcanti, Radu Calinescu, and Marsha Chechik. 2024. Normative requirements operationalization with large language models. In 32nd IEEE International Requirements Engineering (RE).
[86]
Silvan Ferreira, Ivanovitch Silva, and Allan Martins. 2024. Organizing a society of language models: structures and mechanisms for enhanced collective intelligence. arXiv:2405.03825 [cs.AI].
[87]
Antonio Filieri, Henry Hoffmann, and Martina Maggio. 2014. Automated design of self-adaptive software with control-theoretical formal guarantees. In Proceedings of the 36th International Conference on Software Engineering (ICSE ’14), 299–310. DOI:
[88]
Emily First, Markus Rabe, Talia Ringer, and Yuriy Brun. 2023. Baldur: Whole-proof generation and repair with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’23), 1229–1241. DOI:
[89]
Erik M. Fredericks, Byron DeVries, and Betty H. C. Cheng. 2014. Towards run-time adaptation of test cases for self-adaptive systems in the face of uncertainty. In Proceedings of the 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’14), 17–26. DOI:
[90]
Erik M. Fredericks, Andres J. Ramirez, and Betty H. C. Cheng. 2013. Towards run-time testing of dynamic adaptive systems. In Proceedings of the 8th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’13), 169–174. DOI:
[91]
Michael Fu, Chakkrit Tantithamthavorn, Trung Le, Van Nguyen, and Dinh Phung. 2022. VulRepair: A T5-based automated software vulnerability repair. In Proceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’22), 935–947. DOI:
[92]
Hiroki Furuta, Yutaka Matsuo, and Shixiang Shane Gu. 2022. Generalized decision transformer for offline hindsight information matching. In Proceedings of the International Conference on Learning Representations.
[93]
Matteo Gallici, Mario Martin, and Ivan Masmitja. 2023. TransfQMix: Transformers for leveraging the graph structure of multi-agent reinforcement learning problems. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’23), 1679–1687.
[94]
Jensen Gao, Bidipta Sarkar, Fei Xia, Ted Xiao, Jiajun Wu, Brian Ichter, Anirudha Majumdar, and Dorsa Sadigh. 2024. Physically grounded vision-language models for robotic manipulation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[95]
D. Garlan, S. W. Cheng, A. C. Huang, B. Schmerl, and P. Steenkiste. 2004. Rainbow: Architecture-based self-adaptation with reusable infrastructure. Computer 37, 10 (2004) 46–54.
[96]
Mingyang Geng, Shangwen Wang, Dezun Dong, Haotian Wang, Ge Li, Zhi Jin, Xiaoguang Mao, and Xiangke Liao. 2024. Large language models are few-shot summarizers: Multi-intent comment generation via in-context learning. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Article 39, 13 pages. DOI:
[97]
Omid Gheibi and Danny Weyns. 2024. Dealing with drift of adaptation spaces in learning-based self-adaptive systems using lifelong self-adaptation. ACM Transactions on Autonomous and Adaptive Systems 19, 1 (2024), 5:1–5:57. DOI:
[98]
Omid Gheibi, Danny Weyns, and Federico Quin. 2021a. Applying machine learning in self-adaptive systems: A systematic literature review. ACM Transactions on Autonomous and Adaptive Systems 15, 3, Article 9 (Aug. 2021), 37 pages.
[99]
Omid Gheibi, Danny Weyns, and Federico Quin. 2021b. On the impact of applying machine learning in the decision-making of self-adaptive systems. In Proceedings of the International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’21), 104–110. DOI:
[100]
Avijit Ghosh and Genoveva Fossas. 2022. Can there be art without an artist? arXiv:2209.07667 [cs.AI].
[101]
Miriam Gil, Manoli Albert, Joan Fons, and Vicente Pelechano. 2019. Designing human-in-the-loop autonomous cyber-physical systems. International Journal of Human-Computer Studies 130 (2019), 21–39. DOI:
[102]
Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. 2020. Generative adversarial networks. Communications of the ACM 63, 11 (Oct. 2020), 139–144. DOI:
[103]
Moritz A. Graule and Volkan Isler. 2024. GG-LLM: Geometrically grounding large language models for zero-shot human activity forecasting in human-aware task planning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[104]
Alex Graves. 2012. Long Short-Term Memory. Springer, Berlin, 37–45. DOI:
[105]
Nate Gruver, Marc Finzi, Shikai Qiu, and Andrew Gordon Wilson. 2023. Large language models are zero-shot time series forecasters. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems (NeurIPS ’23).
[106]
Lin Guan, Karthik Valmeekam, Sarath Sreedharan, and Subbarao Kambhampati. 2023. Leveraging pre-trained large language models to construct and utilize world models for model-based task planning. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[107]
Qingyan Guo, Rui Wang, Junliang Guo, Bei Li, Kaitao Song, Xu Tan, Guoqing Liu, Jiang Bian, and Yujiu Yang. 2024c. Connecting large language models with evolutionary algorithms yields powerful prompt optimizers. In Proceedings of the 12th International Conference on Learning Representations.
[108]
Taicheng Guo, Xiuying Chen, Yaqi Wang, Ruidi Chang, Shichao Pei, Nitesh V. Chawla, Olaf Wiest, and Xiangliang Zhang. 2024a. Large language model based multi-agents: A survey of progress and challenges. arXiv:2402.01680 [cs.CL].
[109]
Xudong Guo, Kaixuan Huang, Jiale Liu, Wenhui Fan, Natalia Vélez, Qingyun Wu, Huazheng Wang, Thomas L. Griffiths, and Mengdi Wang. 2024b. Embodied LLM agents learn to cooperate in organized teams. arXiv:2403.12482 [cs.AI].
[110]
Xiaoyu Guo, Jianjun Zhao, and Pengzhan Zhao. 2024d. On repairing quantum programs using ChatGPT. In Proceedings of the 5th International Workshop on Quantum Software Engineering (Q-SE ’24).
[111]
Agrim Gupta, Linxi Fan, Surya Ganguli, and Li Fei-Fei. 2022. MetaMorph: Learning universal controllers with transformers. In Proceedings of the International Conference on Learning Representations.
[112]
Priyanshu Gupta, Avishree Khare, Yasharth Bajpai, Saikat Chakraborty, Sumit Gulwani, Aditya Kanade, Arjun Radhakrishna, Gustavo Soares, and Ashish Tiwari. 2023. Grace: Language models meet code edits. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’23), 1483–1495. DOI:
[113]
Seth T. Hamman, Kenneth M. Hopkinson, Ruth L. Markham, Andrew M. Chaplik, and Gabrielle E. Metzler. 2017. Teaching game theory to improve adversarial thinking in cybersecurity students. IEEE Transactions on Education 60, 3 (2017), 205–211. DOI:
[114]
Jesse Michael Han, Jason Rute, Yuhuai Wu, Edward Ayers, and Stanislas Polu. 2022. Proof artifact co-training for theorem proving with language models. In Proceedings of the International Conference on Learning Representations.
[115]
Shibo Hao, Bowen Tan, Kaiwen Tang, Bin Ni, Xiyan Shao, Hengzhe Zhang, Eric Xing, and Zhiting Hu. 2023. BertNet: Harvesting knowledge graphs with arbitrary relations from pretrained language models. In Findings of the Association for Computational Linguistics (ACL ’23). Association for Computational Linguistics, 5000–5015. DOI:
[116]
Andreas Happe and Jürgen Cito. 2023. Getting pwn’d by AI: Penetration testing with large language models. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’23), 2082–2086. DOI:
[117]
Mark Harman, S. Afshin Mansouri, and Yuanyuan Zhang. 2012. Search-based software engineering: Trends, techniques and applications. ACM Computing Surveys 45, 1 (Dec. 2012), Article 11, 61 pages. DOI:
[118]
Shabnam Hassani. 2024. Enhancing legal compliance and regulation analysis with large language models. In Proceedings of the 32nd IEEE International Requirements Engineering 2024 Conference (RE ’24).
[119]
Rishi Hazra, Pedro Zuidberg Dos Martires, and Luc De Raedt. 2024. SayCanPay: Heuristic planning with large language models using learnable domain knowledge. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 20123–20133. DOI:
[120]
Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, and Xuelong Li. 2023. Diffusion model is an effective planner and data synthesizer for multi-task reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36. Curran Associates, Inc., 64896–64917.
[121]
Maya Hickmann. 2000. Linguistic relativity and linguistic determinism: some new directions. Linguistics 38, 2 (2000), 409–434. DOI:
[122]
Shinichi Honiden Hiroyuki Nakagawa. 2023. MAPE-K loop-based goal model generation using generative AI. In Proceedings of the IEEE 31st International Requirements Engineering Conference Workshop.
[123]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. 2020. Denoising diffusion probabilistic models. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 6840–6851.
[124]
Jacob Hoffmann and Demian Frister. 2024. Generating software tests for mobile applications using fine-tuned large language models. In Proceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST ’24), 76–77. DOI:
[125]
Noah Hollmann, Samuel Müller, and Frank Hutter. 2023. Large language models for automated data science: Introducing CAAFE for context-aware automated feature engineering. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[126]
Junyuan Hong, Jiachen T. Wang, Chenhui Zhang, Zhangheng LI, Bo Li, and Zhangyang Wang. 2024a. DP-OPT: Make large language model your privacy-preserving prompt engineer. In Proceedings of the 12th International Conference on Learning Representations.
[127]
Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Jinlin Wang, Ceyao Zhang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, Liyang Zhou, Chenyu Ran, Lingfeng Xiao, Chenglin Wu, and Jürgen Schmidhuber. 2024b. MetaGPT: Meta programming for a multi-agent collaborative framework. In Proceedings of the 12th International Conference on Learning Representations.
[128]
Yuki Hou, Haruki Tamoto, and Homei Miyashita. 2024. “My agent understands me better”: Integrating dynamic human-like memory recall and consolidation in LLM-based agents. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems. Article 7, 7 pages. DOI:
[129]
Jeremy Howard and Sebastian Ruder. 2018. Universal language model fine-tuning for text classification. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Vol. 1: Long Papers. Association for Computational Linguistics, 328–339.
[130]
Anthony Hu, Lloyd Russell, Hudson Yeo, Zak Murez, George Fedoseev, Alex Kendall, Jamie Shotton, and Gianluca Corrado. 2023b. GAIA-1: A Generative world model for autonomous driving. arXiv:2309.17080 [cs.CV].
[131]
Ronghang Hu and Amanpreet Singh. 2021. UniT: Multimodal multitask learning with a unified transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV ’21), 1439–1449.
[132]
Siyi Hu, Fengda Zhu, Xiaojun Chang, and Xiaodan Liang. 2021. {UPD}eT: Universal multi-agent {RL} via policy decoupling with transformers. In Proceedings of the International Conference on Learning Representations.
[133]
X. Hu, Z. Liu, X. Xia, Z. Liu, T. Xu, and X. Yang. 2023a. Identify and update test cases when production code changes: A transformer-based approach. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ’23), 1111–1122. DOI:
[134]
Kai Huang, Xiangxin Meng, Jian Zhang, Yang Liu, Wenjie Wang, Shuhao Li, and Yuqing Zhang. 2023b. An empirical study on fine-tuning large language models of code for automated program repair. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ’23), 1162–1174. DOI:
[135]
Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, and Ting Liu. 2023c. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv:2311.05232 [cs.CL].
[136]
Tao Huang, Pengfei Chen, Jingrun Zhang, Ruipeng Li, and Rui Wang. 2023a. A transferable time series forecasting service using deep transformer model for online systems. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). Article 4, 12 pages. DOI:
[137]
Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Tomas Jackson, Noah Brown, Linda Luu, Sergey Levine, Karol Hausman, and brian ichter. 2022. Inner monologue: Embodied reasoning through planning with language models. In Proceedings of the 6th Annual Conference on Robot Learning.
[138]
Yutan Huang, Tanjila Kanij, Anuradha Madugalla, Shruti Mahajan, Chetan Arora, and John Grundy. 2024. Unlocking adaptive user experience with generative AI. arXiv:2404.05442 [cs.HC].
[139]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research 18, 1 (Jan. 2018), 6869–6898.
[140]
William Hunt, Toby Godfrey, and Mohammad D. Soorati. 2024. Conversational language models for human-in-the-loop multi-robot coordination. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’24), 2809–2811.
[141]
Muhammad Usman Iftikhar, Gowri Sankar Ramachandran, Pablo Bollansée, Danny Weyns, and Danny Hughes. 2017. DeltaIoT: A self-adaptive internet of things exemplar. In Proceedings of the IEEE/ACM 12th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’17), 76–82. DOI:
[142]
M. Usman Iftikhar and Danny Weyns. 2014. ActivFORMS: Active formal models for self-adaptation. In Proceedings of the 9th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’14), 125–134.
[143]
Tatsuro Inaba, Hirokazu Kiyomaru, Fei Cheng, and Sadao Kurohashi. 2023. MultiTool-CoT: GPT-3 can use multiple external tools with chain of thought prompting. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics, Vol. 2: Short Papers. Association for Computational Linguistics, 1522–1532. DOI:
[144]
Jeevana Priya Inala, Yichen Yang, James Paulos, Yewen Pu, Osbert Bastani, Vijay Kumar, Martin Rinard, and Armando Solar-Lezama. 2020. Neurosymbolic transformers for multi-agent communication. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 13597–13608.
[145]
S. Izquierdo, G. Canal, C. Rizzo, and G. Alenyà. 2024. PlanCollabNL: Leveraging large language models for adaptive plan generation in human-robot collaboration. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[146]
Pooyan Jamshidi, Javier Cámara, Bradley Schmerl, Christian Käestner, and David Garlan. 2019. Machine learning meets quantitative planning: Enabling self-adaptation in autonomous robots. In Proceedings of the IEEE/ACM 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’19), 39–50. DOI:
[147]
Michael Janner, Yilun Du, Joshua Tenenbaum, and Sergey Levine. 2022. Planning with diffusion for flexible behavior synthesis. In Proceedings of the International Conference on Machine Learning.
[148]
Piyush Jha, Joseph Scott, Jaya Sriram Ganeshna, Mudit Singh, and Vijay Ganesh. 2024. BertRLFuzzer: A BERT and reinforcement learning based fuzzer (student abstract). In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 23521–23522. DOI:
[149]
Albert Qiaochu Jiang, Wenda Li, Szymon Tworkowski, Konrad Czechowski, Tomasz Odrzygóźdź, Piotr MioŚ, Yuhuai Wu, and Mateja Jamnik. 2022. Thor: Wielding hammers to integrate language models and automated theorem provers. In Proceedings of the Advances in Neural Information Processing Systems.
[150]
Jiawei Jiang, Chengkai Han, Wayne Xin Zhao, and Jingyuan Wang. 2023a. PDFormer: Propagation delay-aware dynamic long-range transformer for traffic flow prediction. In Proceedings of the 37th AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and 13th Symposium on Educational Advances in Artificial Intelligence(AAAI ’23/IAAI ’23/EAAI ’23). Article 487, 9 pages. DOI:
[151]
N. Jiang, K. Liu, T. Lutellier, and L. Tan. 2023b. Impact of code language models on automated program repair. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE ’23), 1430–1442. DOI:
[152]
Peiling Jiang, Jude Rayan, Steven P. Dow, and Haijun Xia. 2023c. Graphologue: Exploring large language model responses with interactive diagrams. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Article 3, 20 pages. DOI:
[153]
Shengbei Jiang, Jiabao Zhang, Wei Chen, Bo Wang, Jianyi Zhou, and Jie M. Zhang. 2024. Evaluating fault localization and program repair capabilities of existing closed-source general-purpose LLMs. In Proceedings of the 1st International Workshop on Large Language Models for Code.
[154]
Ming Jin, Shiyu Wang, Lintao Ma, Zhixuan Chu, James Y. Zhang, Xiaoming Shi, Pin-Yu Chen, Yuxuan Liang, Yuan-Fang Li, Shirui Pan, and Qingsong Wen. 2024. Time-LLM: Time series forecasting by reprogramming large language models. In Proceedings of the 12h International Conference on Learning Representations.
[155]
Peng Jin, Yang Wu, Yanbo Fan, Zhongqian Sun, Yang Wei, and Li Yuan. 2023. Act as you wish: Fine-grained control of motion diffusion model with hierarchical semantic graphs. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[156]
Christoforos Kachris. 2024. A survey on hardware accelerators for large language models. arXiv:2401.09890 [cs.AR].
[157]
Eduard Kamburjan, Riccardo Sieve, Chinmayi Prabhu Baramashetru, Marco Amato, Gianluca Barmina, Eduard Occhipinti, and Einar Broch Johnsen. 2024. GreenhouseDT: An exemplar for digital twins. In Proceedings of the 19th Conference on Software Engineering for Adaptive and Self-Managing Systems.
[158]
Bingyi Kang, Xiao Ma, Chao Du, Tianyu Pang, and Shuicheng Yan. 2023. Efficient diffusion policies for offline reinforcement learning. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[159]
Qitong Kang, Fuyong Wang, Zhongxin Liu, and Zengqiang Chen. 2024. TIMAT: Temporal information multi-agent transformer. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’24), 2321–2323.
[160]
Jeff Kephart and David Chess. 2003. The vision of autonomic computing. Computer 36, 1 (Jan. 2003), 41–50.
[161]
Junaed Younus Khan and Gias Uddin. 2023. Automatic code documentation generation using GPT-3. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). Article 174, 6 pages. DOI:
[162]
Dongsun Kim and Sooyong Park. 2009. Reinforcement learning-based dynamic adaptation planning method for architecture-based self-managed software. In Proceedings of the 2009 ICSE Workshop on Software Engineering for Adaptive and Self-Managing Systems, 76–85. DOI:
[163]
Jayoung Kim, Chaejeong Lee, Yehjin Shin, Sewon Park, Minjung Kim, Noseong Park, and Jihoon Cho. 2022. SOS: Score-based oversampling for tabular data. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’22), 762–772. DOI:
[164]
Jinkyu Kim, Anna Rohrbach, Trevor Darrell, John Canny, and Zeynep Akata. 2018. Textual explanations for self-driving vehicles. In Proceedings of the European Conference on Computer Vision (ECCV ’18).
[165]
Diederik P. Kingma and Max Welling. 2022. Auto-encoding variational bayes. arXiv:1312.6114 [stat.ML].
[166]
K. Knill and S. Young. 1997. Hidden Markov Models in Speech and Language Processing. Springer Netherlands, Dordrecht, 27–68. DOI:
[167]
Hyung-Kwon Ko, Hyeon Jeon, Gwanmo Park, Dae Hyun Kim, Nam Wook Kim, Juho Kim, and Jinwook Seo. 2024. Natural language dataset generation framework for visualizations powered by large language models. In Proceedings of the Conference on Human Factors in Computing Systems (CHI ’24). Article 843, 22 pages. DOI:
[168]
Takeshi Kojima, Shixiang Shane Gu, Machel Reid, Yutaka Matsuo, and Yusuke Iwasawa. 2022. Large language models are zero-shot reasoners. In Proceedings of the Advances in Neural Information Processing Systems 35: Annual Conference on Neural Information Processing Systems (NeurIPS ’22).
[169]
Marcel Kollovieh, Abdul Fatir Ansari, Michael Bohlke-Schneider, Jasper Zschiegner, Hao Wang, and Bernie Wang. 2023. Predict, refine, synthesize: Self-guiding diffusion models for probabilistic time series forecasting. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[170]
Ryan Koo, Minhwa Lee, Vipul Raheja, Jong Inn Park, Zae Myung Kim, and Dongyeop Kang. 2023. Benchmarking cognitive biases in large language models as evaluators. arXiv:2309.17012 [cs.CL].
[171]
Minae Kwon, Sang Michael Xie, Kalesha Bullard, and Dorsa Sadigh. 2023. Reward design with language models. In Proceedings of the the 11th International Conference on Learning Representations.
[172]
Mariam Lahami and Moez Krichen. 2021. A survey on runtime testing of dynamically adaptable and distributed systems. Software Quality Journal 29, 2 (2021), 555–593. DOI:
[173]
Márk Lajkó, Viktor Csuvik, Tibor Gyimothy, and László Vidács. 2024. Automated program repair with the GPT family, including GPT-2, GPT-3 and CodeX. In Proceedings of the IEEE/ACM International Workshop on Automated Program Repair (APR ’24).
[174]
V. Le and H. Zhang. 2021. Log-based anomaly detection without log parsing. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE ’21), 492–504. DOI:
[175]
Van-Hoang Le and Hongyu Zhang. 2023. Log parsing: How far can ChatGPT go?. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ’23). IEEE, 1699–1704. DOI:
[176]
C. Lee, T. Yang, Z. Chen, Y. Su, and M. R. Lyu. 2023. Maat: Performance metric anomaly anticipation for cloud services with conditional diffusion. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ’23), 116–128. DOI:
[177]
Namyeong Lee and Jun Moon. 2023. Transformer actor-critic with regularization: automated stock trading using reinforcement learning. In Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’23), 2815–2817.
[178]
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-Tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. 2020. Retrieval-augmented generation for knowledge-intensive NLP tasks. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 9459–9474.
[179]
Jinyang Li, Binyuan Hui, G. E. Qu, Jiaxi Yang, Binhua Li, Bowen Li, Bailin Wang, Bowen Qin, Ruiying Geng, Nan Huo, Xuanhe Zhou, Chenhao Ma, Guoliang Li, Kevin Chang, Fei Huang, Reynold Cheng, and Yongbin Li. 2023b. Can LLM already serve as a database interface? A big bench for large-scale database grounded text-to-SQLs. In Proceedings of the 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
[180]
Jia Li, Shiva Nejati, and Mehrdad Sabetzadeh. 2024a. Using genetic programming to build self-adaptivity into software-defined networks. ACM Transactions on Autonomous and Adaptive Systems 19, 1, Article 2 (Feb. 2024), 35 pages. DOI:
[181]
Jialong Li, Mingyue Zhang, Nianyu Li, Danny Weyns, Zhi Jin, and Kenji Tei. 2024c. Exploring the potential of large language models in self-adaptive systems. In Proceedings of the 19th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’24), 77–83. DOI:
[182]
Jialong Li, Mingyue Zhang, Zhenyu Mao, Haiyan Zhao, Zhi Jin, Shinichi Honiden, and Kenji Tei. 2022b. Goal-oriented knowledge reuse via curriculum evolution for reinforcement learning-based adaptation. In Proceedings of the 29th Asia-Pacific Software Engineering Conference (APSEC ’22), 189–198.
[183]
Lei Li, Yongfeng Zhang, and Li Chen. 2021b. Personalized transformer for explainable recommendation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Vol. 1 Long Papers. Online, 4947–4957.
[184]
Nianyu Li, Sridhar Adepu, Eunsuk Kang, and David Garlan. 2020a. Explanations for human-on-the-loop: A probabilistic model checking approach. In Proceedings of the IEEE/ACM 15th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’20), 181–187.
[185]
Nianyu Li, Javier Cámara, David Garlan, and Bradley Schmerl. 2020b. Reasoning about when to provide explanation for human-involved self-adaptive systems. In Proceedings of the IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS ’20), 195–204. DOI:
[186]
Nianyu Li, Javier Cámara, David Garlan, Bradley Schmerl, and Zhi Jin. 2021a. Hey! Preparing humans to do tasks in self-adaptive systems. In Proceedings of the International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’21), 48–58. DOI:
[187]
Nianyu Li, Mingyue Zhang, Jialong Li, Sridhar Adepu, Eunsuk Kang, and Zhi Jin. 2024b. A game-theoretical self-adaptation framework for securing software-intensive systems. ACM Transactions on Autonomous and Adaptive Systems, 19 2 (Apr. 2024), Article 12, 49 pages. DOI:
[188]
Nianyu Li, Mingyue Zhang, Jialong Li, Eunsuk Kang, and Kenji Tei. 2023e. Preference adaptation: User satisfaction is all you need!. In Proceedings of the IEEE/ACM 18th Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’23), 133–144.
[189]
Ruikun Li, Xuliang Li, Shiying Gao, S. T. Boris Choy, and Junbin Gao. 2023c. Graph convolution recurrent denoising diffusion model for multivariate probabilistic temporal forecasting. In Proceedings of the Advanced Data Mining and Applications: 19th International Conference (ADMA ’23). Springer-Verlag, Berlin, 661–676. DOI:
[190]
Shuang Li, Xavier Puig, Chris Paxton, Yilun Du, Clinton Wang, Linxi Fan, Tao Chen, De-An Huang, Ekin Akyürek, Anima Anandkumar, Jacob Andreas, Igor Mordatch, Antonio Torralba, and Yuke Zhu. 2022a. Pre-trained language models for interactive decision-making. In Proceedings of the Advances in Neural Information Processing Systems.
[191]
Wenhao Li, Xiangfeng Wang, Bo Jin, and Hongyuan Zha. 2023d. Hierarchical diffusion for offline decision making. In Proceedings of the 40th International Conference on Machine Learning, Vol. 202. PMLR, 20035–20064.
[192]
Yucheng Li, Bo Dong, Frank Guerin, and Chenghua Lin. 2023a. Compressing context to enhance inference efficiency of large language models. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, Singapore, 6342–6353. DOI:
[193]
Bill Yuchen Lin, Yicheng Fu, Karina Yang, Faeze Brahman, Shiyu Huang, Chandra Bhagavatula, Prithviraj Ammanabrolu, Yejin Choi, and Xiang Ren. 2023a. SwiftSage: A generative agent with fast and slow thinking for complex interactive tasks. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[194]
Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2023b. Magic3D: High-resolution text-to-3D content creation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR ’23), 300–309.
[195]
Jinfeng Lin, Yalin Liu, Qingkai Zeng, Meng Jiang, and Jane Cleland-Huang. 2021. Traceability transformed: Generating more accurate links with pre-trained BERT models. In Proceedings of the IEEE/ACM 43rd International Conference on Software Engineering (ICSE ’21), 324–335. DOI:
[196]
Yong Lin, Hangyu Lin, Wei Xiong, Shizhe Diao, Jianmeng Liu, Jipeng Zhang, Rui Pan, Haoxiang Wang, Wenbin Hu, Hanning Zhang, Hanze Dong, Renjie Pi, Han Zhao, Nan Jiang, Heng Ji, Yuan Yao, and Tong Zhang. 2024. Mitigating the alignment tax of RLHF. arXiv:2309.06256 [cs.LG].
[197]
Yen-Ting Lin and Yun-Nung Chen. 2023. LLM-eval: Unified multi-dimensional automatic evaluation for open-domain conversations with large language models. arXiv:2305.13711 [cs.CL].
[198]
Jianwei Liu, Maria Stamatopoulou, and Dimitrios Kanoulas. 2024c. DiPPeR: Diffusion-based 2D path planner applied on legged robots. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[199]
Jijia Liu, Chao Yu, Jiaxuan Gao, Yuqing Xie, Qingmin Liao, Yi Wu, and Yu Wang. 2024f. LLM-powered hierarchical language agent for real-time human-AI coordination. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’24), 1219–1228.
[200]
Mingjie Liu, Teo Ene, Robert Kirby, Chris Cheng, Nathaniel Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Brucek Khailany, Kishor Kunal, Xiaowei Li, Hao Liu, Stuart Oberman, Sujeet Omar, Sreedhar Pratty, Ambar Sarkar, Zhengjiang Shao, Hanfei Sun, Pratik P Suthar, Varun Tej, Kaizhe Xu, and Haoxing Ren. 2023a. ChipNeMo: Domain-adapted LLMs for chip design. arXiv:2311.00176 [cs.CL].
[201]
Shengcai Liu, Caishun Chen, Xinghua Qu, Ke Tang, and Yew-Soon Ong. 2024b. Large language models as evolutionary optimizers. In Proceedings of the 12th International Conference on Learning Representations.
[202]
Tennison Liu, Nicolás Astorga, Nabeel Seedat, and Mihaela van der Schaar. 2024a. Large language models to enhance bayesian optimization. In Proceedings of the 12h International Conference on Learning Representations.
[203]
Xingyu ’Bruce’ Liu, Vladimir Kirilyuk, Xiuxiu Yuan, Peggy Chi, Alex Olwal, Xiang ’Anthony’ Chen, and Ruofei Du. 2023c. Experiencing visual captions: Augmented communication with real-time visuals using large language models. In Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23 Adjunct). Article 85, 4 pages. DOI:
[204]
Yilun Liu, Shimin Tao, Weibin Meng, Jingyu Wang, Wenbing Ma, Yuhang Chen, Yanqing Zhao, Hao Yang, and Yanfei Jiang. 2024d. Interpretable online log analysis using large language models with prompt strategies. In Proceedings of the 32nd IEEE/ACM International Conference on Program Comprehension (ICPC ’24), 35–46. DOI:
[205]
Yong Liu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2022. Non-stationary transformers: Exploring the stationarity in time series forecasting. In Proceedings of the Advances in Neural Information Processing Systems.
[206]
Yang Liu, Yuanshun Yao, Jean-Francois Ton, Xiaoying Zhang, Ruocheng Guo, Hao Cheng, Yegor Klochkov, Muhammad Faaiz Taufiq, and Hang Li. 2024e. Trustworthy LLMs: A survey and guideline for evaluating large language models’ alignment. arXiv:2308.05374 [cs.AI].
[207]
Zuxin Liu, Zijian Guo, Yihang Yao, Zhepeng Cen, Wenhao Yu, Tingnan Zhang, and Ding Zhao. 2023b. Constrained decision transformer for offline safe reinforcement learning. In Proceedings of the 40th International Conference on Machine Learning (ICML ’23). JMLR.org, Article 893, 20 pages.
[208]
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV ’21).
[209]
Samuel López-Ruiz, Carlos Ignacio Hernández-Castellanos, and Katya Rodríguez-Vázquez. 2022. Multi-objective framework for quantile forecasting in financial time series using transformers. In Proceedings of the Genetic and Evolutionary Computation Conference (GECCO ’22), 395–403. DOI:
[210]
Xingzhou Lou, Junge Zhang, Ziyan Wang, Kaiqi Huang, and Yali Du. 2024. Safe reinforcement learning with free-form natural language constraints and pre-trained language models. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’24), 1274–1282.
[211]
Jack Lu, Kelvin Wong, Chris Zhang, Simon Suo, and Raquel Urtasun. 2024. SceneControl: Diffusion for controllable traffic scene generation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[212]
Xianchang Luo, Yinxing Xue, Zhenchang Xing, and Jiamou Sun. 2023. PRCBERT: Prompt learning for requirement classification using BERT-based pretrained language models. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). Article 75, 13 pages. DOI:
[213]
Lipeng Ma, Weidong Yang, Bo Xu, Sihang Jiang, Ben Fei, Jiaqing Liang, Mingjie Zhou, and Yanghua Xiao. 2024c. KnowLog: Knowledge enhanced pre-trained language model for log understanding. In Proceedings of the 46th IEEE/ACM International Conference on Software Engineering (ICSE ’24). ACM, 32:1–32:13. DOI:
[214]
Xiao Ma and Wu-Jun Li. 2024. Weighting online decision transformer with episodic memory for offline-to-online reinforcement learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[215]
Yecheng Jason Ma, William Liang, Guanzhi Wang, De-An Huang, Osbert Bastani, Dinesh Jayaraman, Yuke Zhu, Linxi Fan, and Anima Anandkumar. 2024b. Eureka: Human-level reward design via coding large language models. In Proceedings of the 12h International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=IEduRUO55F
[216]
Zeyang Ma, An Ran Chen, Dong Jae Kim, Tse-Hsun Chen, and Shaowei Wang. 2024a. LLMParser: An exploratory study on using large language models for log parsing. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Article 99, 13 pages. DOI:
[217]
Aman Madaan, Niket Tandon, Peter Clark, and Yiming Yang. 2022. Memory-assisted prompt editing to improve GPT-3 after deployment. In Proceedings of the ACL 2022 Workshop on Commonsense Representation and Reasoning.
[218]
Anuradha Madugalla, Yutan Huang, John Grundy, Min Hee Cho, Lasith Koswatta Gamage, Tristan Leao, and Sam Thiele. 2024. Engineering adaptive information graphics for disabled communities: A case study with public space indoor maps. arXiv:2401.05659 [cs.HC].
[219]
Cláudia Mamede, Eduard Pinconschi, and Rui Abreu. 2023. A transformer-based IDE plugin for vulnerability detection. In Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering (ASE ’22). Article 149, 4 pages. DOI:
[220]
Zhao Mandi, Shreeya Jain, and Shuran Song. 2024. RoCo: Dialectic multi-robot collaboration with large language models. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[221]
Antonio Mastropaolo, Matteo Ciniselli, Luca Pascarella, Rosalia Tufano, Emad Aghajani, and Gabriele Bavota. 2024. Towards summarizing code snippets using pre-trained transformers. In Proceedings of the 32th IEEE/ACM International Conference on Program Comprehension.
[222]
Angelos Mavrogiannis, Christoforos Mavrogiannis, and Yiannis Aloimonos. 2024. Cook2LTL: Translating cooking recipes to LTL formulae using large language models. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[223]
Nicola Mc Donnell, Jim Duggan, and Enda Howley. 2023. A genetic programming-based framework for semi-automated multi-agent systems engineering. ACM Transactions on Autonomous and Adaptive Systems 18, 2, Article 6 (May 2023), 30 pages. DOI:
[224]
Matthew B. A. McDermott, Bret Nestor, Peniel N. Argaw, and Isaac S. Kohane. 2023. Event stream GPT: A data pre-processing and modeling library for generative, pre-trained transformers over continuous-time sequences of complex events. In Proceedings of the 37th Conference on Neural Information Processing Systems Datasets and Benchmarks Track.
[225]
Şevval Mehder and Fatma Başak Aydemir. 2022. Classification of Issue Discussions in Open Source Projects Using Deep Language Models. In Proceedings of the IEEE 30th International Requirements Engineering Conference Workshops (REW ’22), 176–182. DOI:
[226]
Luckeciano C. Melo. 2022. Transformers are meta-reinforcement learners. In Proceedings of the International Conference on Machine Learning (ICML ’22).
[227]
Microsoft. 2024. GraphRAG: Graph Retrieval-Augmented Generation. Retrieved July 22, 2024 from https://github.com/microsoft/graphrag
[228]
Tomás Mikolov, Martin Karafiát, Lukás Burget, Jan Cernocký, and Sanjeev Khudanpur. 2010. Recurrent neural network based language model. In Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH ’10). ISCA, 1045–1048. DOI:
[229]
Utkarsh Aashu Mishra and Yongxin Chen. 2023. ReorientDiff: Diffusion model based reorientation for object manipulation. In Proceedings of the RSS 2023 Workshop on Learning for Task and Motion Planning.
[230]
Utkarsh Aashu Mishra, Shangjie Xue, Yongxin Chen, and Danfei Xu. 2023. Generative skill chaining: Long-horizon skill planning with diffusion models. In Proceedings of the 7th Annual Conference on Robot Learning.
[231]
Gabriel Moreno, Cody Kinneer, Ashutosh Pandey, and David Garlan. 2019. DARTSim: An exemplar for evaluation and comparison of self-adaptation approaches for smart cyber-physical systems. In Proceedings of the IEEE/ACM 14th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’19), 181–187. DOI:
[232]
Gabriel A. Moreno, Javier Camara, David Garlan, and Bradley Schmerl. 2016. Efficient decision-making under uncertainty for proactive self-adaptation. In Proceedings of the IEEE International Conference on Autonomic Computing (ICAC ’16), 147–156. DOI:
[233]
Gabriel A. Moreno, Javier Cámara, David Garlan, and Bradley Schmerl. 2015. Proactive self-adaptation under uncertainty: A probabilistic model checking approach. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering, 1–12. DOI:
[234]
Christian Murphy, Gail Kaiser, Ian Vo, and Matt Chu. 2009. Quality assurance of software applications using the in vivo testing approach. In Proceedings of the 2009 International Conference on Software Testing Verification and Validation, 111–120. DOI:
[235]
H. Nakagawa and S. Honiden. 2023. MAPE-K loop-based goal model generation using generative AI. In Proceedings of the IEEE 31st International Requirements Engineering Conference Workshops (REW ’23), 247–251. DOI:
[236]
Daye Nam, Andrew Macvean, Vincent Hellendoorn, Bogdan Vasilescu, and Brad Myers. 2024. Using an LLM to help with code understanding. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Article 97, 13 pages. DOI:
[237]
Nandu Digital Economy Governance Research Center. 2023. Generative AI Development and Governance Observation Report 2023 (Chinese). Observation Report. Nandu Digital Economy Governance Research Center.
[238]
Nathalia Nascimento, Paulo Alencar, and Donald Cowan. 2023. Self-adaptive large language model (LLM)-based multiagent systems. In Proceedings of the IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C ’23), 104–109. DOI:
[239]
Fei Ni, Jianye Hao, Yao Mu, Yifu Yuan, Yan Zheng, Bin Wang, and Zhixuan Liang. 2023. MetaDiffuser: Diffusion model as conditional planner for offline meta-RL. In Proceedings of the 40th International Conference on Machine Learning (ICML’23). JMLR.org, Article 1085, 19 pages.
[240]
Kolby Nottingham, Prithviraj Ammanabrolu, Alane Suhr, Yejin Choi, Hannaneh Hajishirzi, Sameer Singh, and Roy Fox. 2023. Do embodied agents dream of pixelated sheep? embodied decision making using language guided world modelling. In Proceedings of the 40th International Conference on Machine Learning (ICML’23). Article 1096, 15 pages.
[241]
João Paulo Karol Santos Nunes, Shiva Nejati, Mehrdad Sabetzadeh, and Elisa Yumi Nakagawa. 2024. Self-adaptive, requirements-driven autoscaling of microservices. In Proceedings of the 19th Conference on Software Engineering for Adaptive and Self-Managing Systems.
[242]
OpenAI. 2023. Generative Models. Retrieved May 12, 2023 from https://openai.com/index/generative-models/
[243]
OpenAI. 2024. Hello GPT-4o. Retrieved May 14, 2024 from https://openai.com/index/hello-gpt-4o/
[244]
Jiabao Pan, Yan Zhang, Chen Zhang, Zuozhu Liu, Hongwei Wang, and Haizhou Li. 2024. DynaThink: Fast or slow? A dynamic decision-making framework for large language models. arXiv:2407.01009 [cs.CL].
[245]
Ashutosh Pandey, Gabriel A. Moreno, Javier Cámara, and David Garlan. 2016. Hybrid planning for decision making in self-adaptive systems. In Proceedings of the IEEE 10th International Conference on Self-Adaptive and Self-Organizing Systems (SASO), 130–139. DOI:
[246]
Ravi Pandya, Michelle Zhao, Changliu Liu, Reid Simmons, and Henny Admoni. 2024. Multi-agent strategy explanations for human-robot collaboration. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[247]
Emilio Parisotto, Francis Song, Jack Rae, Razvan Pascanu, Caglar Gulcehre, Siddhant Jayakumar, Max Jaderberg, Raphaël Lopez Kaufman, Aidan Clark, Seb Noury, Matthew Botvinick, Nicolas Heess, and Raia Hadsell. 2020. Stabilizing Transformers for Reinforcement Learning. In Proceedings of the 37th International Conference on Machine Learning, Vol. 119. PMLR, 7487–7498.
[248]
Joon Sung Park, Joseph C. O’Brien, Carrie J. Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative agents: Interactive simulacra of human behavior. arXiv:2304.03442 [cs.HC].
[249]
Juan Parra-Ullauri, Antonio García-Domínguez, Nelly Bencomo, and Luis Garcia-Paucar. 2022. History-aware explanations: towards enabling human-in-the-loop in self-adaptive systems. In Proceedings of the 25th International Conference on Model Driven Engineering Languages and Systems: Companion Proceedings (MODELS ’22), 286–295. DOI:
[250]
Tim Pearce, Tabish Rashid, Anssi Kanervisto, Dave Bignell, Mingfei Sun, Raluca Georgescu, Sergio Valcarcel Macua, Shan Zheng Tan, Ida Momennejad, Katja Hofmann, and Sam Devlin. 2023. Imitating human behaviour with diffusion models. In Proceedings of the 11th International Conference on Learning Representations.
[251]
Laura Plein, Wendkûuni C. Ouédraogo, Jacques Klein, and Tegawendé F. Bissyandé. 2024. Automatic generation of test cases based on bug reports: A feasibility study with large language models. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion ’24), 360–361. DOI:
[252]
Michal Pluhacek, Anezka Kazikova, Tomas Kadavy, Adam Viktorin, and Roman Senkerik. 2023. Leveraging large language models for the generation of novel metaheuristic optimization algorithms. In Proceedings of the Companion Conference on Genetic and Evolutionary Computation (GECCO ’23 Companion), 1812–1820. DOI:
[253]
Nico Potyka, Yuqicheng Zhu, Yunjie He, Evgeny Kharlamov, and Steffen Staab. 2024. Robust knowledge extraction from large language models using social choice theory. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’24), 1593–1601.
[254]
Archiki Prasad, Peter Hase, Xiang Zhou, and Mohit Bansal. 2023. GrIPS: Gradient-free, edit-based instruction search for prompting large language models. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics. Association for Computational Linguistics, 3845–3864. DOI:
[255]
Anamaria-Roberta Preda, Christoph Mayr-Dorn, Atif Mashkoor, and Alexander Egyed. 2024. Supporting high-level to low-level requirements coverage reviewing with large language models. In Proceedings of the Mining Software Repositories (MSR) Conference.
[256]
Ethan Pronovost, Meghana Reddy Ganesina, Noureldin Hendy, Zeyu Wang, Andres Morales, Kai Wang, and Nicholas Roy. 2023. Scenario diffusion: Controllable driving scenario generation with diffusion. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[257]
Moschoula Pternea, Prerna Singh, Abir Chakraborty, Yagna Oruganti, Mirco Milletari, Sayli Bapat, and Kebei Jiang. 2024. The RL/LLM taxonomy tree: reviewing synergies between reinforcement learning and large language models. arXiv:2402.01874 [cs.CL].
[258]
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, dahai li, Zhiyuan Liu, and Maosong Sun. 2024. ToolLLM: Facilitating large language models to master 16000+ real-world APIs. In Proceedings of the 12th International Conference on Learning Representations.
[259]
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. 2021. Learning transferable visual models from natural language supervision. arXiv:2103.00020 [cs.CV].
[260]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J. Liu. 2023. Exploring the limits of transfer learning with a unified text-to-text transformer. arXiv:1910.10683 [cs.LG].
[261]
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. 2022. Hierarchical text-conditional image generation with CLIP latents. arXiv:2204.06125 [cs.CV].
[262]
Krishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian Reid, and Niko Suenderhauf. 2023. SayPlan: Grounding large language models using 3D scene graphs for scalable robot task planning. In Proceedings of the 7th Annual Conference on Robot Learning.
[263]
Fabian Ranz, Vera Hummel, and Wilfried Sihn. 2017. Capability-based task allocation in human-robot collaboration. Procedia Manufacturing 9 (2017), 182–189. DOI:
[264]
N. Rao, K. Jain, U. Alon, C. Goues, and V. J. Hellendoorn. 2023. CAT-LM training language models on aligned code and tests. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE ’23), 409–420. DOI:
[265]
Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. 2021. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In Proceedings of the 38th International Conference on Machine Learning, Vol. 139. PMLR, 8857–8868.
[266]
Emily Reif, Crystal Qian, James Wexler, and Minsuk Kahng. 2024. Automatic histograms: Leveraging language models for text dataset exploration. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA ’24). Article 53, 9 pages. DOI:
[267]
D. A. Reynolds and R. C. Rose. 1995. Robust text-independent speaker identification using Gaussian mixture speaker models. IEEE Transactions on Speech and Audio Processing 3, 1 (1995), 72–83. DOI:
[268]
Francisco Ribeiro, José Nuno Macedo, and Kanae Tsushima. 2023. Beyond code generation: The need for type-aware language models. In Proceedings of the IEEE/ACM International Workshop on Automated Program Repair (APR ’23), 21–22. DOI:
[269]
Celian Ringwald. 2024. Learning pattern-based extractors from natural language and knowledge graphs: Applying large language models to wikipedia and linked open data. In Proceedings of the 38th AAAI Conference on Artificial Intelligence (IAAI ’24), Vol. 38, Student Abstracts, Undergraduate Consortium and Demonstrations (EAAI ’24), Vol. 38, 23411–23412. DOI:
[270]
Juan Rocamonde, Victoriano Montesinos, Elvis Nava, Ethan Perez, and David Lindner. 2024. Vision-language models are zero-shot reward models for reinforcement learning. In Proceedings of the 12th International Conference on Learning Representations.
[271]
Kevin Roose. 2022. An A.I.-Generated Picture Won an Art Prize. Artists Aren’t Happy. The New York Times (02 Sept. 2022). Retrieved May 12, 2024 from https://www.nytimes.com/2022/09/02/technology/ai-artificial-intelligence-artists.html
[272]
Enrico Saccon, Ahmet Tikna, Davide De Martini, Edoardo Lamon, Marco Roveri, and Luigi Palopoli. 2024. When Prolog meets generative models: A new approach for managing knowledge and planning in robotic applications. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[273]
Pranab Sahoo, Ayush Kumar Singh, Sriparna Saha, Vinija Jain, Samrat Mondal, and Aman Chadha. 2024. A systematic survey of prompt engineering in large language models: Techniques and applications. arXiv:2402.07927 [cs.AI].
[274]
Md Sadman Sakib and Yu Sun. 2024. From cooking recipes to robot task trees – Improving planning correctness and task efficiency by leveraging LLMs with a knowledge network. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[275]
Raquel Sanchez, Javier Troya, and Javier Camara. 2024. Automated planning for adaptive cyber-physical systems under uncertainty in temporal availability constraints. In Proceedings of the 19th Conference on Software Engineering for Adaptive and Self-Managing Systems.
[276]
Sofia Santos, João Saraiva, and Francisco Ribeiro. 2024. Large language models in automated repair of haskell type errors. In Proceedings of the IEEE/ACM International Workshop on Automated Program Repair (APR ’24).
[277]
K. Sarda. 2023. Leveraging large language models for auto-remediation in microservices architecture. In Proceedings of the IEEE International Conference on Autonomic Computing and Self-Organizing Systems Companion (ACSOS-C ’23), 16–18. DOI:
[278]
Pete Sawyer, Nelly Bencomo, Jon Whittle, Emmanuel Letier, and Anthony Finkelstein. 2010. Requirements-aware systems: A research agenda for RE for self-adaptive systems. In Proceedings of the 18th IEEE International Requirements Engineering Conference, 95–103. DOI:
[279]
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. 2023. Toolformer: Language models can teach themselves to use tools. arXiv:2302.04761 [cs.CL].
[280]
Andreas Schuller, Doris Janssen, Julian Blumenröther, Theresa Maria Probst, Michael Schmidt, and Chandan Kumar. 2024. Generating personas using LLMs and assessing their viability. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA ’24). Article 179, 7 pages. DOI:
[281]
Rie Sera, Hironori Washizaki, Junyan Chen, Yoshiaki Fukazawa, Masahiro Taga, Kazuyuki Nakagawa, Yusuke Sakai, and Kiyoshi Honda. 2024. Development of data-driven persona including user behavior and pain point through clustering with user log of B2B software. In Proceedings of the 17th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE ’24), 1–6.
[282]
Dhruv Shah, Michael Robert Equi, Bażej Osiński, Fei Xia, Brian Ichter, and Sergey Levine. 2023. Navigation with large language models: Semantic guesswork as a heuristic for planning. In Proceedings of the 7th Conference on Robot Learning, Vol. 229. PMLR, 2683–2699.
[283]
Dhruv Shah, Blazej Osinski, Brian Ichter, and Sergey Levine. 2022. LM-Nav: Robotic navigation with large pre-trained models of language, vision, and action. In Proceedings of the 6th Annual Conference on Robot Learning.
[284]
Hao Shao, Letian Wang, Ruobing Chen, Hongsheng Li, and Yu Liu. 2022. Safety-enhanced autonomous driving using interpretable sensor fusion transformer. In Proceedings of the 6th Annual Conference on Robot Learning.
[285]
Lifeng Shen, Weiyu Chen, and James Kwok. 2024. Multi-resolution diffusion models for time series forecasting. In Proceedings of the 12th International Conference on Learning Representations.
[286]
Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. 2023. HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face. arXiv:2303.17580 [cs.CL].
[287]
Stepan Shevtsov, Mihaly Berekmeri, Danny Weyns, and Martina Maggio. 2018. Control-theoretical software adaptation: A systematic literature review. IEEE Transactions on Software Engineering 44, 8 (2018), 784–810. DOI:
[288]
Haochen Shi, Zhiyuan Sun, Xingdi Yuan, Marc-Alexandre Côté, and Bang Liu. 2024b. OPEx: A large language model-powered framework for embodied instruction following. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’24), 2465–2467.
[289]
Jingyu Shi, Rahul Jain, Hyungjun Doh, Ryo Suzuki, and Karthik Ramani. 2024a. An HCI-centric survey and taxonomy of human-generative-AI interactions. arXiv:2310.07127 [cs.HC].
[290]
Xiaoming Shi, Siqiao Xue, Kangrui Wang, Fan Zhou, James Zhang, Jun Zhou, Chenhao Tan, and Hongyuan Mei. 2023. Language models can improve event prediction by few-shot abductive reasoning. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems (NeurIPS ’23).
[291]
Xiao Shou, Debarun Bhattacharjya, Tian Gao, Dharmashankar Subramanian, Oktie Hassanzadeh, and Kristin P Bennett. 2023. Pairwise causality guided transformers for event sequences. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36. Curran Associates, Inc., 46520–46533.
[292]
Jaidev Shriram and Sanjayan Pradeep Kumar Sreekala. 2023. ZINify: Transforming research papers into engaging zines with large language models. In Adjunct Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23 Adjunct). Article 117, 3 pages. DOI:
[293]
Yash Shukla, Wenchang Gao, Vasanth Sarathy, Alvaro Velasquez, Robert Wright, and Jivko Sinapov. 2024. LgTS: Dynamic task sampling using LLM-generated sub-goals for reinforcement learning agents. In Proceedings of the 23rd International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’24), 1736–1744.
[294]
Samira Silva, Patrizio Pelliccione, and Antonia Bertolino. 2024. Self-adaptive testing in the field. ACM Transactions on Autonomous and Adaptive Systems 19, 1 (Feb. 2024), Article 4, 37 pages. DOI:
[295]
Vítor E. Silva Souza, Alexei Lapouchnian, William N. Robinson, and John Mylopoulos. 2011. Awareness requirements for adaptive systems. In Proceedings of the 6th International Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’11), 60–69. DOI:
[296]
Daniel L. Silver, Qiang Yang, and Lianghao Li. 2013. Lifelong machine learning systems: Beyond learning algorithms. In Proceedings of the Lifelong Machine Learning, Papers from the 2013 AAAI Spring Symposium, Vol. SS-13-05. AAAI. Retrieved from http://www.aaai.org/ocs/index.php/SSS/SSS13/paper/view/5802
[297]
F. Santoni De Sio and Giulio Mecacci. 2021. Four responsibility gaps with artificial intelligence: Why they matter and how to address them. Philosophy & Technology 34, 4 (2021), 1057–1084. DOI:
[298]
D. Sobania, M. Briesch, C. Hanna, and J. Petke. 2023. An analysis of the automatic bug fixing performance of ChatGPT. In Proceedings of the IEEE/ACM International Workshop on Automated Program Repair (APR ’23), 23–30. DOI:
[299]
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. 2015. Deep unsupervised learning using nonequilibrium thermodynamics. In Proceedings of the 32nd International Conference on Machine Learning, Vol. 37. PMLR, 2256–2265.
[300]
Yang Song and Stefano Ermon. 2019. Generative modeling by estimating gradients of the data distribution. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 32. Curran Associates, Inc.
[301]
Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. 2021. Score-based generative modeling through stochastic differential equations. In Proceedings of the International Conference on Learning Representations.
[302]
Yuan Sui, Mengyu Zhou, Mingjie Zhou, Shi Han, and Dongmei Zhang. 2024. Table meets LLM: Can large language models understand structured table data? A benchmark and empirical study. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining (WSDM ’24), 645–654. DOI:
[303]
Hao Sun, Alihan Hüyük, and Mihaela van der Schaar. 2024b. Query-dependent prompt evaluation and optimization with offline inverse RL. In Proceedings of the 12th International Conference on Learning Representations.
[304]
Haotian Sun, Yuchen Zhuang, Lingkai Kong, Bo Dai, and Chao Zhang. 2023b. AdaPlanner: Adaptive planning from feedback with language models. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[305]
Jiankai Sun, Yiqi Jiang, Jianing Qiu, Parth Nobel, Mykel J. Kochenderfer, and Mac Schwager. 2023a. Conformal prediction for uncertainty-aware planning with diffusion dynamics model. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36. Curran Associates, Inc., 80324–80337.
[306]
Jingkai Sun, Qiang Zhang, Yiqun Duan, Xiaoyang Jiang, Chong Cheng, and Renjing Xu. 2024d. Prompt, plan, perform: LLM-based humanoid control via quantized imitation learning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[307]
Lichao Sun, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Yuan Li, Chujie Gao, Yixin Huang, Wenhan Lyu, Yixuan Zhang, Xiner Li, Zhengliang Liu, Yixin Liu, Yijue Wang, Zhikun Zhang, Bertie Vidgen, Bhavya Kailkhura, Caiming Xiong, Chaowei Xiao, Chunyuan Li, Eric Xing, Furong Huang, Hao Liu, Heng Ji, Hongyi Wang, Huan Zhang, Huaxiu Yao, Manolis Kellis, Marinka Zitnik, Meng Jiang, Mohit Bansal, James Zou, Jian Pei, Jian Liu, Jianfeng Gao, Jiawei Han, Jieyu Zhao, Jiliang Tang, Jindong Wang, Joaquin Vanschoren, John Mitchell, Kai Shu, Kaidi Xu, Kai-Wei Chang, Lifang He, Lifu Huang, Michael Backes, Neil Zhenqiang Gong, Philip S. Yu, Pin-Yu Chen, Quanquan Gu, Ran Xu, Rex Ying, Shuiwang Ji, Suman Jana, Tianlong Chen, Tianming Liu, Tianyi Zhou, William Wang, Xiang Li, Xiangliang Zhang, Xiao Wang, Xing Xie, Xun Chen, Xuyu Wang, Yan Liu, Yanfang Ye, Yinzhi Cao, Yong Chen, and Yue Zhao. 2024a. TrustLLM: Trustworthiness in large language models. arXiv:2401.05561 [cs.CL].
[308]
Yuqiang Sun, Daoyuan Wu, Yue Xue, Han Liu, Haijun Wang, Zhengzi Xu, Xiaofei Xie, and Yang Liu. 2024c. GPTScan: Detecting logic vulnerabilities in smart contracts by combining GPT with program analysis. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Article 166, 13 pages. DOI:
[309]
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In Proceedings of the 34th International Conference on Machine Learning, Vol. 70. PMLR, 3319–3328.
[310]
Daniel Sykes, William Heaven, Jeff Magee, and Jeff Kramer. 2008. From goals to components: A combined approach to self-management. In Proceedings of the 2008 International Workshop on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’08), 1–8.
[311]
Andrew Szot, Max Schwarzer, Harsh Agrawal, Bogdan Mazoure, Rin Metcalf, Walter Talbott, Natalie Mackraz, R. Devon Hjelm, and Alexander T. Toshev. 2024. Large language models as generalizable policies for embodied tasks. In Proceedings of the 12th International Conference on Learning Representations.
[312]
Shiro Takagi. 2022. On the effect of pre-training for transformer in different modality on offline reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems.
[313]
Weihao Tan, Wentao Zhang, Shanqi Liu, Longtao Zheng, Xinrun Wang, and Bo An. 2024. True knowledge comes from practice: Aligning large language models with embodied environments via reinforcement learning. In Proceedings of the 12th International Conference on Learning Representations.
[314]
Binh Tang and David S. Matteson. 2021. Probabilistic transformer for time series analysis. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 23592–23608.
[315]
Peiwang Tang and Xianchao Zhang. 2023. Infomaxformer: Maximum entropy transformer for long time-series forecasting problem. In Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems (AAMAS ’23), 1670–1678.
[316]
Z. Tang, C. Li, J. Ge, X. Shen, Z. Zhu, and B. Luo. 2021. AST-transformer: Encoding abstract syntax trees efficiently for code summarization. In Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE ’21), 1193–1195. DOI:
[317]
Daniel Tanneberg, Felix Ocker, Stephan Hasler, Joerg Deigmoeller, Anna Belardinelli, Chao Wang, Heiko Wersing, Bernhard Sendhoff, and Michael Gienger. 2024. To help or not to help: LLM-based attentive support for human-robot group interactions. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[318]
Zhengwei Tao, Ting-En Lin, Xiancai Chen, Hangyu Li, Yuchuan Wu, Yongbin Li, Zhi Jin, Fei Huang, Dacheng Tao, and Jingren Zhou. 2024. A survey on self-evolution of large language models. arXiv:2404.14387 [cs.CL].
[319]
Yusuke Tashiro, Jiaming Song, Yang Song, and Stefano Ermon. 2021. CSDI: Conditional score-based diffusion models for probabilistic time series imputation. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 34. Curran Associates, Inc., 24804–24816.
[320]
Xiaoyu Tian, Liangyu Chen, Na Liu, Yaxuan Liu, Wei Zou, Kaijiang Chen, and Ming Cui. 2023. DUMA: A dual-mind conversational agent with fast and slow thinking. arXiv]2310.18075 [cs.CL]
[321]
Emanuel Todorov, Tom Erez, and Yuval Tassa. 2012. MuJoCo: A physics engine for model-based control. In Proceedings of the 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, 5026–5033. DOI:
[322]
Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer. 2023a. Variable discovery with large language models for metamorphic testing of scientific software. In Proceedings of the International Conference on Computational Science (ICCS ’23). Springer Nature, 321–335.
[323]
Christos Tsigkanos, Pooja Rani, Sebastian Müller, and Timo Kehrer. 2023b. Large language models: The next frontier for variable discovery within metamorphic testing? In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER ’23), 678–682. DOI:
[324]
Michele Tufano, Dawn Drain, Alexey Svyatkovskiy, and Neel Sundaresan. 2022. Generating accurate assert statements for unit test cases using pretrained transformers. In Proceedings of the 3rd ACM/IEEE International Conference on Automation of Software Test (AST ’22), 54–64. DOI:
[325]
U.S. Department of Defense. 2023. DoD directive 3000.09, autonomy in weapon systems.
[326]
Vasily Varenov and Aydar Gabdrahmanov. 2021. Security requirements classification into groups using NLP transformers. In Proceedings of the IEEE 29th International Requirements Engineering Conference Workshops (REW ’21), 444–450. DOI:
[327]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 30. Curran Associates, Inc.
[328]
Norha M. Villegas, Gabriel Tamura, Hausi A. Müller, Laurence Duchien, and Rubby Casallas. 2013. DYNAMICO: A Reference Model for Governing Control Objectives and Context Relevance in Self-Adaptive Software Systems. Springer, Berlin, 265–293. DOI:
[329]
Johanna Walker, Elisavet Koutsiana, Michelle Nwachukwu, Albert Meroño Peñuela, and Elena Simperl. 2024. The promise and challenge of large language models for knowledge engineering: Insights from a hackathon. In Extended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems (CHI EA ’24). Article 318, 9 pages. DOI:
[330]
Xingchen Wan, Ruoxi Sun, Hanjun Dai, Sercan Arik, and Tomas Pfister. 2023a. Better zero-shot reasoning with self-adaptive prompting. In Findings of the Association for Computational Linguistics (ACL ’23). Association for Computational Linguistics, 3493–3514. DOI:
[331]
Xingchen Wan, Ruoxi Sun, Hootan Nakhost, Hanjun Dai, Julian Eisenschlos, Sercan Arik, and Tomas Pfister. 2023b. Universal self-adaptive prompting. arXiv: 2305.14926. Retrieved from https://arxiv.org/pdf/2305.14926.pdf
[332]
Bryan Wang, Gang Li, and Yang Li. 2023d. Enabling conversational interaction with mobile ui using large language models. In Proceedings of the Conference on Human Factors in Computing Systems (CHI ’23). Article 432, 17 pages. DOI:
[333]
Bailin Wang, Zi Wang, Xuezhi Wang, Yuan Cao, Rif A. Saurous, and Yoon Kim. 2023f. Grammar prompting for domain-specific language generation with large language models. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[334]
Jindong Wang, Xixu Hu, Wenxin Hou, Hao Chen, Runkai Zheng, Yidong Wang, Linyi Yang, Haojun Huang, Wei Ye, Xiubo Geng, Binxin Jiao, Yue Zhang, and Xing Xie. 2023b. On the robustness of ChatGPT: An adversarial and out-of-distribution perspective. arXiv:2302.12095 [cs.AI].
[335]
Kerong Wang, Hanye Zhao, Xufang Luo, Kan Ren, Weinan Zhang, and Dongsheng Li. 2022. Bootstrapped transformer for offline reinforcement learning. In Proceedings of the Advances in Neural Information Processing Systems.
[336]
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Jirong Wen. 2024b. A survey on large language model based autonomous agents. Frontiers of Computer Science 18, 6 (2024), 186345. DOI:
[337]
Weishi Wang, Yue Wang, Shafiq Joty, and Steven C.H. Hoi. 2023e. RAP-Gen: Retrieval-augmented patch generation with CodeT5 for automatic program repair. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’23), 146–158. DOI:
[338]
Xinyuan Wang, Chenxi Li, Zhen Wang, Fan Bai, Haotian Luo, Jiayou Zhang, Nebojsa Jojic, Eric Xing, and Zhiting Hu. 2024a. PromptAgent: Strategic planning with language models enables expert-level prompt optimization. In Proceedings of the 12th International Conference on Learning Representations.
[339]
Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc V. Le, Ed H. Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. 2023g. Self-consistency improves chain of thought reasoning in language models. In Proceedings of the 11th International Conference on Learning Representations (ICLR ’23). OpenReview.net.
[340]
Yidong Wang, Zhuohao Yu, Wenjin Yao, Zhengran Zeng, Linyi Yang, Cunxiang Wang, Hao Chen, Chaoya Jiang, Rui Xie, Jindong Wang, Xing Xie, Wei Ye, Shikun Zhang, and Yue Zhang. 2024c. PandaLM: An automatic evaluation benchmark for LLM instruction tuning optimization. In Proceedings of the 12th International Conference on Learning Representations.
[341]
Zihao Wang, Shaofei Cai, Guanzhou Chen, Anji Liu, Xiaojian Ma, and Yitao Liang. 2023a. Describe, explain, plan and select: Interactive planning with LLMs enables open-world multi-task agents. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[342]
Zhendong Wang, Jonathan J. Hunt, and Mingyuan Zhou. 2023c. Diffusion policies as an expressive policy class for offline reinforcement learning. In Proceedings of the 11th International Conference on Learning Representations.
[343]
Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou. 2023a. Chain-of-thought prompting elicits reasoning in large language models. arXiv:2201.11903 [cs.CL].
[344]
Yuxiang Wei, Chunqiu Steven Xia, and Lingming Zhang. 2023b. Copiloting the copilots: Fusing large language models with completion engines for automated program repair. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’23), 172–184. DOI:
[345]
Sean Welleck, Jiacheng Liu, Ximing Lu, Hannaneh Hajishirzi, and Yejin Choi. 2022. NaturalProver: Grounded mathematical proof generation with language models. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 35. Curran Associates, Inc., 4913–4927.
[346]
Haomin Wen, Youfang Lin, Yutong Xia, Huaiyu Wan, Qingsong Wen, Roger Zimmermann, and Yuxuan Liang. 2023a. DiffSTG: Probabilistic spatio-temporal graph forecasting with denoising diffusion models. In Proceedings of the 31st ACM International Conference on Advances in Geographic Information Systems (SIGSPATIAL ’23). Article 60, 12 pages. DOI:
[347]
Qingsong Wen, Tian Zhou, Chaoli Zhang, Weiqi Chen, Ziqing Ma, Junchi Yan, and Liang Sun. 2023b. Transformers in time series: A survey. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI ’23), 6778–6786. DOI: Survey Track.
[348]
Danny Weyns. 2020. An Introduction to Self-adaptive Systems : A Contemporary Software Engineering Perspective. Wiley-IEEE Computer Society Pr.
[349]
Danny Weyns, Ilias Gerostathopoulos, Nadeem Abbas, Jesper Andersson, Stefan Biffl, Premek Brada, Tomas Bures, Amleto Di Salle, Matthias Galster, Patricia Lago, Grace Lewis, Marin Litoiu, Angelika Musil, Juergen Musil, Panos Patros, and Patrizio Pelliccione. 2023. Self-adaptation in industry: A survey. ACM Transactions on Autonomous and Adaptive Systems 18, 2 (2023), 44 pages. 1556–4665
[350]
Danny Weyns and Jesper Andersson. 2023. From self-adaptation to self-evolution leveraging the operational design domain. In Proceedings of the IEEE/ACM 18th Symposium on Software Engineering for Adaptive and Self-Managing Systems (SEAMS ’23), 90–96. DOI:
[351]
Danny Weyns, Thomasb Back, Rene Vidal, Xin Yao, and Ahmed Nabile Belbachir. 2022a. The vision of self-evolving computing systems. Journal of Integrated Design and Process Science 26, 3–4 (2022), 351–367. DOI:
[352]
Danny Weyns, Ilias Gerostathopoulos, Barbora Buhnova, Nicolás Cardozo, Emilia Cioroaica, Ivana Dusparic, Lars Grunske, Pooyan Jamshidi, Christine Julien, Judith Michael, Gabriel Moreno, Shiva Nejati, Patrizio Pelliccione, Federico Quin, Genaina Rodrigues, Bradley Schmerl, Marco Vieira, Thomas Vogel, and Rebekka Wohlrab. 2022b. Guidelines for artifacts to support industry-relevant research on self-adaptation. ACM SIGSOFT Software Engineering Notes 47, 4 (Sep. 2022), 18–24. DOI:
[353]
D. Weyns, U. Iftikhar, S. Malek, and J. Andersson. 2012a. Claims and supporting evidence for self-adaptive systems: A literature study. In Proceedings of the 7th International Symposium on Software Engineering for Adaptive and Self-Managing Systems, 89–98. DOI:
[354]
Danny Weyns and Usman M. Iftikhar. 2023. ActivFORMS: A formally founded model-based approach to engineer self-adaptive systems. ACM Transactions on Software Engineering and Methodology 32, 1, Article 12 (Feb. 2023), 48 pages. DOI:
[355]
Danny Weyns, Sam Malek, and Jesper Andersson. 2012b. FORMS: Unifying reference model for formal specification of distributed self-adaptive systems. ACM Transactions on Autonomous and Adaptive Systems (TAAS) 7, 1, Article 8 (may 2012), 61 pages. DOI:
[356]
Danny Weyns, Sam Malek, and Jesper Andersson. 2012b. FORMS: Unifying reference model for formal specification of distributed self-adaptive systems. ACM Transactions on Autonomous and Adaptive Systems 7, 1 (2012), 8:1–8:61.
[357]
Danny Weyns, Bradley Schmerl, Vincenzo Grassi, Sam Malek, Raffaela Mirandola, Christian Prehofer, Jochen Wuttke, Jesper Andersson, Holger Giese, and Karl M. Göschka. 2013. On Patterns for Decentralized Control in Self-Adaptive Systems. Springer, Berlin, 76–107. DOI:
[358]
Jon Whittle, Pete Sawyer, Nelly Bencomo, Betty H. C. Cheng, and Jean-Michel Bruel. 2009. RELAX: Incorporating uncertainty into the specification of self-adaptive systems. In Proceedings of the 2009 17th IEEE International Requirements Engineering Conference, 79–88.
[359]
Nathan Gabriel Wood. 2023. Autonomous weapon systems and responsibility gaps: a taxonomy. Ethics and Information Technology 25, 1 (2023), 16. DOI:
[360]
Haoze Wu, Clark Barrett, and Nina Narodytska. 2023. Lemur: Integrating large language models in automated program verification. In Proceedings of the 3rd Workshop on Mathematical Reasoning and AI at NeurIPS ’23.
[361]
Sifan Wu, Xi Xiao, Qianggang Ding, Peilin Zhao, Ying Wei, and Junzhou Huang. 2020. Adversarial sparse transformer for time series forecasting. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 33. Curran Associates, Inc., 17105–17115.
[362]
Tongshuang Wu, Michael Terry, and Carrie Jun Cai. 2022b. AI chains: Transparent and controllable human-AI interaction by chaining large language model prompts. In Proceedings of the Conference on Human Factors in Computing Systems (CHI ’22). Article 385, 22 pages.
[363]
Yuhuai Wu, Albert Qiaochu Jiang, Wenda Li, Markus Norman Rabe, Charles E Staats, Mateja Jamnik, and Christian Szegedy. 2022a. Autoformalization with large language models. In Proceedings of the Advances in Neural Information Processing Systems.
[364]
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, and Tao Gui. 2023. The rise and potential of large language model based agents: A survey. arXiv:2309.07864 [cs.AI].
[365]
C. Xia, Y. Ding, and L. Zhang. 2023a. The plastic surgery hypothesis in the era of large language models. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), 522–534. DOI:
[366]
Chunqiu Steven Xia, Matteo Paltenghi, Jia Le Tian, Michael Pradel, and Lingming Zhang. 2024. Fuzz4All: Universal fuzzing with large language models. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Article 126, 13 pages. DOI:
[367]
Chunqiu Steven Xia, Yuxiang Wei, and Lingming Zhang. 2023b. Automated program repair in the era of large pre-trained language models. In Proceedings of the IEEE/ACM 45th International Conference on Software Engineering (ICSE ’23), 1482–1494. DOI:
[368]
Ziyang Xiao, Dongxiang Zhang, Yangjun Wu, Lilin Xu, Yuan Jessica Wang, Xiongwei Han, Xiaojin Fu, Tao Zhong, Jia Zeng, Mingli Song, and Gang Chen. 2024. Chain-of-experts: When LLMs meet complex operations research problems. In Proceedings of the 12th International Conference on Learning Representations.
[369]
Tian Xie, Xiang Fu, Octavian-Eugen Ganea, Regina Barzilay, and Tommi S. Jaakkola. 2022. Crystal diffusion variational autoencoder for periodic material generation. In Proceedings of the International Conference on Learning Representations.
[370]
Tianbao Xie, Siheng Zhao, Chen Henry Wu, Yitao Liu, Qian Luo, Victor Zhong, Yanchao Yang, and Tao Yu. 2024. Text2Reward: Reward Shaping with Language Models for Reinforcement Learning. In The Twelfth International Conference on Learning Representations.
[371]
Hanwei Xu, Yujun Chen, Yulun Du, Nan Shao, Wang Yanggang, Haiyu Li, and Zhilin Yang. 2022a. GPS: Genetic prompt search for efficient few-shot learning. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 8162–8171. DOI:
[372]
Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. 2022b. Anomaly transformer: Time series anomaly detection with association discrepancy. In Proceedings of the International Conference on Learning Representations.
[373]
Mengdi Xu, Yuchen Lu, Yikang Shen, Shun Zhang, Ding Zhao, and Chuang Gan. 2023. Hyper-decision transformer for efficient online policy adaptation. In Proceedings of the 11th International Conference on Learning Representations.
[374]
Xiaohan Xu, Ming Li, Chongyang Tao, Tao Shen, Reynold Cheng, Jinyang Li, Can Xu, Dacheng Tao, and Tianyi Zhou. 2024a. A survey on knowledge distillation of large language models. arXiv:2402.13116 [cs.CL].
[375]
Zelai Xu, Chao Yu, Fei Fang, Yu Wang, and Yi Wu. 2024b. Language agents with reinforcement learning for strategic play in the Werewolf game. arXiv:2310.18940 [cs.AI].
[376]
Zhiyi Xue, Liangguo Li, Senyue Tian, Xiaohong Chen, Pingping Li, Liangyu Chen, Tingting Jiang, and Min Zhang. 2024. Domain knowledge is all you need: A field deployment of LLM-powered test case generation in FinTech domain. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion ’24), 314–315. DOI:
[377]
Taku Yamagata, Ahmed Khalil, and Raúl Santos-Rodríguez. 2023. Q-learning decision transformer: Leveraging dynamic programming for conditional sequence modelling in offline RL. In Proceedings of the 40th International Conference on Machine Learning (ICML’ 23). JMLR.org, Article 1625, 19 pages.
[378]
Huan Yan and Yong Li. 2023. A survey of generative AI for intelligent transportation systems. arXiv:2312.08248 [cs.AI].
[379]
Aidan Z. H. Yang, Claire Le Goues, Ruben Martins, and Vincent Hellendoorn. 2024a. Large language models for test-free fault localization. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering (ICSE ’24). Article 17, 12 pages. DOI:
[380]
Chengrun Yang, Xuezhi Wang, Yifeng Lu, Hanxiao Liu, Quoc V Le, Denny Zhou, and Xinyun Chen. 2024c. Large language models as optimizers. In Proceedings of the 12th International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=Bb4VGOWELI
[381]
Fangkai Yang, Wenjie Yin, Lu Wang, Tianci Li, Pu Zhao, Bo Liu, Paul Wang, Bo Qiao, Yudong Liu, Mårten Björkman, Saravan Rajmohan, Qingwei Lin, and Dongmei Zhang. 2023d. Diffusion-based time series data imputation for cloud failure prediction at Microsoft 365. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE ’23), 2050–2055. DOI:
[382]
Heng Yang and Ke Li. 2023a. InstOptima: Evolutionary multi-objective instruction optimization via large language model-based instruction operators. In Findings of the Association for Computational Linguistics (EMNLP ’23). Association for Computational Linguistics, 13593–13602. DOI:
[383]
Jingda Yang and Ying Wang. 2024. Toward auto-modeling of formal verification for nextG protocols: A multimodal cross- and self-attention large language model approach. IEEE Access 12 (2024), 27858–27869. DOI:
[384]
Ling Yang, Zhilong Zhang, Yang Song, Shenda Hong, Runsheng Xu, Yue Zhao, Wentao Zhang, Bin Cui, and Ming-Hsuan Yang. 2023e. Diffusion models: A comprehensive survey of methods and applications. ACM Computing Surveys 56, 4, Article 105 (Nov. 2023), 39 pages. DOI:
[385]
Yaodong Yang, Guangyong Chen, Weixun Wang, Xiaotian Hao, Jianye HAO, and Pheng-Ann Heng. 2022. Transformer-based working memory for multiagent reinforcement learning with action parsing. In Proceedings of the Advances in Neural Information Processing Systems.
[386]
Zhun Yang, Adam Ishay, and Joohyung Lee. 2023a. Learning to Solve Constraint Satisfaction Problems with Recurrent Transformer. In Proceedings of the 11th International Conference on Learning Representations.
[387]
Zhenjie Yang, Xiaosong Jia, Hongyang Li, and Junchi Yan. 2023b. LLM4Drive: A survey of large language models for autonomous driving. arXiv:2311.01043 [cs.AI].
[388]
Zhutian Yang, Jiayuan Mao, Yilun Du, Jiajun Wu, Joshua B. Tenenbaum, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. 2023c. Compositional diffusion-based continuous constraint solvers. In Proceedings of the 7th Annual Conference on Robot Learning.
[389]
Ziyi Yang, Shreyas S. Raman, Ankit Shah, and Stefanie Tellex. 2024b. Plug in the safety chip: Enforcing constraints for LLM-driven robot agents. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[390]
Jianan Yao, Ziqiao Zhou, Weiteng Chen, and Weidong Cui. 2023b. Leveraging large language models for automated proof synthesis in rust. arXiv:2311.03739 [cs.FL].
[391]
Shunyu Yao, Howard Chen, John Yang, and Karthik R. Narasimhan. 2022. WebShop: Towards scalable real-world web interaction with grounded language agents. In Proceedings of the Advances in Neural Information Processing Systems.
[392]
Shunyu Yao, Dian Yu, Jeffrey Zhao, Izhak Shafran, Tom Griffiths, Yuan Cao, and Karthik Narasimhan. 2023a. Tree of Thoughts: Deliberate problem solving with large language models. In Proceedings of the Advances in Neural Information Processing Systems, Vol. 36. Curran Associates, Inc., 11809–11822.
[393]
Yifan Yao, Jinhao Duan, Kaidi Xu, Yuanfang Cai, Zhibo Sun, and Yue Zhang. 2024. A survey on large language model (LLM) security and privacy: The good, the bad, and the ugly. High-Confidence Computing 4, 2 (Jun. 2024), 100211. DOI:
[394]
Takuma Yoneda, Jiading Fang, Peng Li, Huanyu Zhang, Tianchong Jiang, Shengjie Lin, Ben Picker, David Yunis, Hongyuan Mei, and Matthew R. Walter. 2024. Statler: State-maintaining language models for embodied reasoning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[395]
Chenning Yu, Qingbiao Li, Sicun Gao, and Amanda Prorok. 2023b. Accelerating multi-agent planning using graph transformers with bounded suboptimality. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’23), 3432–3439. DOI:
[396]
Tianhe Yu, Deirdre Quillen, Zhanpeng He, Ryan Julian, Karol Hausman, Chelsea Finn, and Sergey Levine. 2019. Meta-world: A benchmark and evaluation for multi-task and meta reinforcement learning. In Proceedings of the Conference on Robot Learning (CoRL ’19).
[397]
Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, and Fei Xia. 2023a. Language to rewards for robotic skill synthesis. arXiv:2306.08647 [cs.RO].
[398]
Wei Yuan, Quanjun Zhang, Tieke He, Chunrong Fang, Nguyen Quoc Viet Hung, Xiaodong Hao, and Hongzhi Yin. 2022. CIRCLE: continual repair across programming languages. In Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’22), 678–690. DOI:
[399]
Yanjie Ze, Gu Zhang, Kangning Zhang, Chenyuan Hu, Muhan Wang, and Huazhe Xu. 2024. 3D diffusion policy: Generalizable visuomotor policy learning via simple 3D representations. In Proceedings of the Workshop on 3D Visual Representations for Robot Manipulation (ICRA ’24).
[400]
Ailing Zeng, Muxi Chen, Lei Zhang, and Qiang Xu. 2023a. Are transformers effective for time series forecasting? In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, 11121–11128. DOI:
[401]
Fanlong Zeng, Wensheng Gan, Yongheng Wang, Ning Liu, and Philip S. Yu. 2023b. Large language models for robotics: A survey. arXiv:2311.07226 [cs.RO].
[402]
Bin Zhang, Hangyu Mao, Jingqing Ruan, Ying Wen, Yang Li, Shao Zhang, Zhiwei Xu, Dapeng Li, Ziyue Li, Rui Zhao, Lijuan Li, and Guoliang Fan. 2023c. Controlling large language model-based agents for large-scale decision-making: An actor-critic approach. arXiv:2311.13884 [cs.AI].
[403]
Chenyuan Zhang, Hao Liu, Jiutian Zeng, Kejing Yang, Yuhong Li, and Hui Li. 2024d. Prompt-enhanced software vulnerability detection using ChatGPT. In Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engineering: Companion Proceedings (ICSE-Companion ’24), 276–277. DOI:
[404]
Chenrui Zhang, Lin Liu, Chuyuan Wang, Xiao Sun, Hongyu Wang, Jinpeng Wang, and Mingchen Cai. 2024c. PREFER: Prompt ensemble learning via feedback-reflect-refine. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 19525–19532. DOI:
[405]
Ceyao Zhang, Kaijie Yang, Siyi Hu, Zihao Wang, Guanghe Li, Yihang Sun, Cheng Zhang, Zhaowei Zhang, Anji Liu, Song-Chun Zhu, Xiaojun Chang, Junge Zhang, Feng Yin, Yitao Liang, and Yaodong Yang. 2024e. ProAgent: Building proactive cooperative agents with large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38, 17591–17599. DOI:
[406]
Hao Zhang, Hao Wang, and Zhen Kan. 2023d. Exploiting transformer in sparse reward reinforcement learning for interpretable temporal logic motion planning. IEEE Robotics and Automation Letters 8, 8 (Aug. 2023), 4831–4838. DOI:
[407]
Jesse Zhang, Jiahui Zhang, Karl Pertsch, Ziyi Liu, Xiang Ren, Minsuk Chang, Shao-Hua Sun, and Joseph J. Lim. 2023f. Bootstrap your own skills: Learning to solve new tasks with large language model guidance. In Proceedings of the 7th Annual Conference on Robot Learning.
[408]
Lei Zhang, Yuge Zhang, Kan Ren, Dongsheng Li, and Yuqing Yang. 2024f. MLCopilot: Unleashing the power of large language models in solving machine learning tasks. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics, Vol. 1: Long Papers. Association for Computational Linguistics, 2931–2959.
[409]
Mingyue Zhang, Jialong Li, Nianyu Li, Eunsuk Kang, and Kenji Tei. 2024b. User-driven adaptation: Tailoring autonomous driving systems with dynamic preferences. In Proceedings of the ACM International Conference on Human Factors in Computing Systems. ACM.
[410]
Mingyue Zhang, Jialong Li, Haiyan Zhao, Kenji Tei, Shinichi Honiden, and Zhi Jin. 2021. A meta reinforcement learning-based approach for self-adaptive system. In Proceedings of the IEEE International Conference on Autonomic Computing and Self-Organizing Systems (ACSOS ’21), 1–10.
[411]
Quanjun Zhang, Chunrong Fang, Yang Xie, Yaxin Zhang, Yun Yang, Weisong Sun, Shengcheng Yu, and Zhenyu Chen. 2023a. A survey on large language models for software engineering. arXiv:2312.15223 [cs.SE].
[412]
Shujian Zhang, Chengyue Gong, Lemeng Wu, Xingchao Liu, and Mingyuan Zhou. 2023b. AutoML-GPT: Automatic machine learning with GPT. arXiv:2305.02499 [cs.CL].
[413]
Tianjun Zhang, Xuezhi Wang, Denny Zhou, Dale Schuurmans, and Joseph E. Gonzalez. 2023e. TEMPERA: Test-time prompt editing via reinforcement learning. In Proceedings of the 11th International Conference on Learning Representations.
[414]
Yunhao Zhang and Junchi Yan. 2023. Crossformer: Transformer utilizing cross-dimension dependency for multivariate time series forecasting. In Proceedings of the 11th International Conference on Learning Representations.
[415]
Ziyin Zhang, Chaoyu Chen, Bingchang Liu, Cong Liao, Zi Gong, Hang Yu, Jianguo Li, and Rui Wang. 2024a. Unifying the perspectives of NLP and software engineering: A survey on language models for code. arXiv:2311.07989 [cs.CL].
[416]
Andrew Zhao, Daniel Huang, Quentin Xu, Matthieu Lin, Yong-Jin Liu, and Gao Huang. 2024. ExpeL: LLM agents are experiential learners. In Proceedings of the 38th AAAI Conference on Artificial Intelligence (AAAI ’24), Proceedings of the 36th Conference on Innovative Applications of Artificial Intelligence (IAAI ’24), Proceedings of the 14th Symposium on Educational Advances in Artificial Intelligence (EAAI ’14). AAAI Press, 19632–19642. DOI:
[417]
Haiyan Zhao, Hanjie Chen, Fan Yang, Ninghao Liu, Huiqi Deng, Hengyi Cai, Shuaiqiang Wang, Dawei Yin, and Mengnan Du. 2023a. Explainability for large language models: A survey. arXiv:2309.01029 [cs.CL].
[418]
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023b. A survey of large language models. arXiv:2303.18223 [cs.CL].
[419]
Qinqing Zheng, Amy Zhang, and Aditya Grover. 2022. Online decision transformer. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162. PMLR, 27042–27059.
[420]
Ziyuan Zhong, Davis Rempe, Yuxiao Chen, Boris Ivanovic, Yulong Cao, Danfei Xu, Marco Pavone, and Baishakhi Ray. 2023a. Language-guided traffic simulation via scene-level diffusion. arXiv:2306.06344 [cs.RO].
[421]
Ziyuan Zhong, Davis Rempe, Danfei Xu, Yuxiao Chen, Sushant Veer, Tong Che, Baishakhi Ray, and Marco Pavone. 2023b. Guided conditional diffusion for controllable traffic simulation. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’23), 3560–3566. DOI:
[422]
Haotian Zhou, Yunhan Lin, Longwu Yan, Jihong Zhu, and Huasong Min. 2024a. LLM-BT: Performing robotic adaptive tasks based on large language models and behavior trees. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[423]
Jin Peng Zhou, Charles E. Staats, Wenda Li, Christian Szegedy, Kilian Q Weinberger, and Yuhuai Wu. 2024c. Don’t trust: Verify – Grounding LLM quantitative reasoning with autoformalization. In Proceedings of the 12th International Conference on Learning Representations. Retrieved from https://openreview.net/forum?id=V5tdi14ple
[424]
Siyuan Zhou, Yilun Du, Shun Zhang, Mengdi Xu, Yikang Shen, Wei Xiao, Dit-Yan Yeung, and Chuang Gan. 2023a. Adaptive online replanning with diffusion models. In Proceedings of the 37th Conference on Neural Information Processing Systems.
[425]
Tian Zhou, Ziqing Ma, Qingsong Wen, Xue Wang, Liang Sun, and Rong Jin. 2022. FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the 39th International Conference on Machine Learning, Vol. 162. PMLR, 27268–27286.
[426]
Tian Zhou, Peisong Niu, Xue Wang, Liang Sun, and Rong Jin. 2023b. One fits all: Power general time series analysis by pretrained LM. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems (NeurIPS ’23).
[427]
Xin Zhou, Ting Zhang, and David Lo. 2024d. Large language model for vulnerability detection: Emerging results and future directions. In Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER ’24), 47–51. DOI:
[428]
Zhehua Zhou, Jiayang Song, Kunpeng Yao, Zhan Shu, and Lei Ma. 2024b. ISR-LLM: Iterative self-refined large language model for long-horizon sequential task planning. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA ’24).
[429]
Fangqi Zhu, Jun Gao, Changlong Yu, Wei Wang, Chen Xu, Xin Mu, Min Yang, and Ruifeng Xu. 2023b. A generative approach for script event prediction via contrastive fine-tuning. In Proceedings of the 27th AAAI Conference on Artificial Intelligence and 35h Conference on Innovative Applications of Artificial Intelligence and 13th Symposium on Educational Advances in Artificial Intelligence(AAAI’ 23/IAAI ’23/EAAI ’23). Article 1576, 9 pages. DOI:
[430]
Tianchen Zhu, Yue Qiu, Haoyi Zhou, and Jianxin Li. 2023c. Towards long-delayed sparsity: Learning a better transformer through reward redistribution. In Proceedings of the 32nd International Joint Conference on Artificial Intelligence (IJCAI ’23), 4693–4701. DOI:
[431]
Xizhou Zhu, Yuntao Chen, Hao Tian, Chenxin Tao, Weijie Su, Chenyu Yang, Gao Huang, Bin Li, Lewei Lu, Xiaogang Wang, Yu Qiao, Zhaoxiang Zhang, and Jifeng Dai. 2023a. Ghost in the minecraft: Generally capable agents for open-world environments via large language models with text-based knowledge and memory. arXiv:2305.17144 [cs.AI].
[432]
Zhengbang Zhu, Hanye Zhao, Haoran He, Yichao Zhong, Shenyu Zhang, Haoquan Guo, Tingting Chen, and Weinan Zhang. 2024. Diffusion models for reinforcement learning: A survey. arXiv:2311.01223 [cs.LG].
[433]
Yuchen Zhuang, Xiang Chen, Tong Yu, Saayan Mitra, Victor Bursztyn, Ryan A. Rossi, Somdeb Sarkhel, and Chao Zhang. 2024. ToolChain*: Efficient action space navigation in large language models with A* search. In Proceedings of the 12th International Conference on Learning Representations.
[434]
Orr Zohar, Shih-Cheng Huang, Kuan-Chieh Wang, and Serena Yeung. 2023. LOVM: Language-only vision model selection. arXiv:2306.08893 [cs.CV].
[435]
Hao Zou, Zae Myung Kim, and Dongyeop Kang. 2023. A survey of diffusion models in natural language processing. arXiv:2305.14671 [cs.CL].
[436]
Łukasz Czajka and Cezary Kaliszyk. 2018. Hammer for Coq: Automation for dependent type theory. Journal of Automated Reasoning 61, 1 (Jun. 2018), 423–453. DOI:

Recommendations

Comments

Please enable JavaScript to view thecomments powered by Disqus.

Information & Contributors

Information

Published In

cover image ACM Transactions on Autonomous and Adaptive Systems
ACM Transactions on Autonomous and Adaptive Systems  Volume 19, Issue 3
September 2024
242 pages
EISSN:1556-4703
DOI:10.1145/3613578
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 September 2024
Online AM: 20 August 2024
Accepted: 27 June 2024
Revised: 17 June 2024
Received: 17 June 2024
Published in TAAS Volume 19, Issue 3

Check for updates

Author Tags

  1. Self-Adaptive Systems
  2. MAPE
  3. Generative AI
  4. Large Language Model
  5. diffusion model
  6. survey

Qualifiers

  • Research-article

Funding Sources

  • Grant-in-Aid for Young Scientists (Early Bird) of Waseda Research Institute for Science and Engineering, the Special Research Projects of Waseda University
  • National Natural Science Foundation of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 3,010
    Total Downloads
  • Downloads (Last 12 months)3,010
  • Downloads (Last 6 weeks)2,446
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media