The analyzer and planner play pivotal roles in self-adaptation. The main tasks of the analyzer are exploring possible configurations for adaptation (i.e., adaptation options) and evaluating them, while the main tasks of the planner are selecting the best adaptation option based on the adaptation goals and generating a plan to adapt the managed system for this new configuration [
Weyns, 2020]. However, it is often not easy to distinguish between these roles as the functions of the analyzer and the planner may be integrated (often referred to as decision-making). Hence, we deal with them together. We start with describing how LLMs have the potential to enhance specific aspects of different aspects of engineering SASs based on the “seven waves” of research interests within the research community [
Weyns, 2020].
3 This discussion extends from
Section 4.2.1 through
Section 4.2.5. Next, we examine how LLMs have the potential to augment current planning that is generally used in SASs in
Section 4.2.6. Finally, we introduce two new planning paradigms for SASs that leverage the direct use of LLMs and diffusion models as planners respectively. These paradigms are described in
Sections 4.2.7 and
4.2.8.
4.2.1 Architecture-Based Adaptation.
Architecture-based adaptation centers on leveraging software architecture to realize self-adaptation, reflected in two complementary functions. First architecture allows abstracting the design of SASs through layers and system components. A seminal model in this approach is the three-layer reference model of Kramer and Magee [
Sykes et al., 2008], which delineates the system’s operations across three layers: (a) the goal management layer, responsible for generating action plans; (b) the change management layer, tasked with configuring components per these plans; and (c) the component layer, handling the operations of these components.
Formal Model for Self-Adaptation (FORMS) formalizes this structure [
Weyns et al., 2012b]. Second, architecture enables the system to exploit high-level models to reason about the adaptation options, potentially system wide. Characteristic work in this area over time are Rainbow [
Garlan et al., 2004], Models at Runtime [
Blair et al., 2009], QoSMOS [
Calinescu et al., 2011], proactive adaptation [
Moreno et al., 2015], and ActivFORMS [
Iftikhar and Weyns, 2014].
Recent developments in LLMs reflect similar principles, which can be considered as CoTs with external tool calls [
Inaba et al., 2023]. For specific problems or goals, an LLM initially segments the problem into sub-problems either sequentially or hierarchically, selects appropriate components, often APIs, for addressing each sub-problem, and accurately deploys and calls these components. For instance, HuggingGPT [
Shen et al., 2023] uses LLMs as controllers to orchestrate existing AI models with language interfaces. HuggingGPT selects AI models based on their functional descriptions from Hugging Face and employs them for executing complex tasks across language, vision, and speech domains, demonstrating robust performance. Another example, ToolLLM [
Qin et al., 2024] tackles problem-solving by generating sequences of API calls from a pool of 16,464 real-world APIs.
Furthermore, some studies focus specifically on the aspect of component selection, akin to the change management layer.
Schick et al. [2023] introduces Toolformer, a model trained explicitly to determine which APIs to use, the timing of their invocation, and the parameters to be passed.
Zohar et al. [2023] introduces Language-Only Vision Model Selection, which facilitates model selection and performance prediction based solely on textual descriptions of the application. Lastly,
Alsayed et al. [2024] proposes MicroRec, a framework designed to recommend or select microservices using information from README files and Dockerfiles.
Zhuang et al. [2024] considers the API call space as a decision tree, where nodes represent API function calls and their cost functions, and uses the A* algorithm to achieve efficient call paths.
4.2.2 Requirements-Driven Adaptation.
Requirement-driven adaptation puts the emphasis on the requirements as the driver of adaptation, treating them as first-class citizens. Notable methods include RELAX, a language that facilitates the relaxation of requirements to address uncertainties [
Whittle et al., 2009], and awareness and evolution requirements reified in the ZANSHIN framework, which introduced meta-requirements for determining adaptation and its actual execution respectively [
Silva Souza et al., 2011]. We explore the potential of GenAI through three key aspects of requirement management: specification, operationalization, and change.
Requirement Specification. Specifying requirements involves defining the objectives that the system should fulfill. Central to self-adaptation are quality requirements [
Weyns et al., 2012a]. In this context, LLMs may significantly alleviate the modeling burden. For example, LLMs have been used to convert requirements expressed in natural language into formal specification languages such as
Linear Temporal Logic (LTL) or a user-given domain-specific model language, as demonstrated in
Izquierdo et al. [2024] and
Yang et al. [2024b].
Requirement Operationalization and Traceability. This aspect refers to aligning or synchronizing system elements with dynamic requirements, which is essential in requirement-driven adaptation [
Sawyer et al., 2010]. As the traceability between high-level goals and components has been discussed in architecture-based adaptation, here we discuss the linking within requirements and the linking from requirements to the code level. For linking within requirements,
Preda et al. [2024] applies LLM to the task of High-Level to Low-Level Requirements Coverage Reviewing, verifying LLM’s high understanding ability in mapping between high-level abstract requirements and low-level scenario-specific requirements. For linking to the code level, T-BERT [
Lin et al., 2021] effectively creates trace links between source code and natural language artifacts, achieving F1 scores between 0.71 and 0.93 across various datasets. Similarly, BERT4RE [
Ajagbe and Zhao, 2022] fine-tunes BERT to support establishing requirements traceability links for a wide range of requirements.
Requirement Change. Requirement change is a crucial aspect of an adaptive system’s capability to modify its objectives based on changes, particularly in the environmental context, representing a significant challenge within requirement-driven adaptation [
Weyns, 2020]. LLMs have shown promising potential in addressing this challenge from three perspectives: Firstly, LLMs have been extensively utilized in RL, particularly in dynamic and complex environments, with a focus on reward design and reward shaping. These models have demonstrated capabilities that surpass manually designed rewards. For instance,
Kwon et al. [2023] validates the consistency between LLM-generated rewards and user’s objectives under zero-shot or few-shot conditions.
Xie et al. [2024] emphasizes generating dense reward functions based on natural language descriptions of system goals and environmental representations. These ideas can be directly applied to dynamic requirement adjustments in adaptive systems. Secondly, requirements extraction and analysis often require inputs from multiple perspectives, including end-users, engineers, and domain experts. To address this, Nakagawa and Honiden [2023] proposed a multi-LLM agent framework that enables LLM agents to assume various roles and iteratively refine system requirements through discussions. Originally designed for the requirements engineering phase, this framework is equally applicable to runtime requirements adaptations by equipping agents with up-to-date runtime context. Moreover, in situations involving requirements conflicts, negotiation or debate-based approaches [
Chan et al., 2024a;
Hunt et al., 2024] have shown to be potentially more effective than traditional discussion methods. Finally, leveraging LLMs’ capabilities for natural language interaction allows them to effectively capture and integrate user preferences based on runtime feedback into the system’s requirements. This aspect is not discussed here but is covered in detail in
Section 5.1.
4.2.3 Guarantees under Uncertainty.
“Guarantees Under Uncertainty” focuses on ensuring that an SAS complies with its adaptation goals despite the inherent uncertainties it faces. Formal verification techniques such as quantitative verification [
Calinescu et al., 2011], statistical model checking [
Weyns and Iftikhar, 2023], and proactive adaptation using probabilistic model checking [
Moreno et al., 2016] are extensively studied for their abilities to provide evidence of assurance compliance with its requirements at runtime.
To the best of our knowledge, there is no research on using LLMs to directly enhance verification processes. However, several studies demonstrate how LLMs can automate or assist the modeling activities for model checking, potentially lowering entry barriers for developers. For instance, Yang and Wang [2024] employs LLMs to convert natural language network protocol descriptions into quantifiable dependency graphs and formal models, aiding the formal verification of next-generation network protocols. Some other studies aimed at converting natural language into LTL format specifications [
Izquierdo et al., 2024;
Mavrogiannis et al., 2024;
Yang et al., 2024b].
Furthermore, the use of LLMs in theorem proof (in both the context of mathematics and programs) has also seen initial efforts.
Welleck et al. [2022] fine-tunes GPT-3 for Mathematical Proof Generation, verifying its correctness rate of about 40% in short proofs (2–6 steps).
Han et al. [2022] extract training data from kernel-level proof to improve the Transformer’s (next-step proof) tactic prediction, addressing the scarcity of training data for formal theorem proof. Thor [
Jiang et al., 2022] allows a language model-based theorem prover to additionally call automated theorem provers (namely hammers [Lukasz
Czajka and Kaliszyk, 2018]) for premise selection, achieving performance comparable to existing SOTA while reducing computational demand.
First et al. [2023] proposes Baldur, a fine-tuned LLM for generating entire proofs, which proves to be as effective as search-based techniques but without the associated high costs. Baldur has also demonstrated capabilities in proof repair by utilizing additional context from previous failed attempts and error messages, proving an additional 8.7% of theorems compared to Thor. Additionally, [
Wu et al., 2022a;
Zhou et al., 2024c] then attempt to automatically define mathematical problems into formal specifications such as Isabelle (a formal theorem proving environment). Regarding program verification, LEMUR [
Wu et al., 2023] combines LLMs and automated reasoners, where LLMs are employed to propose program invariants in the form of sub-goals, and then reasoners are used to verify their Boolean properties.
Yao et al. [2023b] explore the use of LLMs to synthesize invariants and other proof structures necessary for demonstrating program correctness within the Verus framework, significantly reducing the effort required to (manually) write entry-level proof code.
4.2.5 Learning from Experience.
Learning from experience in SASs refers to the use of
Machine Learning (ML) techniques to manage the growing scale and increasing complexity of uncertainty [
Gheibi et al., 2021a]. A representative example is reducing large search or adaptation spaces, thereby enabling formal methods to efficiently complete analysis and planning within a designated time window [
Gheibi et al., 2021b;
Jamshidi et al., 2019]. We present three potential aspects of integrating LLMs and diffusion models to enhance ML applications in SASs: (i) using LLMs to boost ML model performance, (ii) utilizing LLMs to improve RL, and (iii) employing LLMs or diffusion models to reduce the adaptation space.
Enhancing ML. The literature in this domain can be categorized into four types, each aiming to automate different aspects of ML: (i) ML pipeline generation: Literature such as [
Xu et al., 2024b;
Zhang et al., 2023b] focuses on automating the entire ML pipeline, from data processing to model architecture and hyperparameter tuning, enhancing overall ML performance. (ii) Data annotation: [
Ding et al., 2022] explores the performance of GPT-3 in automating data labeling. (iii) Algorithm and model selection: MLCopilot [
Zhang et al., 2024f] applies experiential reasoning to recommend effective models for new tasks by analyzing historical data on task performance, code, and accuracy. (iv) Feature engineering automation:
4 Tools like CAAFE [
Hollmann et al., 2023] automate feature engineering by generating context-aware features based on dataset characteristics and iteratively updating features based on performance feedback. Integrating LLMs into the ML model construction process can not only reduce the manual effort required in ML model construction but could also potentially improve the model’s performance. Additionally, such LLM-based automated ML also has the potential to facilitate lifelong learning and model updates at the runtime phase [
Gheibi and Weyns, 2024;
Silver et al., 2013].
LLMs have been used to augment RL in the following ways: (i) Reward function: As previously discussed in the Requirement Adaptation (
Section 4.2.2), LLMs can automate the design of reward functions, demonstrating higher performance and faster convergence speed than expert-designed reward function [
Kwon et al., 2023;
Sun et al., 2024d;
Xie et al., 2024;
Yu et al., 2023a]; (ii) Providing sub-goals or skills: LLMs can utilize their high-level planning abilities to guide RL agents by defining intermediate tasks. Exploring with LLMs [
Du et al., 2023], for example, encourages agents to explore strategically significant behaviors, like locating a key before attempting to open a door. Relevant studies include [
Dalal et al., 2024;
Ma et al., 2024b;
Melo, 2022;
Rocamonde et al., 2024;
Shukla et al., 2024;
Tan et al., 2024;
Zhang et al., 2023d, 2023f]. This type of study could enhance the performance of RL in scenarios that require multiple skills or long-term planning; (iii) Policy: LLMs or Transformers can decrease the expenses associated with offline RL training by directly serving as demonstration policies [
Carta et al., 2023;
Szot et al., 2024;
Wang et al., 2022]; and (iv) State representation or quality function: Transformer could serve as representation of state [
Hu et al., 2021; Lee and Moon 2023;
Parisotto et al., 2020;
Yang et al., 2022;
Zhang et al., 2023d] or quality function [
Chebotar et al., 2023;
Gallici et al., 2023] to enhance the performance, scalability, and transferability of RL.
Diffusion models have also been explored for enhancing RL, serving in three different roles: (i) Data synthesizer: Diffusion models are employed to synthesize data for training due to the prevalent issue of data scarcity. Multi-Task Diffusion Model [
He et al., 2023] leverages the extensive knowledge available in multi-task datasets, performing implicit knowledge sharing among tasks, with experimental results indicating significant enhancements in generating data for unseen tasks. (ii) Policy: Diffusion-QL [
Wang et al., 2023c] innovatively employs a conditional diffusion model to express policies, integrating Q-learning guidance into the reverse diffusion chain to optimize action selection.
Kang et al. [2023] enhances the sampling efficiency of Diffusion-QL by strengthening the diffusion policy. Similarly,
Chen et al. [2023a] decouples policy learning into behavior learning and action evaluation. This approach allows for improving policy expressivity by incorporating the distributional expressivity of a diffusion-based behavior model; (iii) Planner: Diffusion models serve as planners, enhancing model-based RL by estimating action sequences that maximize cumulative rewards [
Ni et al., 2023]. Detailed methodologies are discussed in
Section 4.2.8.
Adaptation Space Reduction via LLMs. LLMs’ extensive knowledge also offers opportunities to reduce or condense the analysis and planning space of SASs semantically.
Nottingham et al. [2023] applies LLMs to hypothesize, verify, and refine an
Abstract World Model (AWM), thus abstracting the state space to enhance the training efficiency of RL agents.
Rana et al. [2023] uses semantic search in robot planning tasks involving multiple floors and rooms to prune the planning space, thus speeding up traditional planning techniques.
4.2.6 Enhancing Existing Planning Techniques.
This section explores how LLMs have the potential to enhance four existing planning methods.
Search-Based Planning. Search-based planning involves algorithms that systematically explore spaces of possible actions or configurations to identify sequences that achieve specific goals [
Harman et al., 2012]. The design of heuristics to improve the practicality and efficiency of these searches is a key focus. For instance,
Yu et al. [2023b] proposes a Graph Transformer as a heuristic function for Multi-Agent Planning, which can be trained in environments with fewer agents and generalized to situations with more agents. For LLM,
Shah et al. [2023] utilizes “semantic guesswork” as a guiding heuristic for robot planning, such as guiding the robot to head to the kitchen for the task “find gas stove”.
Dai et al. [2024] uses LLM to generate and translate multi-resolution (i.e., hierarchical) LTL, such as building, floor, and room as different resolutions, within a multi-resolution multi-heuristic A* algorithm. LLAMBO [
Liu et al., 2024a] utilizes the knowledge of LLMs to enhance zero-shot warmstarting in Bayesian optimization.
Evolutionary Algorithms (EAs). Although EAs are a form of search method, they are discussed separately here due to their distinct characteristics and widespread application. EAs, inspired by natural evolution and genetics, are known for their global search capabilities and adaptability to various problem types [
Li et al., 2024a; Mc
Donnell et al., 2023]. Enhancements via LLMs in EAs focus on search operators like LLM-based crossover, mutation, and selection [
Cai et al., 2024b]. A representative example, [
Liu et al., 2024b], demonstrates how LLMs can first select parent solutions from the current population, and then facilitate crossover and mutation processes to generate offspring solutions. The experiments indicate achieving competitive performance in small-scale, single-objective problems like the traveling salesman problem with 20 nodes. Similarly,
Guo et al. [2024c] employs LLMs as evolutionary search operators to automatically generate optimization algorithms for the traveling salesman problem, showing that LLM-generated heuristic algorithms surpass traditional greedy heuristics. Yang and Li [2023a] proposes a decomposition-based multi-objective EA framework, using LLMs to manage the reproduction of individuals within decomposed subproblems.
Game Theory. Game theory provides a mathematical framework to analyze strategic interactions among rational decision-makers and is extensively applied in adversarial settings, such as security [
Chan et al., 2024b;
Li et al., 2024b]. Leveraging the natural language and understanding capabilities of LLMs, game theory can now be “realized” directly through natural language instead of mathematical definitions, broadening its application scope to include areas like social simulation.
Fan et al. [2024a] conducted a systematic analysis of LLMs’ rationality in game theory, assessing their performance in three classical games focused on (a) clear desire, (b) belief refinement, and (c) optimal actions. The study highlighted that even advanced models like GPT-4 require enhancements in these areas. Furthermore, developments in game theory benchmarks and platforms have been made to better evaluate LLMs’ game-playing capabilities. Challenges remain, as [
Fan et al., 2024a] pointed out, particularly in strengthening the rationality of LLMs in game-theoretic settings. Enhancing LLMs’ performance through targeted prompt engineering, such as incorporating explicit desire and belief information, could significantly improve their rationality. Additionally, while traditional game theory still relies on mathematical definitions, the efficacy of LLMs within this conventional framework has yet to be fully ascertained.
Swarm Algorithm. Inspired by biological phenomena such as ant colonies and fish schooling, swarm intelligence focuses on the collective behavior of decentralized, self-organized systems and has recently seen renewed interest by the research community [
Bozhinoski, 2024]. The integration of LLMs into swarm intelligence is still nascent, with [
Pluhacek et al., 2023] being the only study we found in our review. This research explores the automation of hybrid swarm intelligence optimization algorithms using LLMs, tackling the challenge posed by the exponential growth in the number of hybrid (swarm) algorithms due to the diversity of base (swarm) algorithms.
4.2.7 Language Model as Planner.
Given the above background, LLMs’ reasoning capabilities and broad knowledge further position them as potentially powerful, generalized planners. We outline four unique paradigms in LLM-based planning:
Transofrmer as Planner. Prior to the adoption of LLMs for planning, several studies already conceptualized planning as a sequence modeling problem, thereby allowing the use of Transformers as planners.
Decision Transformer (DT) [
Chen et al., 2021a] is a foundational work in this area. It aligns with RL and trains a Transformer to output optimal actions based on expected returns (rewards), past states, and actions, achieving performance that surpassed the then state-of-the-art model-free offline RL methods. From this foundation, many improvements have been derived: Online DT [
Zheng et al., 2022] further combines offline pre-training with online fine-tuning, Weighting Online DT [
Ma and Li, 2024] introduces an episodic memory mechanism to enhance sample efficiency during online fine-tuning. Multi-Game DT is trained on large, diverse datasets, enabling near-human performance in up to 46 Atari games. Generalized DT [
Furuta et al., 2022] addresses a wide range of “hindsight information-matching problems,” such as imitation learning and state-marginal matching. Hyper-DT [
Xu et al., 2023] incorporates an adaptation module into DT, which uses a hyper-network to initialize its parameters based on task demonstrations, effectively adapting to new tasks. Constrained DT [
Liu et al., 2023b] achieves dynamic adjustments between safety and performance during deployment. Q-learning DT [
Yamagata et al., 2023] enhances DT performance when only sub-optimal trajectories are included in the dataset by using dynamic programming (Q-learning) to label training data.
Zhu et al. [2023c] decomposes long-delayed rewards into each timestep, where the decomposition of rewards is described as a globally optimal bi-level optimization problem, thereby enhancing the performance of DT in settings with delayed rewards. It is important to note that these studies can also be viewed as a new realization of RL, where Transformer pre-training is employed to replace traditional methods of fitting value functions or computing policy gradients.
Additionally, Transformers have been utilized as planners in the following applications.
Yang et al. [2023a] trains a Recurrent Transformer to enable logical reasoning on constraint satisfaction problems. Takagi [2022] explores the impact of different modalities on Transformer performance, investigating why models pre-trained on image data perform poorly. TIMAT [
Kang et al., 2024] extracts temporal information and models
Multi-Agent RL (MARL) as a sequential model, its advantage is its ability to plan for an arbitrary number of agents. MetaMorph [
Gupta et al., 2022] trained Universal Controllers for exponentially morphable modular robots, demonstrating the Transformer’s combinatorial generalization capabilities.
Collective Intelligence. Collective intelligence, also referred to as crowdsourcing or self-collaboration in some literature, utilizes the wisdom of crowds to achieve consensus-driven decision-making through discussion, debate, or voting [
Ferreira et al., 2024]. Here, multiple agents or roles are often enabled by various fine-tuned LLMs or prompted by different contexts.
Zhang et al. [2023c] integrates the Actor-Critic concept from RL into LLM multi-agent crowdsourcing, highlighting its potential to cut hallucinations and reduce token usage costs. RoCo [
Mandi et al., 2024] promotes information exchange and task reasoning among robots in multi-robot planning by facilitating discussions.
Shi et al. [2024b] offers a concept that is similar to the MAPE loop, involving three agents working together to complete tasks, which include (i) observing to collect environmental data, (ii) decomposing instructions for planning, and (iii) using skills to execute tasks.
Chen et al. [2024e] explores automated expert recruitment (deciding what kind of domain expert is needed for the task and then generating their persona) and various forms of crowdsourcing (democratic or hierarchical).
Guo et al. [2024b] evaluates the impact of designated leadership in LLM-agent organizations, demonstrating some interesting results include (a) in small teams, higher efficiency can be achieved with less communication cost; (b) agents can elect their own leader and dynamically adjust leadership via communication; and (c) agents spontaneously engage in activities that mimic human behaviors, such as reporting task progress to the leader agent. This study also introduces a criticize-reflect framework to evaluate and adjust organizational structures. Dong [2024] explores the high costs and negative impacts of misinformation in large-scale democratic discussions. This paradigm offers new decision-making avenues, which may be particularly suitable for decentralized SASs [
Weyns et al., 2013].
Experience Accumulation. Experience accumulation, also called lifelong learning in some studies [
Silver et al., 2013], enables agents to use LLMs to gather experience from both failures and successes, learning to improve future planning.
For failed experiences, LLMs or human analyses can identify the causes of failures, reflecting on these insights and integrating them into future planning cycles. This approach is also known in some studies as planning with feedback or self-reflection.
Madaan et al. [2022] records instances of LLM misunderstandings along with user feedback, enhancing prompt accuracy for future queries by integrating past clarifications.
Li et al. [2022a] refers to this as an “active data collection process,” iterating strategies through interactions with the environment based on past failed experiences.
Huang et al. [2022] refers to this process as “inner monologue”.
Wang et al. [2023a] introduces the Describe, Explain, Plan, and Select framework, where an LLM describes the plan execution process and provides self-explanations upon encountering failures, facilitating effective error correction.
Zhang et al. [2024c] propose the Prompt Ensemble learning via Feedback-Reflect-Refine method, which uses a feedback mechanism to reflect on planning inadequacies and generates new prompts for iterative refinement.
Yang et al. [2024c] treats LLMs as optimizers to solve optimization problems described in natural language, where previously generated solutions and their outcomes are used to prompt the LLM to generate new solutions.
For successful experiences, LLMs store these in memory or a skill pool for later retrieval and reuse in similar scenarios.
Zhu et al. [2023a] introduces a three-step process for LLM-based memory reuse: (a) during each game scenario, once the goal is achieved, the executed plan is stored; (b) summarizing common reference plans from multiple scenarios for more generalized situations; and (c) creating new plans based on these reference plans when similar goals arise. Over time, as these summaries accumulate, the effectiveness of the LLM-based planner increases. Similarly,
Zhao et al. [2024] propose ExpeL (Experiential Learning), which enhances task success rates through experience gathering and insight extraction.
LLMs As Tool Makers (LATM) [
Cai et al., 2024a] approaches from a tool maker’s perspective, enabling LLMs to create and utilize tools, which are implemented as Python functions. Moreover, LATM attempts to utilize different LLMs to create tools of varying complexity, thereby reducing the cost of tool production. AdaPlanner [
Sun et al., 2023b] introduces skill filtering, which involves comparing the performance of including versus not including past successful experiences in prompts to determine the generalizability of these experiences.
Optimizing Prompting for Black-Box LLMs. Prompt engineering is crucial in maximizing the planning capabilities of LLMs as it directly impacts the model’s understanding and response to tasks [
Sahoo et al., 2024]. However, LLMs often operate as a black box to users, particularly in the context of LLM as a service (e.g., accessing LLMs through an API). Beyond the previously discussed prompt patterns such as CoT, self-consistency, ToTs, and GoT, recent studies have treated prompt design as an optimization problem to enhance the LLM’s planning performance. These studies can be categorized into four types: (i) RL-based optimization: TEMPERA [
Zhang et al., 2023e] treats prompt optimization as an RL challenge, where the action space includes editing instructions, in-context examples, and verbalizers. The rewards are gauged by the performance improvements from these edits. Similarly, RLPrompt [
Deng et al., 2022] trains a policy network to generate effective prompts, noting that optimized prompts sometimes appear as “gibberish” that defies standard grammatical conventions. Additionally, Prompt-OIRL [
Sun et al., 2024b] leverages an expert dataset and inverse RL to derive a reward model that facilitates prompt evaluations; (ii)
Evolutionary Algorithm (EA)-based optimization: Employing EAs for gradient-free prompt optimization, several methodologies have emerged. Gradient-free Instructional Prompt Search [
Prasad et al., 2023], Genetic Prompt Search [
Xu et al., 2022a], and EvoPrompt [
Guo et al., 2024c] utilize the robust optimization capabilities of EAs. InstOptima [
Yang and Li, 2023a] extends this approach by considering multi-objective goals, evaluating both performance and additional metrics like instruction length; (iii) Incorporating classic planning ideas into prompt: Classic planning principles have also been integrated into prompt engineering. PromptAgent [
Wang et al., 2024a] treats the design space of prompts as a planning problem and uses Monte Carlo Tree Search to strategically explore high-quality prompts, where experiences of failure during interaction with the environment are used to define the rewards in the search.
Hazra et al. [2024] introduces the SayCanPay framework, where LLMs (a) generate candidate actions based on a goal and initial observation (“Say”), (b) an affordance model evaluates the feasibility of these actions (“Can”), and (c) the most feasible and cost-effective plan is selected using a combined score as a heuristic (“Pay”). Here, Can and Pay are independent models that require domain-specific training to ensure the alignment of plans with the current environment. Furthermore, combining hybrid planning (“fast and slow”) [
Pandey et al., 2016] and hierarchical planning,
Lin et al. [2023a] and
Liu et al. [2024f] employs a dual-LLM framework where a detailed, reasoning-focused LLM (“slow mind”) for detailed planning or teammate’s intentions interpretation, and a lightweight LLM (“fast mind”) generates reactive policies and macro actions; and (iv) Self-adaptive prompting: Self-adaptive prompting refers to an approach tailored for zero-shot learning, designed to automatically optimize prompt design. The concept involves initially using LLMs to generate pseudo-demonstrations in a zero-sample setting. Generally, several candidates for pseudo-demonstrations are first generated, and the most effective are then selected for implementing ICL based on metrics such as consistency and logit entropy. Key studies include
Consistency-based Self-adaptive Prompting (COSP) [
Wan et al., 2023a] and
Universal Self-adaptive Prompting (USP) [
Wan et al., 2023b]. Experimental results indicate that COSP enhances performance by an average of 15% over the zero-shot baseline, and both COSP and USP have demonstrated comparable or even superior performance to few-shot baselines in certain tasks.
4.2.8 Diffusion Model as Planner.
Diffusion models have recently been applied for use in planning tasks.
Janner et al. [2022] pioneered this approach by reinterpreting diffusion-based image inpainting as a method for coherent planning strategies, demonstrating the model’s capability in long-horizon decision-making and its adaptability to unseen environments, as demonstrated in 2D maze experiments. Subsequently, diffusion has been extensively applied in motion planning for robotic arms [
Mishra and Chen, 2023;
Pearce et al., 2023;
Ze et al., 2024] and quadruped robots [
Liu et al., 2024c], as well as continuous constraint solvers [
Yang et al., 2023c].
Additionally, further developments have been made in enhancing different aspects of diffusion models. For enhancing long-range decision-making capabilities, Generative Skill Chaining [
Mishra et al., 2023] introduces a method where individual skills are modeled as separate diffusion models and sequentially chained to address long-horizon goals. This chaining process involves generating post-condition states of one skill that satisfy the pre-conditions of the subsequent skill. Regarding uncertainty-aware planning, Dynamics-informed Diffusion [
Cachay et al., 2023] couples probabilistic temporal dynamics forecasting with the diffusion steps, and PlanCP [
Sun et al., 2023a] quantifies the uncertainty of diffusion dynamics models using Conformal Prediction and modifies the loss function for model training.
Chen et al. [2024b] introduces a hierarchical diffuser strategy that employs a “jumpy” high-level planning technique with a broader receptive field and reduced computational demands, effectively directing the lower-level diffuser through strategic sub-goals. Similarly,
Li et al. [2023d] proposes a hierarchical diffusion method, which includes a reward-conditional goal diffuser for subgoal discovery and a goal-conditional trajectory diffuser for generating the corresponding action sequence of subgoals.
Zhou et al. [2023a] focuses on online replanning, where the timing of replanning is determined based on the diffusion model’s estimated likelihood of existing generated plans, and the replanning is based on existing trajectories to ensure that new plans follow the same goal state as the original trajectory.
Jin et al. [2023] introduces a hierarchical semantic graph for fine-grained control of generation, including overall movement, local actions, and action details, to improve the granularity of generated controls.