How to stream from a question-answering chain
This guide assumes familiarity with the following:
Often in Q&A applications it’s important to show users the sources that were used to generate the answer. The simplest way to do this is for the chain to return the Documents that were retrieved in each generation.
We’ll be using the LLM Powered Autonomous Agents blog post by Lilian Weng for retrieval content this notebook.
Setup
Dependencies
We’ll use an OpenAI chat model and embeddings and a Memory vector store in this walkthrough, but everything shown here works with any ChatModel or LLM, Embeddings, and VectorStore or Retriever.
We’ll use the following packages:
npm install --save langchain @langchain/openai cheerio
We need to set environment variable OPENAI_API_KEY
:
export OPENAI_API_KEY=YOUR_KEY
LangSmith
Many of the applications you build with LangChain will contain multiple steps with multiple invocations of LLM calls. As these applications get more and more complex, it becomes crucial to be able to inspect what exactly is going on inside your chain or agent. The best way to do this is with LangSmith.
Note that LangSmith is not needed, but it is helpful. If you do want to use LangSmith, after you sign up at the link above, make sure to set your environment variables to start logging traces:
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_API_KEY=YOUR_KEY
# Reduce tracing latency if you are not in a serverless environment
# export LANGCHAIN_CALLBACKS_BACKGROUND=true
Chain with sources
Here is Q&A app with sources we built over the LLM Powered Autonomous Agents blog post by Lilian Weng in the Returning sources guide:
import "cheerio";
import { CheerioWebBaseLoader } from "@langchain/community/document_loaders/web/cheerio";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { MemoryVectorStore } from "langchain/vectorstores/memory";
import { OpenAIEmbeddings, ChatOpenAI } from "@langchain/openai";
import { pull } from "langchain/hub";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { formatDocumentsAsString } from "langchain/util/document";
import {
RunnableSequence,
RunnablePassthrough,
RunnableMap,
} from "@langchain/core/runnables";
import { StringOutputParser } from "@langchain/core/output_parsers";
const loader = new CheerioWebBaseLoader(
"https://lilianweng.github.io/posts/2023-06-23-agent/"
);
const docs = await loader.load();
const textSplitter = new RecursiveCharacterTextSplitter({
chunkSize: 1000,
chunkOverlap: 200,
});
const splits = await textSplitter.splitDocuments(docs);
const vectorStore = await MemoryVectorStore.fromDocuments(
splits,
new OpenAIEmbeddings()
);
// Retrieve and generate using the relevant snippets of the blog.
const retriever = vectorStore.asRetriever();
const prompt = await pull<ChatPromptTemplate>("rlm/rag-prompt");
const llm = new ChatOpenAI({ model: "gpt-3.5-turbo", temperature: 0 });
const ragChainFromDocs = RunnableSequence.from([
RunnablePassthrough.assign({
context: (input) => formatDocumentsAsString(input.context),
}),
prompt,
llm,
new StringOutputParser(),
]);
let ragChainWithSource = new RunnableMap({
steps: { context: retriever, question: new RunnablePassthrough() },
});
ragChainWithSource = ragChainWithSource.assign({ answer: ragChainFromDocs });
await ragChainWithSource.invoke("What is Task Decomposition");
{
question: "What is Task Decomposition",
context: [
Document {
pageContent: "Fig. 1. Overview of a LLM-powered autonomous agent system.\n" +
"Component One: Planning#\n" +
"A complicated ta"... 898 more characters,
metadata: {
source: "https://lilianweng.github.io/posts/2023-06-23-agent/",
loc: { lines: [Object] }
}
},
Document {
pageContent: 'Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are'... 887 more characters,
metadata: {
source: "https://lilianweng.github.io/posts/2023-06-23-agent/",
loc: { lines: [Object] }
}
},
Document {
pageContent: "Agent System Overview\n" +
" \n" +
" Component One: Planning\n" +
" "... 850 more characters,
metadata: {
source: "https://lilianweng.github.io/posts/2023-06-23-agent/",
loc: { lines: [Object] }
}
},
Document {
pageContent: "Resources:\n" +
"1. Internet access for searches and information gathering.\n" +
"2. Long Term memory management"... 456 more characters,
metadata: {
source: "https://lilianweng.github.io/posts/2023-06-23-agent/",
loc: { lines: [Object] }
}
}
],
answer: "Task decomposition is a technique used to break down complex tasks into smaller and simpler steps fo"... 230 more characters
}
Let’s see what this prompt actually looks like. You can also view it in the LangChain prompt hub:
console.log(prompt.promptMessages.map((msg) => msg.prompt.template).join("\n"));
You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
Streaming final outputs
With LCEL, we can stream outputs as they are generated:
for await (const chunk of await ragChainWithSource.stream(
"What is task decomposition?"
)) {
console.log(chunk);
}
{ question: "What is task decomposition?" }
{
context: [
Document {
pageContent: "Fig. 1. Overview of a LLM-powered autonomous agent system.\n" +
"Component One: Planning#\n" +
"A complicated ta"... 898 more characters,
metadata: {
source: "https://lilianweng.github.io/posts/2023-06-23-agent/",
loc: { lines: [Object] }
}
},
Document {
pageContent: 'Task decomposition can be done (1) by LLM with simple prompting like "Steps for XYZ.\\n1.", "What are'... 887 more characters,
metadata: {
source: "https://lilianweng.github.io/posts/2023-06-23-agent/",
loc: { lines: [Object] }
}
},
Document {
pageContent: "Agent System Overview\n" +
" \n" +
" Component One: Planning\n" +
" "... 850 more characters,
metadata: {
source: "https://lilianweng.github.io/posts/2023-06-23-agent/",
loc: { lines: [Object] }
}
},
Document {
pageContent: "(3) Task execution: Expert models execute on the specific tasks and log results.\n" +
"Instruction:\n" +
"\n" +
"With "... 539 more characters,
metadata: {
source: "https://lilianweng.github.io/posts/2023-06-23-agent/",
loc: { lines: [Object] }
}
}
]
}
{ answer: "" }
{ answer: "Task" }
{ answer: " decomposition" }
{ answer: " is" }
{ answer: " a" }
{ answer: " technique" }
{ answer: " used" }
{ answer: " to" }
{ answer: " break" }
{ answer: " down" }
{ answer: " complex" }
{ answer: " tasks" }
{ answer: " into" }
{ answer: " smaller" }
{ answer: " and" }
{ answer: " simpler" }
{ answer: " steps" }
{ answer: "." }
{ answer: " It" }
{ answer: " can" }
{ answer: " be" }
{ answer: " done" }
{ answer: " through" }
{ answer: " various" }
{ answer: " methods" }
{ answer: " such" }
{ answer: " as" }
{ answer: " using" }
{ answer: " prompting" }
{ answer: " techniques" }
{ answer: "," }
{ answer: " task" }
{ answer: "-specific" }
{ answer: " instructions" }
{ answer: "," }
{ answer: " or" }
{ answer: " human" }
{ answer: " inputs" }
{ answer: "." }
{ answer: " Another" }
{ answer: " approach" }
{ answer: " involves" }
{ answer: " outsourcing" }
{ answer: " the" }
{ answer: " planning" }
{ answer: " step" }
{ answer: " to" }
{ answer: " an" }
{ answer: " external" }
{ answer: " classical" }
{ answer: " planner" }
{ answer: "." }
{ answer: "" }
We can add some logic to compile our stream as it’s being returned:
const output = {};
let currentKey: string | null = null;
for await (const chunk of await ragChainWithSource.stream(
"What is task decomposition?"
)) {
for (const key of Object.keys(chunk)) {
if (output[key] === undefined) {
output[key] = chunk[key];
} else {
output[key] += chunk[key];
}
if (key !== currentKey) {
console.log(`\n\n${key}: ${JSON.stringify(chunk[key])}`);
} else {
console.log(chunk[key]);
}
currentKey = key;
}
}
question: "What is task decomposition?"
context: [{"pageContent":"Fig. 1. Overview of a LLM-powered autonomous agent system.\nComponent One: Planning#\nA complicated task usually involves many steps. An agent needs to know what they are and plan ahead.\nTask Decomposition#\nChain of thought (CoT; Wei et al. 2022) has become a standard prompting technique for enhancing model performance on complex tasks. The model is instructed to “think step by step” to utilize more test-time computation to decompose hard tasks into smaller and simpler steps. CoT transforms big tasks into multiple manageable tasks and shed lights into an interpretation of the model’s thinking process.\nTree of Thoughts (Yao et al. 2023) extends CoT by exploring multiple reasoning possibilities at each step. It first decomposes the problem into multiple thought steps and generates multiple thoughts per step, creating a tree structure. The search process can be BFS (breadth-first search) or DFS (depth-first search) with each state evaluated by a classifier (via a prompt) or majority vote.","metadata":{"source":"https://lilianweng.github.io/posts/2023-06-23-agent/","loc":{"lines":{"from":176,"to":181}}}},{"pageContent":"Task decomposition can be done (1) by LLM with simple prompting like \"Steps for XYZ.\\n1.\", \"What are the subgoals for achieving XYZ?\", (2) by using task-specific instructions; e.g. \"Write a story outline.\" for writing a novel, or (3) with human inputs.\nAnother quite distinct approach, LLM+P (Liu et al. 2023), involves relying on an external classical planner to do long-horizon planning. This approach utilizes the Planning Domain Definition Language (PDDL) as an intermediate interface to describe the planning problem. In this process, LLM (1) translates the problem into “Problem PDDL”, then (2) requests a classical planner to generate a PDDL plan based on an existing “Domain PDDL”, and finally (3) translates the PDDL plan back into natural language. Essentially, the planning step is outsourced to an external tool, assuming the availability of domain-specific PDDL and a suitable planner which is common in certain robotic setups but not in many other domains.\nSelf-Reflection#","metadata":{"source":"https://lilianweng.github.io/posts/2023-06-23-agent/","loc":{"lines":{"from":182,"to":184}}}},{"pageContent":"Agent System Overview\n \n Component One: Planning\n \n \n Task Decomposition\n \n Self-Reflection\n \n \n Component Two: Memory\n \n \n Types of Memory\n \n Maximum Inner Product Search (MIPS)\n \n \n Component Three: Tool Use\n \n Case Studies\n \n \n Scientific Discovery Agent\n \n Generative Agents Simulation\n \n Proof-of-Concept Examples\n \n \n Challenges\n \n Citation\n \n References","metadata":{"source":"https://lilianweng.github.io/posts/2023-06-23-agent/","loc":{"lines":{"from":112,"to":146}}}},{"pageContent":"(3) Task execution: Expert models execute on the specific tasks and log results.\nInstruction:\n\nWith the input and the inference results, the AI assistant needs to describe the process and results. The previous stages can be formed as - User Input: {{ User Input }}, Task Planning: {{ Tasks }}, Model Selection: {{ Model Assignment }}, Task Execution: {{ Predictions }}. You must first answer the user's request in a straightforward manner. Then describe the task process and show your analysis and model inference results to the user in the first person. If inference results contain a file path, must tell the user the complete file path.","metadata":{"source":"https://lilianweng.github.io/posts/2023-06-23-agent/","loc":{"lines":{"from":277,"to":280}}}}]
answer: ""
Task
decomposition
is
a
technique
used
to
break
down
complex
tasks
into
smaller
and
simpler
steps
.
It
can
be
done
through
various
methods
such
as
using
prompting
techniques
,
task
-specific
instructions
,
or
human
inputs
.
Another
approach
involves
outsourcing
the
planning
step
to
an
external
classical
planner
.
"answer"
Next steps
You’ve now learned how to stream responses from a QA chain.
Next, check out some of the other how-to guides around RAG, such as how to add chat history.