所有文章 > AI驱动 > 利用LangChain与OpenLLM构建基于自定义知识库的聊天机器人
利用LangChain与OpenLLM构建基于自定义知识库的聊天机器人

利用LangChain与OpenLLM构建基于自定义知识库的聊天机器人

您能否创建一个能够回答关于您产品或服务问题的聊天机器人?如果可以使用现有的知识库来训练聊天机器人,那将会如何?

在本教程中,我们将构建一个聊天机器人,它能够回答有关Skyscanner服务的问题。我们将利用公司的FAQ部分来提取问题和答案。接着,我们会借助LangChain库来训练这个聊天机器人,使其能够在单个T4 GPU上实时回答问题。在这个过程中,我们还会使用免费的语言模型和嵌入向量。

设置

让我们从安装所需的依赖项开始:

!pip install -Uqqq pip --progress-bar off
!pip install -qqq langchain==0.0.228 --progress-bar off
!pip install -qqq chromadb==0.3.26 --progress-bar off
!pip install -qqq sentence-transformers==2.2.2 --progress-bar off
!pip install -qqq auto-gptq==0.2.2 --progress-bar off
!pip install -qqq einops==0.6.1 --progress-bar off
!pip install -qqq unstructured==0.8.0 --progress-bar off
!pip install -qqq transformers==4.30.2 --progress-bar off
!pip install -qqq torch==2.0.1 --progress-bar off

以下是我们将使用的导入列表:

from pathlib import Path

import torch
from auto_gptq import AutoGPTQForCausalLM
from langchain.chains import ConversationalRetrievalChain
from langchain.chains.question_answering import load_qa_chain
from langchain.document_loaders import DirectoryLoader
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.llms import HuggingFacePipeline
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from transformers import AutoTokenizer, GenerationConfig, TextStreamer, pipeline

数据

聊天机器人的自定义知识源自Skyscanner帮助中心的常见问题解答部分。我们将从中提取12个问题和对应的答案,并为它们创建单独的文本文件。接下来,让我们编写一个辅助函数来完成文件创建的任务。

questions_dir = Path("skyscanner")
questions_dir.mkdir(exist_ok=True, parents=True)


def write_file(question, answer, file_path):
text = f"""
Q: {question}
A: {answer}
""".strip()
with Path(questions_dir / file_path).open("w") as text_file:
text_file.write(text)

我们将为每个问题和答案采用的格式如下:

Q: Sample question?
A: Sample answer.

写入文件的时间:

write_file(
question="How do I search for flights on Skyscanner?",
answer="""
Skyscanner helps you find the best options for flights on a specific date, or on any day in a given month or even year. For tips on how best to search, please head over to our search tips page.

If you're looking for inspiration for your next trip, why not try our everywhere feature. Or, if you want to hang out and ensure the best price, you can set up price alerts to let you know when the price changes.
""".strip(),
file_path="question_1.txt",
)

write_file(
question="What are mash-ups?",
answer="""
These are routes where you fly with different airlines, because it`s cheaper than booking with just one. For example:

If you wanted to fly London to New York, we might find it`s cheaper to fly out with British Airways and back with Virgin Atlantic, rather than buy a round-trip ticket with one airline. This is called a "sum-of-one-way" mash-up. Just in case you're interested.

Another kind of mash-up is what we call a "self-transfer" or a "non-protected transfer". For example:

If you wanted to fly London to Sydney, we might find it`s cheaper to fly London to Dubai with Emirates, and then Dubai to Sydney with Qantas, rather than booking the whole route with one airline.

Pretty simple, right?

However, what`s really important to bear in mind is that mash-ups are NOT codeshares. A codeshare is when the airlines have an alliance. If anything goes wrong with the route - a delay, say, or a strike - those airlines will help you out at no extra cost. But mash-ups DO NOT involve an airline alliance. So if something goes wrong with a mash-up, it could cost you more money.
""".strip(),
file_path="question_2.txt",
)

write_file(
question="Why have I been blocked from accessing the Skyscanner website?",
answer="""
Skyscanner's websites are scraped by bots many millions of times a day which has a detrimental effect on the service we're able to provide. To prevent this, we use a bot blocking solution which checks to ensure you're using the website in a normal manner.

Occasionally, this may mean that a genuine user may be wrongly flagged as a bot. This can be for a number of potential reasons, including, but not limited to:

You're using a VPN which we have had to block due to excessive bot traffic in the past
You're using our website at super speed which manages to beat our rate limits
You have a plug-in on your browser which could be interfering with how our website interacts with you as a user
You're using an automated browser
If you've been blocked during normal use, please send us your IP address (this website may help: http://www.whatismyip.com/), the website you're accessing (e.g. www.skyscanner.net) and the date/time this happened, via the Contact Us button below and we'll look into it as quickly as possible.
""".strip(),
file_path="question_3.txt",
)

write_file(
question="Where is my booking confirmation?",
answer="""
You should get a booking confirmation by email from the company you bought your travel from. This can sometimes go into your spam/junk mail folder, so it's always worth checking there.

If you still can't find it, try getting in touch with the company you bought from to find out what's going on.

To find out who you need to contact, check the company name next to the charge on your bank account.
""".strip(),
file_path="question_4.txt",
)

write_file(
question="How do I change or cancel my booking?",
answer="""
For all questions about changes, cancellations, and refunds - as well as all other questions about bookings - you'll need to contact the company you bought travel from. They'll have all the info about your booking and can advise you.

You'll find 1000s of travel agencies, airlines, hotels and car rental companies that you can buy from through our site and app. When you buy from one of these travel partners, they will take your payment (you'll see their name on your bank or credit card statement), contact you to confirm your booking, and provide any help or support you might need.

If you bought from one of these partners, you'll need to contact them as they have all the info about your booking. We unfortunately don't have any access to bookings you made with them.
""".strip(),
file_path="question_5.txt",
)

write_file(
question="I booked the wrong dates / times",
answer="""
If you have found that you have booked the wrong dates or times, please contact the airline or travel agent that you booked your flight with as they will be able to help you change your flights to the intended dates or times.

The search box below can help you find the contact details for the travel provider you booked with.

You can search flexible or specific dates on Skyscanner to find your preferred flight, and when you select a flight on Skyscanner you are transferred to the website where you will make and pay for your booking. Once you are redirected to the airline or travel agent website, you might be required to select dates again, depending on the website. In all cases, you will be shown the flight details of your selection and you are required before confirming payment to state that you have checked all details and agreed to the terms and conditions. We strongly recommend that you always check this information carefully, as travel information can be subject to change.
""".strip(),
file_path="question_6.txt",
)

write_file(
question="I entered the wrong email address",
answer="""
Please contact the airline or travel agent you booked with as Skyscanner does not have access to bookings made with airlines or travel agents.

If you can't remember who you booked with, you can check your credit card statement for a company name.

The search box below can help you find the contact details for the travel provider you booked with.
""".strip(),
file_path="question_7.txt",
)

write_file(
question="Luggage",
answer="""
Depending on the flight provider, the rules, conditions and prices for luggage (including sports equipment) do vary.
It's always a good idea to check with the airline or travel agent directly (and you should be shown the options when you make your booking).
""".strip(),
file_path="question_8.txt",
)

write_file(
question="Changes, cancellation and refunds",
answer="""
For changes, cancellations or refunds, we recommend that you contact the travel provider (airline or travel agent) agent that you completed your booking with.

As a travel search engine, Skyscanner doesn't take your booking or payment ourselves. Instead, we pass you through to your chosen airline or travel agent where you make your booking directly. We therefore don't have access or visibility to any of your booking information. Depending on the type of ticket you've booked, there may be different options for changes, cancellations and refunds, and the travel provider will be best placed to advise on these.
""".strip(),
file_path="question_9.txt",
)

write_file(
question="Why does the price sometimes change when I am redirected to a flight provider?",
answer="""
Flight prices and availability change constantly, so we make sure the data is updated regularly to reflect this. When you redirect to a travel provider's site, the price is updated again so you can be sure that you will always see the best price available from the airline or travel agent at time of booking.

We make every effort to ensure the information you see on Skyscanner is accurate and up to date, but very occasionally there can be reasons why a price change has not updated accurately on the site. If you see a price difference between Skyscanner and a travel provider, please contact us with all the flight details (from, to, dates, departure times, airline and travel agent if applicable) and we will investigate further.
""".strip(),
file_path="question_10.txt",
)

write_file(
question="Why is Skyscanner free?",
answer="""
Does Skyscanner charge commission?

Nope. Skyscanner is always free to search, and we never charge you any hidden fees.

Want to know how do we do it?

Well, we search through thousands of sites to find you the best deals for flights, hotels and car hire. That includes everything from fancy hotels to low cost airlines, so no matter what your budget is, we'll help you get there.

See a price you like? We'll connect you to that airline or travel company so you can book with them directly. And for this referral, the airline or travel company pays us a small fee.

And that's all there is to it!
""".strip(),
file_path="question_11.txt",
)

write_file(
question="Are my details safe?",
answer="""
We take your privacy and safety online very seriously. We'll never sell, share or pass on your IP details, cookies, personal info and location data to others unless it's required by law, or it's necessary for one of the reasons set out in our Privacy Policy.
""".strip(),
file_path="question_12.txt",
)

模型

我根据LMSYS提供的排名,为我们的项目选择了模型组织。我的选择标准包括卓越的性能、在HuggingFace Hub上的可用性、以及能在单个T4 GPU上执行实时推理的能力。我们最终选择的型号是由Nous Research提供的Nous-Hermes-13b,该型号接受了GPT-4合成数据的训练。为了提高推理速度,我们决定使用量化模型。

幸运的是,这个模型在HuggingFace Hub上有一个量化版本,因此我们可以使用AutoGPTQ库来加载它。我们将使用的量化模型是4位的,由TheBloke提供。现在,让我们来加载它:

DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"

model_name_or_path = "TheBloke/Nous-Hermes-13B-GPTQ"
model_basename = "nous-hermes-13b-GPTQ-4bit-128g.no-act.order"

tokenizer = AutoTokenizer.from_pretrained(model_name_or_path, use_fast=True)

model = AutoGPTQForCausalLM.from_quantized(
model_name_or_path,
model_basename=model_basename,
use_safetensors=True,
trust_remote_code=True,
device=DEVICE,
)

generation_config = GenerationConfig.from_pretrained(model_name_or_path)

请注意,我们需要明确指定想要加载的文件,并且还要设置model_basename参数。此外,我们还加载了对应的分词器(tokenizer)和生成配置。

为了测试模型,我们需要按照模型所使用的标准格式来格式化输入提示,这与处理LLaMa模型时的要求相似:

question = (
"Which programming language is more suitable for a beginner: Python or JavaScript?"
)
prompt = f"""
### Instruction: {question}
### Response:
""".strip()

让我们运行prompt命令来修改tokenizer和模型:

%%time
input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to(DEVICE)
with torch.inference_mode():
output = model.generate(inputs=input_ids, temperature=0.7, max_new_tokens=512)
CPU times: user 3.59 s, sys: 1.1 s, total: 4.69 s
Wall time: 12.4 s

这在单个T4 GPU上花费了大约12秒。让我们看看模型生成了什么:

print(tokenizer.decode(output[0]))
<s> ### Instruction: Which programming language is more suitable for a
beginner: Python or JavaScript?
### Response:Python is generally considered more suitable for beginners due to
its readability and simplicity compared to JavaScript.</s>

请注意,模型生成了与问题相关的回复。它在回复的开头添加了一个<s>标记,在回复的结尾添加了一个</s>标记。模型利用这些标记来标明回复的起始和结束。在将回复返回给用户之前,我们需要将这些标记删除。

最后,让我们看看模型的生成配置,看看它使用了什么参数:

generation_config
GenerationConfig {
"_from_model_config": true,
"bos_token_id": 1,
"eos_token_id": 2,
"pad_token_id": 0,
"transformers_version": "4.30.2"
}

构建管道

LangChain通过提供HuggingFacePipeline类,简化了HuggingFace LLM(大型语言模型)的使用流程。在本例中,我们将利用特定的模型和分词器来创建一个文本生成管道。此外,我们还将应用文本流处理器,以实现将响应实时地回传给用户。该处理器还会负责删除<s>和</s>标记,以及响应中的提示内容。

streamer = TextStreamer(
tokenizer, skip_prompt=True, skip_special_tokens=True, use_multiprocessing=False
)

pipe = pipeline(
"text-generation",
model=model,
tokenizer=tokenizer,
max_length=2048,
temperature=0,
top_p=0.95,
repetition_penalty=1.15,
generation_config=generation_config,
streamer=streamer,
batch_size=1,
)

llm = HuggingFacePipeline(pipeline=pipe)

让我们通过管道传递提示符来测试它:

response = llm(prompt)
Python is generally considered to be more suitable for beginners due to itsreadability and simplicity compared to JavaScript.

太棒了!这个响应与我们直接运行模型时得到的响应是一致的,但它已经被清理并整理好了,可以直接返回给用户。

嵌入文档

为了与我们偏好免费和开源模型的原则保持一致,我们将采用E5-base-v25嵌入模型,该模型最初在名为“弱监督对比预训练的文本嵌入”的论文中被提出。我们选择这个嵌入模型,主要是基于它在海量文本嵌入基准(MTEB)排行榜上的优秀表现,该模型位列第七。由于该模型在HuggingFace Hub上可供使用,因此我们可以利用HuggingFaceEmbeddings类来加载它。

embeddings = HuggingFaceEmbeddings(
model_name="embaas/sentence-transformers-multilingual-e5-base",
model_kwargs={"device": DEVICE},
)

我们可以利用LangChain中的DirectoryLoader来加载文本文件作为LangChain文档。这将使我们能够方便地处理文件的内容:

loader = DirectoryLoader("./skyscanner/", glob="**/*txt")
documents = loader.load()
len(documents)
12

鉴于我们模型的上下文限制(最多处理2048个令牌)以及文档的实际长度,我们需要将文档拆分成更小的段落。这里,我们可以使用CharacterTextSplitter工具,将文档分割成每段512个字符,且各段之间不重叠。

text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
texts[4]
Document(
page_content='Q: Changes, cancellation and refunds A: For changes, cancellations or refunds, we recommend that you contact the travel provider (airline or travel agent) agent that you completed your booking with.',
metadata={'source': 'skyscanner/question_9.txt'}
)

为了创建(并存储)嵌入,我们将使用Chroma工具。具体而言,我们可以借助Chroma的from_documents方法来创建一个数据库。

db = Chroma.from_documents(texts, embeddings)
db.similarity_search("flight search")
[
Document(
page_content="Q: How do I search for flights on Skyscanner? A: Skyscanner helps you find the best options for flights on a specific date, or on any day in a given month or even year. For tips on how best to search, please head over to our search tips page.\n\nIf you're looking for inspiration for your next trip, why not try our everywhere feature. Or, if you want to hang out and ensure the best price, you can set up price alerts to let you know when the price changes.",
metadata={'source': 'skyscanner/question_1.txt'}
),
Document(
page_content="You're using a VPN which we have had to block due to excessive bot traffic in the past You're using our website at super speed which manages to beat our rate limits You have a plug-in on your browser which could be interfering with how our website interacts with you as a user You're using an automated browser If you've been blocked during normal use, please send us your IP address (this website may help: http://www.whatismyip.com/), the website you're accessing (e.g. www.skyscanner.net) and the date/time this happened, via the Contact Us button below and we'll look into it as quickly as possible.",
metadata={'source': 'skyscanner/question_3.txt'}
),
Document(
page_content='Q: I booked the wrong dates / times A: If you have found that you have booked the wrong dates or times, please contact the airline or travel agent that you booked your flight with as they will be able to help you change your flights to the intended dates or times.\n\nThe search box below can help you find the contact details for the travel provider you booked with.',
metadata={'source': 'skyscanner/question_6.txt'}
),
Document(
page_content='You can search flexible or specific dates on Skyscanner to find your preferred flight, and when you select a flight on Skyscanner you are transferred to the website where you will make and pay for your booking. Once you are redirected to the airline or travel agent website, you might be required to select dates again, depending on the website. In all cases, you will be shown the flight details of your selection and you are required before confirming payment to state that you have checked all details and agreed to the terms and conditions. We strongly recommend that you always check this information carefully, as travel information can be subject to change.',
metadata={'source': 'skyscanner/question_6.txt'}
)
]

拥有了智能代理(即大型语言模型LLM)和检索相关信息(通过数据库进行相似性搜索)的能力后,您就已经具备了构建聊天机器人所需的基本条件。

对话链

为了将这些组件整合在一起,我们将利用LangChain的“链”功能。接下来,让我们开始定义输入提示吧。

template = """
### Instruction: You're a travelling support agent that is talking to a customer.

Use only the chat history and the following information
{context}
to answer in a helpful manner to the question. If you don't know the answer -
say that you don't know. Keep your replies short, compassionate and informative.

{chat_history}
### Input: {question}
### Response:
""".strip()

prompt = PromptTemplate(
input_variables=["context", "question", "chat_history"], template=template
)

我们的提示包含三个变量:

  • context——从检索到的文档中提取的相关信息(由LangChain的链功能提供)
  • chat_history——截至目前的对话记录(由内存模块存储)
  • question——用户当前提出的问题

至于内存模块——我们将采用LangChain中的ConversationBufferMemory类。该类负责存储对话记录:

memory = ConversationBufferMemory(
memory_key="chat_history",
human_prefix="### Input",
ai_prefix="### Response",
output_key="answer",
return_messages=True,
)

请注意,我们还定义了human_prefix和ai_prefix这两个参数,它们将用于在提示中格式化并区分用户与AI的对话记录。

让我们利用前面定义的提示来创建一个LangChain的链。

chain = ConversationalRetrievalChain.from_llm(
llm,
chain_type="stuff",
retriever=db.as_retriever(),
memory=memory,
combine_docs_chain_kwargs={"prompt": prompt},
return_source_documents=True,
verbose=True,
)

这个链将大型语言模型(LLM)、检索器和存储器整合在了一起。它还利用提示来规范输出的格式。最后,我们设置了return_source_documents=True,以便从数据库中返回原始的文档信息。

让我们试一试:

question = "How flight search works?"
answer = chain(question)

链式输出

进入新的链条…

进入新的链条…格式化后提示:

#说明:您是一名旅行支持代理,正在与客户交谈。仅使用聊天记录和以下信息

Q: 如何在 Skyscanner 上搜索航班?答:Skyscanner 可帮助您查找特定日期或特定月份甚至年份中任何一天的最佳航班选项。有关如何最好地搜索的提示,请前往我们的搜索提示页面。

如果您正在为下一次旅行寻找灵感,不妨试试我们的“无处不在”功能。另外,如果您想随时掌握价格动态,确保获得最优价格,可以设置价格警报,这样一旦价格有所变动,您就能及时得知。

作为一个旅游搜索引擎,Skyscanner不会直接处理您的预订或付款事宜。相反,我们会将您引导至您所选择的航空公司或旅行社网站,您可以在那里直接完成预订。因此,我们无法访问或查看您的任何预订详情。关于更改、取消和退款政策,这通常取决于您所预订的机票类型,旅行服务提供商将是最能为您提供相关建议的一方。

您可以在Skyscanner上搜索灵活日期或指定具体日期,以找到最满意的航班,当您在Skyscanner选定航班后,我们会将您引导至相应的航空公司或旅行社网站,您将在那里完成预订并支付相关费用。请注意,跳转到这些网站后,您可能需要根据该网站的要求再次确认航班日期。无论哪种情况,您都将看到所选航班的详细信息,并在最终确认付款之前,需要确认已仔细核对所有信息并同意相关条款与条件。我们极力建议您总是认真检查这些信息,因为旅行信息有可能发生变动。

问:为什么当我被重定向到航班提供商网站时,价格有时会发生变化?答:航班价格和可用性总是在不断变化中,我们致力于定期更新数据以反映这些变化。当您被重定向到旅行提供商网站时,价格会再次进行更新,确保您看到的是航空公司或旅行社提供的最新、最优价格。希望这能帮助您理解这一现象。

输入:航班搜索是如何工作的?

响应:

Skyscanner能够助您找到特定日期、给定月份内任意一天乃至整年中的最佳航班选择。我们的搜索算法会广泛扫描数百家航空公司和旅行社,为您提供涵盖直飞、转机以及各类中途停留选项的全面航班信息。若想了解如何更高效地进行搜索,请访问我们的搜索技巧页面。请注意,Skyscanner作为一个旅游搜索引擎,不会直接处理您的预订或付款。相反,我们会将您引导至您所选的航空公司或旅行社网站,让您在那里直接完成预订。因此,我们无法访问或查看您的任何预订详情。

成品链。

Skyscanner helps you find the best options for flights on a specific date, oron any day in a given month or even year. Our search algorithm scans hundreds ofairlines and travel agents to bring you the most comprehensive range of flight options,including direct and connecting flights, stopovers, and layovers.  For tips on how bestto search, please visit our Search Tips page. As a travel search engine,Skyscanner doesn't take your booking or payment ourselves. Instead, we pass you throughto your chosen airline or travel agent where you make your booking directly. Wetherefore don't have access or visibility to any of your booking information.

太棒了!我们的智能助手已经运用专业知识回答了您的问题。接下来,我们一起来分析下答案的构成:

answer.keys()
dict_keys(['question', 'chat_history', 'answer', 'source_documents'])

答案包含问题、聊天记录、答案和源文档。让我们试着问另一个问题:

question = "I bought flight tickets, but I can't find any confirmation. Where is it?"
response = chain(question)

链式输出

正在进入新的链条…

正在进入新的链条…格式化后提示:

请根据以下对话和后续问题,将后续问题重新表述为一个独立的、保持原语言的问题。

聊天记录:

人类:航班搜索是如何运作的?

助手:Skyscanner能帮助您找到特定日期或给定月份甚至年份中任意一天的最佳航班选项。我们的搜索算法会扫描数百家航空公司和旅行社,为您提供最全面的航班选择,包括直飞、转机航班以及中途停留。有关如何更有效地进行搜索的提示,请访问我们的搜索技巧页面。请注意,Skyscanner是一个旅游搜索引擎,不会直接接受您的预订或付款。相反,我们会将您转接到您选择的航空公司或旅行社的网站进行预订。因此,我们无法访问或查看您的任何预订信息。

后续输入:我买了机票,但找不到确认信息。它在哪里?

独立问题:你能帮我找到我的机票确认信息吗?

完成链。

正在进入新链…

正在进入新链…

说明:您是一名旅行支持代理,正在与客户沟通。请仅使用聊天记录和以下信息来回答问题。

Q:我的预订确认信息在哪里?
A:您应该会收到来自您购买旅行产品的公司发送的电子邮件确认。这封邮件有时会进入您的垃圾邮件文件夹,所以请务必检查。

如果您仍然找不到,请尝试联系您购买产品的公司以了解情况。

要查找需要联系的公司,请查看您银行账户交易记录旁边的公司名称。

Q:我预订的日期/时间错了怎么办?
A:如果您发现自己预订的日期或时间有误,请联系您预订航班的航空公司或旅行社,因为他们将能够帮助您更改航班至预期的日期或时间。

下面的搜索框可以帮助您找到旅行服务提供商的联系方式。

Q:我输入了错误的电子邮件地址怎么办?
A:请联系您预订时选择的航空公司或旅行社,因为Skyscanner无法访问通过航空公司或旅行社预订的信息。

如果您不记得是与哪家公司预订的,可以检查您的信用卡账单以获取公司名称。

下面的搜索框可以帮助您找到旅行服务提供商的联系方式。

作为旅游搜索引擎,Skyscanner不会直接接受您的预订或付款。相反,我们会将您转接到您选择的航空公司或旅行社的网站进行预订。因此,我们无法访问或查看您的任何预订信息。关于更改、取消和退款政策,请咨询您的旅行服务提供商,他们会提供最准确的建议。

人类:航班搜索是如何运作的?

助手:Skyscanner能帮助您找到特定日期或给定月份甚至年份中任意一天的最佳航班选项。我们的搜索算法会扫描数百家航空公司和旅行社,为您提供最全面的航班选择,包括直飞、转机航班以及中途停留。有关如何更有效地进行搜索的提示,请访问我们的搜索技巧页面。请注意,Skyscanner是一个旅游搜索引擎,不会直接接受您的预订或付款。相反,我们会将您转接到您选择的航空公司或旅行社的网站进行预订。因此,我们无法访问或查看您的任何预订信息。

成品链。

You should receive an email confirmation from the airline or travel agent youbooked with. Sometimes this email might end up in your junk/spam folder. Trysearching your inbox and spam folder first before reaching out to the airline or travelagent for assistance.

答案确实挺不错的!不过,区块链技术似乎在尝试一些新奇的做法,它会对问题进行重新阐述(具体可以参见详细的输出内容),然后再针对这个新问题给出答案。然而,在当前的LangChain版本中,我们还没有找到关闭这一功能的方法。虽然这种做法的初衷是为了减轻代理的记忆负担,但有时候它可能会让代理的反应变得迟钝,而且也不总是符合我们的期望。因此,我们计划去掉这个重新阐述的步骤。

带内存的 QA 链

我们将使用load_qa_chain重新创建我们的链:

memory = ConversationBufferMemory(
memory_key="chat_history",
human_prefix="### Input",
ai_prefix="### Response",
input_key="question",
output_key="output_text",
return_messages=False,
)

chain = load_qa_chain(
llm, chain_type="stuff", prompt=prompt, memory=memory, verbose=True
)

这种类型的链不会重新表达问题,也没有检索器来搜索文档。我们得自己动手:

question = "How flight search works?"
docs = db.similarity_search(question)
answer = chain.run({"input_documents": docs, "question": question})
Skyscanner helps you find the best options for flights on a specific date, oron any day in a given month or even year. Our search algorithm scans hundreds ofairlines and travel agents to bring you the most comprehensive range of flight options,including direct and connecting flights, stopovers, and layovers.  For tips on how bestto search, please visit our Search Tips page. As a travel search engine,Skyscanner doesn't take your booking or payment ourselves. Instead, we pass you throughto your chosen airline or travel agent where you make your booking directly. Wetherefore don't have access or visibility to any of your booking information.

很好,我们得到了同样的回答!虽然它可能会有更多的工作,但它更灵活,允许我们使用相同的链。让我们尝试另一个问题:

question = "I entered wrong email address during my flight booking. What should I do?"
docs = db.similarity_search(question)
answer = chain.run({"input_documents": docs, "question": question})
Please contact the airline or travel agent you booked with as Skyscanner doesnot have access to bookings made with airlines or travel agents. If you can'tremember who you booked with, you can check your credit card statement for a companyname. The search box below can help you find the contact details for the travelprovider you booked with.

没问题,让我们把所有有用的信息整合起来,打包成一个既方便又实用的整体方案。

支持聊天机器人

让我们创建一个类,使其易于使用我们的聊天机器人:

DEFAULT_TEMPLATE = """
### Instruction: You're a travelling support agent that is talking to a customer.

Use only the chat history and the following information
{context}
to answer in a helpful manner to the question. If you don't know the answer -
say that you don't know. Keep your replies short, compassionate and informative.

{chat_history}
### Input: {question}
### Response:
""".strip()


class Chatbot:
def __init__(
self,
text_pipeline: HuggingFacePipeline,
embeddings: HuggingFaceEmbeddings,
documents_dir: Path,
prompt_template: str = DEFAULT_TEMPLATE,
verbose: bool = False,
):
prompt = PromptTemplate(
input_variables=["context", "question", "chat_history"],
template=prompt_template,
)
self.chain = self._create_chain(text_pipeline, prompt, verbose)
self.db = self._embed_data(documents_dir, embeddings)

def _create_chain(
self,
text_pipeline: HuggingFacePipeline,
prompt: PromptTemplate,
verbose: bool = False,
):
memory = ConversationBufferMemory(
memory_key="chat_history",
human_prefix="### Input",
ai_prefix="### Response",
input_key="question",
output_key="output_text",
return_messages=False,
)

return load_qa_chain(
text_pipeline,
chain_type="stuff",
prompt=prompt,
memory=memory,
verbose=verbose,
)

def _embed_data(
self, documents_dir: Path, embeddings: HuggingFaceEmbeddings
) -> Chroma:
loader = DirectoryLoader(documents_dir, glob="**/*txt")
documents = loader.load()
text_splitter = CharacterTextSplitter(chunk_size=512, chunk_overlap=0)
texts = text_splitter.split_documents(documents)
return Chroma.from_documents(texts, embeddings)

def __call__(self, user_input: str) -> str:
docs = self.db.similarity_search(user_input)
return self.chain.run({"input_documents": docs, "question": user_input})

我们的聊天机器人系统具备一个文本处理流程、嵌入功能以及一个包含各类文档的目录。两大核心任务是构建聊天链和创建数据库。此外,我们还对__call__方法进行了重写,让调用聊天机器人变得像调用普通函数一样简单便捷。

让我们创建一个类的实例:

chatbot = Chatbot(llm, embeddings, "./skyscanner/")

要创建聊天,我们将使用while循环并要求用户输入。如果用户输入bye或goodbye,我们将中断循环:

import warnings

warnings.filterwarnings("ignore", category=UserWarning)

while True:
user_input = input("You: ")
if user_input.lower() in ["bye", "goodbye"]:
break
answer = chatbot(user_input)
print()

:航班搜索如何运作?

助手:Skyscanner 可帮助您找到特定日期或给定月份甚至年份中任何一天的航班的最佳选择。我们的搜索算法扫描数百家航空公司和旅行社,为您提供最全面的航班选择,包括直飞和转机航班、中途停留和中途停留。

:我买了机票,但我找不到任何确认。它在哪里?

助理:听起来您的机票是从其他公司或航空公司购买的,而非通过Skyscanner。要查找预订确认信息,请直接联系购票公司。他们将提供更多关于预订及确认的详情。

: 我在预订航班时输入了错误的电子邮件地址。我该怎么做?

助理:很抱歉,由于Skyscanner不参与实际预订流程,我们无法直接更改电子邮件地址。但您可以尝试联系预订的航空公司或旅行社,询问他们是否能为您更新电子邮件地址。或者,您也可以使用我们网站上的搜索功能查找预订,看看能否自行找到并更正。

:混搭航班(mash-up flights)的优势是什么?

助理:混搭航班相比传统往返航班具有多项优势。以下是主要几点:

  1. 成本节省:通过组合两个单独的单程航班,您通常能节省费用,相较于购买单张往返机票更为划算。
  2. 灵活性增强:混搭航班让您能更灵活地规划行程。如需更改回程日期或时间,您只需调整相应部分,无需重新预订整个行程。
  3. 选择更多样:搭乘多家航空公司的航班,意味着您可以享受更广泛的目的地选择和出发时间安排,从而挑选最适合您日程和偏好的航班。

原文链接:https://www.mlexpert.io/blog/support-chatbot-using-custom-knowledge-base-with-langchain

#你可能也喜欢这些API文章!