Parent Document Retriever in Action: Setting Up RAG with Mistral LLM and LangChain

Introduction

This post demonstrates how to set up a Retrieval-Augmented Generation (RAG) system using LangChain, integrating a Parent Document Retriever with Mistral AI models. It provides implementation details, including Python code, to show how RAG enhances the quality of language model responses. This complements the related post, which presents the final case.

Key RAG Architecture

Here are the key components of this RAG system, describing their roles and contributions to the overall architecture:

Mistral LLM and embeddings: Uses Mistral embeddings and Mistral Large LLM to generate relevant responses based on external knowledge.
Parent Document Retriever: Retrieves smaller chunks of information while referencing their parent documents for context.
FAISS Vector Store: Stores embeddings and enables efficient similarity searches.
Document Chunking: Splits PDF documents into smaller parts for better retrieval.
Naive RAG Chain: Connects the retriever, vector store, and LLM for generating informed responses.

With these components in mind, let's now explore how they work together to create an effective Retrieval-Augmented Generation system.

LangChain Framework

LangChain is the primary framework used for implementing the Retrieval-Augmented Generation (RAG) system. It provides the necessary tools and components to build the RAG application, including integration with vector stores and retrievers.

Now that we have a framework, let's delve deeper into how the ParentDocumentRetriever balances specificity and context in document retrieval.

ParentDocumentRetriever: Balancing Specificity and Context

The ParentDocumentRetriever creates small chunks for accurate embeddings while retaining enough context for meaningful retrieval. It retrieves precise chunks and their parent documents, ensuring specificity and context without losing important information.

Next, let's look at how PDF chunking and indexing work together to enhance retrieval performance.

PDF Chunking and FAISS Vector Store Indexing

The following code snippet demonstrates how to prepare PDF documents for RAG by splitting them into chunks and indexing them in a FAISS vector store. This process involves several steps, including defining the chunk sizes, creating embeddings, setting up storage for the documents, and indexing them for similarity searches. The parent and child chunk sizes and overlap sizes are important parameters that influence the granularity of the chunks and the level of context retained, thereby affecting the quality of the retrieval.

Documents are loaded using a PDF loader, and the ParentDocumentRetriever is created to handle the retrieval of document chunks along with their parent documents. The code then iterates over the loaded documents, adding them to the retriever in batches, and finally saves the indexed database locally for future use.

# Constants for chunk overlap
CHILD_CHUNK_SIZE = 1024
CHILD_CHUNK_OVERLAP = 100
PARENT_CHUNK_SIZE = 4096
PARENT_CHUNK_OVERLAP = 400

# Create embeddings instance
embeddings = MistralAIEmbeddings(model="mistral-embed", mistral_api_key=my_api_key)

# Settings
index_name = args.index_name
data_files_path = args.data_files_path
dbstore_path = args.dbstore_path
docstore_path = args.docstore_path

# Create stores
fs = LocalFileStore(docstore_path)
store = create_kv_docstore(fs)

# Create Parent and Child text splitters
child_text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHILD_CHUNK_SIZE, chunk_overlap=CHILD_CHUNK_OVERLAP)
parent_text_splitter = RecursiveCharacterTextSplitter(chunk_size=PARENT_CHUNK_SIZE, chunk_overlap=PARENT_CHUNK_OVERLAP)

# Create FAISS vectorstore
dimensions = len(embeddings.embed_query("dummy"))

db = FAISS(
  embedding_function=embeddings,
  index=IndexFlatL2(dimensions),
  docstore=InMemoryDocstore(),
  index_to_docstore_id={},
  normalize_L2=False
)

# Load documents
logging.info("Loading documents...")
loader = PyPDFDirectoryLoader(data_files_path)
docs = loader.load()
logging.info(f"Number of document blocks loaded: {len(docs)}")

# Create ParentDocumentRetriever
big_chunks_retriever = ParentDocumentRetriever(
  vectorstore=db,
  docstore=store,
  child_splitter=child_text_splitter,
  parent_splitter=parent_text_splitter
)

# Add documents to retriever
MAX_BATCH_SIZE = 100

for i in tqdm(range(0, len(docs), MAX_BATCH_SIZE)):
  logging.info(f"Start: {i}")
  i_end = min(len(docs), i + MAX_BATCH_SIZE)
  logging.info(f"End: {i_end}")
  batch = docs[i:i_end]
  try:
      big_chunks_retriever.add_documents(batch, ids=None)
  except ValueError as e:
      logging.error(e)
      big_chunks_retriever.add_documents(batch[:50], ids=None)
      big_chunks_retriever.add_documents(batch[50:], ids=None)
      continue
  logging.info(f"Number of keys stored in the docstore: {len(list(store.yield_keys()))}")

# Save the database
db.save_local(dbstore_path, index_name)
logging.info("Completed")

Naive RAG Chain Diagram

Below is a diagram that represents the workflow of the Naive RAG system, showcasing how each component interacts to generate informed responses.

Workflow chart

Having illustrated the workflow, let's now look at how to define the RAG chain using LangChain Expression Language (LCEL).

Defining the RAG Chain with LangChain Expression Language (LCEL)

This section explains how to set up the Parent Document Retriever using LangChain, including configuring the vector store and defining text splitters.

The first code snippet demonstrates how to define a function for rebuilding the retriever. This function takes in paths for the document store and database store, as well as the embeddings model and index name. It creates both child and parent text splitters to ensure that documents are appropriately divided for indexing and retrieval.

# Constants
NUM_CTX = 32768
RETRIEVED_CHUNKS = 20
CHILD_CHUNK_SIZE = 1024
CHILD_CHUNK_OVERLAP = 100
PARENT_CHUNK_SIZE = 4096
PARENT_CHUNK_OVERLAP = 400

def rebuild_retriever(docstore_path, dbstore_path, embeddings, index_name):
  child_text_splitter = RecursiveCharacterTextSplitter(chunk_size=CHILD_CHUNK_SIZE, chunk_overlap=CHILD_CHUNK_OVERLAP)
  parent_text_splitter = RecursiveCharacterTextSplitter(chunk_size=PARENT_CHUNK_SIZE, chunk_overlap=PARENT_CHUNK_OVERLAP)

  fs = LocalFileStore(docstore_path)
  docstore = create_kv_docstore(fs)

  vectordb = FAISS.load_local(
      folder_path=dbstore_path,
      embeddings=embeddings,
      index_name=index_name,
      allow_dangerous_deserialization=True
  )

  big_chunks_retriever = ParentDocumentRetriever(
      vectorstore=vectordb,
      docstore=docstore,
      child_splitter=child_text_splitter,
      parent_splitter=parent_text_splitter,
      search_type="similarity",
      search_kwargs={
          "k": RETRIEVED_CHUNKS
      }
  )
  return big_chunks_retriever

Next, let's define a prompt template that will guide the language model’s response and ensure it only relies on the retrieved context.

# Define Prompt Template
def get_prompt_template() -> PromptTemplate:
  """
  Define and return a prompt template for question-answering tasks.

  Returns:
      PromptTemplate: The prompt template for question-answering tasks.
  """
  system_prompt = """
You are an assistant for question-answering tasks related to a knowledge domain based on a context provided to you.
Answer the question only based on the provided context.
If the context does not contain the information, just say that you don't know and don´t give any other response.
Give a response in technical English language and do not translate acronyms in the response.
Include the references at the end of the response, specifying only the name of the document and the page number(s) of the documents in the context used to build the response.
Do not include metadata information in the list of documents used to build the response.
Avoid duplicates in the list of documents used to build the response.

Example of output:

Response goes here

References:
- Document Name: name goes here - Page Number(s): pages go here
  """

  prompt = ChatPromptTemplate.from_messages([
      ("system", system_prompt),
      ("human", "{question}
 Context: {context}
"),
  ])

  return prompt

Finally, the last snippet shows how to build the entire RAG chain in LCEL, integrating the retriever, prompt, and language model.

# Build retriever
big_chunks_retriever = rebuild_retriever(docstore_path, dbstore_path, embeddings, index_name)

# Define prompt template
prompt = get_prompt_template()

llm = ChatMistralAI(model="mistral-large-2407", mistral_api_key=my_api_key, temperature=0.0, num_ctx=NUM_CTX)
naive_chain = (
  {
      "context": big_chunks_retriever, "question": RunnablePassthrough()
  }
  | prompt
  | llm
  | StrOutputParser()
)

Useful Links

Enjoyed this post? Found it helpful? Feel free to leave a comment below to share your thoughts or ask questions. A GitHub account is required to join the discussion.