graph TD %%{init: {'theme': 'mc','layout': 'elk'}}%% ChatOutput-s3yx9[<div><img alt="logo" src="/_astro/messages-square.BaSDmT6g.svg" style="height: 20px !important;width: 20px !important"/></div>Chat Output] style ChatOutput-s3yx9 stroke:#a170ff ChatInput-nbeu2[<div><img alt="logo" src="/_astro/messages-square.BaSDmT6g.svg" style="height: 20px !important;width: 20px !important"/></div>Pregunta] style ChatInput-nbeu2 stroke:#a170ff GDriveFilesComponent-ield1[<div><img alt="logo" src="/_astro/google_drive.wKmDsV2c.svg" style="height: 20px !important;width: 20px !important"/></div>Obtener documento] style GDriveFilesComponent-ield1 stroke:#a170ff OpenAIEmbeddings-8lgaa[<div><img alt="logo" src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI Embeddings] style OpenAIEmbeddings-8lgaa stroke:#a170ff LanguageRecursiveTextSplitter-60k4u[Separador de texto] style LanguageRecursiveTextSplitter-60k4u stroke:#a170ff AdvancedAgent-8vhpf[Agent] style AdvancedAgent-8vhpf stroke:#a170ff OpenAIModel-1ec5c[<div><img alt="logo" src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI] style OpenAIModel-1ec5c stroke:#a170ff AstraDB-0wxah[Astra DB2] style AstraDB-0wxah stroke:#a170ff RetrieverTool-9ghrj[<div><img alt="logo" src="/_astro/langchain-icon.BXtvU_nA.svg" style="height: 20px !important;width: 20px !important"/></div>RetrieverTool] style RetrieverTool-9ghrj stroke:#a170ff AstraDB-jjmpw[Astra DB] style AstraDB-jjmpw stroke:#a170ff OpenAIEmbeddings-m1or6[<div><img alt="logo" src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI Embeddings2] style OpenAIEmbeddings-m1or6 stroke:#a170ff GDriveFilesComponent-ield1 -.- LanguageRecursiveTextSplitter-60k4u linkStyle 0 stroke:#a170ff ChatInput-nbeu2 -.- AdvancedAgent-8vhpf linkStyle 1 stroke:#a170ff AdvancedAgent-8vhpf -.- ChatOutput-s3yx9 linkStyle 2 stroke:#a170ff OpenAIModel-1ec5c -.- AdvancedAgent-8vhpf linkStyle 3 stroke:#a170ff LanguageRecursiveTextSplitter-60k4u -.- AstraDB-0wxah linkStyle 4 stroke:#a170ff OpenAIEmbeddings-8lgaa -.- AstraDB-0wxah linkStyle 5 stroke:#a170ff RetrieverTool-9ghrj -.- AdvancedAgent-8vhpf linkStyle 6 stroke:#a170ff AstraDB-jjmpw -.- RetrieverTool-9ghrj linkStyle 7 stroke:#a170ff OpenAIEmbeddings-m1or6 -.- AstraDB-jjmpw linkStyle 8 stroke:#a170ff

Chat with Your Documents

🧩 Overview

This workflow enables a conversational AI to answer questions about documents stored in a Google Drive folder.
Documents are automatically ingested, split into semantic chunks, embedded, and indexed in a vector store.
When a user asks a question, the agent retrieves the most relevant passages, generates an answer, and presents it in a chat interface.

⚙️ Main Features

Automatic document ingestion from Google Drive into a vector store.
Chunking of long documents into language‑aware segments.
Embedding generation using OpenAI models for semantic search.
Retrieval‑augmented generation: the agent queries the vector store before answering.
Conversational UI with chat input and output components.
Scalable, batch‑oriented ingestion and retrieval pipelines.

🔄 Workflow Steps

Component Name	Role in the Workflow	Key Inputs	Key Outputs
GDrive Files Component	Fetches files from a Google Drive folder.	Folder or file selection; optional filters.	Raw file contents.
Language Recursive Text Splitter	Divides raw documents into manageable text chunks.	File contents.	Array of text chunks.
OpenAI Embeddings	Generates vector embeddings for each chunk.	Text chunks.	Embedding vectors.
Astra DB (Ingest)	Stores embeddings and metadata in a vector collection.	Embedding vectors, chunk metadata.	Confirmation of stored records.
Astra DB (Retriever)	Builds a retriever over the indexed collection.	Vector store reference.	Retriever object.
Retriever Tool	Exposes the retriever as a tool for the agent.	Retriever object.	BaseTool for the agent.
OpenAI Model	Provides the language model for the agent.	Model name, API credentials.	LanguageModel instance.
Advanced Agent	Orchestrates the conversation, calling the LLM and tools.	User query, LLM, tools, system prompt.	Agent response message.
Chat Input	Collects user messages from the UI.	User text, optional file attachments.	User Message.
Chat Output	Displays the agent’s response in the chat interface.	Agent response text.	Rendered chat message.

Execution Flow

Document Ingestion – The GDrive Files Component retrieves the files, which are passed to the Language Recursive Text Splitter.
The splitter produces chunks that are forwarded to OpenAI Embeddings.
Generated embeddings are stored by Astra DB (Ingest).
Astra DB (Retriever) creates a retriever from the stored vectors, which the Retriever Tool exposes to the agent.
An OpenAI Model instance is built and supplied to the Advanced Agent.
The user submits a query via Chat Input; the Advanced Agent processes it, using the Retriever Tool to fetch relevant passages.
The agent formulates an answer with the LLM and returns it to Chat Output for display.

🧠 Notes

The workflow assumes valid API keys for OpenAI and Astra DB; credentials are injected via the respective components.
The vector store is refreshed only when the ingestion pipeline is triggered; subsequent queries reuse the existing index.
Retrieval uses a cosine similarity metric by default; this can be changed in the Astra DB Retriever configuration.
The system prompt is hard‑coded to instruct the agent to answer based solely on retrieved data, ensuring grounded responses.
The entire conversation history is optionally stored (Chat Input’s should_store_message flag) to support context-aware replies.
Batch ingestion and retrieval operations are performed in parallel where supported, improving throughput for large document sets.