Chat with your documents
From a document you can ask the AI questions about it as if it were a conversation
+2
graph TD
%%{init: {'theme': 'mc','layout': 'elk'}}%%
ChatOutput-s3yx9[<div><img alt="logo" src="/_astro/messages-square.BaSDmT6g.svg" style="height: 20px !important;width: 20px !important"/></div>Chat Output]
style ChatOutput-s3yx9 stroke:#a170ff
ChatInput-nbeu2[<div><img alt="logo" src="/_astro/messages-square.BaSDmT6g.svg" style="height: 20px !important;width: 20px !important"/></div>Pregunta]
style ChatInput-nbeu2 stroke:#a170ff
GDriveFilesComponent-ield1[<div><img alt="logo" src="/_astro/google_drive.wKmDsV2c.svg" style="height: 20px !important;width: 20px !important"/></div>Obtener documento]
style GDriveFilesComponent-ield1 stroke:#a170ff
OpenAIEmbeddings-8lgaa[<div><img alt="logo" src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI Embeddings]
style OpenAIEmbeddings-8lgaa stroke:#a170ff
LanguageRecursiveTextSplitter-60k4u[Separador de texto]
style LanguageRecursiveTextSplitter-60k4u stroke:#a170ff
AdvancedAgent-8vhpf[Agent]
style AdvancedAgent-8vhpf stroke:#a170ff
OpenAIModel-1ec5c[<div><img alt="logo" src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI]
style OpenAIModel-1ec5c stroke:#a170ff
AstraDB-0wxah[Astra DB2]
style AstraDB-0wxah stroke:#a170ff
RetrieverTool-9ghrj[<div><img alt="logo" src="/_astro/langchain-icon.BXtvU_nA.svg" style="height: 20px !important;width: 20px !important"/></div>RetrieverTool]
style RetrieverTool-9ghrj stroke:#a170ff
AstraDB-jjmpw[Astra DB]
style AstraDB-jjmpw stroke:#a170ff
OpenAIEmbeddings-m1or6[<div><img alt="logo" src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI Embeddings2]
style OpenAIEmbeddings-m1or6 stroke:#a170ff
GDriveFilesComponent-ield1 -.- LanguageRecursiveTextSplitter-60k4u
linkStyle 0 stroke:#a170ff
ChatInput-nbeu2 -.- AdvancedAgent-8vhpf
linkStyle 1 stroke:#a170ff
AdvancedAgent-8vhpf -.- ChatOutput-s3yx9
linkStyle 2 stroke:#a170ff
OpenAIModel-1ec5c -.- AdvancedAgent-8vhpf
linkStyle 3 stroke:#a170ff
LanguageRecursiveTextSplitter-60k4u -.- AstraDB-0wxah
linkStyle 4 stroke:#a170ff
OpenAIEmbeddings-8lgaa -.- AstraDB-0wxah
linkStyle 5 stroke:#a170ff
RetrieverTool-9ghrj -.- AdvancedAgent-8vhpf
linkStyle 6 stroke:#a170ff
AstraDB-jjmpw -.- RetrieverTool-9ghrj
linkStyle 7 stroke:#a170ff
OpenAIEmbeddings-m1or6 -.- AstraDB-jjmpw
linkStyle 8 stroke:#a170ff
Chat with Your Documents
đź§© Overview
This workflow enables a conversational AI to answer questions about documents stored in a Google Drive folder.
Documents are automatically ingested, split into semantic chunks, embedded, and indexed in a vector store.
When a user asks a question, the agent retrieves the most relevant passages, generates an answer, and presents it in a chat interface.
⚙️ Main Features
- Automatic document ingestion from Google Drive into a vector store.
- Chunking of long documents into language‑aware segments.
- Embedding generation using OpenAI models for semantic search.
- Retrieval‑augmented generation: the agent queries the vector store before answering.
- Conversational UI with chat input and output components.
- Scalable, batch‑oriented ingestion and retrieval pipelines.
🔄 Workflow Steps
| Component Name | Role in the Workflow | Key Inputs | Key Outputs |
|---|---|---|---|
| GDrive Files Component | Fetches files from a Google Drive folder. | Folder or file selection; optional filters. | Raw file contents. |
| Language Recursive Text Splitter | Divides raw documents into manageable text chunks. | File contents. | Array of text chunks. |
| OpenAI Embeddings | Generates vector embeddings for each chunk. | Text chunks. | Embedding vectors. |
| Astra DB (Ingest) | Stores embeddings and metadata in a vector collection. | Embedding vectors, chunk metadata. | Confirmation of stored records. |
| Astra DB (Retriever) | Builds a retriever over the indexed collection. | Vector store reference. | Retriever object. |
| Retriever Tool | Exposes the retriever as a tool for the agent. | Retriever object. | BaseTool for the agent. |
| OpenAI Model | Provides the language model for the agent. | Model name, API credentials. | LanguageModel instance. |
| Advanced Agent | Orchestrates the conversation, calling the LLM and tools. | User query, LLM, tools, system prompt. | Agent response message. |
| Chat Input | Collects user messages from the UI. | User text, optional file attachments. | User Message. |
| Chat Output | Displays the agent’s response in the chat interface. | Agent response text. | Rendered chat message. |
Execution Flow
- Document Ingestion – The GDrive Files Component retrieves the files, which are passed to the Language Recursive Text Splitter.
- The splitter produces chunks that are forwarded to OpenAI Embeddings.
- Generated embeddings are stored by Astra DB (Ingest).
- Astra DB (Retriever) creates a retriever from the stored vectors, which the Retriever Tool exposes to the agent.
- An OpenAI Model instance is built and supplied to the Advanced Agent.
- The user submits a query via Chat Input; the Advanced Agent processes it, using the Retriever Tool to fetch relevant passages.
- The agent formulates an answer with the LLM and returns it to Chat Output for display.
đź§ Notes
- The workflow assumes valid API keys for OpenAI and Astra DB; credentials are injected via the respective components.
- The vector store is refreshed only when the ingestion pipeline is triggered; subsequent queries reuse the existing index.
- Retrieval uses a cosine similarity metric by default; this can be changed in the Astra DB Retriever configuration.
- The system prompt is hard‑coded to instruct the agent to answer based solely on retrieved data, ensuring grounded responses.
- The entire conversation history is optionally stored (Chat Input’s
should_store_messageflag) to support context-aware replies. - Batch ingestion and retrieval operations are performed in parallel where supported, improving throughput for large document sets.