graph TD %%{init: {'theme': 'mc','layout': 'elk'}}%% ChatOutput-s3yx9[<div><img src="/_astro/messages-square.BaSDmT6g.svg" style="height: 20px !important;width: 20px !important"/></div>Chat Output] style ChatOutput-s3yx9 stroke:#a170ff ChatInput-nbeu2[<div><img src="/_astro/messages-square.BaSDmT6g.svg" style="height: 20px !important;width: 20px !important"/></div>Pregunta] style ChatInput-nbeu2 stroke:#a170ff GDriveFilesComponent-ield1[<div><img src="/_astro/google_drive.wKmDsV2c.svg" style="height: 20px !important;width: 20px !important"/></div>Obtener documento] style GDriveFilesComponent-ield1 stroke:#a170ff Prompt-bqb7h[<div><img src="/_astro/square-terminal.BMOXc-nZ.svg" style="height: 20px !important;width: 20px !important"/></div>Instrucciones] style Prompt-bqb7h stroke:#a170ff ParseData-u86r4[<div><img src="/_astro/braces.Djq0PW4_.svg" style="height: 20px !important;width: 20px !important"/></div>Obtener texto] style ParseData-u86r4 stroke:#a170ff Chroma-rr0og[<div><img src="/_astro/chroma.CDTUBZSx.svg" style="height: 20px !important;width: 20px !important"/></div>Subir a DB] style Chroma-rr0og stroke:#a170ff OpenAIEmbeddings-8lgaa[<div><img src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI Embeddings] style OpenAIEmbeddings-8lgaa stroke:#a170ff LanguageRecursiveTextSplitter-60k4u[Separador de texto] style LanguageRecursiveTextSplitter-60k4u stroke:#a170ff Chroma-v9w2i[<div><img src="/_astro/chroma.CDTUBZSx.svg" style="height: 20px !important;width: 20px !important"/></div>Obtener de DB] style Chroma-v9w2i stroke:#a170ff OpenAIEmbeddings-3cp2j[<div><img src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI Embeddings2] style OpenAIEmbeddings-3cp2j stroke:#a170ff OpenAIModel-h9hjf[<div><img src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI] style OpenAIModel-h9hjf stroke:#a170ff ParseData-u86r4 -.- Prompt-bqb7h linkStyle 0 stroke:#a170ff ChatInput-nbeu2 -.- Prompt-bqb7h linkStyle 1 stroke:#a170ff GDriveFilesComponent-ield1 -.- LanguageRecursiveTextSplitter-60k4u linkStyle 2 stroke:#a170ff LanguageRecursiveTextSplitter-60k4u -.- Chroma-rr0og linkStyle 3 stroke:#a170ff OpenAIEmbeddings-8lgaa -.- Chroma-rr0og linkStyle 4 stroke:#a170ff Chroma-v9w2i -.- ParseData-u86r4 linkStyle 5 stroke:#a170ff OpenAIEmbeddings-3cp2j -.- Chroma-v9w2i linkStyle 6 stroke:#a170ff Prompt-bqb7h -.- OpenAIModel-h9hjf linkStyle 7 stroke:#a170ff OpenAIModel-h9hjf -.- ChatOutput-s3yx9 linkStyle 8 stroke:#a170ff

Chat with Your Documents

🧩 Overview

The workflow allows a user to ask natural‑language questions about documents stored in Google Drive. It automatically retrieves the relevant document, splits and embeds the text, stores the embeddings in a vector database, and then uses an OpenAI language model to answer the question in a conversational style. The result is presented as a chat message, providing a seamless, AI‑powered Q&A experience over existing documents.

⚙️ Main Features

Document retrieval from Google Drive using a user‑selected file.
Text chunking that respects language structure and avoids breaking sentences.
Embedding generation with OpenAI models for semantic indexing.
Vector store integration (Chroma) for fast similarity search.
Dynamic prompt construction that injects relevant document excerpts.
LLM response generation via OpenAI, outputting natural language answers.
Chat‑style output that displays the answer as a message in the playground.

🔄 Workflow Steps

Component Name	Role in the Workflow	Key Inputs	Key Outputs
Chat Input	Captures the user’s question.	Text of the question.	Message containing the question.
Google Drive File Retrieval	Loads the selected document from Drive.	File ID or selection.	Raw document data (binary/text).
Text Splitter	Divides the document into language‑aware chunks.	Raw document data.	List of text chunks.
OpenAI Embeddings	Converts text chunks into dense vectors.	List of text chunks.	Embeddings vector for each chunk.
Chroma Vector Store (Add)	Stores embeddings for later retrieval.	Embeddings vector.	Persisted vector store entry.
Chroma Vector Store (Search)	Finds the most relevant chunks for the user query.	User query, vector store.	Subset of document chunks (relevant excerpts).
Parse Data	Converts retrieved chunks into plain text.	Relevant document chunks.	Concatenated text excerpt.
Prompt Builder	Creates a prompt that includes the excerpt and the question.	Extracted text, user question.	Prompt message ready for the LLM.
OpenAI Model	Generates an answer to the prompt.	Prompt message.	Generated text response.
Chat Output	Presents the answer in the playground as a chat message.	Generated text.	Chat message displayed to the user.

🧠 Notes

The workflow requires valid Google Drive credentials and an OpenAI API key.
The vector store persists data under the directory chat_wiht_documents; ensure this path is writable.
Only the most recent 100 documents are indexed when using the “All Files in Drive” mode to avoid exceeding API limits.
The embedding operation is limited to 10,000 tokens per request to comply with OpenAI’s token restrictions.
The similarity search uses a cosine similarity threshold of 0.1; this can be tuned for stricter or looser matching.
The OpenAI model defaults to gpt‑4.1; users may switch to other models via the model_name parameter, which may affect token limits and cost.
All components operate in batch mode where possible, improving throughput for large documents.