CV Match
A flow that compares CVs with requirements, calculates matches, and returns the candidate with the best fit.
🧩 Overview
The workflow automates the comparison of candidate CVs against a set of user‑defined requirements. It retrieves CV documents from Google Drive, splits them into manageable chunks, embeds the text, stores the embeddings in a Chroma vector database, retrieves the most relevant CVs, extracts structured data, constructs prompts from user inputs, and finally generates a recommendation for the best‑matching candidate using an OpenAI language model.
⚙️ Main Features
- Retrieves and ingests CV files from a specified Google Drive folder.
- Splits CV text into language‑aware chunks for efficient embedding.
- Generates embeddings for both ingestion and search using OpenAI embeddings.
- Stores embeddings in a Chroma vector store and retrieves relevant CVs.
- Parses retrieved CV data into a plain‑text format.
- Builds dynamic prompts from user inputs (education, experience, etc.).
- Generates matching instructions with an OpenAI language model.
- Displays the final recommendation in a chat output.
🔄 Workflow Steps
| Component Name | Role in the Workflow | Key Inputs | Key Outputs |
|---|---|---|---|
| Obtener CV | Reads CV files from a Google Drive folder and delivers them to the pipeline. | Google Drive folder path, file selection settings. | CV file content as Data. |
| Language Recursive Text Splitter | Splits the CV text into language‑aware chunks for embedding. | CV Data. | Chunked Data. |
| OpenAI Embeddings (ingest) | Creates embeddings for each chunk to be stored in the vector database. | Chunked Data. | Embedding vectors. |
| Subir a DB | Ingests chunked data and their embeddings into a Chroma vector store. | Chunked Data, Embedding vectors. | Stored embeddings in Chroma. |
| OpenAI Embeddings (search) | Generates embeddings for the search query (CV). | Search query Data (CV). | Embedding vectors for search. |
| Cargar de DB | Retrieves the most similar CVs from Chroma based on the search embeddings. | Search query Data, embedding vectors. | Search results as Data. |
| ParseData | Converts the retrieved CV data into plain text suitable for prompt construction. | Retrieved CV Data. | Plain‑text Data (CV text). |
| Carrera (Text Input) | Captures the candidate’s field of study. | User input. | Text. |
| Certificados (Text Input) | Captures the candidate’s certifications. | User input. | Text. |
| Educación (Text Input) | Captures the candidate’s education details. | User input. | Text. |
| Experiencia (Text Input) | Captures the candidate’s work experience. | User input. | Text. |
| Habilidades (Text Input) | Captures the candidate’s skills. | User input. | Text. |
| Requisitos | Builds a prompt template containing placeholders for the user inputs. | Carrera, Certificados, Educación, Experiencia, Habilidades. | Prompt message with placeholders. |
| Instrucciones | Generates a user‑friendly prompt that includes the CV text and the user’s input. | CV text (from ParseData), user’s query (from Requisitos). | Final prompt message. |
| OpenAI Model | Generates the recommendation by responding to the prompt. | Prompt message. | Generated text. |
| Chat Output | Displays the recommendation in the playground chat. | Generated text. | Chat message shown to the user. |
All components are executed sequentially as dictated by the directed edges in the workflow diagram.
🧠 Notes
- Credentials: The workflow requires valid Google Drive and OpenAI API credentials.
- Embeddings: Two separate OpenAI embeddings components are used; one for ingesting CV chunks and one for generating search queries.
- Vector Store: Chroma is configured to persist embeddings in the
CV_matchdirectory and to retrieve up to 10 similar documents. - Chunking: The splitter uses a chunk size of 1000 tokens with 200 token overlap to preserve context.
- Prompt Generation: The
Requisitoscomponent dynamically inserts user inputs into a template that describes the required CV attributes. - Error Handling: If no matching CVs are found, the ParseData component will return an empty text, causing the OpenAI model to respond accordingly.
- Performance: Embedding generation and database ingestion are the most compute‑heavy steps; caching is enabled for these components to reduce repeated calls.
This documentation provides a concise, functional overview of the workflow, its components, and their interactions without delving into internal implementation details.