Invoice data extractor
This flow automates the data extraction process from unstructured invoices.
+2
graph TD
%%{init: {'theme': 'mc','layout': 'elk'}}%%
ParseData-8kr6e[<div><img alt="logo" src="/_astro/braces.Djq0PW4_.svg" style="height: 20px !important;width: 20px !important"/></div>Parse Data]
style ParseData-8kr6e stroke:#a170ff
Prompt-0gsjq[<div><img alt="logo" src="/_astro/square-terminal.BMOXc-nZ.svg" style="height: 20px !important;width: 20px !important"/></div>Extractor de Informacion]
style Prompt-0gsjq stroke:#a170ff
OpenAIModel-m3qyl[<div><img alt="logo" src="/_astro/openAI.BhmuxEs3.svg" style="height: 20px !important;width: 20px !important"/></div>OpenAI]
style OpenAIModel-m3qyl stroke:#a170ff
TextInput-84vxn[<div><img alt="logo" src="/_astro/type.Dy26vmDy.svg" style="height: 20px !important;width: 20px !important"/></div>Text Input]
style TextInput-84vxn stroke:#a170ff
GDriveFilesComponent-7oth8[<div><img alt="logo" src="/_astro/google_drive.wKmDsV2c.svg" style="height: 20px !important;width: 20px !important"/></div>Drive File Manager]
style GDriveFilesComponent-7oth8 stroke:#a170ff
TextOutput-0lnck[<div><img alt="logo" src="/_astro/type.Dy26vmDy.svg" style="height: 20px !important;width: 20px !important"/></div>Text Output]
style TextOutput-0lnck stroke:#a170ff
ParseData-8kr6e -.- Prompt-0gsjq
linkStyle 0 stroke:#a170ff
Prompt-0gsjq -.- OpenAIModel-m3qyl
linkStyle 1 stroke:#a170ff
GDriveFilesComponent-7oth8 -.- ParseData-8kr6e
linkStyle 2 stroke:#a170ff
TextInput-84vxn -.- GDriveFilesComponent-7oth8
linkStyle 3 stroke:#a170ff
OpenAIModel-m3qyl -.- TextOutput-0lnck
linkStyle 4 stroke:#a170ff
Invoice Data Extractor
🧩 Overview
The Invoice Data Extractor automates the extraction of structured information from unstructured invoice documents stored in Google Drive. By retrieving the file, converting it to plain text, feeding a dynamic prompt to an OpenAI model, and presenting the extracted data, the workflow streamlines the billing and accounting process, reducing manual data entry and improving accuracy.
⚙️ Main Features
- Retrieves invoices directly from a specified Google Drive folder.
- Converts document content into plain text using a templated parser.
- Generates a custom extraction prompt that lists all required fields.
- Sends the prompt to an OpenAI language model to extract structured data.
- Outputs the extracted information in a readable text format for end‑users.
🔄 Workflow Steps
| Component Name | Role in the Workflow | Key Inputs | Key Outputs |
|---|---|---|---|
| Text Input | Provides the Google Drive folder URL that contains the invoice file. | Folder URL (text) | Folder ID (used by the Drive File Manager) |
| Drive File Manager | Retrieves the invoice file from the specified folder. | Folder ID | File content (binary or data format) |
| Parse Data | Transforms the raw file content into plain text according to a template. | File content | Invoice text (Message) |
| Prompt | Builds a prompt that embeds the invoice text and lists the fields to extract. | Invoice text | Prompt message |
| OpenAI Model | Executes the prompt with the selected LLM to extract structured data. | Prompt message | Extracted fields (Message) |
| Text Output | Displays the extracted data to the user. | Extracted fields | Visible text output |
Note: The Label Component is used only for display purposes and is not part of the execution flow.
🧠 Notes
- The workflow relies on valid Google Drive and OpenAI credentials configured in the component settings.
- The OpenAI model operates in text mode; if JSON output is required, ensure the prompt contains a clear instruction to return JSON.
- Parsing accuracy depends on the quality of the PDF or image; OCR can be enabled in the Drive File Manager if needed.
- The extraction prompt lists all mandatory invoice fields, but optional fields are included only if present in the source.
- The final output is presented as plain text; downstream processes may need to convert it into a structured format (e.g., CSV or JSON) for integration with other systems.