Google Drive Node
Overview
The Google Drive Node is a file synchronization component that automates fetching and synchronizing files from Google Drive. This node supports various file types, including Google Docs, Slides, Word Documents, PDFs, and text files, enabling regular file synchronization to support vectorization and indexing for Retrieval-Augmented Generation (RAG) flows.
Node Type Information
Type | Description | Status |
---|---|---|
Batch Trigger | Starts the flow on a schedule or batch event. Ideal for periodic data processing. | ✅ True |
Event Trigger | Starts the flow based on external events (e.g., webhook, user interaction). | ❌ False |
Action | Executes a task or logic as part of the flow (e.g., API call, transformation). | ❌ False |
This node is a Batch Trigger node that automates file fetching and synchronization from Google Drive on a scheduled basis.
This node is a Batch Trigger node that streamlines file collection from Google Drive and prepares files for vectorization and indexing to enhance RAG flows.
To use the Google Drive node, you need to create a separate flow to Implement RAG. You can integrate it into this distinct flow.
Features
Key Functionalities
- Batch Trigger: Automates file fetching and synchronization on a schedule using Cron expressions for regular execution.
- File Type Support: Handles multiple file formats including Google Docs, Slides, Word Documents, PDFs, and text files.
- Synchronization Modes: Supports both incremental (process new/updated files only) and full-refresh (reprocess all files) modes.
- Folder-based Processing: Processes files from specified Google Drive folders with configurable access permissions.
Benefits
- Streamlined File Collection: Automates the process of collecting files from Google Drive, reducing manual effort and ensuring consistency.
- RAG Flow Preparation: Prepares files for vectorization and indexing to enhance Retrieval-Augmented Generation workflows.
- Scheduled Synchronization: Enables regular file updates through configurable scheduling, keeping data current.
- Flexible Processing: Supports both incremental and full synchronization modes to optimize processing efficiency.
Prerequisites
Before using Google Drive Node, ensure the following:
- Google Drive Credentials: Valid Google authentication details for accessing the desired folder.
- Target Folder URL: The specific Google Drive folder URL from which files will be fetched.
- Cron Expression Knowledge: Understanding of Cron expressions for scheduling flows.
- RAG Flow Setup: A separate flow configured for implementing Retrieval-Augmented Generation.
Setup
Step 1: Set Up Google Drive Access
-
Create Google Drive Credentials:
- Set up Google OAuth credentials for Drive API access
- Ensure proper permissions for the target folder
- Test folder accessibility and file permissions
-
Identify Target Folder:
- Copy the Google Drive folder URL
- Verify folder contains supported file types
- Ensure folder is accessible with provided credentials
Step 2: Configure Google Drive Credentials
Use the following format to set up your credentials:
Key Name | Description | Example Value |
---|---|---|
Credential Name | Name to identify this set of credentials | my-google-drive-creds |
Google OAuth | Google authentication details for Drive access | Google Drive OAuth |
Step 3: Set Up Lamatic Flow
-
Create a Custom Flow for Google Drive:
- Configure the Google Drive node
- Set up scheduling parameters
- Define synchronization preferences
-
Use Pre-built Usecases from the Lamatic Studio
Configuration Reference
Batch Trigger Configuration
Parameter | Description | Required | Example |
---|---|---|---|
Name | Display name for the node | ✅ | Google Drive |
Credentials | Google authentication details required to connect to the Drive folder | ✅ | my-google-drive-credentials |
Folder | The folder from which files will be fetched | ✅ | https://drive.google.com/drive/folders/{folder-id} |
Sync Mode | Specify either incremental (process new/updated files only) or full-refresh (reprocess all files) | ✅ | incremental |
Sync Schedule | Frequency for running the flow, specified using a Cron expression | ✅ | 0 0 05 1/1 * ? * UTC |
Low-Code Example
triggerNode:
nodeId: triggerNode_1
nodeType: googleDriveNode
nodeName: Google Drive
values:
process: batch
syncMode: incremental_append
credentials: Google Drive OAuth
cronExpression: 0 0 */12 ? * * UTC
folderUrl: https://drive.google.com/drive/folders/YOUR_ID
Testing
For testing purposes, you need to manually provide content
and document_key
as dummy data. Once deployed, these values will be automatically fetched from your configured Google Drive folder.
Test Configuration
{
"document_key": "test_document.pdf",
"content": "This is test content that will be processed for RAG flows"
}
document_key
: Serves as the unique identifier for the document, used to locate or update the specific file or filename.content
: Represents the data or text that will be stored in the document.
Output
Batch Trigger Output
document_key
: String identifier for the document (filename or unique key)content
: String containing the document content or text datametadata
: Additional file information (if available)file_type
: Type of the processed filelast_modified
: Timestamp of last modificationfile_size
: Size of the file in bytes
Troubleshooting
Common Issues
Problem | Solution |
---|---|
Invalid Credentials | Verify that the correct Google Drive credentials are provided |
Folder Not Found | Ensure the folder URL is accurate and accessible |
Sync Not Working | Check the Cron expression for correctness |
File Types Unsupported | Confirm that the files in the folder are of supported types (Google Docs, PDFs, etc.) |
Permission Errors | Verify Google Drive API access permissions and folder sharing settings |
Scheduling Issues | Validate Cron expression syntax and timezone settings |
Debugging
- Check Lamatic Flow logs for detailed error messages
- Verify Google Drive API access permissions
- Test the folder URL to ensure it's valid and accessible
- Validate Cron expression syntax using online Cron validators
- Monitor file processing logs for specific file errors
- Test with a small folder containing few files before scaling up
- Verify network connectivity and API rate limits