OneDrive Integration

No sections found for this integration

The integration documentation may not have the expected structure

Overview

The OneDrive integration in Lamatic automates document syncing and processing from Microsoft OneDrive Business accounts. It supports various file types and provides secure integration with Lamatic Flow for automated document intelligence and RAG workflows.

This integration connects to your OneDrive Business account to sync documents for processing in Lamatic Flow.

Features

✅ Key Functionalities

Document Syncing: Automatically syncs documents from OneDrive drives and folders
File Type Support: Handles PDFs, Word documents, Excel files, and other compatible formats
Scheduled Processing: Supports automated sync schedules with cron expressions
Selective Filtering: Use glob patterns to filter specific file types and paths
Multiple Sync Modes: Supports both incremental (new/modified files only) and full-refresh (all files) synchronization modes
Search Scope Control: Configurable search scope including accessible drives, shared items, or all files

✅ Benefits

Automates document collection from OneDrive repositories
Enables RAG workflows with organizational knowledge
Provides granular control over file selection and processing
Supports both incremental and full-refresh synchronization modes

Available Functionality

Event Triggers

✅ Scheduled document syncing from OneDrive drives
✅ Support for multiple file types (PDF, DOCX, XLSX, etc.)
✅ Folder-specific monitoring and filtering
✅ Incremental and full-refresh sync modes

Actions

✅ Parse and extract text from documents
✅ Vectorize content for RAG workflows
✅ Filter files using glob patterns
✅ Schedule automated sync operations

Prerequisites

Before setting up the OneDrive integration, ensure you have:

A Microsoft 365 account with OneDrive Business access
Appropriate permissions to access OneDrive files and drives
Your organization's Tenant ID from Microsoft Entra Admin Center
Understanding of OneDrive folder structure and file organization
API Permissions: Proper OneDrive API permissions for accessing selected files and drives

Setup

Step 1: Set Up Microsoft 365 Credentials

Get Tenant ID: Navigate to Microsoft Entra Admin Center (opens in a new tab)
Access Azure Active Directory: Go to Azure Active Directory section
Copy Tenant ID: Under Tenant Information, copy the Tenant ID (also called Directory ID)

⚠️

Ensure you have appropriate OneDrive API permissions to access the selected files and drives.

Step 2: Configure OneDrive Node

Add OneDrive Node: Drag the OneDrive node to your flow
Enter Credentials: Provide your Microsoft 365 Tenant ID
Configure Drive Name: Enter the name of your OneDrive drive (usually "OneDrive")
Set Folder Path: Specify the folder path within the drive (use "." for all folders)

Step 3: Test and Deploy

Test Connection: Verify the node can access your OneDrive account
Configure Sync Settings: Set up sync mode, schedule, and file filters
Deploy Flow: Activate the flow to start syncing documents

Configuration Reference

OneDrive Node Parameters

Parameter	Description	Required	Default	Example
Credentials	Microsoft 365 credentials with access to OneDrive files	✅	-	`Microsoft 365`
Drive Name	Name of the connected OneDrive drive	✅	-	`OneDrive`
Folder Path	Path within the drive to target. Use `"."` to sync all folders	✅	-	`.`
Globs (Path Patterns)	Glob pattern for matching files	❌	`**`	`*/.pdf`, `*/.docx`
Sync Mode	Controls how files are re-indexed: `full_refresh` or `incremental`	✅	`incremental`	`incremental`
Sync Schedule	Schedule for automated syncs	❌	-	`Every 24 hours`
Search Scope	Scope of files to include: `ACCESSIBLE_DRIVES`, `SHARED_ITEMS`, or `ALL`	✅	`ALL`	`ALL`
Parsing Strategy	Strategy for extracting content: `fast`, `ocr_only`, or `hi_res`	✅	`fast`	`hi_res`
Days To Sync If History Is Full	Limit sync to files modified in the last N days if sync state is full	❌	`30`	`30`
Start Date	Ignore files modified before this UTC datetime (ISO format)	❌	-	`2024-01-01T00:00:00.000000Z`

Supported File Types

Only specific file types are currently supported for vectorization and indexing in Lamatic. Using unsupported formats may result in parsing errors during synchronization.

✅ Allowed File Extensions

.pdf — PDF Documents
.txt — Plain Text Files
.docx — Microsoft Word
.pptx — Microsoft PowerPoint
.md — Markdown Files

To avoid sync issues, ensure your glob patterns are configured to include only these types.

🔍 Recommended Glob Pattern

    globs:
    - "**/*.pdf"
    - "**/*.txt"
    - "**/*.docx"
    - "**/*.pptx"
    - "**/*.md"

Sync Configuration Options

Sync Modes

# Incremental Sync (recommended)
sync_mode: "incremental"  # Only sync new/modified files
 
# Full Refresh
sync_mode: "full_refresh"  # Re-index all files

Schedule Examples

# Daily at midnight
sync_schedule: "0 0 * * *"
 
# Every 6 hours
sync_schedule: "0 */6 * * *"
 
# Weekdays only at 9 AM
sync_schedule: "0 9 * * 1-5"

File Filtering Patterns

Common Glob Patterns

# All PDF files
globs: "**/*.pdf"
 
# All Word and Excel files
globs: "**/*.docx", "**/*.xlsx"
 
# Files in specific folders
globs: "HR/**/*", "Legal/**/*"
 
# Exclude draft folders
globs: "**/*", "!**/draft/**"

Search Scope Options

ACCESSIBLE_DRIVES: Only files in drives you have direct access to
SHARED_ITEMS: Files shared with you by others
ALL: All accessible files (recommended)

Low-Code Example

triggerNode:
  nodeId: triggerNode_1
  nodeType: onedriveBusinessNode
  nodeName: OneDrive Business
  values:
    credentials: Microsoft 365
    drive_name: OneDrive
    folder_path: .
    globs: ["**/*.pdf", "**/*.docx"]
    sync_mode: incremental
    sync_schedule: "0 2 * * *"
    search_scope: ALL
    parsing_strategy: fast
    days_to_sync_if_history_full: 30
    start_date: "2024-01-01T00:00:00.000000Z"

Output Schema

Batch Trigger Output

document_key: String identifier for the document (filename)
content: String containing the extracted text content from the document
metadata: Additional information about the document
- file_type: Type of the processed file
- file_size: Size of the file in bytes
- last_modified: Timestamp of last modification
- drive_name: Name of the OneDrive drive
- folder_path: Path within the drive
- parsing_strategy: Strategy used for content extraction

Example Output

{
  "document_key": "document_name.pdf",
  "content": "Extracted text content from the document",
  "metadata": {
    "file_type": "pdf",
    "file_size": 1024000,
    "last_modified": "2024-01-01T00:00:00.000Z",
    "drive_name": "OneDrive",
    "folder_path": "./Documents",
    "parsing_strategy": "fast"
  }
}

Usage Examples

Basic OneDrive Sync

# Basic configuration for syncing all documents
credentials: "Microsoft 365"
drive_name: "OneDrive"
folder_path: "."
globs: "**/*.pdf", "**/*.docx"
sync_mode: "incremental"
search_scope: "ALL"
parsing_strategy: "fast"

Advanced Configuration

# Advanced setup with scheduling and filtering
credentials: "Microsoft 365"
drive_name: "OneDrive"
folder_path: "./Documents"
globs: "**/*.pdf", "**/*.docx", "!**/draft/**"
sync_mode: "incremental"
sync_schedule: "0 2 * * *"  # Daily at 2 AM
search_scope: "ALL"
parsing_strategy: "hi_res"
days_to_sync_if_history_full: 30
start_date: "2024-01-01T00:00:00.000000Z"

Selective Document Sync

# Sync only specific document types from Work folder
credentials: "Microsoft 365"
drive_name: "OneDrive"
folder_path: "./Work"
globs: "**/*.pdf", "**/*.docx"
sync_mode: "incremental"
search_scope: "ACCESSIBLE_DRIVES"
parsing_strategy: "ocr_only"  # Better for scanned documents

Troubleshooting

Common Issues

Problem	Solution
Authentication Failed	Verify Tenant ID is correct and you have OneDrive access permissions
Drive Not Found	Check the Drive Name and ensure you have access to the specified OneDrive
Files Not Syncing	Verify folder path exists and glob patterns are correctly formatted
Permission Denied	Ensure your Microsoft 365 account has appropriate OneDrive API permissions
Sync Not Scheduled	Check cron expression format and ensure sync schedule is properly configured

Debugging Steps

Verify Credentials: Test your Microsoft 365 credentials and Tenant ID
Check Drive Access: Ensure you can access the OneDrive drive in your browser
Validate Folder Path: Confirm the folder path exists and is accessible
Test Glob Patterns: Verify file filtering patterns match your documents
Check Sync Logs: Review Lamatic Flow logs for detailed error information

Best Practices

Use incremental sync mode for better performance
Implement specific glob patterns to avoid syncing unnecessary files
Schedule syncs during off-peak hours to minimize impact
Use hi_res parsing for scanned documents and images
Regularly monitor sync logs for any issues
Set appropriate days_to_sync_if_history_full to limit historical data

Example Use Cases

Document Intelligence Workflows

Business Documents: Sync reports, contracts, and spreadsheets for automated processing
Internal Wikis: Index knowledge bases and team documentation
Legal Documents: Process compliance and audit-related content
Team Collaboration: Automate access to shared folders and project files

RAG Applications

Semantic Search: Enable natural language search across OneDrive documents
Question Answering: Build AI assistants that can answer questions about business documents
Document Summarization: Automatically summarize lengthy reports and documents
Content Discovery: Help users find relevant information across OneDrive repositories