Docs
Data
Chunking Node

Chunking Node

Loading node sections...

Overview

The Chunking Node is a data processing component that breaks down large text documents into smaller, manageable chunks. This is essential for processing large documents with AI models that have token limits, enabling efficient text analysis and processing.

chunking.png

Node Type Information

TypeDescriptionStatus
Batch TriggerStarts the flow on a schedule or batch event. Ideal for periodic data processing.❌ False
Event TriggerStarts the flow based on external events (e.g., webhook, user interaction).❌ False
ActionExecutes a task or logic as part of the flow (e.g., API call, transformation).âś… True

This node is an Action node that processes large text documents and breaks them into smaller chunks for further processing.

Features

Key Functionalities
  1. Custom Chunking Logic: Configure specific methodologies and separators to divide text into meaningful units while preserving context.

  2. Semantic Preparation: Break data into manageable chunks optimized for vectorization and semantic retrieval processes.

  3. Flexible Integration: Incorporate Lamatic.ai's Chunking node into various flow for tailored data processing solutions.

  4. Contextual Integrity: Ensure the resulting chunks maintain relevance and logical flow for downstream analysis.

  5. Configurable Parameters: Customize chunk size, overlap, and separators to match the needs of your dataset.

Benefits
  1. Optimized Data Processing: Enhance the performance of vectorization and semantic retrieval with pre-processed chunks.

  2. Scalable Flow: Manage large datasets effectively, enabling flow to scale seamlessly with increasing data.

  3. Improved Accuracy: Maintain context and relevance in chunked data for better insights during analysis.

  4. Streamlined Automation: Automate text parsing tasks to save time and reduce manual effort.

  5. Enhanced Insights: Facilitate meaningful information extraction from structured and unstructured datasets.

What Can You Build?

  1. Develop a system for efficient text vectorization and semantic retrieval.
  2. Create flow that handle large datasets by breaking them into manageable chunks.
  3. Implement automation processes to parse and prepare data for machine learning models.
  4. Build data processing pipelines that maintain context and relevance through custom chunking methodologies.

Setup

Select the Chunking Node

  1. Fill in the required parameters.
  2. Build the desired flow
  3. Deploy the Project
  4. Click Setup on the workflow editor to get the automatically generated instruction and add it in your application.

Configuration Reference

ParameterDescriptionRequiredExample Value
Text to chunkThe text data to be split into smaller chunks.Yes${{data}}
Number of CharactersMax length (in characters) for each chunk.Yes200
Overlapping CharactersNumber of characters overlapping between chunks.Yes20
Chunking TypeChoose the method to split the text into chunks.

Recursive Character Text Splitter method splits text into smaller chunks while preserving as much context as possible. It works by recursively breaking down the text, starting with larger logical units like paragraphs, then sentences, and so on, until the desired chunk size is achieved. This approach is ideal for use cases where maintaining coherence within chunks is important, such as natural language processing or document indexing.

Character Text Splitter is a simpler method that divides text into chunks based on a fixed number of characters. It does not consider the structure of the text, such as paragraphs or sentences, and while it is faster and easier to implement, it may cut off sentences or disrupt the logical flow of the text.
YesRecursive Character Text Splitter
List of separatorsCharacters or strings used to define chunk boundaries.No\n\n \n

Low-Code Example

nodes:
  - nodeId: chunkNode_262
    nodeType: chunkNode
    nodeName: chunking
    values:
      chunkField: "{{triggerNode_1.output.content}}"
      numOfChars: 200
      separators:
        - \n\n
        - \n
        - " "
      chunkingType: recursiveCharacterTextSplitter
      overlapChars: "20"
    needs:
      - triggerNode_1

Output

chunks

  • An array of objects, each representing a segmented portion of the input text and its associated metadata.

pageContent

  • The textual content of the chunk, extracted from the input data.

metadata

  • A nested object providing additional details about the chunk's origin or properties.

loc

Was this page useful?

Questions? We're here to help

Subscribe to updates