Docs
Chunking Node

Chunking Node Documentation

The Chunking node is a tool designed to parse data into manageable chunks to facilitate vectorization and semantic retrieval. By breaking text into smaller, logical units, it prepares the data for more efficient processing and analysis. This node can be configured to use specific chunking methodologies and separators, ensuring that the text is divided in a way that maintains context and relevance. Lamatic.ai provides this functionality as part of its suite of tools to enhance data processing and streamline workflows, making it easier for users to manage large datasets and extract meaningful insights.

chunk.png

Features

Key Functionalities
  1. Custom Chunking Logic: Configure specific methodologies and separators to divide text into meaningful units while preserving context.

  2. Semantic Preparation: Break data into manageable chunks optimized for vectorization and semantic retrieval processes.

  3. Flexible Integration: Incorporate Lamatic.ai’s Chunking node into various workflows for tailored data processing solutions.

  4. Contextual Integrity: Ensure the resulting chunks maintain relevance and logical flow for downstream analysis.

  5. Configurable Parameters: Customize chunk size, overlap, and separators to match the needs of your dataset.

Benefits
  1. Optimized Data Processing: Enhance the performance of vectorization and semantic retrieval with pre-processed chunks.

  2. Scalable Workflows: Manage large datasets effectively, enabling workflows to scale seamlessly with increasing data.

  3. Improved Accuracy: Maintain context and relevance in chunked data for better insights during analysis.

  4. Streamlined Automation: Automate text parsing tasks to save time and reduce manual effort.

  5. Enhanced Insights: Facilitate meaningful information extraction from structured and unstructured datasets.

What Can You Build?

  1. Develop a system for efficient text vectorization and semantic retrieval.
  2. Create workflows that handle large datasets by breaking them into manageable chunks.
  3. Implement automation processes to parse and prepare data for machine learning models.
  4. Build data processing pipelines that maintain context and relevance through custom chunking methodologies.

Setup

Select the Chunking Node

  1. Fill in the required parameters.
  2. Build the desired flow
  3. Deploy the Project
  4. Click Setup on the workflow editor to get the automatically generated instruction and add it in your application.

Configuration Reference

ParameterDescriptionRequiredExample Value
Text to chunkThe text data to be split into smaller chunks.Yes${{data}}
Number of CharactersMax length (in characters) for each chunk.Yes200
Overlapping CharactersNumber of characters overlapping between chunks.Yes20
Chunking TypeChoose the method to split the text into chunks.

Recursive Character Text Splitter method splits text into smaller chunks while preserving as much context as possible. It works by recursively breaking down the text, starting with larger logical units like paragraphs, then sentences, and so on, until the desired chunk size is achieved. This approach is ideal for use cases where maintaining coherence within chunks is important, such as natural language processing or document indexing.

Character Text Splitter is a simpler method that divides text into chunks based on a fixed number of characters. It does not consider the structure of the text, such as paragraphs or sentences, and while it is faster and easier to implement, it may cut off sentences or disrupt the logical flow of the text.
YesRecursive Character Text Splitter
List of separatorsCharacters or strings used to define chunk boundaries.No\n\n \n

Low-Code Example

nodes:
  - nodeId: chunkNode_262
    nodeType: chunkNode
    nodeName: chunking
    values:
      chunkField: "{{triggerNode_1.output.content}}"
      numOfChars: 200
      separators:
        - \n\n
        - \n
        - " "
      chunkingType: recursiveCharacterTextSplitter
      overlapChars: "20"
    needs:
      - triggerNode_1

Troubleshooting

Common Issues

ProblemSolution
Dynamic Content Not LoadedIncrease the Wait for Page Load time in the configuration.

Debugging

  1. Check Lamatic Flow logs for error details.

Was this page useful?

Questions? We're here to help

Subscribe to updates