Multimodel Text Node

No sections found for this node

The node documentation may not have the expected structure

Overview

The Multimodal Node returns textual output from selected large language models (LLMs). It supports both text and image inputs. This node is particularly useful for applications that require a seamless integration of textual and visual data processing, such as image captioning where the text is generated based on the content of an image.

Features

Key Functionalities

Textual and Visual Input Support: Accepts both text and image inputs, enabling seamless integration of multimodal data for processing.
Text Generation: Produces textual outputs from selected large language models (LLMs) based on the provided input.
Image-Based Text Generation: Generates captions and descriptive text based on the content of images, making it suitable for applications like image captioning.
LLM Integration: Supports integration with multiple large language models, offering flexibility to choose the best-suited model for specific use cases.
Context-Aware Processing: Combines textual and visual data inputs to enhance the accuracy and relevance of the generated output.

Benefits

Versatility: Enables applications in various domains, such as image captioning, multimodal analysis, and AI-driven content generation.
Enhanced User Experience: Provides seamless handling of both text and image inputs, improving workflow efficiency.
Scalability: Adapts easily to a wide range of use cases, from simple text generation to complex multimodal data analysis.
Efficiency: Consolidates textual and visual processing into a single node, reducing the need for separate tools and streamlining flow.
Advanced AI Integration: Leverages the capabilities of modern LLMs to deliver accurate and contextually rich outputs.

What can I build?

Develop applications that seamlessly integrate text and image data processing for tasks like image captioning.
Create tools for automatic generation of descriptive content for visual data, enhancing accessibility.
Build interactive applications where user actions on images are analyzed to generate contextual feedback.
Design systems that maintain consistent tone and style across generated content based on previous interactions.

Setup

Select the Multimodal Text Node

Fill in the required parameters.
Build the desired flow
Deploy the Project
Click Setup on the workflow editor to get the automatically generated instruction and add it in your application.

Configuration Reference

Parameter	Description	Example Value
Generative Model Name	Select the model to generate text based on the prompt.	Gemini Model
Prompt Template	Define the instructions for generating the text output.Define the instructions for generating the text output.	Tell me something about Bali
Attachments	Select the attachments to be used for the multi modal LLM.	`{{triggerNode_1.output.topic}}`
System Prompt	System prompt to guide the LLM	You are Travel Planner

Low-Code Example

nodes:
  - nodeId: multiModalLLMNode_924
    nodeType: multiModalLLMNode
    nodeName: Multi Modal
    values:
      promptTemplate: Tell me something about ${{triggerNode_1.output.topic}}
      attachments: '{{triggerNode_1.output.topic}}'
      systemPrompt: You are an AI Assistant
      messages: '[]'
      generativeModelName:
        provider_name: mistral
        type: generator/text
        credential_name: Mistral API
        credentialId: 32bf5e3b-a8fc-4697-b95a-b1af3dcf7498
        model_name: mistral/mistral-large-2402
    needs:
      - triggerNode_1
  - nodeId: plus-node-addNode_704346
    nodeType: addNode
    nodeName: ''
    values: {}
    needs:
      - multiModalLLMNode_924

Output

`_meta`

A nested object containing metadata about the processing of the text generation request.
- prompt_tokens: The number of tokens in the input prompt provided to the model.
- completion_tokens: The number of tokens in the generated output.
- total_tokens: The sum of prompt_tokens and completion_tokens.
- prompt_tokens_details: Breakdown of token usage in the prompt.
  - cached_tokens: The number of tokens reused from a cache.
  - audio_tokens: The number of tokens associated with audio input (if applicable).
- completion_tokens_details: Breakdown of token usage in the generated output.
  - reasoning_tokens: The number of tokens used for reasoning processes (if applicable).
  - audio_tokens: The number of tokens associated with audio output (if applicable).
  - accepted_prediction_tokens: The number of tokens from accepted predictions (if applicable).
  - rejected_prediction_tokens: The number of tokens from rejected predictions (if applicable).
- model_name: The name of the AI model used for text generation.
- model_provider

Example Output

{
    "_meta": {
      "prompt_tokens": 276,
      "completion_tokens": 130,
      "total_tokens": 406,
      "prompt_tokens_details": {
        "cached_tokens": 0,
        "audio_tokens": 0
      },
      "completion_tokens_details": {
        "reasoning_tokens": 0,
        "audio_tokens": 0,
        "accepted_prediction_tokens": 0,
        "rejected_prediction_tokens": 0
      },
      "model_name": "gpt-4-turbo",
      "model_provider": "openai"
    },
    "generatedResponse": "Response"
  }

Troubleshooting

Common Issues

Problem	Solution
Invalid API Key	Ensure the API key is correct and has not expired.
Dynamic Content Not Loaded	Increase the `Wait for Page Load` time in the configuration.

Debugging

Check Lamatic Flow logs for error details.
Verify API Key.

Memory Retrieve Node RAG Node

Was this page useful?

Questions? We're here to help

Feedback Email Talk to sales