Docs
Workflows
Nodes
AI
Multimodal

Multi-Modal

This node leverages a multi-modal Large Language Model to generate text output based on provided instructions and visual inputs. It allows for the integration of both textual and visual data in the generation process.

Input Parameters

Required Parameters

  • Node Name: A unique identifier for this node within your workflow.
  • Prompt Template: Detailed instructions that guide the multi-modal LLM in generating the desired text output. This should include specific directions on how to interpret and utilize both textual and visual inputs.
  • Generative Model Name: Select the specific multi-modal LLM to be used for text generation. The chosen model should be capable of processing both text and image inputs.
  • Attachments: Specify the visual inputs to be processed by the multi-modal LLM. This can be provided as either:
    • A list of strings (each string representing a file path or URL to an image)
    • A single string (representing a file path or URL to an image)
    • You can alsouse Variable Selector to select the attachments from previous nodes.

Optional Parameters

  • System Prompt: A high-level instruction set that establishes the overall context and behavior for the LLM. This can be used to define the AI's role, set constraints, or provide background information.
  • Messages: A list of previous messages or conversation history to provide context for the current generation task. This allows for more coherent and contextually relevant outputs. For more detailed information on message formatting and usage, please refer to our Messages documentation.

Expected Output

The node will produce two main outputs:

  1. generatedResponse: The primary output text generated by the multi-modal LLM based on the provided prompt, attachments, and any additional context.

  2. _meta: A metadata object containing detailed information about the node's execution, including:

    • Prompt tokens: Number of tokens used in the input
    • Completion tokens: Number of tokens generated in the output
    • Total tokens: Sum of prompt and completion tokens
    • Prompt tokens details: Breakdown of token usage in different parts of the input
    • Completion tokens details: Breakdown of token usage in the generated output
    • Model name: Identifier of the specific model used
    • Model provider: Name of the company or service providing the model

This comprehensive metadata allows for better understanding and optimization of the multi-modal LLM usage within your workflow.

Example Use Case

In this example, we are using a multi modal LLM to generate a response based on a prompt and attachments.

Workflow View

Complete Workflow Configuration
triggerNode:
  nodeId: graphqlNode_8rmzr195wo
  nodeName: Graphql
  values:
    responeType: realtime
    advance_schema: |-
      {
        "prompt": "string",
        "attachments": "[string]"
      }
nodes:
  - nodeId: multiModalLLMNode_5191sn539o
    nodeName: Multi Modal
    values:
      attachments: '{{graphqlNode_8rmzr195wo.output.attachments}}'
      promptTemplate: 'Answer the query: {{graphqlNode_8rmzr195wo.output.prompt}}'
      generativeModelName:
        type: generator/text
        model_name: gpt-4o-mini
        credentialId: aeeb2922-a7d3-431a-a376-77e01e725c51
        provider_name: openai
        credential_name: Test OpenAI
    needs:
      - graphqlNode_8rmzr195wo
responseNode:
  nodeId: graphqlResponseNode_3am8g6lws1
  values:
    outputMapping: |-
      {
        "output": "{{multiModalLLMNode_5191sn539o.output.generatedResponse}}",
        "meta": "{{multiModalLLMNode_5191sn539o.output._meta}}"
      }
  needs:
    - multiModalLLMNode_5191sn539o
 
Sample Workflow Input
{
  "prompt": "are the two images same?",
  "attachments": [
    "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg",
    "https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg"
  ]
}
Sample Workflow Output
{
    "output": "Yes, the two images are the same.",
    "meta": {
        "prompt_tokens": 73696,
        "completion_tokens": 9,
        "total_tokens": 73705,
        "prompt_tokens_details": {
            "cached_tokens": 0
        },
        "completion_tokens_details": {
            "reasoning_tokens": 0
        },
        "model_name": "gpt-4o-mini",
        "model_provider": "openai"
    }
}

Was this page useful?

Questions? We're here to help

Subscribe to updates