Pipeline Orchestration

Visual workflow orchestration with DataMate

Pipeline orchestration module provides drag-and-drop visual interface for designing and managing complex data processing workflows.

Features Overview

Pipeline orchestration provides:

  • Visual Designer: Drag-and-drop workflow design
  • Rich Node Types: Data processing, conditions, loops, etc.
  • Flow Execution: Auto-execute and monitor workflows
  • Template Management: Save and reuse flow templates
  • Version Management: Flow version control

Node Types

Data Nodes

NodeFunctionConfig
Input DatasetRead from datasetSelect dataset
Output DatasetWrite to datasetSelect dataset
Data CollectionExecute collection taskSelect task
Data CleaningExecute cleaning taskSelect task
Data SynthesisExecute synthesis taskSelect task

Logic Nodes

NodeFunctionConfig
Condition BranchExecute different branchesCondition expression
LoopRepeat executionLoop count/condition
ParallelExecute multiple branches in parallelBranch count
WaitWait for specified timeDuration

Quick Start

1. Create Pipeline

Step 1: Enter Pipeline Orchestration Page

Select Pipeline Orchestration in left navigation.

Step 2: Create Pipeline

Click Create Pipeline.

Step 3: Fill Basic Information

  • Pipeline name: e.g., data_processing_pipeline
  • Description: Pipeline purpose (optional)

Step 4: Design Flow

  1. Drag nodes from left library to canvas
  2. Connect nodes
  3. Configure node parameters
  4. Save flow

Example:

Input Dataset → Data Cleaning → Condition Branch
                                    ├── Satisfied → Data Annotation → Output
                                    └── Not Satisfied → Data Synthesis → Output

2. Execute Pipeline

Step 1: Enter Execution Page

Click pipeline name to enter details.

Step 2: Execute Pipeline

Click Execute Now.

Step 3: Monitor Execution

View execution status, progress, and logs.

Advanced Features

Flow Templates

Save as Template

  1. Design flow
  2. Click Save as Template
  3. Enter template name

Use Template

  1. Create pipeline, click Use Template
  2. Select template
  3. Load to designer

Parameterized Flow

Define parameters in pipeline:

{
  "parameters": [
    {
      "name": "input_dataset",
      "type": "dataset",
      "required": true
    }
  ]
}

Scheduled Execution

Configure scheduled execution:

  • Cron expression: 0 0 2 * * ? (Daily at 2 AM)
  • Execution parameters

Best Practices

1. Flow Design

Recommended principles:

  • Modular: Split complex flows
  • Reusable: Use templates
  • Maintainable: Add comments
  • Testable: Test individually

2. Performance Optimization

Optimize performance:

  • Parallelize: Use parallel nodes
  • Reduce data transfer: Process locally
  • Batch operations: Use batch operations
  • Cache results: Cache intermediate results

Common Questions

Q: Flow execution failed?

A: Troubleshoot:

  1. View execution logs
  2. Check node configuration
  3. Check data format
  4. Test nodes individually

Last modified February 9, 2026: :memo: add english docs (3868c82)