Pipeline Orchestration
Visual workflow orchestration with DataMate
Pipeline orchestration module provides drag-and-drop visual interface for designing and managing complex data processing workflows.
Features Overview
Pipeline orchestration provides:
- Visual Designer: Drag-and-drop workflow design
- Rich Node Types: Data processing, conditions, loops, etc.
- Flow Execution: Auto-execute and monitor workflows
- Template Management: Save and reuse flow templates
- Version Management: Flow version control
Node Types
Data Nodes
| Node | Function | Config |
|---|---|---|
| Input Dataset | Read from dataset | Select dataset |
| Output Dataset | Write to dataset | Select dataset |
| Data Collection | Execute collection task | Select task |
| Data Cleaning | Execute cleaning task | Select task |
| Data Synthesis | Execute synthesis task | Select task |
Logic Nodes
| Node | Function | Config |
|---|---|---|
| Condition Branch | Execute different branches | Condition expression |
| Loop | Repeat execution | Loop count/condition |
| Parallel | Execute multiple branches in parallel | Branch count |
| Wait | Wait for specified time | Duration |
Quick Start
1. Create Pipeline
Step 1: Enter Pipeline Orchestration Page
Select Pipeline Orchestration in left navigation.
Step 2: Create Pipeline
Click Create Pipeline.
Step 3: Fill Basic Information
- Pipeline name: e.g.,
data_processing_pipeline - Description: Pipeline purpose (optional)
Step 4: Design Flow
- Drag nodes from left library to canvas
- Connect nodes
- Configure node parameters
- Save flow
Example:
Input Dataset → Data Cleaning → Condition Branch
├── Satisfied → Data Annotation → Output
└── Not Satisfied → Data Synthesis → Output
2. Execute Pipeline
Step 1: Enter Execution Page
Click pipeline name to enter details.
Step 2: Execute Pipeline
Click Execute Now.
Step 3: Monitor Execution
View execution status, progress, and logs.
Advanced Features
Flow Templates
Save as Template
- Design flow
- Click Save as Template
- Enter template name
Use Template
- Create pipeline, click Use Template
- Select template
- Load to designer
Parameterized Flow
Define parameters in pipeline:
{
"parameters": [
{
"name": "input_dataset",
"type": "dataset",
"required": true
}
]
}
Scheduled Execution
Configure scheduled execution:
- Cron expression:
0 0 2 * * ?(Daily at 2 AM) - Execution parameters
Best Practices
1. Flow Design
Recommended principles:
- Modular: Split complex flows
- Reusable: Use templates
- Maintainable: Add comments
- Testable: Test individually
2. Performance Optimization
Optimize performance:
- Parallelize: Use parallel nodes
- Reduce data transfer: Process locally
- Batch operations: Use batch operations
- Cache results: Cache intermediate results
Common Questions
Q: Flow execution failed?
A: Troubleshoot:
- View execution logs
- Check node configuration
- Check data format
- Test nodes individually
Related Documentation
- Data Collection - Collection nodes
- Data Cleaning - Cleaning nodes
- Operator Market - Get more operators
Feedback
Was this page helpful?
Glad to hear it! Please tell us how we can improve.
Sorry to hear that. Please tell us how we can improve.