This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Quick Start

Deploy DataMate in 5 minutes

This guide will help you deploy DataMate platform in 5 minutes.

DataMate supports two main deployment methods:

  • Docker Compose: Suitable for quick experience and development testing
  • Kubernetes/Helm: Suitable for production deployment

Prerequisites

Docker Compose Deployment

  • Docker 20.10+
  • Docker Compose 2.0+
  • At least 4GB RAM
  • At least 10GB disk space

Kubernetes Deployment

  • Kubernetes 1.20+
  • Helm 3.0+
  • kubectl configured with cluster connection
  • At least 8GB RAM
  • At least 20GB disk space

5-Minute Quick Deployment (Docker Compose)

1. Clone the Code

git clone https://github.com/ModelEngine-Group/DataMate.git
cd DataMate

2. Start Services

Use the provided Makefile for one-click deployment:

make install

After running the command, the system will prompt you to select a deployment method:

Choose a deployment method:
1. Docker/Docker-Compose
2. Kubernetes/Helm
Enter choice:

Enter 1 to select Docker Compose deployment.

3. Verify Deployment

After services start, you can access them at:

  • Frontend: http://localhost:30000
  • API Gateway: http://localhost:8080
  • Database: localhost:5432

4. Check Service Status

docker ps

You should see the following containers running:

  • datamate-frontend (Frontend service)
  • datamate-backend (Backend service)
  • datamate-backend-python (Python backend service)
  • datamate-gateway (API gateway)
  • datamate-database (PostgreSQL database)
  • datamate-runtime (Operator runtime)

Optional Components Installation

Install Milvus Vector Database

Milvus is used for vector storage and retrieval in knowledge bases:

make install-milvus

Select Docker Compose deployment method when prompted.

Install Label Studio Annotation Tool

Label Studio is used for data annotation:

make install-label-studio

Access: http://localhost:30001

Default credentials:

Install MinerU PDF Processing Service

MinerU provides enhanced PDF document processing:

make build-mineru
make install-mineru

Install DeerFlow Service

DeerFlow is used for enhanced workflow orchestration:

make install-deer-flow

Using Local Images for Development

If you’ve modified local code, use local images for deployment:

make build
make install dev=true

Offline Environment Deployment

For offline environments, download all images first:

make download SAVE=true

Images will be saved in the dist/ directory. Load images on the target machine:

make load-images

Uninstall

Uninstall DataMate

make uninstall

The system will prompt whether to delete volumes:

  • Select 1: Delete all data (including datasets, configurations, etc.)
  • Select 2: Keep volumes

Uninstall Specific Components

# Uninstall Label Studio
make uninstall-label-studio

# Uninstall Milvus
make uninstall-milvus

# Uninstall DeerFlow
make uninstall-deer-flow

Next Steps

Common Questions

Q: What if service startup fails?

First check if ports are occupied:

# Check port usage
lsof -i :30000
lsof -i :8080

If ports are occupied, modify port mappings in deployment/docker/datamate/docker-compose.yml.

Q: How to view service logs?

# View all service logs
docker compose -f deployment/docker/datamate/docker-compose.yml logs

# View specific service logs
docker compose -f deployment/docker/datamate/docker-compose.yml logs -f datamate-backend

Q: Where is data stored?

Data is persisted through Docker volumes:

  • datamate-dataset-volume: Dataset files
  • datamate-postgresql-volume: Database data
  • datamate-log-volume: Log files

View all volumes:

docker volume ls | grep datamate

1 - Installation Guide

Detailed installation and configuration instructions for DataMate

This document provides detailed installation and configuration instructions for the DataMate platform.

System Requirements

Minimum Configuration

ComponentMinimumRecommended
CPU4 cores8 cores+
RAM8 GB16 GB+
Disk50 GB100 GB+
OSLinux/macOS/WindowsLinux (Ubuntu 20.04+)

Software Dependencies

Docker Compose Deployment

  • Docker 20.10+
  • Docker Compose 2.0+
  • Git (optional, for cloning code)
  • Make (optional, for using Makefile)

Kubernetes Deployment

  • Kubernetes 1.20+
  • Helm 3.0+
  • kubectl (matching cluster version)
  • Git (optional, for cloning code)
  • Make (optional, for using Makefile)

Deployment Method Comparison

FeatureDocker ComposeKubernetes
Deployment Difficulty⭐ Simple⭐⭐⭐ Complex
Resource Utilization⭐⭐ Fair⭐⭐⭐⭐ High
High Availability❌ Not supported✅ Supported
Scalability⭐⭐ Fair⭐⭐⭐⭐ Strong
Use CaseDev/test, small scaleProduction, large scale

Docker Compose Deployment

Basic Deployment

1. Prerequisites

# Clone code repository
git clone https://github.com/ModelEngine-Group/DataMate.git
cd DataMate

# Check Docker and Docker Compose versions
docker --version
docker compose version

2. Deploy Using Makefile

# One-click deployment (including Milvus)
make install

Select 1. Docker/Docker-Compose when prompted.

3. Use Docker Compose Directly

If Make is not installed:

# Set image registry (optional)
export REGISTRY=ghcr.io/modelengine-group/

# Start basic services
docker compose -f deployment/docker/datamate/docker-compose.yml --profile milvus up -d

4. Verify Deployment

# Check container status
docker ps

# View service logs
docker compose -f deployment/docker/datamate/docker-compose.yml logs -f

# Access frontend
open http://localhost:30000

Optional Components

Milvus Vector Database

# Using Makefile
make install-milvus

# Or Docker Compose
docker compose -f deployment/docker/datamate/docker-compose.yml --profile milvus up -d

Components:

  • milvus-standalone (19530, 9091)
  • milvus-minio (9000, 9001)
  • milvus-etcd

Label Studio Annotation Tool

# Using Makefile
make install-label-studio

# Or Docker Compose
docker compose -f deployment/docker/datamate/docker-compose.yml --profile label-studio up -d

Access: http://localhost:30001

Default credentials:

MinerU PDF Processing

# Build MinerU image
make build-mineru

# Deploy MinerU
make install-mineru

DeerFlow Workflow Service

# Using Makefile
make install-deer-flow

# Or Docker Compose
docker compose -f deployment/docker/datamate/docker-compose.yml --profile deer-flow up -d

Environment Variables

VariableDefaultDescription
DB_PASSWORDpasswordDatabase password
DATAMATE_JWT_ENABLEfalseEnable JWT authentication
REGISTRYghcr.io/modelengine-group/Image registry
VERSIONlatestImage version
LABEL_STUDIO_HOST-Label Studio access URL

Data Volume Management

DataMate uses Docker volumes for persistence:

# View all volumes
docker volume ls | grep datamate

# View volume details
docker volume inspect datamate-dataset-volume

# Backup volume data
docker run --rm -v datamate-dataset-volume:/data -v $(pwd):/backup \
  ubuntu tar czf /backup/dataset-backup.tar.gz /data

Kubernetes/Helm Deployment

Prerequisites

# Check cluster connection
kubectl cluster-info
kubectl get nodes

# Check Helm version
helm version

# Create namespace (optional)
kubectl create namespace datamate

Using Makefile

# Deploy DataMate
make install INSTALLER=k8s

# Or deploy to specific namespace
make install NAMESPACE=datamate INSTALLER=k8s

Using Helm

1. Deploy Basic Services

# Deploy DataMate
helm upgrade datamate deployment/helm/datamate/ \
  --install \
  --namespace datamate \
  --create-namespace \
  --set global.image.repository=ghcr.io/modelengine-group/

# Check deployment status
kubectl get pods -n datamate

2. Configure Ingress (Optional)

# Edit values.yaml
cat >> deployment/helm/datamate/values.yaml << EOF
ingress:
  enabled: true
  className: nginx
  hosts:
    - host: datamate.example.com
      paths:
        - path: /
          pathType: Prefix
EOF

# Redeploy
helm upgrade datamate deployment/helm/datamate/ \
  --namespace datamate \
  -f deployment/helm/datamate/values.yaml

3. Deploy Optional Components

# Deploy Milvus
helm upgrade milvus deployment/helm/milvus \
  --install \
  --namespace datamate

# Deploy Label Studio
helm upgrade label-studio deployment/helm/label-studio/ \
  --install \
  --namespace datamate

Offline Deployment

Prepare Offline Images

1. Download Images

# Download all images locally
make download SAVE=true

# Download specific version
make download VERSION=v1.0.0 SAVE=true

Images saved in dist/ directory.

2. Package and Transfer

# Package
tar czf datamate-images.tar.gz dist/

# Transfer to target server
scp datamate-images.tar.gz user@target-server:/tmp/

Offline Installation

1. Load Images

# Extract on target server
tar xzf datamate-images.tar.gz

# Load all images
make load-images

2. Modify Configuration

Use empty REGISTRY for local images:

REGISTRY= docker compose -f deployment/docker/datamate/docker-compose.yml up -d

Upgrade Guide

Docker Compose Upgrade

# 1. Backup data
docker run --rm -v datamate-postgresql-volume:/data -v $(pwd):/backup \
  ubuntu tar czf /backup/postgres-backup.tar.gz /data

# 2. Pull new images
docker pull ghcr.io/modelengine-group/datamate-backend:latest

# 3. Stop services
docker compose -f deployment/docker/datamate/docker-compose.yml down

# 4. Start new version
docker compose -f deployment/docker/datamate/docker-compose.yml up -d

# 5. Verify upgrade
docker ps
docker logs -f datamate-backend

Or use Makefile:

make datamate-docker-upgrade

Kubernetes Upgrade

# 1. Backup data
kubectl exec -n datamate deployment/datamate-database -- \
  pg_dump -U postgres datamate > backup.sql

# 2. Update Helm Chart
helm upgrade datamate deployment/helm/datamate/ \
  --namespace datamate \
  --set global.image.tag=new-version

Uninstall

Docker Compose Complete Uninstall

# Using Makefile
make uninstall

# Choose to delete volumes for complete cleanup

Or manual uninstall:

# Stop and remove containers
docker compose -f deployment/docker/datamate/docker-compose.yml --profile milvus --profile label-studio down -v

# Remove all volumes
docker volume rm datamate-dataset-volume \
  datamate-postgresql-volume \
  datamate-log-volume

# Remove network
docker network rm datamate-network

Kubernetes Complete Uninstall

# Uninstall all components
make uninstall INSTALLER=k8s

# Or use Helm
helm uninstall datamate -n datamate
helm uninstall milvus -n datamate
helm uninstall label-studio -n datamate

# Delete namespace
kubectl delete namespace datamate

Troubleshooting

Common Issues

1. Service Won’t Start

# Check port conflicts
netstat -tlnp | grep -E '30000|8080|5432'

# Check disk space
df -h

# Check memory
free -h

# View detailed logs
docker logs datamate-backend --tail 100

2. Database Connection Failed

# Check database container
docker ps | grep database

# Test connection
docker exec -it datamate-database psql -U postgres -d datamate

2 - System Architecture

DataMate system architecture design documentation

This document details DataMate’s system architecture, tech stack, and design philosophy.

Overall Architecture

DataMate adopts a microservices architecture, splitting the system into multiple independent services, each responsible for specific business functions. This architecture provides good scalability, maintainability, and fault tolerance.

┌─────────────────────────────────────────────────────────────────┐
│                           Frontend Layer                        │
│                    (React + TypeScript)                         │
│                      Ant Design + Tailwind                      │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                        API Gateway Layer                        │
│                    (Spring Cloud Gateway)                       │
│                      Port: 8080                                 │
└────────────────────────┬────────────────────────────────────────┘
                         │
         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│  Java Backend│ │ Python Backend│ │  Runtime     │
│   Services   │ │    Service    │ │   Service    │
├──────────────┤ ├──────────────┤ ├──────────────┤
│· Main App    │ │· RAG Service  │ │· Operator    │
│· Data Mgmt   │ │· LangChain    │ │  Execution   │
│· Collection  │ │· FastAPI      │ │              │
│· Cleaning    │ │              │ │              │
│· Annotation  │ │              │ │              │
│· Synthesis   │ │              │ │              │
│· Evaluation  │ │              │ │              │
│· Operator    │ │              │ │              │
│· Pipeline    │ │              │ │              │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
       │                │                │
       └────────────────┼────────────────┘
                        ▼
         ┌──────────────┴──────────────┐
         │                              │
    ┌────▼────┐    ┌─────────┐   ┌─────▼────┐
    │PostgreSQL│    │  Redis  │   │  Milvus  │
    │  (5432)  │    │ (6379)  │   │ (19530)  │
    └──────────┘    └─────────┘   └──────────┘
                                              │
                                        ┌─────▼─────┐
                                        │   MinIO   │
                                        │  (9000)   │
                                        └───────────┘

Tech Stack

Frontend Tech Stack

TechnologyVersionPurpose
React18.xUI framework
TypeScript5.xType safety
Ant Design5.xUI component library
Tailwind CSS3.xStyling framework
Redux Toolkit2.xState management
React Router6.xRouting management
Vite5.xBuild tool

Backend Tech Stack (Java)

TechnologyVersionPurpose
Java21Runtime environment
Spring Boot3.5.6Application framework
Spring Cloud2023.xMicroservices framework
MyBatis Plus3.xORM framework
PostgreSQL Driver42.xDatabase driver
Redis5.xCache client
MinIO8.xObject storage client

Backend Tech Stack (Python)

TechnologyVersionPurpose
Python3.11+Runtime environment
FastAPI0.100+Web framework
LangChain0.1+LLM application framework
Ray2.xDistributed computing
Pydantic2.xData validation

Data Storage

TechnologyVersionPurpose
PostgreSQL15+Main database
Redis8.xCache and message queue
Milvus2.6.5Vector database
MinIORELEASE.2024+Object storage

Microservices Architecture

Service List

Service NamePortTech StackDescription
API Gateway8080Spring Cloud GatewayUnified entry, routing, auth
Frontend30000ReactFrontend UI
Main Application-Spring BootCore business logic
Data Management Service8092Spring BootDataset management
Data Collection Service-Spring BootData collection tasks
Data Cleaning Service-Spring BootData cleaning tasks
Data Annotation Service-Spring BootData annotation tasks
Data Synthesis Service-Spring BootData synthesis tasks
Data Evaluation Service-Spring BootData evaluation tasks
Operator Market Service-Spring BootOperator marketplace
RAG Indexer Service-Spring BootKnowledge base indexing
Runtime Service8081Python + RayOperator execution engine
Backend Python Service18000FastAPIPython backend service
Database5432PostgreSQLDatabase

Service Communication

Synchronous Communication

  • API Gateway → Backend Services: HTTP/REST
  • Frontend → API Gateway: HTTP/REST
  • Backend Services ↔: HTTP/REST (Feign Client)

Asynchronous Communication

  • Task Execution: Database task queue
  • Event Notification: Redis Pub/Sub

Data Architecture

Data Flow

┌─────────────┐
│  Data       │ Collection task config
│  Collection │ → DataX → Raw data
└──────┬──────┘
       │
       ▼
┌─────────────┐
│  Data       │ Dataset management, file upload
│  Management │ → Structured storage
└──────┬──────┘
       │
       ├──────────────┐
       ▼              ▼
┌─────────────┐  ┌─────────────┐
│  Data       │  │ Knowledge   │
│  Cleaning   │  │ Base        │
│             │  │             │
└──────┬──────┘  └──────┬──────┘
       │                │
       ▼                ▼
┌─────────────┐  ┌─────────────┐
│  Data       │  │ Vector      │
│  Annotation │  │ Index       │
└──────┬──────┘  └──────┬──────┘
       │                │
       ▼                │
┌─────────────┐          │
│  Data       │          │
│  Synthesis  │          │
└──────┬──────┘          │
       │                │
       ▼                ▼
┌─────────────┐  ┌─────────────┐
│  Data       │  │  RAG        │
│  Evaluation │  │ Retrieval   │
└─────────────┘  └─────────────┘

Deployment Architecture

Docker Compose Deployment

┌────────────────────────────────────────────────┐
│              Docker Network                    │
│            datamate-network                    │
│                                                │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │Frontend  │  │ Gateway  │  │ Backend  │   │
│  │ :30000   │  │  :8080   │  │          │   │
│  └──────────┘  └──────────┘  └──────────┘   │
│                                                │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │Backend   │  │ Runtime  │  │Database  │   │
│  │  Python  │  │  :8081   │  │  :5432   │   │
│  └──────────┘  └──────────┘  └──────────┘   │
│                                                │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │  Milvus  │  │  MinIO   │  │  etcd    │   │
│  │  :19530  │  │  :9000   │  │          │   │
│  └──────────┘  └──────────┘  └──────────┘   │
└────────────────────────────────────────────────┘

Kubernetes Deployment

┌────────────────────────────────────────────────┐
│           Kubernetes Cluster                   │
│                                                │
│  Namespace: datamate                           │
│                                                │
│  ┌────────────┐  ┌────────────┐              │
│  │ Deployment │  │ Deployment │              │
│  │  Frontend  │  │  Gateway   │              │
│  │   (3 Pods) │  │  (2 Pods)  │              │
│  └─────┬──────┘  └─────┬──────┘              │
│        │                │                     │
│  ┌─────▼────────────────▼──────┐              │
│  │       Service (LoadBalancer) │              │
│  └──────────────────────────────┘              │
│                                                │
│  ┌────────────┐  ┌────────────┐              │
│  │ StatefulSet│  │ Deployment │              │
│  │  Database  │  │  Backend   │              │
│  └────────────┘  └────────────┘              │
└────────────────────────────────────────────────┘

Security Architecture

Authentication & Authorization

JWT Authentication (Optional)

datamate:
  jwt:
    enable: true  # Enable JWT authentication
    secret: your-secret-key
    expiration: 86400  # 24 hours

API Key Authentication

datamate:
  api-key:
    enable: false

Data Security

Transport Encryption

  • API Gateway supports HTTPS/TLS
  • Internal service communication can be encrypted

Storage Encryption

  • Database: Transparent data encryption (TDE)
  • MinIO: Server-side encryption
  • Milvus: Encryption at rest

Next Steps

3 - Development Environment Setup

Local development environment configuration guide for DataMate

This document describes how to set up a local development environment for DataMate.

Prerequisites

Required Software

SoftwareVersionPurpose
Node.js18.x+Frontend development
pnpm8.x+Frontend package management
Java21Backend development
Maven3.9+Backend build
Python3.11+Python service development
Docker20.10+Containerized deployment
Docker Compose2.0+Service orchestration
Git2.x+Version control
Make4.x+Build automation
  • IDE: IntelliJ IDEA (backend) + VS Code (frontend/Python)
  • Database Client: DBeaver, pgAdmin
  • API Testing: Postman, curl
  • Git Client: GitKraken, SourceTree

Code Structure

DataMate/
├── backend/                 # Java backend
│   ├── services/           # Microservice modules
│   │   ├── main-application/
│   │   ├── data-management-service/
│   │   ├── data-cleaning-service/
│   │   └── ...
│   ├── openapi/            # OpenAPI specs
│   └── scripts/            # Build scripts
├── frontend/               # React frontend
│   ├── src/
│   │   ├── components/    # Common components
│   │   ├── pages/         # Page components
│   │   ├── services/      # API services
│   │   ├── store/         # Redux store
│   │   └── routes/        # Routes config
│   └── package.json
├── runtime/                # Python runtime
│   └── datamate/          # DataMate runtime
└── deployment/             # Deployment configs
    ├── docker/            # Docker configs
    └── helm/              # Helm charts

Backend Development

1. Install Java 21

# macOS (Homebrew)
brew install openjdk@21

# Linux (Ubuntu/Debian)
sudo apt update
sudo apt install openjdk-21-jdk

# Verify
java -version

2. Install Maven

# macOS
brew install maven

# Linux
sudo apt install maven

# Verify
mvn -version

3. Configure IDE (IntelliJ IDEA)

Install Plugins

  • Lombok Plugin
  • MyBatis Plugin
  • Rainbow Brackets
  • GitToolBox

Import Project

  1. Open IntelliJ IDEA
  2. File → Open
  3. Select backend directory
  4. Wait for Maven dependency download

4. Configure Database

Start Local Database (Docker)

# Start database only
docker compose -f deployment/docker/datamate/docker-compose.yml up -d datamate-database

Connection info:

  • Host: localhost
  • Port: 5432
  • Database: datamate
  • Username: postgres
  • Password: password

5. Run Backend Service

Using Maven

cd backend/services/main-application
mvn spring-boot:run

Using IDE

  1. Find Application class
  2. Right-click → Run
  3. Access http://localhost:8080

Frontend Development

1. Install Node.js

# macOS
brew install node@18

# Linux
curl -fsSL https://deb.nodesource.com/setup_18.x | sudo -E bash -
sudo apt-get install -y nodejs

2. Install pnpm

npm install -g pnpm

3. Install Dependencies

cd frontend
pnpm install

4. Configure Dev Environment

Create .env.development:

VITE_API_BASE_URL=http://localhost:8080
VITE_API_TIMEOUT=30000

5. Start Dev Server

pnpm dev

Access http://localhost:3000

Python Service Development

1. Install Python 3.11

# macOS
brew install python@3.11

# Linux
sudo apt install python3.11 python3.11-venv

2. Create Virtual Environment

cd runtime/datamate
python3.11 -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Run Python Service

python operator_runtime.py --port 8081

Local Debugging

Start All Services

Using Docker Compose

# Start base services (database, Redis, etc.)
docker compose -f deployment/docker/datamate/docker-compose.yml up -d \
  datamate-database \
  datamate-redis

# Start Milvus (optional)
docker compose -f deployment/docker/datamate/docker-compose.yml --profile milvus up -d

Start Backend Services

# Terminal 1: Main Application
cd backend/services/main-application
mvn spring-boot:run

# Terminal 2: Data Management Service
cd backend/services/data-management-service
mvn spring-boot:run

Start Frontend

cd frontend
pnpm dev

Start Python Services

# Runtime Service
cd runtime/datamate
python operator_runtime.py --port 8081

# Backend Python Service
cd backend-python
uvicorn main:app --reload --port 18000

Code Standards

Java Code Standards

Naming Conventions

  • Class name: PascalCase UserService
  • Method name: camelCase getUserById
  • Constants: UPPER_CASE MAX_SIZE
  • Variables: camelCase userName

TypeScript Code Standards

Naming Conventions

  • Components: PascalCase UserProfile
  • Types/Interfaces: PascalCase UserData
  • Functions: camelCase getUserData
  • Constants: UPPER_CASE API_BASE_URL

Python Code Standards

Follow PEP 8:

def get_user(user_id: int) -> dict:
    """Get user information

    Args:
        user_id: User ID

    Returns:
        User information dictionary
    """
    # ...

Common Issues

Backend Won’t Start

  1. Check Java version: java -version
  2. Check port conflicts: lsof -i :8080
  3. View logs
  4. Clean and rebuild: mvn clean install

Frontend Won’t Start

  1. Check Node version: node -v
  2. Delete node_modules: rm -rf node_modules && pnpm install
  3. Check port: lsof -i :3000

Next Steps