This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

Developer Guide

DataMate architecture and development guide

Developer guide introduces DataMate’s technical architecture, development environment, and contribution process.

DataMate is an enterprise-level data processing platform using microservices architecture, supporting large-scale data processing and custom extensions.

Architecture Documentation

Development Guide

Tech Stack

Frontend

TechnologyVersionDescription
React18.xUI framework
TypeScript5.xType safety
Ant Design5.xUI component library
Redux Toolkit2.xState management
Vite5.xBuild tool

Backend (Java)

TechnologyVersionDescription
Java21Runtime environment
Spring Boot3.5.6Application framework
Spring Cloud2023.xMicroservices framework
MyBatis Plus3.xORM framework

Backend (Python)

TechnologyVersionDescription
Python3.11+Runtime environment
FastAPI0.100+Web framework
LangChain0.1+LLM framework
Ray2.xDistributed computing

Project Structure

DataMate/
├── backend/                 # Java backend
│   ├── services/           # Microservice modules
│   ├── openapi/            # OpenAPI specs
│   └── scripts/            # Build scripts
├── frontend/               # React frontend
│   ├── src/
│   │   ├── components/    # Common components
│   │   ├── pages/         # Page components
│   │   ├── services/      # API services
│   │   └── store/         # Redux store
│   └── package.json
├── runtime/                # Python runtime
│   └── datamate/          # DataMate runtime
└── deployment/             # Deployment config
    ├── docker/            # Docker config
    └── helm/              # Helm Charts

Quick Start

1. Clone Code

git clone https://github.com/ModelEngine-Group/DataMate.git
cd DataMate

2. Start Services

# Start basic services
make install

# Access frontend
open http://localhost:30000

3. Development Mode

# Backend development
cd backend/services/main-application
mvn spring-boot:run

# Frontend development
cd frontend
pnpm dev

# Python service development
cd runtime/datamate
python operator_runtime.py --port 8081

Core Concepts

Microservices Architecture

DataMate uses microservices architecture, each service handles specific business functions:

  • API Gateway: Unified entry, routing, authentication
  • Main Application: Core business logic
  • Data Management Service: Dataset management
  • Data Cleaning Service: Data cleaning
  • Data Synthesis Service: Data synthesis
  • Runtime Service: Operator execution

Operator System

Operators are basic units of data processing:

  • Built-in operators: Common operators provided by platform
  • Custom operators: User-developed custom operators
  • Operator execution: Executed by Runtime Service

Pipeline Orchestration

Pipelines are implemented through visual orchestration:

  • Nodes: Basic units of data processing
  • Connections: Data flow between nodes
  • Execution: Automatic execution according to workflow

Extension Development

Develop Custom Operators

Operator development guide:

  1. Operator Market - Operator usage guide
  2. Python operator development examples
  3. Operator testing and debugging

Integrate External Systems

  • API integration: Integration via REST API
  • Webhook: Event notifications
  • Plugin system: (Coming soon)

Testing

Unit Tests

# Backend tests
cd backend
mvn test

# Frontend tests
cd frontend
pnpm test

# Python tests
cd runtime
pytest

Integration Tests

# Start test environment
make test-env-up

# Run integration tests
make integration-test

# Clean test environment
make test-env-down

Performance Optimization

Backend Optimization

  • Database connection pool configuration
  • Query optimization
  • Caching strategies
  • Asynchronous processing

Frontend Optimization

  • Code splitting
  • Lazy loading
  • Caching strategies

Security

Authentication and Authorization

  • JWT authentication
  • RBAC permission control
  • API Key authentication

Data Security

  • Transport encryption (HTTPS/TLS)
  • Storage encryption
  • Sensitive data masking

1 - Backend Architecture

DataMate Java backend architecture design

DataMate backend adopts microservices architecture built on Spring Boot 3.x and Spring Cloud.

Architecture Overview

DataMate backend uses microservices architecture, splitting into multiple independent services:

┌─────────────────────────────────────────────┐
│              API Gateway                    │
│         (Spring Cloud Gateway)              │
│              Port: 8080                     │
└──────────────┬──────────────────────────────┘
               │
       ┌───────┴───────┬───────────────┐
       ▼               ▼               ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│   Main       │ │  Data        │ │  Data        │
│ Application  │ │  Management  │ │  Collection  │
└──────────────┘ └──────────────┘ └──────────────┘
       │               │               │
       └───────────────┴───────────────┘
                       │
                       ▼
              ┌────────────────┐
              │   PostgreSQL   │
              │   Port: 5432   │
              └────────────────┘

Tech Stack

Core Frameworks

TechnologyVersionPurpose
Java21Programming language
Spring Boot3.5.6Application framework
Spring Cloud2023.xMicroservices framework
MyBatis Plus3.5.xORM framework

Support Components

TechnologyVersionPurpose
Redis5.xCache and message queue
MinIO8.xObject storage
Milvus SDK2.3.xVector database

Microservices List

API Gateway

Port: 8080

Functions:

  • Unified entry point
  • Route forwarding
  • Authentication and authorization
  • Rate limiting and circuit breaking

Tech: Spring Cloud Gateway, JWT authentication

Main Application

Functions:

  • User management
  • Permission management
  • System configuration
  • Task scheduling

Data Management Service

Port: 8092

Functions:

  • Dataset management
  • File management
  • Tag management
  • Statistics

API Endpoints:

  • /data-management/datasets - Dataset management
  • /data-management/datasets/{id}/files - File management

Runtime Service

Port: 8081

Functions:

  • Operator execution
  • Ray integration
  • Task scheduling

Tech: Python + Ray, FastAPI

Database Design

Main Tables

users (User Table)

FieldTypeDescription
idBIGINTPrimary key
usernameVARCHAR(50)Username
passwordVARCHAR(255)Password (encrypted)
emailVARCHAR(100)Email
roleVARCHAR(20)Role
created_atTIMESTAMPCreation time

datasets (Dataset Table)

FieldTypeDescription
idVARCHAR(50)Primary key
nameVARCHAR(100)Name
descriptionTEXTDescription
typeVARCHAR(20)Type
statusVARCHAR(20)Status
created_byVARCHAR(50)Creator

Service Communication

Synchronous Communication

Services communicate via HTTP/REST:

// Using Feign Client
@FeignClient(name = "data-management-service")
public interface DataManagementClient {
    @GetMapping("/data-management/datasets/{id}")
    DatasetResponse getDataset(@PathVariable String id);
}

Asynchronous Communication

Using Redis for async messaging:

// Send message
redisTemplate.convertAndSend("task.created", taskMessage);

// Receive message
@RedisListener(topic = "task.created")
public void handleTaskCreated(TaskMessage message) {
    // Handle task creation event
}

Authentication & Authorization

JWT Authentication

@Configuration
public class JwtConfig {
    @Value("${datamate.jwt.secret}")
    private String secret;

    @Value("${datamate.jwt.expiration}")
    private Long expiration;
}

RBAC

@PreAuthorize("hasRole('ADMIN')")
public void adminOperation() {
    // Admin operations
}

Performance Optimization

Database Connection Pool

spring:
  datasource:
    hikari:
      maximum-pool-size: 20
      minimum-idle: 5
      connection-timeout: 30000

Caching Strategy

@Cacheable(value = "datasets", key = "#id")
public Dataset getDataset(String id) {
    return datasetRepository.findById(id);
}

2 - Frontend Architecture

DataMate React frontend architecture design

DataMate frontend is built on React 18 and TypeScript with modern frontend architecture.

Architecture Overview

DataMate frontend adopts SPA architecture:

┌─────────────────────────────────────────────┐
│              Browser                        │
└──────────────────┬──────────────────────────┘
                   │
                   ▼
┌─────────────────────────────────────────────┐
│              React App                      │
│  ┌──────────────────────────────────────┐  │
│  │         Components                   │  │
│  └──────────────────────────────────────┘  │
│  ┌──────────────────────────────────────┐  │
│  │         State Management             │  │
│  │         (Redux Toolkit)              │  │
│  └──────────────────────────────────────┘  │
│  ┌──────────────────────────────────────┐  │
│  │         Services (API)               │  │
│  └──────────────────────────────────────┘  │
│  ┌──────────────────────────────────────┐  │
│  │         Routing                      │  │
│  └──────────────────────────────────────┘  │
└─────────────────────────────────────────────┘

Tech Stack

Core Frameworks

TechnologyVersionPurpose
React18.xUI framework
TypeScript5.xType safety
Ant Design5.xUI components
Tailwind CSS3.xStyling

State Management

TechnologyVersionPurpose
Redux Toolkit2.xGlobal state
React Query5.xServer state

Project Structure

frontend/
├── src/
│   ├── components/     # Common components
│   ├── pages/          # Page components
│   ├── services/       # API services
│   ├── store/          # Redux store
│   ├── hooks/          # Custom hooks
│   ├── routes/         # Routes config
│   └── main.tsx        # Entry point

Routing Design

const router = createBrowserRouter([
  { path: "/", Component: Home },
  { path: "/chat", Component: AgentPage },
  {
    path: "/data",
    Component: MainLayout,
    children: [
      {
        path: "management",
        Component: DatasetManagement
      }
    ]
  }
]);

State Management

Redux Toolkit Configuration

export const store = configureStore({
  reducer: {
    dataManagement: dataManagementSlice,
    user: userSlice,
  },
});

Slice Example

export const fetchDatasets = createAsyncThunk(
  'dataManagement/fetchDatasets',
  async (params: GetDatasetsParams) => {
    const response = await getDatasets(params);
    return response.data;
  }
);

Component Design

Page Component

export const DataManagement: React.FC = () => {
  const dispatch = useAppDispatch();
  const { datasets, loading } = useAppSelector(
    (state) => state.dataManagement
  );

  useEffect(() => {
    dispatch(fetchDatasets({ page: 0, size: 20 }));
  }, [dispatch]);

  return (
    <div className="p-6">
      <h1>Data Management</h1>
      <DataTable data={datasets} loading={loading} />
    </div>
  );
};

API Services

Axios Configuration

const request = axios.create({
  baseURL: import.meta.env.VITE_API_BASE_URL,
  timeout: 30000,
});

// Request interceptor
request.interceptors.request.use((config) => {
  const token = localStorage.getItem('token');
  if (token) {
    config.headers.Authorization = `Bearer ${token}`;
  }
  return config;
});

Performance Optimization

Code Splitting

const DataManagement = lazy(() =>
  import('@/pages/DataManagement/Home/DataManagement')
);

React.memo

export const DataCard = React.memo<DataCardProps>(({ data }) => {
  return <div>{data.name}</div>;
});