Developer Guide
DataMate architecture and development guide
Developer guide introduces DataMate’s technical architecture, development environment, and contribution process.
DataMate is an enterprise-level data processing platform using microservices architecture, supporting large-scale data processing and custom extensions.
Architecture Documentation
Development Guide
Tech Stack
Frontend
| Technology | Version | Description |
|---|
| React | 18.x | UI framework |
| TypeScript | 5.x | Type safety |
| Ant Design | 5.x | UI component library |
| Redux Toolkit | 2.x | State management |
| Vite | 5.x | Build tool |
Backend (Java)
| Technology | Version | Description |
|---|
| Java | 21 | Runtime environment |
| Spring Boot | 3.5.6 | Application framework |
| Spring Cloud | 2023.x | Microservices framework |
| MyBatis Plus | 3.x | ORM framework |
Backend (Python)
| Technology | Version | Description |
|---|
| Python | 3.11+ | Runtime environment |
| FastAPI | 0.100+ | Web framework |
| LangChain | 0.1+ | LLM framework |
| Ray | 2.x | Distributed computing |
Project Structure
DataMate/
├── backend/ # Java backend
│ ├── services/ # Microservice modules
│ ├── openapi/ # OpenAPI specs
│ └── scripts/ # Build scripts
├── frontend/ # React frontend
│ ├── src/
│ │ ├── components/ # Common components
│ │ ├── pages/ # Page components
│ │ ├── services/ # API services
│ │ └── store/ # Redux store
│ └── package.json
├── runtime/ # Python runtime
│ └── datamate/ # DataMate runtime
└── deployment/ # Deployment config
├── docker/ # Docker config
└── helm/ # Helm Charts
Quick Start
1. Clone Code
git clone https://github.com/ModelEngine-Group/DataMate.git
cd DataMate
2. Start Services
# Start basic services
make install
# Access frontend
open http://localhost:30000
3. Development Mode
# Backend development
cd backend/services/main-application
mvn spring-boot:run
# Frontend development
cd frontend
pnpm dev
# Python service development
cd runtime/datamate
python operator_runtime.py --port 8081
Core Concepts
Microservices Architecture
DataMate uses microservices architecture, each service handles specific business functions:
- API Gateway: Unified entry, routing, authentication
- Main Application: Core business logic
- Data Management Service: Dataset management
- Data Cleaning Service: Data cleaning
- Data Synthesis Service: Data synthesis
- Runtime Service: Operator execution
Operator System
Operators are basic units of data processing:
- Built-in operators: Common operators provided by platform
- Custom operators: User-developed custom operators
- Operator execution: Executed by Runtime Service
Pipeline Orchestration
Pipelines are implemented through visual orchestration:
- Nodes: Basic units of data processing
- Connections: Data flow between nodes
- Execution: Automatic execution according to workflow
Extension Development
Develop Custom Operators
Operator development guide:
- Operator Market - Operator usage guide
- Python operator development examples
- Operator testing and debugging
Integrate External Systems
- API integration: Integration via REST API
- Webhook: Event notifications
- Plugin system: (Coming soon)
Testing
Unit Tests
# Backend tests
cd backend
mvn test
# Frontend tests
cd frontend
pnpm test
# Python tests
cd runtime
pytest
Integration Tests
# Start test environment
make test-env-up
# Run integration tests
make integration-test
# Clean test environment
make test-env-down
Backend Optimization
- Database connection pool configuration
- Query optimization
- Caching strategies
- Asynchronous processing
Frontend Optimization
- Code splitting
- Lazy loading
- Caching strategies
Security
Authentication and Authorization
- JWT authentication
- RBAC permission control
- API Key authentication
Data Security
- Transport encryption (HTTPS/TLS)
- Storage encryption
- Sensitive data masking
1 - Backend Architecture
DataMate Java backend architecture design
DataMate backend adopts microservices architecture built on Spring Boot 3.x and Spring Cloud.
Architecture Overview
DataMate backend uses microservices architecture, splitting into multiple independent services:
┌─────────────────────────────────────────────┐
│ API Gateway │
│ (Spring Cloud Gateway) │
│ Port: 8080 │
└──────────────┬──────────────────────────────┘
│
┌───────┴───────┬───────────────┐
▼ ▼ ▼
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Main │ │ Data │ │ Data │
│ Application │ │ Management │ │ Collection │
└──────────────┘ └──────────────┘ └──────────────┘
│ │ │
└───────────────┴───────────────┘
│
▼
┌────────────────┐
│ PostgreSQL │
│ Port: 5432 │
└────────────────┘
Tech Stack
Core Frameworks
| Technology | Version | Purpose |
|---|
| Java | 21 | Programming language |
| Spring Boot | 3.5.6 | Application framework |
| Spring Cloud | 2023.x | Microservices framework |
| MyBatis Plus | 3.5.x | ORM framework |
Support Components
| Technology | Version | Purpose |
|---|
| Redis | 5.x | Cache and message queue |
| MinIO | 8.x | Object storage |
| Milvus SDK | 2.3.x | Vector database |
Microservices List
API Gateway
Port: 8080
Functions:
- Unified entry point
- Route forwarding
- Authentication and authorization
- Rate limiting and circuit breaking
Tech: Spring Cloud Gateway, JWT authentication
Main Application
Functions:
- User management
- Permission management
- System configuration
- Task scheduling
Data Management Service
Port: 8092
Functions:
- Dataset management
- File management
- Tag management
- Statistics
API Endpoints:
/data-management/datasets - Dataset management/data-management/datasets/{id}/files - File management
Runtime Service
Port: 8081
Functions:
- Operator execution
- Ray integration
- Task scheduling
Tech: Python + Ray, FastAPI
Database Design
Main Tables
users (User Table)
| Field | Type | Description |
|---|
| id | BIGINT | Primary key |
| username | VARCHAR(50) | Username |
| password | VARCHAR(255) | Password (encrypted) |
| email | VARCHAR(100) | Email |
| role | VARCHAR(20) | Role |
| created_at | TIMESTAMP | Creation time |
datasets (Dataset Table)
| Field | Type | Description |
|---|
| id | VARCHAR(50) | Primary key |
| name | VARCHAR(100) | Name |
| description | TEXT | Description |
| type | VARCHAR(20) | Type |
| status | VARCHAR(20) | Status |
| created_by | VARCHAR(50) | Creator |
Service Communication
Synchronous Communication
Services communicate via HTTP/REST:
// Using Feign Client
@FeignClient(name = "data-management-service")
public interface DataManagementClient {
@GetMapping("/data-management/datasets/{id}")
DatasetResponse getDataset(@PathVariable String id);
}
Asynchronous Communication
Using Redis for async messaging:
// Send message
redisTemplate.convertAndSend("task.created", taskMessage);
// Receive message
@RedisListener(topic = "task.created")
public void handleTaskCreated(TaskMessage message) {
// Handle task creation event
}
Authentication & Authorization
JWT Authentication
@Configuration
public class JwtConfig {
@Value("${datamate.jwt.secret}")
private String secret;
@Value("${datamate.jwt.expiration}")
private Long expiration;
}
RBAC
@PreAuthorize("hasRole('ADMIN')")
public void adminOperation() {
// Admin operations
}
Database Connection Pool
spring:
datasource:
hikari:
maximum-pool-size: 20
minimum-idle: 5
connection-timeout: 30000
Caching Strategy
@Cacheable(value = "datasets", key = "#id")
public Dataset getDataset(String id) {
return datasetRepository.findById(id);
}
2 - Frontend Architecture
DataMate React frontend architecture design
DataMate frontend is built on React 18 and TypeScript with modern frontend architecture.
Architecture Overview
DataMate frontend adopts SPA architecture:
┌─────────────────────────────────────────────┐
│ Browser │
└──────────────────┬──────────────────────────┘
│
▼
┌─────────────────────────────────────────────┐
│ React App │
│ ┌──────────────────────────────────────┐ │
│ │ Components │ │
│ └──────────────────────────────────────┘ │
│ ┌──────────────────────────────────────┐ │
│ │ State Management │ │
│ │ (Redux Toolkit) │ │
│ └──────────────────────────────────────┘ │
│ ┌──────────────────────────────────────┐ │
│ │ Services (API) │ │
│ └──────────────────────────────────────┘ │
│ ┌──────────────────────────────────────┐ │
│ │ Routing │ │
│ └──────────────────────────────────────┘ │
└─────────────────────────────────────────────┘
Tech Stack
Core Frameworks
| Technology | Version | Purpose |
|---|
| React | 18.x | UI framework |
| TypeScript | 5.x | Type safety |
| Ant Design | 5.x | UI components |
| Tailwind CSS | 3.x | Styling |
State Management
| Technology | Version | Purpose |
|---|
| Redux Toolkit | 2.x | Global state |
| React Query | 5.x | Server state |
Project Structure
frontend/
├── src/
│ ├── components/ # Common components
│ ├── pages/ # Page components
│ ├── services/ # API services
│ ├── store/ # Redux store
│ ├── hooks/ # Custom hooks
│ ├── routes/ # Routes config
│ └── main.tsx # Entry point
Routing Design
const router = createBrowserRouter([
{ path: "/", Component: Home },
{ path: "/chat", Component: AgentPage },
{
path: "/data",
Component: MainLayout,
children: [
{
path: "management",
Component: DatasetManagement
}
]
}
]);
State Management
export const store = configureStore({
reducer: {
dataManagement: dataManagementSlice,
user: userSlice,
},
});
Slice Example
export const fetchDatasets = createAsyncThunk(
'dataManagement/fetchDatasets',
async (params: GetDatasetsParams) => {
const response = await getDatasets(params);
return response.data;
}
);
Component Design
Page Component
export const DataManagement: React.FC = () => {
const dispatch = useAppDispatch();
const { datasets, loading } = useAppSelector(
(state) => state.dataManagement
);
useEffect(() => {
dispatch(fetchDatasets({ page: 0, size: 20 }));
}, [dispatch]);
return (
<div className="p-6">
<h1>Data Management</h1>
<DataTable data={datasets} loading={loading} />
</div>
);
};
API Services
Axios Configuration
const request = axios.create({
baseURL: import.meta.env.VITE_API_BASE_URL,
timeout: 30000,
});
// Request interceptor
request.interceptors.request.use((config) => {
const token = localStorage.getItem('token');
if (token) {
config.headers.Authorization = `Bearer ${token}`;
}
return config;
});
Code Splitting
const DataManagement = lazy(() =>
import('@/pages/DataManagement/Home/DataManagement')
);
React.memo
export const DataCard = React.memo<DataCardProps>(({ data }) => {
return <div>{data.name}</div>;
});