This is the multi-page printable view of this section. Click here to print.

Return to the regular view of this page.

API Reference

DataMate API documentation

DataMate provides complete REST APIs supporting programmatic access to all core features.

API Overview

DataMate API is based on REST architecture design, providing the following services:

  • Data Management API: Dataset and file management
  • Data Cleaning API: Data cleaning task management
  • Data Collection API: Data collection task management
  • Data Annotation API: Data annotation task management
  • Data Synthesis API: Data synthesis task management
  • Data Evaluation API: Data evaluation task management
  • Operator Market API: Operator management
  • RAG Indexer API: Knowledge base and vector retrieval
  • Pipeline Orchestration API: Pipeline orchestration management

Authentication

DataMate supports two authentication methods:

GET /api/v1/data-management/datasets
Authorization: Bearer <your-jwt-token>

Get JWT Token:

POST /api/v1/auth/login
Content-Type: application/json

{
  "username": "admin",
  "password": "password"
}

Response:

{
  "token": "eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9...",
  "expiresIn": 86400
}

API Key Authentication

GET /api/v1/data-management/datasets
X-API-Key: <your-api-key>

Common Response Format

Success Response

{
  "code": 200,
  "message": "success",
  "data": {
    // Response data
  }
}

Error Response

{
  "code": 400,
  "message": "Bad Request",
  "error": "Invalid parameter: datasetId",
  "timestamp": "2024-01-15T10:30:00Z",
  "path": "/api/v1/data-management/datasets"
}

Paged Response

{
  "content": [],
  "page": 0,
  "size": 20,
  "totalElements": 100,
  "totalPages": 5,
  "first": true,
  "last": false
}

API Endpoints

Data Management

EndpointMethodDescription
/data-management/datasetsGETGet dataset list
/data-management/datasetsPOSTCreate dataset
/data-management/datasets/{id}GETGet dataset details
/data-management/datasets/{id}PUTUpdate dataset
/data-management/datasets/{id}DELETEDelete dataset
/data-management/datasets/{id}/filesGETGet file list
/data-management/datasets/{id}/files/uploadPOSTUpload files

Data Cleaning

EndpointMethodDescription
/data-cleaning/tasksGETGet cleaning task list
/data-cleaning/tasksPOSTCreate cleaning task
/data-cleaning/tasks/{id}GETGet task details
/data-cleaning/tasks/{id}PUTUpdate task
/data-cleaning/tasks/{id}DELETEDelete task
/data-cleaning/tasks/{id}/executePOSTExecute task

Data Collection

EndpointMethodDescription
/data-collection/tasksGETGet collection task list
/data-collection/tasksPOSTCreate collection task
/data-collection/tasks/{id}GETGet task details
/data-collection/tasks/{id}/executePOSTExecute collection task

Data Synthesis

EndpointMethodDescription
/data-synthesis/tasksGETGet synthesis task list
/data-synthesis/tasksPOSTCreate synthesis task
/data-synthesis/templatesGETGet instruction template list
/data-synthesis/templatesPOSTCreate instruction template

Operator Market

EndpointMethodDescription
/operator-market/operatorsGETGet operator list
/operator-market/operatorsPOSTPublish operator
/operator-market/operators/{id}GETGet operator details
/operator-market/operators/{id}/installPOSTInstall operator

RAG Indexer

EndpointMethodDescription
/rag/knowledge-basesGETGet knowledge base list
/rag/knowledge-basesPOSTCreate knowledge base
/rag/knowledge-bases/{id}/documentsPOSTUpload documents
/rag/knowledge-bases/{id}/searchPOSTVector search

Error Codes

CodeDescription
200Success
201Created
400Bad Request
401Unauthorized
403Forbidden
404Not Found
409Conflict
500Internal Server Error

Rate Limiting

API call rate limits:

  • Default limit: 1000 requests/hour
  • Burst limit: 100 requests/minute

Exceeding the limit returns 429 Too Many Requests.

Response headers contain rate limiting information:

X-RateLimit-Limit: 1000
X-RateLimit-Remaining: 999
X-RateLimit-Reset: 1642252800

Version Management

API versions are specified through URL paths:

  • Current version: /api/v1/
  • Future versions: /api/v2/

1 - Data Management API

Dataset and file management API

Data management API provides capabilities for dataset and file creation, query, update, and deletion.

Basic Information

  • Base URL: http://localhost:8092/api/v1/data-management
  • Authentication: JWT / API Key
  • Content-Type: application/json

Dataset Management

Get Dataset List

GET /data-management/datasets?page=0&size=20&type=text

Query Parameters:

ParameterTypeRequiredDescription
pageintegerNoPage number, starts from 0
sizeintegerNoPage size, default 20
typestringNoDataset type filter
tagsstringNoTag filter, comma-separated
keywordstringNoKeyword search
statusstringNoStatus filter

Response Example:

{
  "content": [
    {
      "id": "dataset-001",
      "name": "text_dataset",
      "description": "Text dataset",
      "type": {
        "code": "TEXT",
        "name": "Text"
      },
      "status": "ACTIVE",
      "fileCount": 1000,
      "totalSize": 1073741824,
      "createdAt": "2024-01-15T10:00:00Z"
    }
  ],
  "page": 0,
  "size": 20,
  "totalElements": 1
}

Create Dataset

POST /data-management/datasets
Content-Type: application/json

{
  "name": "my_dataset",
  "description": "My dataset",
  "type": "TEXT",
  "tags": ["training", "nlp"]
}

Get Dataset Details

GET /data-management/datasets/{datasetId}

Update Dataset

PUT /data-management/datasets/{datasetId}
Content-Type: application/json

{
  "name": "updated_dataset",
  "description": "Updated description"
}

Delete Dataset

DELETE /data-management/datasets/{datasetId}

File Management

Get File List

GET /data-management/datasets/{datasetId}/files?page=0&size=20

Upload File

POST /data-management/datasets/{datasetId}/files/upload/chunk
Content-Type: multipart/form-data

Download File

GET /data-management/datasets/{datasetId}/files/{fileId}/download

Delete File

DELETE /data-management/datasets/{datasetId}/files/{fileId}

Error Response

{
  "code": 400,
  "message": "Bad Request",
  "error": "Invalid parameter: datasetId",
  "timestamp": "2024-01-15T10:30:00Z",
  "path": "/api/v1/data-management/datasets"
}

SDK Usage

Python

from datamate import DataMateClient

client = DataMateClient(
    base_url="http://localhost:8080",
    api_key="your-api-key"
)

# Get datasets
datasets = client.data_management.get_datasets()

# Create dataset
dataset = client.data_management.create_dataset(
    name="my_dataset",
    type="TEXT"
)

cURL

# Get datasets
curl -X GET "http://localhost:8092/api/v1/data-management/datasets" \
  -H "Authorization: Bearer your-jwt-token"

# Create dataset
curl -X POST "http://localhost:8092/api/v1/data-management/datasets" \
  -H "Authorization: Bearer your-jwt-token" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "my_dataset",
    "type": "TEXT"
  }'