A high-performance, scalable deduplication engine built with Go, featuring variable-block chunking, intelligent caching, and microservices architecture.
π Overview
The Deduplication Engine is a modern, cloud-native solution for efficient data storage and backup. Built with Go and containerized with Docker, it provides enterprise-grade deduplication capabilities with microservices architecture.
99.92%
Deduplication Ratio
640
Chunks/Second
10MB/s
Processing Speed
4
Microservices
β¨ Features
π§ Variable-Block Chunking
Uses content-defined chunking with Blake3 hashing for optimal deduplication across different file types and sizes.
β‘ Intelligent Caching
LRU cache with Cuckoo filter for fast duplicate detection, reducing storage overhead and improving performance.
ποΈ Microservices Architecture
Distributed services for ingest, storage, and stream handling with gRPC communication for high performance.
π³ Containerized
Full Docker support with docker-compose for easy deployment and scaling in any environment.
ποΈ Database Integration
CockroachDB for metadata storage with ACID compliance and distributed capabilities.
βοΈ Object Storage
MinIO integration for scalable chunk storage with S3-compatible API.
ποΈ Architecture
Technology Stack
Language: Go 1.21+
Database: CockroachDB (PostgreSQL-compatible)
Object Storage: MinIO (S3-compatible)
Communication: gRPC with Protocol Buffers
Containerization: Docker & Docker Compose
Caching: LRU Cache with Cuckoo Filter
Chunking: Variable-block with Blake3 hashing
π Performance Results
File Type Testing
File Type
Size
Deduplication
Chunks
Result
Text (unique)
28B
0%
1
β Expected
Text (edited)
46B
0%
1
β Expected
Small binary
32B
0%
1
β Expected
Large binary
5MB
0%
640
β Expected
Compressed (.zip)
204B
0%
3
β Expected
Compressed (.gz)
60B
0%
1
β Expected
Repetitive
10MB
99.92%
1280
β Excellent
Performance Metrics
Chunking Speed: ~10MB/s
Deduplication Ratio: Up to 99.92% on repetitive content
Cache Hit Rate: >95% for duplicate detection
Storage Efficiency: Significant space savings on similar files
π Quick Start
Prerequisites
Docker and Docker Compose
Go 1.21+ (for development)
Git
1. Clone the Repository
git clone https://github.com/radhakrish-venkat/dedupe-engine.git
cd dedupe-engine
2. Start the Services
docker-compose up -d
This will start:
CockroachDB: Database for metadata storage
MinIO: Object storage for chunks
Data Storage Node: gRPC server for storage operations
Ingest Node: gRPC server for backup processing
3. Test the System
# Test with a small file
docker run --rm --network dedupe-engine_dedupe-net \
-v $(pwd):/data dedupe-engine-stream-handler \
-file /data/test-file.txt -ingest-addr ingest-node:50051
4. Monitor Services
# Check service status
docker-compose ps
# View logs
docker-compose logs -f ingest-node