Project

General

Profile

Bug #2459

Archive Historical MongoDB Collection Data to S3 (CSV + Compressed Format)

Added by Prashant Jain about 3 hours ago.

Status:
New
Priority:
Normal
Start date:
04/07/2026
Due date:
% Done:

0%

Estimated time:

Description

We need to implement a process to archive historical data from MongoDB collections. The archival process should extract old data, convert it into CSV format, compress it, and store it securely in an S3 bucket for long-term storage and cost optimization.

Scope of Work:
Data Identification
Identify historical data based on configurable criteria (e.g., records older than X months).
Ensure active/critical data is excluded.
Data Extraction
Fetch data from MongoDB collections in batches to avoid performance impact.
Support multi-tenant collections (if applicable).
Data Transformation
Convert extracted data into CSV format.
Ensure proper schema mapping and column headers.
Compression
Compress CSV files using one of the following formats:
.zip (preferred)
.gzip (optional)
Ensure optimal file size for storage and transfer.
Storage (AWS S3)
Upload compressed files to designated S3 bucket.

Define folder structure:

s3://<bucket-name>/<tenant>/<collection>/<year>/<month>/
Enable versioning and lifecycle policies (if required).
Post-Archival Handling
Option to delete or mark archived records in MongoDB.
Maintain audit logs for archived data.
Automation
Schedule via cron/job scheduler (e.g., daily/weekly).
Retry mechanism for failed uploads.
Acceptance Criteria:
Historical data is correctly identified and extracted.
Data is converted into valid CSV format.
Files are compressed and uploaded to S3 successfully.
No performance impact on live MongoDB operations.
Logs are maintained for audit and traceability.
Process is configurable (date range, collection name, tenant, etc.).
Technical Considerations:
Use streaming/batch processing to handle large datasets.
Ensure secure S3 upload (IAM roles / access keys).
Handle failures and partial uploads gracefully.
Validate data integrity after archival.

No data to display

Also available in: Atom PDF