
SMS activation platform
August 2025
4 min read
Backend Engineer
Client confidential
Automated Expiry Detection for Virtual Numbers
Built an automated expiry detection and refund system for temporary virtual numbers using Redis sorted sets when the provider offered no webhook support.
~95%
Polling Reduction
99.9%+
Refund Accuracy
500+/hr
Order Volume
Some clients prefer not to be named publicly. We honor that. These write-ups describe the problem, our approach, and the outcome. Never confidential details.
Written by
Relev Works EngineeringBackend EngineerAt a glance
Problem
Temporary virtual numbers expired silently when unused. The provider sent no webhooks, so expired orders stayed active internally, users were not refunded automatically, and manual reconciliation became routine.
Approach
Time-ordered expiry tracking with Redis sorted sets and a background worker that processes only due entries. Provider verification on expiry, idempotent refunds with database locks and Redis idempotency keys.
Result
Expired numbers handled automatically without provider support. Manual reconciliation eliminated. Refund accuracy reached 99.9%+ with idempotency guards. Stable under 500+ concurrent orders per hour.
Write-up
Context
The SMS activation platform allowed users to purchase temporary virtual phone numbers for receiving SMS verification codes. The workflow was:
- User purchases a number for a specific service (e.g., WhatsApp, Telegram)
- System requests number from third-party provider via API
- Number is active for a fixed duration (typically 10-20 minutes)
- User receives SMS verification code on that number
- System releases the number after successful verification or expiration
The provider's API handled number provisioning and SMS forwarding but provided no event notifications for number lifecycle changes. All status checks required polling their REST API.
Problem
When a user purchased a virtual number:
- The provider issued the number successfully
- Each number had a known expiration time
- If unused, the number expired silently
- The provider did not notify our system
This caused several issues:
- Expired numbers remained active internally
- Users were not refunded automatically
- Manual reconciliation became necessary
- Periodic polling of all active orders was inefficient and expensive
The system needed a reliable way to detect expirations without introducing excessive API calls or database load.
Constraints
The solution needed to satisfy the following conditions:
- Expiration time was known at purchase
- Provider APIs could be slow or inconsistent
- Refunds must never execute more than once
- The system already processed concurrent orders
- Background processing had to be lightweight and predictable
- Could not poll the provider API for every active order continuously
- Database queries for time-based filtering would not scale efficiently
Solution
We introduced a time-ordered expiry system using Redis sorted sets instead of database-based polling.
Approach
When a number was purchased:
- The expiration timestamp was calculated based on provider-specified duration
- A small buffer (+30 seconds) was added to account for provider delays
- The order ID was inserted into a Redis sorted set using the expiry timestamp as the score
A background worker continuously monitored only the earliest expiring entries instead of scanning all active orders:
# Worker polls for expired entries
def check_expirations():
current_time = time.time()
# Get orders expiring up to current time
expired_orders = redis.zrangebyscore(
'number_expirations',
min=0,
max=current_time,
start=0,
num=100 # Process in batches
)
for order_id in expired_orders:
process_expiration(order_id)
# Remove from sorted set after processing
redis.zrem('number_expirations', order_id)When an expiry time was reached:
- The worker verified the order status via the provider API
- If fulfilled (SMS received), the order was closed normally
- If expired and unused, a refund process was triggered
Key decisions
Redis sorted sets over database queries
Database queries using WHERE expiry_time <= NOW() would require full table scans or complex indexing. Redis sorted sets provide O(log N) range queries by score and are optimized for time-series data. The worker only queries entries that need immediate action.
Idempotent refund logic with status guards
Background workers can retry on failure. Refund operations needed protection against double execution:
def process_refund(order_id: str):
# Atomic status check and update
order = db.query(Order).filter(
Order.id == order_id,
Order.status == 'active' # Only refund active orders
).with_for_update().first()
if not order:
return # Already processed or doesn't exist
# Check idempotency key
idempotency_key = f"refund:{order_id}"
if cache.exists(idempotency_key):
return # Refund already processed
# Execute refund
user.balance += order.amount
order.status = 'refunded'
# Mark as processed
cache.set(idempotency_key, "1", ex=86400)
db.commit()30-second buffer on expiry time
Provider API responses could be delayed. Numbers might expire server-side before our system received confirmation. The buffer reduced false positives and unnecessary API calls.
Challenge
During production usage, approximately 1 in 200 orders failed due to database timeouts while processing refunds.
The issue occurred when multiple workers attempted to process the same expiring order simultaneously. Although Redis correctly removed the entry from the sorted set, race conditions existed between the Redis operation and the database transaction.
Because the worker retried failed jobs, this exposed the risk of duplicate refund attempts.
The issue was resolved by:
- Adding a database-level lock using
SELECT FOR UPDATEduring status checks - Introducing an idempotency key in Redis that persisted for 24 hours
- Ensuring refund operations executed atomically within a single transaction
The combination prevented double refunds even under retry scenarios.
Outcome
After deployment:
- Expired numbers were handled automatically without provider support
- API polling load was reduced by approximately 95% (from continuous polling to event-driven processing)
- Refund accuracy improved to 99.9%+ with idempotency guards
- The system remained stable under concurrent processing of 500+ orders per hour
- Manual reconciliation work was eliminated
Lessons learned
Treat external providers as unreliable by default. Time-based workflows benefit from ordered data structures. Background workers must assume retries — design every financial operation to be idempotent.
Expiry handling
-
Treat external providers as unreliable by default. Webhook support should not be assumed. Always design fallback mechanisms for critical workflows.
-
Time-based workflows benefit from ordered data structures. Redis sorted sets are significantly more efficient than database time-range queries for expiry detection.
-
Background workers must always assume retries will occur. Design every operation to be idempotent. Use a combination of database locks and external caching for financial operations.
-
Buffer times prevent edge cases. Small time buffers between system state and external provider state reduce false positives and unnecessary API calls.
Technical appendix
Technical problem
No lifecycle events from the provider API. Polling every active order was expensive. Time-range database queries would not scale. Refund logic had to stay idempotent under concurrent workers and retries.
Technical approach
Time-ordered expiry tracking with Redis sorted sets and a background worker that processes only due entries. Provider verification on expiry, idempotent refunds with database locks and Redis idempotency keys.
Technical outcome
API polling load reduced by approximately 95%. Redis ZRANGEBYSCORE drives event-driven expiry processing. SELECT FOR UPDATE plus 24-hour idempotency keys prevent duplicate refunds under retry.
Related case studies
Avnac Studio
Boreas - Job Queue Architecture for Background Removal at Scale
Built a fast, stateless background-removal API that decouples image upload validation from expensive compute work using Redis queues and worker pools, keeping request latency under 200ms regardless of processing time.
Read case study →Avnac Studio
Saraswati Engine - Extracting a Design Tool from Fabric.js
Avnac Studio's desktop port was blocked by Fabric.js acting as the document model, not just the renderer. We extracted scene logic into a renderer-agnostic engine without pausing the product.
Read case study →