Posted 4 days ago

SaaS Infrastructure, Reliability & Multi-Tenant Scaling

Why This Role Exists

Rapid Alpha is executing a 2026 transition:

From: Analyst-heavy, manually scaled delivery
To: Platform-leveraged execution through EVOS

Our objective is clear:

Increase margin.
Increase client concurrency.
Increase reliability.
Without increasing operational fragility.

Today:

  • AI workloads (embedding, indexing, classification) can strain infrastructure.
  • Web and compute workloads must be cleanly isolated.
  • Client concurrency must scale predictably.
  • Infrastructure discipline must match product ambition.

You are not joining to “manage servers.”

You are joining to ensure the system holds under growth.

This role protects revenue and enables scale.

The System You Will Inherit

You step into a live, revenue-generating SaaS environment:

  • Laravel-based web application
  • Python services for AI workloads
  • PostgreSQL database
  • AWS infrastructure (EC2, RDS, S3)
  • Multi-tenant client usage
  • Growing concurrent document-processing demands

Current constraint:

AI workloads and application workloads must be cleanly separated and scaled.

The foundation exists.
Your responsibility is to professionalize and harden it.

What You Own (Non-Negotiable)

1️⃣ System Stability & Isolation

You will:

  • Separate web tier from worker tier
  • Implement and maintain queue-driven architecture
  • Ensure embedding and classification jobs never degrade web performance
  • Design horizontal scaling strategies
  • Implement resource limits and concurrency controls

If the system degrades under load, you own the diagnosis.

2️⃣ Multi-Tenant Scalability

As we onboard more clients, you ensure:

  • Client workloads do not interfere with one another
  • Background processing scales predictably
  • Capacity planning anticipates growth
  • Infrastructure bottlenecks are identified before failure

Scaling revenue must not increase fragility.

3️⃣ Observability & Operational Discipline

You will implement:

  • Monitoring and alerting
  • Logging and traceability
  • Performance metrics dashboards
  • Incident documentation practices
  • Environment parity (dev/staging/prod)

Engineering discipline is part of margin expansion.

4️⃣ Collaboration with AI Systems Architect

You will work closely with the Principal AI Systems Architect to ensure:

  • AI workflows are async and queue-driven
  • Concurrency is managed intentionally
  • Rate limiting and retry logic are robust
  • Infrastructure scales before demand forces it

You own the system layer.
They own the intelligence layer.

P&L Impact

This role directly impacts:

  • Client retention (no instability under growth)
  • Revenue capacity (more concurrent workloads)
  • Operational cost control
  • Founder risk exposure
  • Long-term platform valuation

Stable infrastructure is a revenue multiplier.

Fragile infrastructure is a growth limiter.

What You Must Be Able To Explain

  • How to isolate CPU-bound AI jobs from web application workloads.
  • How to design a queue-based architecture for large document processing.
  • What fails first under concurrency (DB, memory, CPU, IO) and how to mitigate it.
  • How to scale AWS infrastructure predictably.
  • How to introduce observability in a growing SaaS system.

If you cannot articulate failure modes, this role is not a fit.

Required Experience

  • 6–10+ years backend/infrastructure engineering
  • Strong AWS experience (EC2, RDS, autoscaling, networking)
  • Experience implementing queue systems (SQS, Redis, RabbitMQ, etc.)
  • Experience scaling multi-tenant SaaS platforms
  • Strong understanding of distributed system constraints
  • Familiarity working in mixed-stack environments (PHP + Python preferred)
  • Experience improving production stability in live systems

This is equivalent to a Senior / Staff-level infrastructure engineer in a scaling SaaS company.

What You Do NOT Own

  • Frontend feature development
  • AI workflow logic
  • Prompt design
  • Client-specific custom builds
  • Marketing or product experimentation

You own stability, scale, and operational integrity.

What Success Looks Like (First 90 Days)

By Day 30

  • Architecture audit completed
  • Clear isolation plan defined
  • Immediate stability risks mitigated

By Day 60

  • Worker isolation implemented
  • Async job architecture stabilized
  • Monitoring and alerts operational

By Day 90

  • System stable under concurrent multi-client workloads
  • Clear capacity planning roadmap
  • Reduced infrastructure risk under growth

Compensation

  • 100% remote
  • India federal holidays + company shutdown weeks
  • Full benefits
  • High ownership
  • Direct collaboration with founder and AI Systems Architect

Compensation reflects expected impact on platform stability and revenue scalability.

About Rapid Alpha

Rapid Alpha helps mid-sized companies align Vision, Focus, and Results (VFR) into repeatable execution systems.

EVOS is a structured execution platform designed to encode expert reasoning into scalable AI workflows.

Our 2026 objective:

Scale results delivery without scaling fragility.

This role is foundational to that objective.

How to Apply

Please send:

  1. A brief description of one production system you helped scale, including what broke and how you fixed it.
  2. A short architecture outline (max 500 words) explaining how you would isolate AI background jobs from a Laravel-based web application handling concurrent users.
  3. Your CV and expected CTC.

Applications without this information will not be reviewed.

This is a senior ownership role. We value clarity of thinking over keyword density.

Apply For This Job

A valid email address is required.
A valid phone number is required.