Senior Platform Architect

Posted 4 days ago

SaaS Infrastructure, Reliability & Multi-Tenant Scaling

Why This Role Exists

Rapid Alpha is executing a 2026 transition:

From: Analyst-heavy, manually scaled delivery
To: Platform-leveraged execution through EVOS

Our objective is clear:

Increase margin.
Increase client concurrency.
Increase reliability.
Without increasing operational fragility.

Today:

AI workloads (embedding, indexing, classification) can strain infrastructure.
Web and compute workloads must be cleanly isolated.
Client concurrency must scale predictably.
Infrastructure discipline must match product ambition.

You are not joining to “manage servers.”

You are joining to ensure the system holds under growth.

This role protects revenue and enables scale.

The System You Will Inherit

You step into a live, revenue-generating SaaS environment:

Laravel-based web application
Python services for AI workloads
PostgreSQL database
AWS infrastructure (EC2, RDS, S3)
Multi-tenant client usage
Growing concurrent document-processing demands

Current constraint:

AI workloads and application workloads must be cleanly separated and scaled.

The foundation exists.
Your responsibility is to professionalize and harden it.

What You Own (Non-Negotiable)

1️⃣ System Stability & Isolation

You will:

Separate web tier from worker tier
Implement and maintain queue-driven architecture
Ensure embedding and classification jobs never degrade web performance
Design horizontal scaling strategies
Implement resource limits and concurrency controls

If the system degrades under load, you own the diagnosis.

2️⃣ Multi-Tenant Scalability

As we onboard more clients, you ensure:

Client workloads do not interfere with one another
Background processing scales predictably
Capacity planning anticipates growth
Infrastructure bottlenecks are identified before failure

Scaling revenue must not increase fragility.

3️⃣ Observability & Operational Discipline

You will implement:

Monitoring and alerting
Logging and traceability
Performance metrics dashboards
Incident documentation practices
Environment parity (dev/staging/prod)

Engineering discipline is part of margin expansion.

4️⃣ Collaboration with AI Systems Architect

You will work closely with the Principal AI Systems Architect to ensure:

AI workflows are async and queue-driven
Concurrency is managed intentionally
Rate limiting and retry logic are robust
Infrastructure scales before demand forces it

You own the system layer.
They own the intelligence layer.

P&L Impact

This role directly impacts:

Client retention (no instability under growth)
Revenue capacity (more concurrent workloads)
Operational cost control
Founder risk exposure
Long-term platform valuation

Stable infrastructure is a revenue multiplier.

Fragile infrastructure is a growth limiter.

What You Must Be Able To Explain

How to isolate CPU-bound AI jobs from web application workloads.
How to design a queue-based architecture for large document processing.
What fails first under concurrency (DB, memory, CPU, IO) and how to mitigate it.
How to scale AWS infrastructure predictably.
How to introduce observability in a growing SaaS system.

If you cannot articulate failure modes, this role is not a fit.

Required Experience

6–10+ years backend/infrastructure engineering
Strong AWS experience (EC2, RDS, autoscaling, networking)
Experience implementing queue systems (SQS, Redis, RabbitMQ, etc.)
Experience scaling multi-tenant SaaS platforms
Strong understanding of distributed system constraints
Familiarity working in mixed-stack environments (PHP + Python preferred)
Experience improving production stability in live systems

This is equivalent to a Senior / Staff-level infrastructure engineer in a scaling SaaS company.

What You Do NOT Own

Frontend feature development
AI workflow logic
Prompt design
Client-specific custom builds
Marketing or product experimentation

You own stability, scale, and operational integrity.

What Success Looks Like (First 90 Days)

By Day 30

Architecture audit completed
Clear isolation plan defined
Immediate stability risks mitigated

By Day 60

Worker isolation implemented
Async job architecture stabilized
Monitoring and alerts operational

By Day 90

System stable under concurrent multi-client workloads
Clear capacity planning roadmap
Reduced infrastructure risk under growth

Compensation

100% remote
India federal holidays + company shutdown weeks
Full benefits
High ownership
Direct collaboration with founder and AI Systems Architect

Compensation reflects expected impact on platform stability and revenue scalability.

About Rapid Alpha

Rapid Alpha helps mid-sized companies align Vision, Focus, and Results (VFR) into repeatable execution systems.

EVOS is a structured execution platform designed to encode expert reasoning into scalable AI workflows.

Our 2026 objective:

Scale results delivery without scaling fragility.

This role is foundational to that objective.

How to Apply

Please send:

A brief description of one production system you helped scale, including what broke and how you fixed it.
A short architecture outline (max 500 words) explaining how you would isolate AI background jobs from a Laravel-based web application handling concurrent users.
Your CV and expected CTC.

Applications without this information will not be reviewed.

This is a senior ownership role. We value clarity of thinking over keyword density.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.