Site Reliability Engineer at Odixcity Consulting

Experience: Mid-level, Senior-level

Education: B.A/BSc/HND

Job-type: Full Time, Remote

Location: Nigeria

Field: Tech & Development

Odixcity Consulting is a Nigerian HR consulting and procurement firm that provides business solutions to companies, entrepreneurs, and SMEs. We offer a wide range of services including recruitment, performance management, training and development, compensation and benefits, payroll and benefits administration, and procurement of goods and services.

Job Role

We are at a critical inflection point. Our low-code platform is preparing for an immediate scale-up to 3,000,000 concurrent users. We currently operate on a GKE-based architecture with 78 microservices and a MongoDB Atlas backend. We need a Lead Site Reliability Engineer who can transform our current synchronous system into a high-concurrency, asynchronous engine capable of surviving massive traffic spikes without database or compute failure.

Responsibilities

โ— Decoupled Architecture: Transition synchronous API flows to Google Cloud Pub/Sub to act as a shock absorber for a MongoDB Atlas M60+ cluster.

โ— Database Guardrails: Implement and own the “Speed Limit” for our database. You will configure Subscriber-side Flow Control in Node.js and Kubernetes HPA to ensure we never exceed 10,000 IOPS or 32k connections.

โ— Resource Isolation: Isolate heavy Puppeteer/Chrome workloads from core platform services using Cloud Run or dedicated Spot VM node pools with taints/tolerations.

โ— Observability & Alerts: Build a “Nerve Center” using Cloud Monitoring. You must track Message Age, Disk Throughput, and Connection Saturation with millisecond precision.

โ— Platform Hardening: Work with our 78 microservices to optimize their container footprints using Vertical Pod Autoscaling (VPA) and efficient bin-packing.

Technical Requirements

โ— GCP Mastery: Deep experience with GKE, Pub/Sub, and Cloud Run. You should know how to request and manage high-scale CPU quotas.

โ— Advanced Node.js: You must understand how to manage the Event Loop under heavy load and how to properly ack/nack messages in a distributed queue.

โ— MongoDB at Scale: Experience with Atlas M60/M80 tiers. You must know how to diagnose Index Resident Memory issues and manage connection pooling at the platform level.

โ— The “SRE Mindset”: You believe that a “Slow” system is better than a “Broken” one. You have experience implementing Backpressure and Circuit Breakers.

Method of Application

Meet the Qualifications? Email your CVs and cover letters to [email protected] using the Job Title as the subject.

Fast Track Your Application

Want to skip the queue? Submit your CV directly and our team will prioritise your application.

Submit Your CV

Latest Jobs

Scroll to Top