We’re looking for a hands-on, self-directed Senior DevOps Engineer to join our fast-paced startup. You’ll be the first line of defense for production issues, architect robust observability systems, and improve deployment and testing practices. If you thrive in startup environments, enjoy taking ownership, and are comfortable in modern JS/TS stacks, we’d love to meet you.
Implement a reliable observability stack: Leverage Grafana, CloudWatch, and OpenTelemetry within our Node.js and TypeScript codebase.
Be on top of alerts and issues: Monitor, triage, fix or escalate production issues with traceability and follow-up.
Reduce system noise: Begin reducing the frequency and volume of unexpected errors.
Improve test coverage: Ensure better code quality and proactively catch regressions.
Own DevOps workflows: Deploy, debug, and maintain infrastructure health autonomously.
Become a core team member: Handle incidents independently and support the evolution of our infra/dev culture.
Leading Indicators:
Number of alerts and incidents triaged
Trace IDs investigated and logged
Bugs found early and resolved
Tickets opened/closed efficiently
Reduced volume of unhandled or duplicate errors
Lagging Indicators:
Production uptime and stability
% fixes resolved without handoff
Number of tests added
Reduction in recurring or duplicate issues
Maintain and enhance Grafana dashboards
Integrate and manage CloudWatch alarms and OpenTelemetry traces
Ensure traceability across all systems (CRM, APIs, webhooks, workflows)
Act as first responder for production issues during working hours
Troubleshoot, escalate with full context, and coordinate incident response
Improve deployment workflows and monitor resource usage
Maintain the health of critical subsystems (queues, sync jobs, memory/cpu)
Add and improve test coverage once baseline reliability is achieved
Build confidence in deployments through automated testing and regression checks
Strong experience with Node.js, TypeScript, and React
Deep knowledge of AWS, Grafana, OpenTelemetry, and CloudWatch
Prior startup experience preferred
Clear, proactive communicator with a bias toward ownership
Available 1:30 AM to 10:30 PM IST 5 days/week for on-call responsibilities
Bonus: Experience reviewing pull requests and deploying code regularly
Review and phase-implement an internal RFC for observability
Refine and own Grafana dashboards; implement meaningful alerts
Ensure consistent trace ID usage throughout the codebase
Improve logging and tracing to increase debuggability
Monitor and respond to production errors daily
Investigate, fix, or escalate recurring system issues