27/10/2020
Responsibilities
● Designing, building, running and monitoring production infrastructure
● Responding to production incidents and determining how we can prevent them in
the future
● Triaging and troubleshooting complex production issues to ensure reliability and
performance
● Identifying and automating manual processes
● Continuously evolving our monitoring tools and platform
● Promoting and applying best practices for building scalable and reliable services
across engineering
● Developing and maintaining technical documentation, runbooks, and procedures
● Supporting a 24x7 online environment as part of an on-call rotation
About You
● Bachelor of Science in Computer Science or related field is required
● 7+ years of SRE/DevOps/infrastructure experience
● 7+ years of experience deploying, operating and debugging server software on Linux
at scale
● Unwavering commitment to identifying root cause of infrastructure issues and
resolving them
● Have experience automating and running large scale production Java/Tomcat
services in AWS (EC2, ECS, KMS, Kinesis, RDS) or other cloud providers
● Advance experience with configuration management and orchestration tools
(Ansible, Terraform)
● Experience with the use, maintenance and configuration of monitoring, metrics and
logging infrastructure (Datadog, Sensu, New Relic, Icinga/Nagios, etc.)
● Aptitude for automation and streamlining of tasks in an SRE/Operations engineering
context (Python, Go, Bash, Ruby, etc.)
● Have experience writing infrastructure as code using tools such as Chef and
Terraform
● Comfortable working with modern databases and big data platforms (SQL, etc.)
MySQL automation a big plus