Senior/Expert Engineer, Site Reliability (SRE)

Singapore Engineering and Technology Experienced (Individual Contributor)
Job Description
  • Deep dive into development lines, learning and understanding the mechanism of every application component, and promoting product scalability, stability and performance.
  • Setup, manage and maintain product, middleware, big-data applications and services.
  • Perform regular and ad-hoc server-side deployments, performance fine-tuning and troubleshooting.
  • Design and develop automations for workflows.
  • Capacity and Resource management.
  • Responsible for the full-chain stress test to enhance the performance and remove redundancy of applications.
  • Prepare routine operation documentation.
Job Requirements
  • Bachelor’s or higher degree in Computer Science, Engineering, Information Systems or related fields.
  • Minimum 2 years of working experience in Site Reliability Engineer roles.
  • Extensive and hands-on knowledge with Linux operating systems (Ubuntu, CentOS, etc.).
  • Knowledge of Computer Network(TCP/IP, DNS, etc.) and OS.
  • Hands-on experience with at least one of the programming languages: Bash, Python, Go.
  • Strong analytical and problem-solving skills with the ability to thrive under difficult and stressful situations.
  • Passion and high sense of responsibility for work.
  • Fast learning ability and a good team player.
  • Detailed-oriented, cautious and prudent.

 

Skills below are optional but preferable:

  • Experience with automation tools like Ansible, Jenkins.
  • Experience with monitoring tools like Prometheus, Zabbix, Grafana etc.
  • Experience with load balancing tools like LVS, Nginx, Openresty or HAProxy.
  • Experience with container technology such as Docker, Kubernetes.
  • Experience with Kafka and Codis