Site Reliability Engineer ( L3 Applicaton Engineer)

  • Administer the Issuing applications and support application infrastructure to ensure our platform services are optimized for performance and reliability
  • Consult with other technical team members on best practices to improve scalability and reliability of infrastructure and application services. Discuss and agree on the most innovative solutions applicable to our cloud services environments
  • Maintain and enhance internal tools
  • Deploy and assist with maintenance of automation technologies (Scripts, APIs etc.)
  • Maintain, Create and Update Knowledge base articles and operational procedures
  • Proactive monitoring of applications and infrastructure to ensure high availability of critical functions, re-act on events in order to restore any degradation as quickly as possible
  • Support development and implementation teams with your technical expertise during the launch or integration of new services

Requirements

  • Experience in deploying and supporting web based applications and ability to grasp complex dependencies between multiple systems
  • Experience with at least one orchestration and configuration management toolsets like: Ansible, Puppet, Chef, Terraform, Packer, OpenStack Heat, CloudFormation
  • Understanding of operational concepts like change management, on call rotations, escalations, uptime, performance tuning, monitoring, log analysis etc
  • Cloud related experience: Amazon Web Services (i.e. EC2, ELB, ALB, ECS, S3, Route53, EBS, EFS, Networking etc.), OpenStack (Nova, Swift, Cinder, Glance, Neutron) or other cloud providers
  • Understanding of continues delivery and experience with Jenkins, Git etc
  • Strong knowledge of monitoring tools (Nagios, AppDynamics, DataDog) including experience in design and implementation of new monitoring checks
  • Strong Linux OS skills with multiple years of admin experience
  • Strong scripting knowledge (PL/SQL, Perl, Shell, Python) with several years of experience
  • Expert of Java application containers (e.g. Tomcat) and Apache web servers
  • Strong troubleshooting and analytical skills, ability to comprehend, review and analyze application logs and database queries (Oracle, MySQL)
  • Excellent communication skills (verbal, written) in English