What you will be doing as Lead Site Reliability Engineer
You have a strong background as a software, infra or as a system engineer and you work in close collaboration with your team to improve the overall reliability and resilience of the environment.
Our stack consists of Cassandra, Kafka and Spark. Our development and test environment run on AWS (using terraform).
You are an expert on the command line and enthusiastic about optimizing the overall performance and availability of the environment.
You strive to automate our process so the other teams can get work done faster and with better monitoring for which we use the ELK stack and make use of Prometheus.
In the event that things go wrong (we are human after all!) you perform root cause analysis and contribute to restoring service as fast as possible. More importantly, you work actively to prevent the problem from occurring again and support all our infra colleagues, even those not on site (specifically, in Poland so you might need to do a bit of traveling).
You are eager to learn and use (new) technologies and you are on the lookout for new tools that could help us achieve our goals, but you always keep availability as the top priority.
What the Site Reliability Team does
It’s the purpose of the Site Reliability team to mitigate impact at incidents as quickly as possible and to prevent that they will occur again. The team delivers the tools for other engineers to bring software to production faster, safer and with less errors.
On top of that, they create the building blocks that are needed to monitor the overall performance and reliability of the system and act as the single point of contact for our infra colleagues.
The level of expertise and experience within the team is very high which creates an energetic atmosphere. We don't forget to have some fun while developing the proposition!
What we are looking for in a Lead Site Reliability Engineer:
- Bachelor’s and/or Master’s degree in Computer Science or relevant field
- 5-7 years of experience in software engineering, at least 1 year in a lead role
- Experience working with Docker and Kubernetes in a production environment
- Knowledge of design and implementation of resilience patterns
- Expert in at least one modern scripting language
- You are the go-to person with knowledge about continuous Integration tools like: GIT, Gitlab, Sonar and Nexus and are able to configure these tools.
- Experience of working and tuning NoSQL databases like Cassandra
- Fluent in English (verbal and written)
- Mastery in at least one programming language, preferably Java
- Solid foundation in Linux administration and troubleshooting
- Proven experience with automation and continuous delivery practices
- Knowledge of configuration management tools like Puppet or Chef is a pre;
- Share the on-call rotation (every +/- 10 weeks) and be an escalation contact for incidents
- Good social and communication skills
- Experience in Agile / scrum or in the role as scrum master
What we offer:
- Flexible hours
- Ability to travel to other sites and get to know the local teams and culture
- Amazing future potential as we continue to scale and grow
- Competitive salary and benefits
- Inspiring, entrepreneurial environment with ambitious vision for the future
- Supporting, fun-loving team of highly skilled people
- A challenging job within one of the most distinctive departments of ING
It is important that you adhere to the ING values and it is evident for you that your behaviour is fully aligned with these values. You are also prepared to take the Banker's Oath. For more information, please visit http://www.ing.jobs/Netherlands/Why-ING/This-is-ING-too/ING-Values.htm
If you see yourself in this role and want to be part of the team that will disrupt banking, apply now or get in touch for more information (yolt.com and email@example.com)