What you will be doing as Lead Site Reliability Engineer
You
have a strong background as a software, infra or as a system engineer
and you work in close collaboration with your team to improve the
overall reliability and resilience of the environment.
Our stack consists of Cassandra, Kafka and Spark. Our development and test environment run on AWS (using terraform).
You
are an expert on the command line and enthusiastic about optimizing the
overall performance and availability of the environment.
You
strive to automate our process so the other teams can get work done
faster and with better monitoring for which we use the ELK stack and
make use of Prometheus.
In the event that things go wrong (we are human after
all!) you perform root cause analysis and contribute to restoring
service as fast as possible. More importantly, you work actively to
prevent the problem from occurring again and support all our infra
colleagues, even those not on site (specifically, in Poland so you might
need to do a bit of traveling).
You are eager to learn
and use (new) technologies and you are on the lookout for new tools that
could help us achieve our goals, but you always keep availability as
the top priority.
What the Site Reliability Team does
It’s
the purpose of the Site Reliability team to mitigate impact at
incidents as quickly as possible and to prevent that they will occur
again. The team delivers the tools for other engineers to bring software
to production faster, safer and with less errors.
On top of
that, they create the building blocks that are needed to monitor the
overall performance and reliability of the system and act as the single
point of contact for our infra colleagues.
The level of
expertise and experience within the team is very high which creates an
energetic atmosphere. We don't forget to have some fun while developing
the proposition!
What we are looking for in a Lead Site Reliability Engineer:
- Bachelor’s and/or Master’s degree in Computer Science or relevant field
- 5-7 years of experience in software engineering, at least 1 year in a lead role
- Experience working with Docker and Kubernetes in a production environment
- Knowledge of design and implementation of resilience patterns
- Expert in at least one modern scripting language
- You
are the go-to person with knowledge about continuous Integration tools
like: GIT, Gitlab, Sonar and Nexus and are able to configure these
tools.
- Experience of working and tuning NoSQL databases like Cassandra
- Fluent in English (verbal and written)
- Mastery in at least one programming language, preferably Java
- Solid foundation in Linux administration and troubleshooting
- Proven experience with automation and continuous delivery practices
- Knowledge of configuration management tools like Puppet or Chef is a pre;
- Share the on-call rotation (every +/- 10 weeks) and be an escalation contact for incidents
- Good social and communication skills
- Experience in Agile / scrum or in the role as scrum master
What we offer:
- Flexible hours
- Ability to travel to other sites and get to know the local teams and culture
- Amazing future potential as we continue to scale and grow
- Competitive salary and benefits
- Inspiring, entrepreneurial environment with ambitious vision for the future
- Supporting, fun-loving team of highly skilled people
- A challenging job within one of the most distinctive departments of ING
It is important that you
adhere to the ING values and it is evident for you that your behaviour
is fully aligned with these values. You are also prepared to take the
Banker's Oath. For more information, please visit http://www.ing.jobs/Netherlands/Why-ING/This-is-ING-too/ING-Values.htm
If
you see yourself in this role and want to be part of the team that will
disrupt banking, apply now or get in touch for more information (yolt.com and careers@yolt.com)