Site Reliability Engineers (SRE) at Algolia are both software and systems engineers
that ensure we can reliably serve billions of queries every day, for
users all around the world, despite datacenters being unavailable and
undersea cables being cut. As we operate many services including our
Search API, Places, DocSearch and Analytics, you’ll keep learning new
things everyday and share what you have learned.
The
platform we develop uses both virtual and bare-metal systems spanning
over 50 data centers in 15 different regions serving millions of users
from every corner of the globe. Since search is a critical component of
many applications, the SRE team maintains a high level of expertise in
system failures in order to prevent them and provide reliable service to
our customers.
No two problems are the same
because all the systems evolve all the time. We expect you to be a
resilient problem solver who isn’t afraid to think outside of the box
and use the knowledge of system interactions in your favor. You’ll also
take ownership of complete projects and execute them.
The
team is composed of engineers with different backgrounds and experience
both in the industry and academia. The diversity works in our favour
and you should increase it by bringing your experience, your knowledge
and your point of view. Thinking differently is a plus, not a minus.
We’re transparent with each other and to other teams both about our
success and our failures. This way we learn, we accept our weaknesses
and continuously strive to improve both personally and professionally.
RESPONSIBILITIES
- Work with other teams to identify, troubleshoot, and resolve high impact issues
- Evaluate performance of current and future systems, both software and hardware
- Participate in design of new systems
- Develop and maintain the automation framework used for all systems
- Participate in on-call rotation to ensure fast response to production issues
REQUIREMENTS
- 4+ years of software engineering experience
- Knowledge of Shell scripting and at least one scripting language (Python, Ruby, etc.)
- Willing to learn Go (golang)
- Understanding of Linux systems: I/O, process scheduling, filesystems
- Understanding of computer networks: TCP/IP, DNS, load-balancing
- Full professional English proficiency
- Rigor in high code quality, automated testing, and other engineering best practices
- Ability to make independent decisions and taking ownership for them
NICE TO HAVE
- Knowledge of Go (golang)
- Ability to use a configuration management tool like Ansible, Puppet or Chef
- Knowledge of low level principles of computers and network components
- Performance profiling of applications both in development and production
- Knowledge of cloud platforms such AWS / GCP / Azure