What we expect from you:
To have done battle with linux emerged victorious
To be obsessed automating stuff now so you can relax later
To like building internal tools for monitoring and configuration
Requirements
- Familiarity with any programming language for system automation tasks: bash advanced scripting, ie. python
- Ability to analyze, identify and overcome technical problems.
- Knowledge of SCM tools.
- 2+ year hands-on experience in Linux administration.
- Familiar with monitoring/graphing systems like Nagios, graphite or similar tools
- MySQL knowledge.
- Networking knowledge (Ethernet, TCP/IP stack, static routing, etc...).
- Dynamic routing protocols (RIP, OSPF, EIGRP, BGP), network high availability
- (CARP, VRRP, STP) and load balancing (HAProxy, LVS).
- Proactive approach to systems maintenance and problem avoidance.
- Very good organizational and communication skills.
- Self-motivated with a strong desire to learn and improve.
- Strong analytical skills and ability to collate and interpret data from various sources.
- Ability to multitask and work under pressure in an environment of continuous change
- Ability to work independently and as part of a team.
- Fluency in written and spoken English.
- Attention to performance and availability of services.
- Ability to write monitoring and maintenance tools using scripting languages (bash expertise is a must).
- Previous use of systems configuration and bootstrapping utilities (Puppet, PXE).
Responsibilities
-
Monitor network, servers and applications hosted in our data centers.
-
Performance tuning, hardware upgrades, and resource optimization as required.
-
Respond to alerts and create auto-remediation systems.
-
Maintenance of technical documentation of services, processes and procedures used throughout normal operations.
-
Apply operating system, application and site configuration changes on regular basis using existing automated tools or creating new ones if necessary.
-
Support engineering teams for enabling new features and services.
-
Participate in code stabilization and code releases.
-
Lead/participate in code releases supporting Tuenti's development teams
-
Analyse, suggest, and implement release process optimizations
-
Design and development of release tools.
-
Respond to and rapidly resolve system failures
-
Analyze and predict bottlenecks in system performance
-
Develop tools for automating installation and maintenance tasks
-
Participation in shifted-schedule and on-call system, to ensure 24/7 site availability
Bonus points for experience with