What are we looking for?
A Lead Data Engineer for our data team, mainly working on pipelines that process a huge amount of data.
You'll ve also our first DataOps engineer, helping to define and implement the methodology and cloud-infrastructure for each project.
Our stack uses:
- Python: our lingua franca, used for several processes
- Pandas: for data analysis and our communication language between data scientists and data engineers
- Flask: to expose pipeline results as REST APIs
- Scala: we're moving some ETLs to Scala, in order to improve our type safety. We like functional Scala, but don't get crazy, please (:
- Apache Beam and Scio executed on Google Dataflow: our main tools for big ETLs.
- Kubernetes: to run containers with Python processes or Flask APIs.
What can we offer?
- A key position in one of the most important teams of the company: you'll arrive at the perfect timing, being able to contribute from the very beginning.
Competitive salary + bonus.
- Positive + inclusive + respectful working environment.
- Comfortable office in downtown Madrid (Metro Alonso Martínez): fresh fruit, coffee... you know.
- Remote working when you need it (fully-remote can be considered, part of our team is distributed).
- PluralSight account to learn during your working hours.
- A budget for conferences and events.
And above all, a project where we'll help you to reach the sky 🚀
Do you fit?
We're looking for a proactive individual, passionate for the Big data world, able to learn a lot and fast, but with a good track of Big data products.
Someone able to:
- To make difficult decisions, evaluate alternatives and results,
and fill the gaps if the result is not good enough (failing is part of
- To understand data scientists and business people language and needs. Have always a product mindset and develop strategies to solve data problems in a consistent and robuts way.
- To create technology from scratch, but without blindly following trends and buzzwords. You need to understand the benefits behind every technology or tool.
- To deal with our current code base (not too old, just 2 years old): understand the reasons behind every workaround, accept the tradeoffs made, evolve it iteratively, and have an opinion about when is the right time to throw it away or to keep it (starting from scratch is not always an option).
- To handle millions of rows in our databases, updated weekly with proper monitoring.
- To operate and evolve our Kubernetes clusters (one for live API, the other for parallel data processing) and optimize the development environment based on Docker.
How will be your day-to-day?
- To be the Tech Lead of the Data Engineering team, mentoring and making grow a team of 4-6 data engineers.
- To develop data pipelines: Apache Beam, Airflow, vanilla Python, or other tools that are suitable for the job.
Integrate quality checkings in your pipelines: test input data with preconditions, output data with postconditions and test your code with unit-testing best practices.
- Monitor your pipelines runs: add alerts and notifications when execution was failed or input/output data don't meet the expectations.
- Design and evolve our data lake (based on Google Storage)
- Evaluate, choose and implement our next columnar database, based on business needs: Clickhouse? Bigquery?
- To operate and evolve our Google Cloud infrastructure: Docker, Kubernetes, Stackdriver...
- To evolve the current deployment for Machine Learning models (based on sharding by country).
- Evaluate and choose tools for improve our Machine Learning experiements cycle: Michelangelo? MLFlow?
- To automate deployments of code and data (we call it "data-deploy" because we need to deploy our datasets with the same confidence and control as we do with the code). Consider tools like Apache Nifi for data-deploys.
- To be a technical reference in the team. Your code will be "the way to go" to other data engineers.
- To help to evolve the data acquisition infrastructure: RabbitMQ, Scala, OpenVPN, Proxmox, Linux sysadmin...
I'm all in! What should I do?
If you think you have the commitment and the experience, don't hesitate! Send your resume or Linkedin, Git(hub|Lab) to firstname.lastname@example.org and a brief explanation on why you want to join us!
Come on! Don't miss the train! 🚂