You will be joining the Dataservices team, where we produce query-able datasets in order to enable higher analytics.
About the Dataservices team
The team is in charge of maintaining a self-service data feed that enables anyone within the company to make decisions backed up by data. We clean, transform and normalize data from both internal and external sources. We orchestrate pipelines composed of multiple stages using a workflow manager, Apache Airflow. Data ingestion, data profiling, ETLs using Apache Spark are part of our daily tasks. Options to work remotely.
What we offer
- Trust: you manage your own work schedule
- Join a diverse and multicultural environment
- Work with the latest open source technologies
- A comprehensive employee benefits package
- Permanent contract
- Top notch office located in the city center
- Solid Python knowledge: Pandas, PySpark
- SQL mastery
- Experience developing ETLs
- Relational database and data warehousing concepts
- Experience with version control systems, we use Git
- Familiarity with Unix commands, a predilection for command line
- Have a strong knowledge of the HTTP protocol
- 2+ years of software development experience
- You know how to translate business requirements into feasible tasks
- Communicate with other departments on behalf of the team, providing support on how to gather insights from the existing data
- Spot repetitive tasks and detect the underlying problem. Automate all the things.
- Obsessed with data quality. You know how to detect outliers in a dataset.
- You know what is data governance
- Experience with workflow managers (Apache Airflow, Luigi, etc.)
- Experience with Scala/Java
- Experience with Docker
- Familiarity with techniques and tools for crawling and transforming data
- Know how to handle performance issues with crawling at scale and common anti-scraping protection methods
- Have excellent communication in both spoken and written English