Adikteev is an adTech startup created in November 2012 and funded by tier 1 investors ISAI and Ventech.
We deliver performance based campaigns for the largest brands on a
400+ publisher network and rely on a proprietary predictive targeting
technology and proprietary over performing ad units suite.
To support our aggressive development and triple digit growth, we are
always looking for the most talented people to join our 40+ people team.
Adikteev is now looking for a Data Engineer who will join as soon as
possible the Data Science team to build and maintain its huge
advertising data pipeline.
MISSIONS
Your key role is to link both ways production and data science environment, therefore your responsibilities cover:
- Piping efficiently fresh data from production environment to data science environment
- Selecting and integrating big data tools and frameworks required
to provide requested storage, querying and processing capabilities
- Implementing ETL and data cleaning processes between different components of the stack
- Monitoring performance and advising any necessary infrastructure changes
- Deploying scalable predictive systems based on data scientists algorithms
- Implementing large-scale batch and stream processing jobs
- Building high-performance APIs to deliver predictions
- Maintaining a data analysis and visualization environment for the data science team
Requirements
MINIMUM VIABLE CANDIDATE
- Enjoys being challenged and to solve complex problems on a daily basis
- Is able to work in teams and collaborate with others to clarify requirements
- Enjoys working iteratively, delivering fast, failing fast and learning from its mistakes
- Shows advanced programming skills including fluency with one
software engineering language (Java, Scala, C/C++, etc.) and one
scripting language (Python, Ruby, etc.)
- Has a proficient understanding of distributed computing principles
- Is able to administrate a Hadoop cluster, with all included
services, along with ability to solve any ongoing issues with operating
the cluster
- Is able to tune big data solutions to improve performance and end-user experience
- Is proficient with distributed processing technologies (Spark/Hadoop ecosystem)
- Has experience with building stream-processing systems, using solutions such as Spark-Streaming, Storm or Samza
- Has a good knowledge of distributed querying tools such as Hive and Impala
- Has experience with operating high-perfomance databases populated from multiple data sources
- Has experience with relational databases such as PostgreSQL
- Has experience with NoSQL databases such as Couchbase, Cassandra and Redis
- Has knowledge of various ETL techniques and frameworks
- Has experience with Kafka or other messaging systems
BONUS POINTS