Data Engineer in London

SnowPlow

Category

Data Engineer

Industry

SaaS Industry

Workplace

Onsite

Hours

Full-Time

Internship

Skills

Go Functional Programming Ansible AWS Communication Java Kubernetes Scala

Share offer

Job Description

A data engineer at Snowplow Analytics Ltd works across our product and infrastructure engineering efforts.

Product engineering

Over the past four years Snowplow has grown into the industry-leading open-source event data pipeline (main repository), consisting of a dizzying array of user-facing products, SDKs and software libraries.

All of these 30+ projects are products in some sense, but Snowplow is not a packaged SaaS product – instead, our various user constituencies (data analysts, developers, devops) interact with the platform via SQL, software SDKs and public APIs; being open source, the Snowplow codebase is itself an important user-facing aspect of the product.

Current and planned projects in product engineering include:

Migrating the Snowplow batch pipeline from Hadoop to Apache Spark (see the RFC)
New SaaS integrations for Sauna, our decisioning and response platform
Adding new event sources to Snowplow, including SaaS webhooks and database change-data-capture
Porting Snowplow to new platforms such as Apache Kafka and Google Cloud Platform
Adding schema inference support to Snowplow and Iglu, our schema registry system
Building tooling and user interfaces for event data modeling in SQL and Apache Spark

Infrastructure engineering

Infrastructure engineering is focused on helping Snowplow Analytics Ltd to grow to managing 100, then 1,000, then 10,000 AWS accounts as part of the Snowplow Managed Service.

To deliver the Snowplow Managed Service we have built a proprietary deployment, orchestration and monitoring stack, using pragmatic technologies including Ansible, CloudFormation, bash, Golang, cron and PagerDuty. We are also developing open source infrastructure tooling, such as DAG runners, Hadoop jobflow runners and similar.

We are constantly iterating on and evolving our infrastructure stack - current and planned projects include:

Porting our real-time pipeline orchestration engine to Kubernetes, then open sourcing it
Replacing our in-house secrets manager with HashiCorp Vault
Adding a UI to Factotum, our open source DAG runner
Implementing Mesos and evaluating options for running scheduled job DAGs on Mesos (replacing our in-house distributed cron system)
Building a framework for automatic upgrades of customers’ Snowplow pipelines
Evaluating Nix as a replacement for much of our Ansible automation

Responsibilities

Responsibilities include:

Working closely with the Snowplow co-founders, gaining deep familiarity with our 30+ open source projects, and making contributions back to those projects to make them easier to operate at scale
Designing and developing our in-house Managed Service stack, using pragmatic technologies including Ansible, CloudFormation, bash, Golang, cron, PagerDuty, Scala, Java, Mesos, Akka and Kubernetes
Designing and developing our open source infrastructure tooling, such as DAG runners, Hadoop jobflow runners and similar
Working closely with Support Engineering, including spending time regularly on the support rotation, to understand their requirements and build tooling to automate their ongoing work
Originating and specifying all-new open source projects on both the product and infrastructure engineering sides
Following best practices in terms of customer/user support, product documentation, testing and QA, software delivery techniques

What we’re looking for

We’d love to get to know you if:

You have strong technical skills. This role would be a great fit for a software engineer who loves infrastructure automation, or who wants more exposure to data engineering and functional programming
You communicate with clarity and precision. It’s super-important that our data engineers do not become bottlenecks across Snowplow’s processes and systems. Communicating your work and being responsive to feedback is as important as your technical ability
You have a mature attitude to InfoSec, documentation and process. Managed Service customers trust us with their event pipelines and AWS accounts - this is a huge responsibility and informs everything we do

Interested? Send your CV to recruitment@snowplowanalytics.com.

We do not welcome calls from recruitment consultants.

Read full job description

About SnowPlow

Industry
Saa S

SnowPlow company page is empty

Add a description and pictures to attract more candidates and boost your employer branding.