Noggin is in the business of helping individuals manage and monetize their data, which is stored on containerized Private Data Stores. Users monetize their data by (i) viewing advertisements that are matched to them, and (ii) contributing to data aggregation queries/statistical learning processes that they authorize.
We are looking for a Data Engineer with a strong background in building and maintaining data processing pipelines, and who is enthusiastic about "privacy-by-design" data mining and processing.
Classical data engineering capabilities are expected. In particular, proficiency in Apache Spark and associated tools.
- Selection and integration of "Big Data" tools and frameworks
- Performance monitoring and maintenance of data infrastructure
- Collaboration on the design and development of privacy-by-design data processing solutions
- Solid understanding of distributed computing principles
- Proficiency working with distributed data sources
- Experience setting up and managing Spark clusters (preferably on Amazon Web Services)
- Experience building Spark data processing pipelines
- Experience with Scala and AWS Lambda would be an advantage (we use both)
- Experience with components of the broader Spark eco-system would be an advantage
- Experience with messaging systems (e.g.: Kafka, RabbitMQ) would be an advantage
- Experience with configuration management tools (e.g.: Salt, Terraform) would be an advantage