Sr. Data Engineer
The Data Engineering team at Yekpay is responsible for company-wide data infrastructure, pipelines, and processing.
The Data Engineering team at Yekpay is responsible for company-wide data infrastructure, pipelines, and processing, as well as ensuring that data flows smoothly and accurately throughout the organization.
We take pride in our work, and aren’t afraid to get our hands dirty. We work closely with Product, Marketing, Business Intelligence and Application Engineers to ensure that teams throughout the company have easy, realtime access to clean, normalized data, and help improve the experience of using Yekpay overall.
What you’ll do:
- Build and maintain our current data pipelines, data warehousing, ETL processes, and logging and stats infrastructure (Hadoop, Kafka, Scala, Spark, HBase, Flume, MapReduce, etc)
- Design and implement our next-generation data infrastructure inside the Google Cloud Platform, as we migrate from physical machines in data centers. So experience with Dataflow, BigQuery and other Google data products a big plus.
- Build solutions that efficiently handle large volumes of data, while keeping up-to-date with the latest techniques relevant to mining and making decisions with that data
- Write ETL pipelines that allow for relevant data to be analyzed by members of the team as well as other teams
- Partner with product managers to refine and prioritize requirements, estimate and scope work, and time releases
- Work with engineers on other teams to integrate data solutions into the larger Yekpay ecosystem
- Write clean, modular and well-documented code, including automated tests wherever possible
Skills & knowledge you should possess:
- Experience building distributed systems that handle large amounts of data, with the end goal of making actionable decisions from their output.
- Mastery of python, Java, Scala and similar. Go and shell scripting strong pluses
- Solid experience with the Hadoop stack, Spark or other distributed computing frameworks
- Practical experience with production-critical batch and realtime data processing
- Comfortable interacting with SQL, NoSQL and raw log data, and large clustered data stores such as HBase, Vertica, Cassandra, Riak, etc.
- Strong familiarity with general devops/site reliability work (system debugging, monitoring, automation, maintenance, etc)