Job Description
<h3>π Description</h3> β’ Own core data ingestion and transformation pipelines across streaming and batch systems
β’ Work on both greenfield projects and robust production systems
β’ Dive into technologies like Spark, Kafka, Airflow, Delta Lake, Teradata, BigQuery, and Hive
β’ Solve real business problems for marketing, finance, supply chain, and more
β’ Be part of a flexible, supportive, and growth-oriented engineering culture
β’ Design, develop, and maintain high-performance, high-volume ETL pipelines (batch and streaming)
β’ Build and optimize cloud-based data platforms on AWS and GCP
β’ Partner with cross-functional teams to define requirements and implement scalable solutions
β’ Mentor junior engineers and contribute to a collaborative, inclusive team culture
β’ Own data quality, lineage, monitoring, and production support for critical pipelines
β’ Participate in on-call rotations and agile delivery cycles <h3>π― Requirements</h3> β’ BS/MS in Computer Science or related field
β’ 5+ years in data engineering, distributed systems, or backend development
β’ Strong coding skills in Python, Java, or Scala
β’ Deep SQL knowledge and experience with both relational and NoSQL databases
β’ Proven expertise in big data frameworks like Spark, Hive, Kafka, and Airflow
β’ Experience building and supporting production-grade data platforms
β’ Strong communication skills and ability to thrive in a globally distributed team
β’ Experience with cloud data technologies (AWS or GCP, especially for large-scale processing)
β’ Ownership mindset and ability to operate independently with minimal supervision
β’ Familiarity with HBase, Redis, CDC (Change Data Capture) tools
β’ Experience with Google Data Streams, Dataproc, Delta Lake, or Apache Hudi
β’ Background in monitoring tools like Elasticsearch and Wavefront
β’ Experience in a dynamic, multi-time-zone engineering environment