Job Description
<h3>📋 Description</h3> • Partner with Data Science, Product Manager, Analytics, and Business teams to review and gather the data/reporting/analytics requirements and build trusted and scalable data models, data extraction processes, and data applications to help answer complex questions.
• Design and implement data pipelines to ETL data from multiple sources into a central data warehouse.
• Design and implement real-time data processing pipelines using Apache Spark Streaming.
• Improve data quality by leveraging internal tools/frameworks to automatically detect and mitigate data quality issues.
• Develop and implement data governance procedures to ensure data security, privacy, and compliance.
• Implement new technologies to improve data processing and analysis.
• Coach and mentor junior data engineers to enhance their skills and foster a collaborative team environment. <h3>🎯 Requirements</h3> • A BE in Computer Science or equivalent with 8+ years of professional experience as a Data Engineer or in a similar role
• Experience building scalable data pipelines in Spark using Airflow scheduler/executor framework or similar scheduling tools.
• Experience with Databricks and its APIs.
• Experience with modern databases (Redshift, Dynamo DB, Mongo DB, Postgres or similar) and data lakes.
• Proficient in one or more programming languages such as Python/Scala and rock-solid SQL skills.
• Champion automated builds and deployments using CICD tools like Bitbucket, Git
• Experience working with large-scale, high-performance data processing systems (batch and streaming) <h3>🏖️ Benefits</h3> • health coverage
• paid volunteer days
• wellness resources