DevJobs

Data Engineer

Overview
Skills
  • Python Python ꞏ 4y
  • SQL SQL
  • Numpy Numpy
  • Spark Spark
  • Pandas Pandas
  • CI/CD CI/CD
  • AWS AWS
  • GCP GCP
  • Docker Docker
  • Kubernetes Kubernetes
  • Pytest
  • Dask
  • dbt
  • Pandera
  • Parquet
  • Behave
  • BigQuery
  • Dagster
  • Delta Lake
  • Great Expectations
  • Hamilton
Fetcherr, experts in deep learning, algo, e-commerce, and digitization, is disrupting traditional systems with its cutting-edge AI technology. At its core is the Large Market Model (LMM), an adaptable AI engine that forecasts demand and market trends with precision, empowering real-time decision-making. Specializing initially in the airline industry, Fetcherr aims to revolutionize industries with dynamic AI-driven solutions.

Fetcher is seeking a Data Engineer to build large-scale optimized data pipelines using cutting-edge technology and tools. We're looking for someone with advanced Python skills and a deep understanding of memory and CPU optimization in distributed environments. This is a high-impact role with responsibilities that directly influence the company's strategic decisions and data-driven initiatives.

Key Responsibilities:

  • Design and build scalable, cross-client data pipelines and transformation workflows using modern ELT tools, ensuring high performance, reusability, and cost-efficiency across diverse data products. Leverage orchestration frameworks like Dagster to manage dependencies, retries, and monitoring.
  • Develop and operate distributed data processing systems that handle large-scale workloads efficiently, adapting to dynamic data volumes and infrastructure constraints. Apply frameworks such as Dask or Spark to unlock parallelism and optimize compute resource utilization.
  • Deliver robust, maintainable Python solutions by applying sound software engineering principles, including modular architecture, reusable components, and shared libraries. Ensure code quality and operational resilience through CI/CD best practices and containerized deployments.
  • Collaborate with data scientists, engineers, and product teams to deliver validated, analytics-ready data that aligns with business requirements. Support team-wide adoption of data modeling standards and efficient data access patterns.
  • Proactively safeguard data quality and reliability by implementing anomaly detection, validation frameworks, and statistical or ML-based techniques to forecast trends and catch regressions early. Enforce backward compatibility and data contract integrity across pipeline changes.
  • Document workflows, interfaces, and architectural decisions in a clear and structured manner to support long-term maintainability. Maintain up-to-date data contracts, system runbooks, and onboarding guides for effective cross-team collaboration.

Requirements:

You’ll be a great fit if you have...

  • 4+ years of hands-on experience building and maintaining production-grade data pipelines at scale
  • Expertise in Python, with strong grasp of data structures, performance optimization, and modern data processing libraries (e.g. pandas, NumPy)
  • Practical experience with distributed computing frameworks such as Dask or Spark, including performance tuning and memory management
  • Proficiency in SQL, with a deep understanding of query optimization, analytical functions, and cost-efficient query design
  • Experience designing and managing transformation logic using dbt, with a focus on modular development, testability, and scalable performance across large datasets
  • Strong understanding of ETL/ELT architecture, data modeling principles, and data validation
  • Familiarity with cloud platforms (e.g. GCP, AWS) and modern data storage formats (e.g. Parquet, BigQuery, Delta Lake)
  • Experience with CI/CD workflows, Docker, and orchestrating workloads in Kubernetes

Nice to Have :

  • Experience with Dagster or similar workflow orchestration tools
  • Familiarity with automated testing frameworks for data workflows, such as pytest, Great Expectations, Pandera, Behave, Hamilton, dbt tests or similar
  • Deep interest in performance optimization and vectorized computation, especially in Dask/pandas -based pipelines
  • Ability to design cross-client, cost-efficient solutions that prioritize scalability, modularity, and minimal resource consumption
  • Strong grounding in software architecture best practices, including adherence to SOLID, YANGI, KISS, DRY, CoC, OOP, CoI, LOD principles and code reuse through shared libraries (strong pro)
Fetcherr