Lead Data Engineer
About RENEWCAST
Founded in 2020, RENEWCAST is a leader in precision forecasting for wind and solar energy. We harness cutting-edge machine learning to deliver highly accurate power production forecasts. Our mission is to advance renewable energy through innovation, collaboration, and technical excellence.
Recently, RENEWCAST secured a €2 million funding round led by South Western Power Group (SWPG) and CDP Venture Capital’s Green Transition Fund, strengthening our position as a global leader in renewable energy forecasting. This investment enables us to expand our tech team, enhance our forecasting models, and scale operations across Europe and the U.S. market.
Backed by Beamline Accelerator, Helen Ventures, and Tech4Planet, we are assembling a world-class team of engineers, data scientists, and energy experts to redefine wind and solar power forecasting.
We’re building next-generation forecasting software for renewable energy production.
Our platform ingests and processes vast geospatial datasets—from weather models to sensor data—to deliver highly accurate, scalable, and cost-efficient predictions for wind and solar farms. Our core infrastructure is built on Azure, leveraging Databricks, Delta Lake, Kubernetes, and Postgres. We operate with a lean, execution-focused engineering culture — and we’re growing fast.
Your Role
As a Data Engineer, you will take full ownership of the data flows powering our forecasting platform.
You’ll design, orchestrate, and optimize reliable pipelines with a strong focus on performance, cost efficiency, and reproducibility.
You’ll work within a cross-functional engineering team that includes a Machine Learning Tech Lead, a DevOps Engineer, an MLOps Engineer, and a Full-Stack Developer. We expect you to work autonomously, take full responsibility for the data engineering scope, and collaborate effectively to deliver production-grade systems.
What You Will Do
-
Own and maintain all Databricks-based data pipelines, from ingestion to transformation to delivery
-
Design and optimize workflows for performance, clarity, and cost efficiency, including compute strategy, parallelization, and dependency management
-
Contribute to evolving the orchestration layer (e.g., transitioning workflows to Prefect, Dagster, or similar frameworks running on Kubernetes)
-
Support CI/CD processes: build and test pipelines, manage Docker-based execution environments, and handle multi-stage deployment flows
-
Develop and maintain Docker images for Databricks jobs, ensuring reproducibility and efficiency
-
Translate exploratory code from meteorologists and scientists into well-structured, production-grade pipelines
-
Monitor and improve data pipeline performance, stability, and cost effectiveness
-
Collaborate closely with MLOps and DevOps engineers on shared infrastructure, compute setup, and deployment mechanics
What We Are Looking For
Must-Have Skills
-
5+ years of experience in data engineering or data infrastructure roles
-
Solid hands-on experience with Azure Databricks, Delta Lake, and Python-based data tooling
-
Strong knowledge of Docker, especially in the context of CI/CD and runtime environments
-
Experience with data-focused CI/CD pipelines (e.g., GitHub Actions or similar), including testing, promotion, and reproducibility
-
Familiarity with modern workflow orchestrators (e.g., Prefect, Dagster, Airflow) and DAG-based execution models
-
Solid understanding of staging and production environments, and how to ship safe and testable changes across them
-
Proven ability to diagnose and resolve complex issues in distributed data systems
-
Clear grasp of the full lifecycle of a pipeline: testing, validation, staging, deployment, and monitoring
Bonus Points For
-
Experience working with large-scale tensorial or gridded datasets (e.g., ZARR, GRIB2, NetCDF, or similar)
-
Understanding of geospatial and temporal data patterns, especially in forecasting or climate
-
Prior exposure to ML infrastructure, particularly feature extraction, batch inference, or model-serving pipeline
-
Familiarity with Kubernetes for running data workflows or jobs at scal
-
Comfort managing data cost-performance tradeoffs, e.g., compute provisioning, Spark tuning, or caching strategie
-
Hands-on experience integrating custom Docker containers into orchestration environments (e.g. via Databricks, Kubernetes, or custom schedulers
-
Understanding of how to work alongside product and research teams to turn ad-hoc code into reproducible, maintainable components
How We Work
We value autonomy, ownership, and clear communication. You’ll be expected to drive your area forward with minimal supervision — but not in isolation. You’ll work closely with peers who handle MLOps, DevOps, front-end, and ML modeling. We make decisions together, move quickly, and hold each other accountable.
We expect you to:
-
Ask clear questions and propose structured solutions
-
Move quickly but responsibly: ship MVPs fast, then refine
-
Own your outcomes, not just your code
-
Proactively communicate when something is blocked or unclear
-
Think in systems: design solutions that scale with company growth
What Success Looks like
-
You have a deep understanding of how data flows across our platform — and have made those systems easier to reason about and improve
-
Our pipelines are faster, more reliable, and more cost-efficient because of your work
-
Docker-based environments are well-maintained and reproducible across environments
-
The orchestration layer is robust, transparent, and easier to debug or extend
-
Data scientists and engineers can rely on you for fast, trustworthy implementation and collaboration
What We Offer
-
Strategic ownership and architectural influence, with direct visibility to the CTO and CEO
-
Competitive compensation package plus Equity Stock Option Plan (ESOP)
-
Flexible hybrid remote work model, with offices in Tallinn and Rome
-
Opportunities for career growth and professional development
-
Career growth culture — renewcast.com
How to Apply