Lead MLOPs Engineer

About RENEWCAST

Founded in 2020, RENEWCAST builds precision wind & solar forecasting software using cutting-edge ML to deliver accurate, scalable, and cost-efficient predictions. After recent investment rounds, we're expanding our senior technical team to establish the discipline heads who will anchor our growth.
At Renewcast, we value clarity, alignment, and smooth collaboration across technical and business teams — everyone understands not just what we build, but why. We work in a single shared planning cycle and priorities, trade-offs, and progress are transparent. Seniors are hands-on contributors today while setting the standards that junior colleagues grow into tomorrow.

Your Role

Own the ML operations lifecycle: deployment pipelines, fast/efficient batch inference, monitoring/alerting, drift detection, retraining orchestration, and safe rollbacks. Collaborate tightly with Data Science (for packaging/validation) and DevOps (for infra/scale). Mentor and manage a small team focused on production best practices.

What You Will Do

  • Deploy & scale models: deliver fast and efficient batch inference across multiple environments according to SLAs/costs needs (Databricks, Kubernetes, cloud native Batch services).

  • Make it observable: monitor p50/p95 latency/throughput, reliability, and drift; help build alerts/dashboards; balance performance with cost.

  • Make it reliable: containerise correctly; rollbacks, retries, isolation to prevent cascading failures; safe backfills.

  • Monitor drift and orchestrate retraining: implement drift detection (Evidently/Alibi/Great Expectations) and manage retraining triggers from drift, new data, business rules, and schedules with automated promotion gates.

  • Be the bridge: ensure models are production-ready (packaging/validation with DS) and scalable/reliable (infra/ops with DevOps).

  • Mentor & manage: coach and grow a small team on production best practices, observability, reliability.


What We Are Looking For — Must-Have
  • 7–10+ years in ML Eng/MLOps with end-to-end lifecycle ownership (registry → deploy → serve → monitor → retrain).

  • Strong batch inference experience.

  • MLflow (preferred) for versioning, promotion, and rollback.

  • Docker + CI/CD for reliable, reproducible deployments.

  • Practical drift monitoring (Evidently, Alibi, Great Expectations).

  • Proven balance of performance vs cost in inference systems.

  • Strong collaboration with Data Science and DevOps.

Bonus Points
  • Online/multi-model serving (KServe, Ray Serve, Triton).

  • Ray Data or other Python-native batch frameworks.

  • Cost attribution/ reporting (Databricks usage tables + Azure billing, monitoring tools integration).

  • Streaming familiarity for future use cases.

  • Familiarity with modern product oriented delivery methods (e.g., Shape Up).

How We Work

We keep priorities aligned across disciplines and work in clear, time-bounded cycles so teams can deliver with focus. No heroics—just transparent goals, crisp handoffs, and shared accountability.

What Success Looks Like
  • Fast, cost-aware batch inference with robust monitoring/alerting.

  • Reliable deployments with safe rollbacks, retries, and no cascading failures.

  • Clear drift signals wired to retraining/promotion gates.

  • Juniors learn quickly, becoming autonomous and contributing confidently.
     
What We Offer
  • Strategic ownership with direct visibility to leadership.

  • Competitive compensation + ESOP.

  • Flexible hybrid work; offices in Tallinn and Rome.

  • Growth into discipline leads to MLOps.

How to Apply

Send your resume directly to recruiting@renewcast.com

or Click below to apply.


 
Apply Now!