Back to all jobs

[Remote] Evaluation Engineer

Work from home Full-time role Hiring

Note: The job is a remote job and is open to candidates in USA. Elicit is an AI research platform that uses language models to help researchers make better decisions. The Evaluation Engineer will own the technical foundation of auto-evaluation systems, ensuring they are fast, reliable, and user-friendly while focusing on decision-making in pharma.

Responsibilities

  • You'll build a comprehensive system that runs fast, is easy to use, and supports quickly building new evals:
  • You’ll build a lightning-fast basic evals infrastructure that schedules tasks to introduce practically no latency; and then you’ll figure out clever ways to solve the fundamental sources of latency (building a version of Elicit, running it on a query, and evaluating it using LMs)
  • ML engineers need evals to kick off automatically on relevant commits, with results they can see at a glance and drill into
  • Product managers need dashboards showing performance over time and what's going wrong in production
  • Your code must be well-architected so other team members and ML engineers can understand and build on it
  • We need to evaluate how well Elicit actually helps with decision-making in pharma, not just measure what's easy to measure
  • This requires encoding real knowledge about how pharma customers make decisions (for example, choosing appropriate gold standards)
  • You'll provide appropriate statistical tests and confidence intervals so we can trust our results
  • In a typical month, expect to spend:
  • 60% working on the core eval platform
  • 15% working closely with the evals team to build and improve specific evals (e.g., an eval of our paper search within our systematic review flow)
  • 10% mentoring our evals engineering intern
  • The rest on learning how people interact with the eval system so you can make it work better for them, and understanding what our users want from Elicit so evals measure what matters

Skills

  • At least 3 years of experience as a professional software engineer, with demonstrated experience building complex backend systems (e.g., backend for a complex website, data pipelines, etc.)
  • Aptitude and interest in evaluating how Elicit helps with pharma decision-making. There's no particular experience you must have, but we'll evaluate your aptitude
  • Knowledge of statistics (for e.g. calculating power and credence intervals for evals)
  • Experience with advanced Python (asyncio/trio and parallel processing strategies)
  • Front-end experience and strong UX sensibility (you'll be building dashboards). TypeScript experience is a plus
  • Experience building developer tools (ML engineers are one of your most important clients)
  • Previous experience as a data engineer or working on AI infrastructure
  • Knowledge of pharma/biomed
  • Experience evaluating ML systems
  • Experience building language-model-based systems (helps with understanding Elicit and how to evaluate it)

Benefits

  • Flexible work environment: work from our office in Oakland or remotely with time zone overlap (between GMT and GMT-8), as long as you can travel for in-person retreats and coworking events
  • Fully covered health, dental, vision, and life insurance for you, generous coverage for the rest of your family
  • Flexible vacation policy, with a minimum recommendation of 20 days/year + company holidays
  • 401K with a 6% employer match
  • A new Mac + $1,000 budget to set up your workstation or home office in your first year, then $500 every year thereafter
  • $1,000 quarterly AI Experimentation & Learning budget, so you can freely experiment with new AI tools, take courses, purchase educational resources, or attend AI-focused conferences and events
  • A team administrative assistant who can help you with personal and work tasks

Company Overview

  • Elicit uses language models to help users automate research workflows. It was founded in 2023, and is headquartered in Oakland, California, USA, with a workforce of 11-50 employees. Its website is https://elicit.com.
  • Apply To This Job

    More remote roles to explore

    [Remote] Director of Finance

    Work from home Full-time role

    [Remote] Principal Product Manager, AI Product

    Work from home Full-time role

    [Remote] Real Estate Partner / Group (Transactions, Finance, Development & PE) for Fast-Growing AmLaw Firm

    Work from home Full-time role

    [Remote] Senior Principal IS Program Manager - Remote

    Work from home Full-time role

    [Remote] FP&A Product Lead, Financial Manager

    Work from home Full-time role

    [Remote] Research Analyst Education Remote

    Work from home Full-time role

    [Remote] Financial Analysis Advisor - Express Scripts - Remote

    Work from home Full-time role

    [Remote] Senior Manager - Digital UX Design

    Work from home Full-time role

    [Remote] Business Development - Property Inspection Services

    Work from home Full-time role

    [Remote] Software Engineer, Java (Temp)

    Work from home Full-time role

    Experienced Full Stack Data Entry and Customer Support Specialist – Home Advisor Role at arenaflex

    Work from home Full-time role

    Experienced Remote Data Entry Specialist – Virtual Operations Support at arenaflex

    Work from home Full-time role

    School SLP Remote | New Mexico

    Work from home Full-time role

    Experienced Online Chat Support Specialist – Part-Time Opportunity at arenaflex

    Work from home Full-time role

    Remote Patient Identity Data Specialist – Master Patient Index (MPI) Reconciliation, Data Quality Assurance & Healthcare Record Management (Remote – Flexible Hours)

    Work from home Full-time role

    Experienced Data Entry Specialist – Remote Opportunity with arenaflex

    Work from home Full-time role

    Associate Attorney - Toxic Tort

    Work from home Full-time role

    Senior Mortgage Operations & Credit Manager

    Work from home Full-time role

    Senior Shopify Developer (Remote + Flexible)

    Work from home Full-time role

    Travel counselor

    Work from home Full-time role