QA Engineer, AI Products
Job Description:
- Design and execute test strategies for LLM-powered features, including prompt regression testing, output evaluation, and hallucination detection
- Build and maintain automated evaluation pipelines (eval sets, golden datasets, LLM-as-judge frameworks) to catch quality regressions in non-deterministic outputs
- Perform black-box and exploratory testing of MDCalc's AI features across web and mobile, with particular attention to clinical accuracy, safety, and edge cases
- Define quality metrics for AI outputs (accuracy, faithfulness, relevance, safety, latency, cost) and establish thresholds for release readiness
- Collaborate cross-functionally with engineers, product managers, ML/AI engineers, and clinical reviewers to define what "good" looks like for AI responses
- Investigate and triage AI failure modes, distinguishing model issues, prompt issues, retrieval issues, and integration bugs
- Participate in team discussions, offering feedback on testability, risks, prompt design, and guardrails
- Help develop QA strategies to expand future testing capacity, automation, and evaluation coverage as the AI product surface grows
Requirements:
- 5+ years of experience in software QA, with at least 1 year of hands-on testing of LLM-based or AI/ML-powered features
- Strong understanding of QA principles, test case creation/documentation, and best practices for both deterministic and non-deterministic systems
- Hands-on experience with LLM tooling and concepts: prompt engineering, RAG systems, evaluation frameworks (e.g., Promptfoo, Braintrust, LangSmith, DeepEval, Ragas, OpenAI Evals), and LLM APIs (OpenAI, Anthropic, etc.)
- Experience designing automated qualitative evaluation approaches, including LLM-as-judge, rubric-based scoring, semantic similarity checks, and golden dataset regression testing
- Proficiency with test automation tools, with a focus on Playwright
- Strong SQL skills for data validation, test data creation, and verifying data integrity across systems
- Familiarity with token usage, latency profiling, and cost monitoring as quality signals
- Eagerness to learn quickly and a positive, solutions-oriented attitude
- Clear and concise communicator, able to surface issues, blockers, and risks effectively when communicating ambiguous or probabilistic failures
- Self-motivated, proactive, and able to manage time and priorities independently
Benefits:
- Ability to make a true difference in medicine: MDCalc is the most broadly used medical reference by physicians, used by over 65% of US attending doctors weekly
- Medical, Dental, & Vision Coverage, with option to extend to your dependents
- Company-sponsored short-term insurance
- Fully-paid 8 week parental leave, after 6 months of employment
- Company-sponsored 401k, after 3 months of employment
- Unlimited vacation for salaried roles - we trust you to take the time you need
- Bi-annual company offsites to connect, reflect, and plan together
- Work from home monthly stipend
- A culture of fun and motivated team members who believe in a greater mission here at MDCalc
Apply tot his job Apply To this Job