Access to labeled data remains one of the biggest bottlenecks in deploying machine learning in real-world domains - especially in healthcare, where annotations are costly, slow, and require domain expertise. In this talk, we’ll present how we use a rule-based linguistic approach to generate high-quality supervision signals in the absence of manual labels. We start by capturing domain knowledge through clear, interpretable rules, which lets us generate high-quality pseudo-labels in situations where manual annotations aren’t available. These rule-based labels give us a reliable starting point for training models, and we continue to use the rules throughout the lifecycle: re-training, flagging edge cases, and adapting to new data. Rather than viewing rule-based systems and ML as competing approaches, we design them to work in tandem- where rules provide structure, consistency, and interpretability, and models bring generalization and scalability. This approach helps us improve accuracy over time while staying less dependent on costly human annotation.
Tue, Oct 28th, 15:50 - 16:20 • Room: A4+A5

Nym health, Head of R&D