From 2D Graphs to Sub-Unit pKa: Nine Iterations to Beat MolGpKa on Both Scaffolds and Sulfonamides
A routed-ensemble pKa predictor with functional-group-specific atom-level descriptors — a v1→v9 ablation and competitive analysis.
A routed ensemble for micro-pKa prediction, developed across nine iterations (v1→v9). Each route specializes on a functional-group family using atom-level descriptors, with a router selecting the appropriate model per ionizable centre.
The arc is presented as a systematic ablation: what each iteration added, where it helped, and where it did not. The final model is evaluated against MolGpKa on both general scaffolds and the sulfonamide class that general models tend to miss.
Highlights
- Routed, functional-group-specific atom-level descriptors rather than a single global model.
- Competitive with — and on the reported splits, ahead of — MolGpKa on scaffolds and sulfonamides.
- Full v1→v9 ablation, including the iterations that regressed.
Served in production via the predict_pka tool. Geometry note: RDKit geometry is used for
downstream excited-state work rather than xTB-optimized structures.