
ProtoBind-Diff is a diffusion model trained jointly on protein sequences and small molecules. Where structure-based methods learn the geometry of a binding pocket, ProtoBind-Diff learns the statistical relationship between protein sequence and the chemical matter known to bind it, across the union of publicly available pharmacological data and our internal corpus.
At inference, the model conditions on a target sequence and samples novel molecules from the learned distribution. The output is a set of chemically valid candidates that respect the binding preferences implied by the target's sequence — including for targets with no structural information at all.
[figure: ProtoBind-Diff architecture or representative outputs]
The model is open source. The full description, training procedure, and benchmarks are in the preprint.
The theory behind our platform points to drug targets that most generative chemistry models cannot address. Many are intracellular. Many lack high-quality structures. Many are involved in dynamic processes where a static pocket is the wrong abstraction.
ProtoBind-Diff was built for these. Three concrete consequences:
The current preprint reports in-silico benchmarks against structure-based baselines and ablations across target classes. ProtoBind-Diff produces chemically valid, diverse candidates and recovers known binders for held-out targets at rates comparable to or exceeding structure-based methods that have access to experimental crystal structures.
A wet-lab validation campaign is complete. Results will be released alongside the updated preprint and journal submission.
Preprint: bioRxiv 2025.06.16
Code: GitHub — open source
Collaboration: We work with pharma partners on targeted generation campaigns.