Machine Learning in Structural Biology

Workshop at the 37th Conference on Neural Information Processing Systems

December 2023


Structural biology, the study of the 3D structure or shape of proteins and other biomolecules, has been transformed by breakthroughs from machine learning algorithms. Machine learning models are now routinely used by experimentalists to predict structures to aid in hypothesis generation and experimental design, accelerate the experimental process of structure determination (e.g. computer vision algorithms for cryo-electron microscopy), and have become a new industry standard for bioengineering new protein therapeutics (e.g. large language models for protein design). Despite all of this progress, there are still many active and open challenges for the field, such as modeling protein dynamics, predicting the structure of other classes of biomolecules such as RNA, learning and generalizing the underlying physics driving protein folding, and relating the structure of isolated proteins to the in vivo and contextual nature of their underlying function. These challenges are diverse and interdisciplinary, motivating new kinds of machine learning methods and requiring the development and maturation of standard benchmarks and datasets.

Machine Learning in Structural Biology (MLSB), seeks to bring together field experts, practitioners, and students from across academia, industry research groups, and pharmaceutical companies to focus on these new challenges and opportunities. This year, MLSB aims to bridge the theoretical and practical by addressing the outstanding computational and experimental problems at the forefront of our field. The intersection of artificial intelligence and structural biology promises to unlock new scientific discoveries and develop powerful design tools.

MLSB will be an in-person workshop on December 15th at NeurIPS.

Please contact the organizers at with any questions.

Stay updated on changes and workshop news by joining our mailing list.

Presenter Information

Congratulations to all accepted presenters! Please find some information on deadlines and expectations leading up to the MLSB Workshop!


We expect all authors to prepare a poster that can be presented as part of our workshop. Posters must be 24W x 36H inches and will be taped to the wall. Poster boards will not be provided at the workshop. Posters should be on lightweight paper, and not laminated.

Additionally, a virtual copy of each poster must be uploaded to the NeurIPS poster upload portal by Thursday, December 14. Posters must be PNG with no more than 5120 width x 2880 height (no more than 10 MB). Thumbnail images should be 320 width x 256 height PNG and no more than 5 MB. Users should log in using the account associated with their CMT email address. If they did not already have a account, then it should have automatically been created and can be accessed by resetting the password.

Paper Camera-Ready

De-anonymized, camera-ready versions of the workshop paper will be due on Microsoft CMT by Monday, Dec 4. Papers must indicate that they are NeurIPS MLSB workshop papers by using the modified NeurIPS style file here. Papers should be compiled with the 'final' argument, e.g. \usepackage[final]{neurips_mlsb_2023}

We plan to make all submitted papers available on the workshop website ( If you would prefer that your work not be shared, please email the organizers by responding to this email as soon as possible. Additionally, please let us know if there is an arXiv/biorXiv link for the paper that should be linked as well.

Travel Award

This year we will try to cover as many workshop registrations as possible for student/academic attendees with oral presentations or posters who need financial assistance. If you would like to be considered, please fill out the following form by Friday, Nov 17th. If you have any questions, please don't hesitate to contact us at

Key Dates

Application for Registration Reimbursement: Friday, November 17th, 2023, at 11:59PM, Anywhere on Earth.

Camera-Ready PDF due on Microsoft CMT: Monday, December 4th, 2023.

Poster due: Thursday, December 14th, 2023.

Invited Speakers

Bridget Carragher

Bridget Carragher

Founding Technical Director of the Chan-Zuckerberg Imaging Institute.

Show/Hide Bio
Kyunghyun Cho

Kyunghyun Cho

Associate Professor at NYU
Senior Director of Frontier Research at Prescient Design.

Show/Hide Bio
Rhiju Das

Rhiju Das

HHMI Investigator, Associate Professor of Biochemistry at Stanford University.

Show/Hide Bio
Polly Fordyce

Polly Fordyce

Associate Professor of Genetics and
Bioengineering at Stanford University.

Show/Hide Bio
Tanja Kortemme

Tanja Kortemme

Professor of Bioengineering at University of California, San Francisco.

Show/Hide Bio

Gevorg Grigoryan

Gevorg Grigoryan

Co-Founder and CTO of Generate Biomedicines.
Associate Professor at Dartmouth College.

Show/Hide Bio

RF Diffusion Team

A diffusion model for protein design.

Show/Hide Bio

Schedule (CST)

08:30 Opening Remarks
08:35 Invited Speaker - Kyunghyun Cho

Health system scale language models for clinical and operational decision making

09:00 Contributed Talk

Validation of de novo designed water-soluble and membrane proteins by in silico folding and melting
Alvaro Martin · Carolin Berner · Sergey Ovchinnikov · Anastassia Vorobieva

09:15 Invited Speaker - Tanja Kortemme

Accurate and tunable de novo protein shapes for new functions

09:40 Break
10:00 Invited Speaker - Bridget Carragher

A CryoET Data Portal to Foster a Collaboration between the Machine Learning and CryoET Communities

10:25 Contributed Talk

AlphaFold Meets Flow Matching for Generating Protein Ensembles
Bowen Jing · Bonnie Berger · Tommi Jaakkola

10:40 Contributed Talk

DSMBind: an unsupervised generative modeling framework for binding energy prediction
Wengong Jin · Caroline Uhler · Nir Hacohen

10:55 Invited Speaker - Polly Fordyce

Leveraging microfluidics for high-throughput and quantitative biochemistry and biophysics

11:20 Poster Session/Lunch
12:40 Invited Speaker - Gevorg Grigoryan

Illuminating protein space with a programmable generative model

01:05 Contributed Talk

Protein generation with evolutionary diffusion: sequence is all you need
Sarah Alamdari · Nitya Thakkar · Rianne van den Berg · Alex Lu · Nicolo Fusi · Ava P Amini · Kevin Yang

01:20 Invited Speaker - Jason Yim / Brian Trippe

De novo design of protein structure and function with RFdiffusion

01:45 Break
02:00 Contributed Talk

DiffDock-Pocket: Diffusion for Pocket-Level Docking with Sidechain Flexibility
Michael Plainer · Marcella Toth · Simon Dobers · Hannes Stärk · Gabriele Corso · Céline Marquet · Regina Barzilay

02:15 Contributed Talk

PoseCheck: Generative Models for 3D Structure-based Drug Design Produce Unrealistic Poses
Charles Harris · Kieran Didi · Arian R. Jamasb · Chaitanya Joshi · Simon V Mathis · Pietro Lió · Tom Blundell

02:30 Invited Speaker - Rhiju Das

World-wide competitions and the RNA folding problem

02:55 Break
03:00 Panel Discussion
03:45 Poster Session / Happy Hour
05:00 Closing Remarks

Accepted Papers

  • ESMFold Hallucinates Native-Like Protein Sequences

    Jeliazko Jeliazkov, Diego del Alamo, Joel Karpiak


  • Conditioned Protein Structure Prediction

    Tengyu Xie, Zilin Song, Jing Huang


  • Stable Online and Offline Reinforcement Learning for Antibody CDRH3 Design

    Yannick Vogt, Mehdi Naouar, Maria Kalweit, Christoph Cornelius Miething, Justus Duyster, Roland Mertelsmann, Gabriel Kalweit, Joschka Boedecker


  • Guiding diffusion models for antibody sequence and structure co-design with developability properties

    Amelia Villegas-Morcillo, Jana M Weber, Marcel JT Reinders


  • AlphaFold Distillation for Protein Design

    Igor Melnyk, Aurelie Lozano, Payel Das, Vijil Chenthamarakshan


  • Binding Oracle: Fine-Tuning From Stability to Binding Free Energy

    Chengyue Gong, Adam Klivans, Jordan Wells, James Loy, Qiang Liu, Alex Dimakis, Daniel J Diaz


  • Scalable Multimer Structure Prediction using Diffusion Models

    Peter Pao-Huang, Bowen Jing, Bonnie Berger


  • Implicit Geometry and Interaction Embeddings Improve Few-Shot Molecular Property Prediction

    Chris Fifty, Joseph M Paggi, Ehsan Amid, Jure Leskovec, Ron Dror


  • Molecular Diffusion Models with Virtual Receptors

    Matan Halfon, Eyal Rozenberg, Ehud Rivlin, Daniel Freedman


  • CESPED: a new benchmark for supervised particle pose estimation in Cryo-EM.

    Ruben Sanchez Garcia, Michael Saur, Javier Vargas, Carl Poelking, Charlotte Deane


  • Learning Scalar Fields for Molecular Docking with Fast Fourier Transforms

    Bowen Jing, Tommi Jaakkola, Bonnie Berger


  • VN-EGNN: Equivariant Graph Neural Networks with Virtual Nodes Enhance Protein Binding Site Identification

    Florian Sestak, Lisa Schneckenreiter, Sepp Hochreiter, Andreas Mayr, Guenter Klambauer


  • Enhancing Ligand Pose Sampling for Machine Learning–Based Docking

    Patricia Suriana, Ron Dror


  • Improved encoding of ensembles in PDBx/mmCIF

    Stephanie A Wankowicz, James Fraser


  • AlphaFold Meets Flow Matching for Generating Protein Ensembles

    Bowen Jing, Bonnie Berger, Tommi Jaakkola


  • The Discovery of Binding Modes Requires Rethinking Docking Generalization

    Gabriele Corso, Arthur Deng, Nicholas Polizzi, Regina Barzilay, Tommi Jaakkola


  • Conformational sampling and interpolation using language-based protein folding neural networks

    Diego del Alamo, Jeliazko Jeliazkov, Daphne Truan, Joel Karpiak


  • FLIGHTED: Inferring Fitness Landscapes from Noisy High-Throughput Experimental Data

    Vikram Sundar, Boqiang Tu, Lindsey Guan, Kevin Esvelt


  • Contrasting Sequence with Structure: Pre-training Graph Representations with PLMs

    Louis Robinson, Timothy Atkinson, Liviu Copoiu, Patrick Bordes, Thomas Pierrot, Thomas D Barrett


  • Target-Aware Variational Auto-Encoders for Ligand Generation with Multi-Modal Protein Modeling

    Khang Nhat Ngo, Truong Son Hy


  • SE(3) denoising score matching for unsupervised binding energy prediction and nanobody design

    Wengong Jin, Xun Chen, Amrita Vetticaden, Siranush Sarzikova, Raktima Raychowdhury, Caroline Uhler, Nir Hacohen†


  • Fast non-autoregressive inverse folding with discrete diffusion

    John J Yang, Jason Yim, Regina Barzilay, Tommi Jaakkola


  • TopoDiff: Improving Protein Backbone Generation with Topology-aware Latent Encoding

    Yuyang Zhang, Zihui Ma, Haipeng Gong


  • Harmonic Prior Self-conditioned Flow Matching for Multi-Ligand Docking and Binding Site Design

    Hannes Stärk, Bowen Jing, Regina Barzilay, Tommi Jaakkola


  • CrysFormer: Protein Crystallography Prediction via 3d Patterson Maps and Partial Structure Attention

    Chen Dun, Tom Pan, Shikai Jin, Ria Stevens, Mitchell D Miller, George Phillips, Anastasios Kyrillidis


  • PoseCheck: Generative Models for 3D Structure-based Drug Design Produce Unrealistic Poses

    Charles Harris, Kieran Didi, Arian R Jamasb, Chaitanya K Joshi, Simon V Mathis, Pietro Lió, Tom L Blundell


  • Sampling Protein Language Models for Functional Protein Design

    Jeremie Theddy Darmawan, Yarin Gal, Pascal Notin


  • A framework for conditional diffusion modelling with applications in protein design

    Kieran Didi, Francisco Vargas, Simon V Mathis, Vincent Dutordoir, Emile Mathieu, Urszula Julia Komorowska, Pietro Lió


  • DiffRNAFold: Generating RNA Tertiary Structures with Latent Space Diffusion

    Mihir N Bafna, Vikranth Keerthipati, Subhash Kanaparthi, Ruochi Zhang


  • Pair-EGRET: Enhancing the prediction of protein-protein interaction sites through graph attention networks and protein language models

    Ramisa Alam, Sazan Mahbub, Shamsuzzoha Bayzid


  • FlexiDock: Compositional diffusion models for flexible molecular docking

    Zichen Wang, Balasubramaniam Srinivasan, Zhengyuan Shen, George Karypis, Huzefa Rangwala


  • In vitro validated antibody design against multiple therapeutic antigens using generative inverse folding

    Amir Shanehsazzadeh, Julian Alverio, George Kasun, Simon Levine-Gottreich, Jibran A Khan, Chelsea Chung, Nicolas Diaz, Breanna K Luton, Ysis Tarter, Cailen McCloskey, Katherine B Bateman, Hayley Carter, Dalton Chapman, Rebecca Consbruck, Alec Jaeger, Christa Kohnert, Gaelin Kopec-Belliveau, John M Sutton, Zheyuan Guo, Gustavo Canales, Kai Ejan, Emily Marsh, Alyssa Ruelos, Rylee Ripley, Brooke Stoddard, Rodante Caguiat, Kyra Price, Matthew Saunders, Jared Sharp, Douglas Ganini da Silva, Audrey Feltner, Jake Ripley, Megan E Bryant, Danni Castillo, Joshua Meier, Christian M Stegmann, Katherine Moran, Christine Lemke, Shaheed Abdulhaqq, Lillian R Klug, Sharrol Bachas


  • Evaluating Zero-Shot Scoring for In Vitro Antibody Binding Prediction with Experimental Validation

    Divya V Nori, Simon V Mathis, Amir P Shanehsazzadeh


  • PDB-Struct: A Comprehensive Benchmark for Structure-based Protein Design

    Chuanrui Wang, Bozitao Zhong, Zuobai Zhang, Narendra Chaudhary, Sanchit Misra, Jian Tang


  • Optimizing protein language models with Sentence Transformers

    Istvan Redl, Fabio Airoldi, Sandro Bottaro, Albert Chung, Oliver Dutton, Carlo Fisicaro, Patrik Foerch, Louie Henderson, Falk Hoffmann, Michele Invernizzi, Benjamin M J Owens, Stefano Ruschetta, Kamil Tamiola


  • DiffDock-Pocket: Diffusion for Pocket-Level Docking with Sidechain Flexibility

    Michael Plainer, Marcella Toth, Simon Dobers, Hannes Stärk, Gabriele Corso, Céline Marquet, Regina Barzilay


  • Transition Path Sampling with Boltzmann Generator-based MCMC Moves

    Michael Plainer, Hannes Stärk, Charlotte Bunne, Stephan Günnemann


  • Pre-training Sequence, Structure, and Surface Features for Comprehensive Protein Representation Learning

    Youhan Lee, Hasun Yu, Jaemyung Lee, Jaehoon Kim


  • Inpainting Protein Sequence and Structure with ProtFill

    Elizaveta Kozlova, Arthur Valentin, Daniel Nakhaee-Zadeh Gutierrez


  • Investigating Protein-DNA Binding Energetic of Mismatched DNA

    Ruben Solozabal Ochoa de Retana, Tamir Avioz, Yunxiang LI, Le Song, Martin Takac, Ariel Afek

  • AntiFold: Improved antibody structure design using inverse folding

    Alissa M Hummer, Magnus H Høie, Tobias Olsen, Morten Nielsen, Charlotte Deane


  • Improved B-cell epitope prediction using AlphaFold2 modeling and inverse folding latent representations

    Paolo Marcatili

  • Combining Structure and Sequence for Superior Fitness Prediction

    Steffanie Paul, Aaron Kollasch, Pascal Notin, Debora Marks


  • Epitope-specific antibody design using diffusion models on the latent space of ESM embeddings

    Tomer Cohen, Dina Schneidman-Duhovny


  • Protein language models learn evolutionary statistics of interacting sequence motifs

    Zhidian Zhang, Hannah K Wayment-Steele, Garyk Brixi, Matteo Dal Peraro, Dorothee Kern, Sergey Ovchinnikov

  • Using artificial sequence coevolution to predict disulfide-rich peptide structures with experimental connectivity in AlphaFold

    Gabriella Gerlach, John M Nicoludis


  • Preferential Bayesian Optimisation for Protein Design with Ranking-Based Fitness Predictors

    Alex Hawkins-Hooker, Paul Duckworth, Oliver Bent


  • FAFormer: Frame Averaging Transformer for Predicting Nucleic Acid-Protein Interactions

    Tinglin Huang, Zhenqiao Song, Rex Ying, Wengong Jin


  • LightMHC: A Light Model for pMHC Structure Prediction with Graph Neural Networks

    Antoine P Delaunay, Yunguan Fu, Nikolai Gorbushin, Robert McHardy, Bachir Djermani, Liviu Copoiu, Michael Rooney, Maren Lang, Andrey Tovchigrechko, Ugur Sahin, Karim Beguir, Nicolas Lopez Carranza


  • FrameDiPT: SE(3) Diffusion Model for Protein Structure Inpainting

    Cheng Zhang, Adam Leach, Thomas Makkink, Miguel Arbesu, Ibtissem Kadri, Daniel Luo, Liron Mizrahi, Sabrine Krichen, Maren Lang, Andrey Tovchigrechko, Nicolas Lopez Carranza, Ugur Sahin, Karim Beguir, Michael Rooney, Yunguan Fu


  • An Active Learning Framework for ML-Assisted Labeling of Cryo-EM Micrographs

    Robert Kiewisz, Tristan Bepler


  • Validation of de novo designed water-soluble and membrane proteins by in silico folding and melting.

    Alvaro Martin, Carolin Berner, Sergey Ovchinnikov, Anastassia Vorobieva


  • Structure, Surface and Interface Informed Protein Language Model

    Ioan Ieremie, Mahesan Niranjan, Rob M Ewing


  • De Novo Short Linear Motif (SLiM) Discovery With AlphaFold-Multimer

    Theo Sternlieb, Abhishaike Mahajan, Davian Ho, Jeffrey Chan


  • AF2BIND: Predicting ligand-binding sites using the pair representation of AlphaFold2

    Artem Gazizov, Anna Lian, Casper Goverde, Sergey Ovchinnikov, Nicholas Polizzi


  • Protein generation with evolutionary diffusion: sequence is all you need

    Sarah Alamdari, Nitya Thakkar, Rianne van den Berg, Alex Lu, Nicolo Fusi, Ava P Amini, Kevin Yang


  • LatentDock: Protein-Protein Docking with Latent Diffusion

    Matt McPartlon, Céline Marquet, Tomas Geffner, Danny Kovtum, Alexander Goncearenco, Zachary Carpenter, Luca Naef, Michael Bronstein, Jinbo Xu


  • HiFi-NN annotates the microbial dark matter with Enzyme Commission numbers

    Gavin Ayres, Geraldene Munsamy, Michael Heinzinger, Noelia Ferruz, Kevin Yang, Philipp Lorenz


  • Towards Joint Sequence-Structure Generation of Nucleic Acid and Protein Complexes with SE(3)-Discrete Diffusion

    Alex Morehead, Jeffrey A Ruffolo, Aadyot Bhatnagar, Ali Madani


  • SO(3)-Equivariant Representation Learning in 2D Images

    Darnell Granberry, Alireza Nasiri, Jiayi Shou, Alex J Noble, Tristan Bepler


  • HelixDiff: Conditional Full-atom Design of Peptides With Diffusion Models

    Xuezhi Xie, Pedro A Valiente, Jisun Kim, Philip Kim


  • DiffMaSIF: Surface-based Protein-Protein Docking with Diffusion Models

    Freyr Sverrisson, Mehmet Akdel, Dylan Abramson, Jean Feydy, Alexander Goncearenco, Yusuf Adeshina, Daniel Kovtun, Céline Marquet, Xuejin Zhang, David Baugher, Zachary Wayne Carpenter, Luca Naef, Michael Bronstein, Bruno Correia


  • FLAb: Benchmarking deep learning methods for antibody fitness prediction

    Michael F Chungyoun, Jeffrey A Ruffolo, Jeffrey Gray


  • Parameter-Efficient Fine-Tuning of Protein Language Models Improves Prediction of Protein-Protein Interactions

    Samuel Sledzieski, Meghana Kshirsagar, Bonnie Berger, Rahul Dodhia, Juan M Lavista Ferres


  • TriFold: A New Architecture for Predicting Protein Sequences from Structural Data

    Harish K Srinivasan, Jian Zhou


  • End-to-End Sidechain Modeling in AlphaFold2: Attention May or May Not Be All That You Need

    Jonathan King, David Koes


  • Coarse-graining via reparametrization avoids force-matching and back-mapping

    Nima Dehmamy, Csaba Both, Subhro Das, Tommi Jaakkola


  • SE3Lig: SE(3)-equivariant CNNs for the reconstruction of cofactors and ligands in protein structures

    Siddharth Bhadra-Lobo, Anushriya Subedy, Sagar Khare, Guillaume Lamoureux


  • Cramming Protein Language Model Training in 24 GPU Hours

    Nathan C Frey, Taylor Joren, Aya A Ismail, Allen Goodman, Stephen Ra, Richard Bonneau, Kyunghyun Cho, Vladimir Gligorijevic


  • Preparation Of Labeled Cryo-ET Datasets For Training And Evaluation Of Machine Learning Models

    Aygul Ishemgulova, Alex Noble, Tristan Bepler, Alex De Marco


  • EMPOT: partial alignment of density maps and rigid body fitting using unbalanced Gromov-Wasserstein divergence

    Aryan Tajmir Riahi, Chenwei Zhang, James Chen, Anne Condon, Khanh Dao Duc


  • Fast protein backbone generation with SE(3) flow matching

    Jason Yim, Andrew Campbell, Andrew Foong, Sarah Lewis, Victor Satorras, Michael Gastegger, Bas Veeling, Jose Jimenez-Luna, Regina Barzilay, Tommi Jaakkola, Frank Noe


  • Frame2seq: structure-conditioned masked language modeling for protein sequence design

    Deniz Akpinaroglu, Kosuke Seki, Eleanor Zhu, Tanja Kortemme


  • Structure-Conditioned Generative Models for De Novo Ligand Generation: A Pharmacophore Assessment

    Shannon Smith, Leo Gendelev, Kangway V Chuang, Seth Harris


  • Jointly Embedding Protein Structures and Sequences through Residue Level Alignment

    Foster Birnbaum, Saachi Jain, Aleksander Madry, Amy Keating


  • Evaluating Representation Learning on the Protein Structure Universe

    Arian R Jamasb, Alex Morehead, Chaitanya K Joshi, Zuobai Zhang, Kieran Didi, Simon V Mathis, Charles Harris, Jian Tang, Mila), Jianlin Cheng, Pietro Lió, Tom Blundell


  • Enhancing Antibody Language Models with Structural Information

    Justin Barton, Jacob Galson, Jinwoo Leem


  • Amortized Pose Estimation for X-Ray Single Particle Imaging

    Jay Shenoy, Axel Levy, Frederic Poitevin, Gordon Wetzstein


  • Rethinking Performance Measures of RNA Secondary Structure Problems

    Frederic Runge, Jörg KH Franke, Daniel Fertmann, Frank Hutter


  • Structure-based and leakage-free data splits for rigorous protein function evaluation

    Charlotte Rochereau, Mohammed AlQuraishi, Arthur Valentin, Gergo Nikolenyi

  • Uncovering sequence diversity from a known protein structure

    Luca alessandro silva, Barthélémy Meynard, Carlo Lucibello, Christoph Feinauer


  • Exploiting language models for protein discovery with latent walk-jump sampling

    Sai Pooja Mahajan, Nathan Frey, Daniel Berenberg, Joseph Kleinhenz, Richard Bonneau, Vladimir Gligorijevic, Andrew Watkins, Saeed Saremi



Photo of Gabriele Corso

Gabriele Corso

Photo of Gina El Nesr

Gina El Nesr
Stanford University

Photo of Sergey Ovchinnikov

Sergey Ovchinnikov
Harvard University

Photo of Roshan Rao

Roshan Rao
Meta AI

Photo of Hannah Wayment-Steele

Hannah Wayment-Steele
Brandeis University

Photo of Ellen Zhong

Ellen Zhong
Princeton University