I am a co-founder and the VP of Engineering at EvolutionaryScale, a research company at the intersection of artificial intelligence and biology. Our mission is to develop artificial intelligence (AI) to understand biology for the benefit of human health and society, through open, safe, and responsible research, and in partnership with the scientific community. We develop biological AI models at the frontier of scale and capabilities to enable solving the hardest problems in the life sciences.
Before EvolutionaryScale, I was co-leading the protein team at Meta AI (FAIR) in NYC, where we developed the first transformer protein language model ESM-1, followed by ESM-1b, ESM2, ESMFold and the ESM Atlas resource. Before Meta, I was at IBM Research in the TJ Watson Research Center. I graduated from the MS in Data Science at New York University and hold a B.Sc./M.Sc. in Engineering Physics from Ghent University.
My past work has covered several areas of deep learning research, including learning representations and generative models for proteins and peptides, unsupervised learning, and semi-supervised learning with small amounts of labeled data. I worked on Generative Adversarial Networks (GANs) and VAEs, precursors to the current “generative AI” methods. I worked on multimodal learning: learning models across different data modalities like images, text, and speech. And I started my research career working on deep learning approaches to acoustic modeling in speech recognition, bringing advances from the deep learning and computer vision communities to speech recognition.
Publications
Please see my google scholar profile since this list inevitably goes stale.
This is a selection of my most relevant and recent academic papers. I provide a full list here.
-
Hayes, Rao, Akin, Sofroniew, Oktay, Lin, Verkuil, Tran, Deaton, Wiggert, others, Sercu, Candido, Rives. Simulating 500 million years of evolution with a language model. bioRxiv, 2024
-
Verkuil, Kabeli, Du, Wicky, Milles, Dauparas, Baker, Ovchinnikov, Sercu, Rives. Language models generalize beyond natural proteins. bioRxiv, 2022
-
Lin, Akin, Rao, Hie, Zhu, Lu, Smetanin, Verkuil, Kabeli, Shmueli, others. Evolutionary-scale prediction of atomic-level protein structure with a language model. Science, 2023
-
Rives, Meier, Sercu, Goyal, Lin, Liu, Guo, Ott, Zitnick, Ma, others. Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences. Proceedings of the National Academy of Sciences, 2021
-
Payel Das, Tom Sercu, Kahini Wadhawan, Inkit Padhi, Sebastian Gehrmann, Flaviu Cipcigan, Vijil Chenthamarakshan, Hendrik Strobelt, Cicero dos Santos, Pin-Yu Chen, Yi Yan Yang, Jeremy Tan, James Hedrick, Jason Crain, Aleksandra Mojsilovic. Accelerated antimicrobial discovery via deep generative models and molecular dynamics simulations. Nature Biomedical Engineering, 2021 [arXiv]
De novo therapeutic design is challenged by a vast chemical repertoire and multiple constraints, e.g., high broad-spectrum potency and low toxicity. We propose CLaSS (Controlled Latent attribute Space Sampling) - an efficient computational method for attribute-controlled generation of molecules, which leverages guidance from classifiers trained on an informative latent space of molecules modeled using a deep generative autoencoder. We screen the generated molecules for additional key attributes by using deep learning classifiers in conjunction with novel features derived from atomistic simulations. The proposed approach is demonstrated for designing non-toxic antimicrobial peptides (AMPs) with strong broad-spectrum potency, which are emerging drug candidates for tackling antibiotic resistance. Synthesis and testing of only twenty designed sequences identified two novel and minimalist AMPs with high potency against diverse Gram-positive and Gram-negative pathogens, including one multidrug-resistant and one antibiotic-resistant K. pneumoniae, via membrane pore formation. Both antimicrobials exhibit low in vitro and in vivo toxicity and mitigate the onset of drug resistance. The proposed approach thus presents a viable path for faster and efficient discovery of potent and selective broad-spectrum antimicrobials.
-
Tom Sercu, Christian Puhrsch, Brian Kingsbury, Yann LeCun. Very deep multilingual convolutional neural networks for LVCSR. Proc ICASSP, 2015 [arXiv]
Convolutional neural networks (CNNs) are a standard component of many current state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) systems. However, CNNs in LVCSR have not kept pace with recent advances in other domains where deeper neural networks provide superior performance. In this paper we propose a number of architectural advances in CNNs for LVCSR. First, we introduce a very deep convolutional network architecture with up to 14 weight layers. There are multiple convolutional layers before each pooling layer, with small 3x3 kernels, inspired by the VGG Imagenet 2014 architecture. Then, we introduce multilingual CNNs with multiple untied layers. Finally, we introduce multi-scale input features aimed at exploiting more context at negligible computational cost. We evaluate the improvements first on a Babel task for low resource speech recognition, obtaining an absolute 5.77% WER improvement over the baseline PLP DNN by training our CNN on the combined data of six different languages. We then evaluate the very deep CNNs on the Hub5'00 benchmark (using the 262 hours of SWB-1 training data) achieving a word error rate of 11.8% after cross-entropy training, a 1.4% WER improvement (10.6% relative) over the best published CNN result so far.