I am a Research Engineer and Tech Lead Manager for the protein team at Facebook AI Research in NYC. With the protein team, we are working to make an impact on structural biology using state of the art methods from Artificial Intelligence.
My past research has covered several areas of machine learning / deep learning, including learning representations and generative models for proteins and peptides, unsupervised and semi-supervised learning with either no or very small amounts of labeled data. I worked on Generative Adversarial Networks (GANs), specifically on finding a better distance metric between the data distribution and the generated distribution, which leads to fast and stable training. I worked on multimodal learning: learning representations across different data modalities like images, text, and speech. And I started my research career working on deep learning approaches to acoustic modeling in speech recognition, bringing advances from the deep learning and computer vision communities to speech recognition.
Before facebook, I was at IBM Research AI, in the T.J. Watson Research Center. I graduated from the MS in Data Science at New York University’s Courant Institute of Mathematical Sciences and have a B.Sc./M.Sc. in Engineering Physics from Ghent University.
For an always-up-to-date list, see google scholar.
Sercu, Gehrmann, Strobelt, Das, Padhi, Dos Santos, Wadhawan, Chenthamarakshan. Interactive Visual Exploration of Latent Space (IVELS) for peptide auto-encoder model selection. ICLR workshop: Deep Generative Models for Highly Structured Data, 2019
Youssef Mroueh, Tom Sercu, Anant Raj. Sobolev Descent. AISTATS, 2019 [arXiv]
We study a simplification of GAN training: the problem of transporting particles from a source to a target distribution. Starting from the Sobolev GAN critic, part of the gradient regularized GAN family, we show a strong relation with Optimal Transport (OT). Specifically with the less popular dynamic formulation of OT that finds a path of distributions from source to target minimizing a ``kinetic energy''. We introduce Sobolev descent that constructs similar paths by following gradient flows of a critic function in a kernel space or parametrized by a neural network. In the kernel version, we show convergence to the target distribution in the MMD sense. We show in theory and experiments that regularization has an important role in favoring smooth transitions between distributions, avoiding large gradients from the critic. This analysis in a simplified particle setting provides insight in paths to equilibrium in GANs.
Tom Sercu, Youssef Mroueh. Semi-Supervised Learning with IPM-based GANs: an Empirical Study. NIPS Workshop: Deep Learning: Bridging Theory and Practice, 2017 [arXiv]
We present an empirical investigation of a recent class of Generative Adversarial Networks (GANs) using Integral Probability Metrics (IPM) and their performance for semi-supervised learning. IPM-based GANs like Wasserstein GAN, Fisher GAN and Sobolev GAN have desirable properties in terms of theoretical understanding, training stability, and a meaningful loss. In this work we investigate how the design of the critic (or discriminator) influences the performance in semi-supervised learning. We distill three key take-aways which are important for good SSL performance: (1) the K+1 formulation, (2) avoiding batch normalization in the critic and (3) avoiding gradient penalty constraints on the classification layer.
Tom Sercu, Vaibhava Goel. Dense Prediction on Sequences with Time-Dilated Convolutions for Speech Recognition. NIPS End-to-end Learning for Speech and Audio Processing Workshop, 2016 [arXiv]
In computer vision pixelwise dense prediction is the task of predicting a label for each pixel in the image. Convolutional neural networks achieve good performance on this task, while being computationally efficient. In this paper we carry these ideas over to the problem of assigning a sequence of labels to a set of speech frames, a task commonly known as framewise classification. We show that dense prediction view of framewise classification offers several advantages and insights, including computational efficiency and the ability to apply batch normalization. When doing dense prediction we pay specific attention to strided pooling in time and introduce an asymmetric dilated convolution, called time-dilated convolution, that allows for efficient and elegant implementation of pooling in time. We show results using time-dilated convolutions in a very deep VGG-style CNN with batch normalization on the Hub5 Switchboard-2000 benchmark task. With a big n-gram language model, we achieve 7.7% WER which is the best single model single-pass performance reported so far.
Tom Sercu, Christian Puhrsch, Brian Kingsbury, Yann LeCun. Very deep multilingual convolutional neural networks for LVCSR. Proc ICASSP, 2015 [arXiv]
Convolutional neural networks (CNNs) are a standard component of many current state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) systems. However, CNNs in LVCSR have not kept pace with recent advances in other domains where deeper neural networks provide superior performance. In this paper we propose a number of architectural advances in CNNs for LVCSR. First, we introduce a very deep convolutional network architecture with up to 14 weight layers. There are multiple convolutional layers before each pooling layer, with small 3x3 kernels, inspired by the VGG Imagenet 2014 architecture. Then, we introduce multilingual CNNs with multiple untied layers. Finally, we introduce multi-scale input features aimed at exploiting more context at negligible computational cost. We evaluate the improvements first on a Babel task for low resource speech recognition, obtaining an absolute 5.77% WER improvement over the baseline PLP DNN by training our CNN on the combined data of six different languages. We then evaluate the very deep CNNs on the Hub5'00 benchmark (using the 262 hours of SWB-1 training data) achieving a word error rate of 11.8% after cross-entropy training, a 1.4% WER improvement (10.6% relative) over the best published CNN result so far.
I started training Judo / Brazilian Jiu-Jitsu about a year ago. Different styles of sport martial arts (combat sports) generalize better or worse to real self-defense situations, with an interesting and counter intuitive relation to the constraints on the sport: more constrained grappling sports like BJJ can generalize better. I will call this the Bias-Trainability tradeoff, in analogy to the Bias-Variance tradeoff in supervised machine learning. This tradeoff can be relevant to Reinforcement Learning (RL): athletes are RL agents whose optimization is so good it’s probably not the bottleneck. So we can look at martial arts to learn how changing the rules/environment/reward influences the optimal policy.
Again a post mostly for my own memory: how I set up my website with jekyll, github pages, a custom domain, and automatic publication listing.
I just installed i3 window manager on ubuntu. This post serves as memory of what I did and learned.
Generative Adversarial Networks – hands-on tutorial in pytorch. NYC AI & ML meetup. [Slides]
Github Repository with notebook This talk is a hands-on live coding tutorial. We will implement a Generative Adversarial Network (GAN) to learn to generate small images. We will assume only a superficial familiarity with deep learning and a notion of PyTorch. This tutorial is as self-contained as possible. The goal is that this talk/tutorial can serve as an introduction to PyTorch at the same time as being an introduction to GANs.
Guest Lecture: Deep Learning. Intro to Data Science, Fall 2018 @ CCNY. [Slides]
Guest Lecture: intro to Deep Learning. The class was a 1 hour lecture and 1 hour lab for undergrad students in Computer Science at CCNY (college of CUNY).
- Lecture slides preface, slides, NN figure – Lab materials: https://github.com/grantmlong/itds2018/tree/master/lecture-13
Machine Learning: successes, promises and limits. AI Academy Seminars (Howest and Voka) at Kortrijk, Belgium. [Slides]
Seminar at AI Academy, organized by Howest and Voka, Flanders, Belgium. This seminar presented my view of AI / machine learning / deep learning for a non-technical audience of business leaders in Kortrijk, Belgium. It’s high-level and accessible to a wide audience.
From my time at NYU
While at NYU, I worked as a Teaching Assistant for two courses, making the following lab material: