I am a Research Engineer at Facebook AI Research in NYC. Before that I was at IBM Research AI, working in the T.J. Watson Research Center. I graduated from the MS in Data Science at New York University’s Courant Institute of Mathematical Sciences in May 2015, and obtained a B.Sc. (2011) and M.Sc. (2013) in Engineering Physics from Ghent University.
My research interests include unsupervised and semi-supervised learning with either no or very small amounts of labeled data, multimodal learning (i.e. learning representations across different data modalities like images, text, and speech), and learning generative models of structured data. I also worked on deep learning approaches to acoustic modeling in speech recognition, bringing advances from the deep learning and computer vision communities to speech recognition. Most recently I worked on Generative Adversarial Networks (GANs), specifically on finding a better distance metric between the data distribution and the generated distribution, which leads to fast and stable training.
For an always-up-to-date list, see google scholar.
Tom Sercu, Youssef Mroueh. Semi-Supervised Learning with IPM-based GANs: an Empirical Study. NIPS Workshop: Deep Learning: Bridging Theory and Practice, 2017 [arXiv]
We present an empirical investigation of a recent class of Generative Adversarial Networks (GANs) using Integral Probability Metrics (IPM) and their performance for semi-supervised learning. IPM-based GANs like Wasserstein GAN, Fisher GAN and Sobolev GAN have desirable properties in terms of theoretical understanding, training stability, and a meaningful loss. In this work we investigate how the design of the critic (or discriminator) influences the performance in semi-supervised learning. We distill three key take-aways which are important for good SSL performance: (1) the K+1 formulation, (2) avoiding batch normalization in the critic and (3) avoiding gradient penalty constraints on the classification layer.
Youssef Mroueh, Tom Sercu. Fisher GAN. NIPS, 2017 [arXiv]
Generative Adversarial Networks (GANs) are powerful models for learning complex distributions. Stable training of GANs has been addressed in many recent works which explore different metrics between distributions. In this paper we introduce Fisher GAN which fits within the Integral Probability Metrics (IPM) framework for training GANs. Fisher GAN defines a critic with a data dependent constraint on its second order moments. We show in this paper that Fisher GAN allows for stable and time efficient training that does not compromise the capacity of the critic, and does not need data independent constraints such as weight clipping. We analyze our Fisher IPM theoretically and provide an algorithm based on Augmented Lagrangian for Fisher GAN. We validate our claims on both image sample generation and semi-supervised classification using Fisher GAN.
Tom Sercu, Vaibhava Goel. Dense Prediction on Sequences with Time-Dilated Convolutions for Speech Recognition. NIPS End-to-end Learning for Speech and Audio Processing Workshop, 2016 [arXiv]
In computer vision pixelwise dense prediction is the task of predicting a label for each pixel in the image. Convolutional neural networks achieve good performance on this task, while being computationally efficient. In this paper we carry these ideas over to the problem of assigning a sequence of labels to a set of speech frames, a task commonly known as framewise classification. We show that dense prediction view of framewise classification offers several advantages and insights, including computational efficiency and the ability to apply batch normalization. When doing dense prediction we pay specific attention to strided pooling in time and introduce an asymmetric dilated convolution, called time-dilated convolution, that allows for efficient and elegant implementation of pooling in time. We show results using time-dilated convolutions in a very deep VGG-style CNN with batch normalization on the Hub5 Switchboard-2000 benchmark task. With a big n-gram language model, we achieve 7.7% WER which is the best single model single-pass performance reported so far.
Tom Sercu, Christian Puhrsch, Brian Kingsbury, Yann LeCun. Very deep multilingual convolutional neural networks for LVCSR. Proc ICASSP, 2015 [arXiv]
Convolutional neural networks (CNNs) are a standard component of many current state-of-the-art Large Vocabulary Continuous Speech Recognition (LVCSR) systems. However, CNNs in LVCSR have not kept pace with recent advances in other domains where deeper neural networks provide superior performance. In this paper we propose a number of architectural advances in CNNs for LVCSR. First, we introduce a very deep convolutional network architecture with up to 14 weight layers. There are multiple convolutional layers before each pooling layer, with small 3x3 kernels, inspired by the VGG Imagenet 2014 architecture. Then, we introduce multilingual CNNs with multiple untied layers. Finally, we introduce multi-scale input features aimed at exploiting more context at negligible computational cost. We evaluate the improvements first on a Babel task for low resource speech recognition, obtaining an absolute 5.77% WER improvement over the baseline PLP DNN by training our CNN on the combined data of six different languages. We then evaluate the very deep CNNs on the Hub5'00 benchmark (using the 262 hours of SWB-1 training data) achieving a word error rate of 11.8% after cross-entropy training, a 1.4% WER improvement (10.6% relative) over the best published CNN result so far.
I started training Judo / Brazilian Jiu-Jitsu about a year ago. Different styles of sport martial arts (combat sports) generalize better or worse to real self-defense situations, with an interesting and counter intuitive relation to the constraints on the sport: more constrained grappling sports like BJJ can generalize better. I will call this the Bias-Trainability tradeoff, in analogy to the Bias-Variance tradeoff in supervised machine learning. This tradeoff can be relevant to Reinforcement Learning (RL): athletes are RL agents whose optimization is so good it’s probably not the bottleneck. So we can look at martial arts to learn how changing the rules/environment/reward influences the optimal policy.
Again a post mostly for my own memory: how I set up my website with jekyll, github pages, a custom domain, and automatic publication listing.
I just installed i3 window manager on ubuntu. This post serves as memory of what I did and learned.
Generative Adversarial Networks – hands-on tutorial in pytorch. NYC AI & ML meetup. [Slides]
Github Repository with notebook This talk is a hands-on live coding tutorial. We will implement a Generative Adversarial Network (GAN) to learn to generate small images. We will assume only a superficial familiarity with deep learning and a notion of PyTorch. This tutorial is as self-contained as possible. The goal is that this talk/tutorial can serve as an introduction to PyTorch at the same time as being an introduction to GANs.
Guest Lecture: Deep Learning. Intro to Data Science, Fall 2018 @ CCNY. [Slides]
Guest Lecture: intro to Deep Learning. The class was a 1 hour lecture and 1 hour lab for undergrad students in Computer Science at CCNY (college of CUNY).
- Lecture slides preface, slides, NN figure – Lab materials: https://github.com/grantmlong/itds2018/tree/master/lecture-13
Machine Learning: successes, promises and limits. AI Academy Seminars (Howest and Voka) at Kortrijk, Belgium. [Slides]
Seminar at AI Academy, organized by Howest and Voka, Flanders, Belgium. This seminar presented my view of AI / machine learning / deep learning for a non-technical audience of business leaders in Kortrijk, Belgium. It’s high-level and accessible to a wide audience.
From my time at NYU
While at NYU, I worked as a Teaching Assistant for two courses, making the following lab material: