publications
Conference and workshop papers listed in reversed chronological order. * denotes equal contribution
2024
- ICMLOut of the Ordinary: Robust Regression by Spectral AdaptationBenjamin Eyre, Elliot Creager, David Madras, Vardan Papyan, and Richard ZemelIn Proceedings of the 41st International Conference on Machine Learning 2024
Designing deep neural network classifiers that perform robustly on distributions differing from the available training data is an active area of machine learning research. However, out-of-distribution generalization for regression-the analogous problem for modeling continuous targets-remains relatively unexplored. To tackle this problem, we return to first principles and analyze how the closed-form solution for Ordinary Least Squares (OLS) regression is sensitive to covariate shift. We characterize the out-of-distribution risk of the OLS model in terms of the eigenspectrum decomposition of the source and target data. We then use this insight to propose a method for adapting the weights of the last layer of a pre-trained neural regression model to perform better on input data originating from a different distribution. We demonstrate how this lightweight spectral adaptation procedure can improve out-of-distribution performance for synthetic and real-world datasets.
- ICMLRemembering to Be Fair: Non-Markovian Fairness in Sequential Decision MakingParand A. Alamdari, Toryn Q. Klassen, Elliot Creager, and Sheila A. McilraithIn Proceedings of the 41st International Conference on Machine Learning Jul 2024
Fair decision making has largely been studied with respect to a single decision. Here we investigate the notion of fairness in the context of sequential decision making where multiple stakeholders can be affected by the outcomes of decisions. We observe that fairness often depends on the history of the sequential decision-making process, and in this sense that it is inherently non-Markovian. We further observe that fairness often needs to be assessed at time points \textlessem\textgreaterwithin\textless/em\textgreater the process, not just at the end of the process. To advance our understanding of this class of fairness problems, we explore the notion of non-Markovian fairness in the context of sequential decision making. We identify properties of non-Markovian fairness, including notions of long-term, anytime, periodic, and bounded fairness. We explore the interplay between non-Markovian fairness and memory and how memory can support construction of fair policies. Finally, we introduce the FairQCM algorithm, which can automatically augment its training data to improve sample efficiency in the synthesis of fair policies via reinforcement learning.
2023
- ThesisRobust Machine Learning by Transforming and Augmenting Imperfect Training DataElliot CreagerJul 2023
Machine Learning (ML) is an expressive framework for turning data into computer programs. Across many problem domains – both in industry and policy settings – the types of computer programs needed for accurate prediction or optimal control are difficult to write by hand. On the other hand, collecting instances of desired system behavior may be relatively more feasible. This makes ML broadly appealing, but also induces data sensitivities that often manifest as unexpected failure modes during deployment. In this sense, the training data available tend to be imperfect for the task at hand. This thesis explores several data sensitivities of modern machine learning and how to address them. We begin by discussing how to prevent ML from codifying prior human discrimination measured in the training data, where we take a fair representation learning approach. We then discuss the problem of learning from data containing spurious features, which provide predictive fidelity during training but are unreliable upon deployment. Here we observe that insofar as standard training methods tend to learn such features, this propensity can be leveraged to search for partitions of training data that expose this inconsistency, ultimately promoting learning algorithms invariant to spurious features. Finally, we turn our attention to reinforcement learning from data with insufficient coverage over all possible states and actions. To address the coverage issue, we discuss how causal priors can be used to model the single-step dynamics of the setting where data are collected. This enables a new type of data augmentation where observed trajectories are stitched together to produce new but plausible counterfactual trajectories.
- ICCVSURFSUP: Learning Fluid Simulation for Novel SurfacesArjun Mani, Ishaan Preetam Chandratreya, Elliot Creager, Carl Vondrick, and Richard ZemelIn International Conference on Computer Vision Oct 2023
Modeling the mechanics of fluid in complex scenes is vital to applications in design, graphics, and robotics. Learning-based methods provide fast and differentiable fluid simulators, however most prior work is unable to accurately model how fluids interact with genuinely novel surfaces not seen during training. We introduce SURFSUP, a framework that represents objects implicitly using signed distance functions (SDFs), rather than an explicit representation of meshes or particles. This continuous representation of geometry enables more accurate simulation of fluid-object interactions over long time periods while simultaneously making computation more efficient. Moreover, SURFSUP trained on simple shape primitives generalizes considerably out-of-distribution, even to complex real-world scenes and objects. Finally, we show we can invert our model to design simple objects to manipulate fluid flow.
2022
- NeurIPSMoCoDA: Model-based Counterfactual Data AugmentationSilviu Pitis, Elliot Creager, Ajay Mandlekar, and Animesh GargIn Advances in Neural Information Processing Systems Nov 2022
The number of states in a dynamic process is exponential in the number of objects, making reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to the real world, they will need to react to and reason about unseen combinations of objects. We argue that the ability to recognize and use local factorization in transition dynamics is a key element in unlocking the power of multi-object reasoning. To this end, we show that (1) known local structure in the environment transitions is sufficient for an exponential reduction in the sample complexity of training a dynamics model, and (2) a locally factored dynamics model provably generalizes out-of-distribution to unseen states and actions. Knowing the local structure also allows us to predict which unseen states and actions this dynamics model will generalize to. We propose to leverage these observations in a novel Model-based Counterfactual Data Augmentation (MoCoDA) framework. MoCoDA applies a learned locally factored dynamics model to an augmented distribution of states and actions to generate counterfactual transitions for RL. MoCoDA works with a broader set of local structures than prior work and allows for direct control over the augmented training distribution. We show that MoCoDA enables RL agents to learn policies that generalize to unseen states and actions. We use MoCoDA to train an offline RL agent to solve an out-of-distribution robotics manipulation task on which standard offline RL algorithms fail.
- ICML WorkshopsTowards Environment-Invariant Representation Learning for Robust Task TransferBenjamin Eyre, Richard Zemel, and Elliot CreagerIn ICML Workshop on Spurious Correlation, Invariance, and Stability Jul 2022
To train a classification model that is robust to distribution shifts upon deployment, auxiliary labels indicating the various “environments” of data collection can be leveraged to mitigate reliance on environment-specific features. In this paper we attempt to determine where in the network the environment invariance property can be located for such a model, with the hopes of adapting a single pre-trained invariant model for use in multiple tasks. We discuss how to evaluate whether a model has formed an environment-invariant internal representation—as opposed to an invariant final classifier function—and propose an objective that encourages learning such a representation. We also extend color-biased digit recognition to a transfer setting where the target task requires an invariant model, but lacks the environment labels needed to train an invariant model from scratch, thus motivating the transfer of an invariant representation trained on a source task with environment labels.
2021
- ICMLEnvironment Inference for Invariant LearningElliot Creager, Joern-Henrik Jacobsen, and Richard ZemelIn Proceedings of the 38th International Conference on Machine Learning Jul 2021
Learning models that gracefully handle distribution shifts is central to research on domain generalization, robust optimization, and fairness. A promising formulation is domain-invariant learning, which identifies the key issue of learning which features are domain-specific versus domain-invariant. An important assumption in this area is that the training examples are partitioned into “domains” or “environments”. Our focus is on the more common setting where such partitions are not provided. We propose EIIL, a general framework for domain-invariant learning that incorporates Environment Inference to directly infer partitions that are maximally informative for downstream Invariant Learning. We show that EIIL outperforms invariant learning methods on the CMNIST benchmark without using environment labels, and significantly outperforms ERM on worst-group performance in the Waterbirds dataset. Finally, we establish connections between EIIL and algorithmic fairness, which enables EIIL to improve accuracy and calibration in a fair prediction problem.
- ICML (Oral)On Disentangled Representations Learned from Correlated DataFrederik Träuble, Elliot Creager, Niki Kilbertus, Francesco Locatello, Andrea Dittadi, Anirudh Goyal, Bernhard Schölkopf, and Stefan BauerIn Proceedings of the 38th International Conference on Machine Learning Jul 2021
The focus of disentanglement approaches has been on identifying independent factors of variation in data. However, the causal variables underlying real-world observations are often not statistically independent. In this work, we bridge the gap to real-world scenarios by analyzing the behavior of the most prominent disentanglement approaches on correlated data in a large-scale empirical study (including 4260 models). We show and quantify that systematically induced correlations in the dataset are being learned and reflected in the latent representations, which has implications for downstream applications of disentanglement such as fairness. We also demonstrate how to resolve these latent correlations, either using weak supervision during training or by post-hoc correcting a pre-trained model with a small number of labels.
- ICML WorkshopsMeasuring User Recourse in a Dynamic Recommender SystemDilys Dickson, and Elliot CreagerIn ICML Workshop on Algorithmic Recourse Jul 2021
From online searches to suggested videos in social media, recommendation systems are heavily relied upon to mediate access to digital information. Concerns have been raised about these systems over their potential for feedback loops that can create unintended consequences such as echochambers, filter bubbles and polarization in the digital space. In this paper, we measure the effect of prolonged exposure to recommendation on availability of diverse suggested content to the user. We use the definition of reachability (or user recourse) of Dean et al. (2020b), as the proportion of unseen items that could be recommended to the user in the future, which can be approximated using knowledge of the embedding space geometry for linear recommenders. Whereas previous work assumed a static recommender, we study the case where the recommender can change over time, either by training for longer given a fixed dataset, or dynamically updating its training online through interactions with users. We find that dynamic changes to the recommender system do indeed affect the recourse available to users.
- ICML WorkshopsOnline Algorithmic Recourse by Collective ActionElliot Creager, and Richard ZemelIn ICML Workshop on Algorithmic Recourse Jul 2021
Research on algorithmic recourse typically considers how an individual can reasonably change an unfavorable automated decision when interacting with a fixed decision-making system. This paper focuses instead on the online setting, where system parameters are updated dynamically according to interactions with data subjects. Beyond the typical individual-level recourse, the online setting opens up new ways for groups to shape system decisions by leveraging the parameter update rule. We show empirically that recourse can be improved when users coordinate by jointly computing their feature perturbations, underscoring the importance of collective action in mitigating adverse automated decisions.
2020
- NeurIPSCounterfactual Data Augmentation using Locally Factored DynamicsSilviu Pitis, Elliot Creager, and Animesh GargIn Advances in Neural Information Processing Systems Dec 2020
Many dynamic processes, including common scenarios in robotic control and reinforcement learning (RL), involve a set of interacting subprocesses. Though the subprocesses are not independent, their interactions are often sparse, and the dynamics at any given time step can often be decomposed into locally independent\vphantom{} causal mechanisms. Such local causal structures can be leveraged to improve the sample efficiency of sequence prediction and off-policy reinforcement learning. We formalize this by introducing local causal models (LCMs), which are induced from a global causal model by conditioning on a subset of the state space. We propose an approach to inferring these structures given an object-oriented state representation, as well as a novel algorithm for Counterfactual Data Augmentation (CoDA). CoDA uses local structures and an experience replay to generate counterfactual experiences that are causally valid in the global model. We find that CoDA significantly improves the performance of RL agents in locally factored tasks, including the batch-constrained and goal-conditioned settings. Code available at https://github.com/spitis/mrl.
- NeurIPS WorkshopsFairness and Robustness in Invariant Learning: A Case Study in Toxicity ClassificationRobert Adragna, Elliot Creager, David Madras, and Richard ZemelIn NeurIPS 2020 Workshop on Fairness Through the Lens of Causality Dec 2020
Robustness is of central importance in machine learning and has given rise to the fields of domain generalization and invariant learning, which are concerned with improving performance on a test distribution distinct from but related to the training distribution. In light of recent work suggesting an intimate connection between fairness and robustness, we investigate whether algorithms from robust ML can be used to improve the fairness of classifiers that are trained on biased data and tested on unbiased data. We apply Invariant Risk Minimization (IRM), a domain generalization algorithm that employs a causal discovery inspired method to find robust predictors, to the task of fairly predicting the toxicity of internet comments. We show that IRM achieves better out-of-distribution accuracy and fairness than Empirical Risk Minimization (ERM) methods, and analyze both the difficulties that arise when applying IRM in practice and the conditions under which IRM will likely be effective in this scenario. We hope that this work will inspire further studies of how robust machine learning methods relate to algorithmic fairness.
- ICMLCausal Modeling for Fairness In Dynamical SystemsElliot Creager, David Madras, Toniann Pitassi, and Richard ZemelIn Proceedings of the 37th International Conference on Machine Learning Jul 2020
In many applications areas—lending, education, and online recommenders, for example—fairness and equity concerns emerge when a machine learning system interacts with a dynamically changing environment to produce both immediate and long-term effects for individuals and demographic groups. We discuss causal directed acyclic graphs (DAGs) as a unifying framework for the recent literature on fairness in such dynamical systems. We show that this formulation affords several new directions of inquiry to the modeler, where sound causal assumptions can be expressed and manipulated. We emphasize the importance of computing interventional quantities in the dynamical fairness setting, and show how causal assumptions enable simulation (when environment dynamics are known) and estimation by adjustment (when dynamics are unknown) of intervention on short- and long-term outcomes, at both the group and individual levels.
- ICMLOptimizing Long-term Social Welfare in Recommender Systems: A Constrained Matching ApproachMartin Mladenov, Elliot Creager, Omer Ben-Porat, Kevin Swersky, Richard Zemel, and Craig BoutilierIn Proceedings of the 37th International Conference on Machine Learning Jul 2020
Most recommender systems (RS) research assumes that a user’s utility can be maximized independently of the utility of the other agents (e.g., other users, content providers). In realistic settings, this is often not true – the dynamics of an RS ecosystem couple the long-term utility of all agents. In this work, we explore settings in which content providers cannot remain viable unless they receive a certain level of user engagement. We formulate this problem as one of equilibrium selection in the induced dynamical system, and show that it can be solved as an optimal constrained matching problem. Our model ensures the system reaches an equilibrium with maximal social welfare supported by a sufficiently diverse set of viable providers. We demonstrate that even in a simple, stylized dynamical RS model, the standard myopic approach to recommendation - always matching a user to the best provider - performs poorly. We develop several scalable techniques to solve the matching problem, and also draw connections to various notions of user regret and fairness, arguing that these outcomes are fairer in a utilitarian sense.
2019
- ICMLFlexibly Fair Representation Learning by DisentanglementElliot Creager, David Madras, Joern-Henrik Jacobsen, Marissa Weis, Kevin Swersky, Toniann Pitassi, and Richard ZemelIn Proceedings of the 36th International Conference on Machine Learning Jun 2019
We consider the problem of learning representations that achieve group and subgroup fairness with respect to multiple sensitive attributes. Taking inspiration from the disentangled representation learning literature, we propose an algorithm for learning compact representations of datasets that are useful for reconstruction and prediction, but are also flexibly fair, meaning they can be easily modified at test time to achieve subgroup demographic parity with respect to multiple sensitive attributes and their conjunctions. We show empirically that the resulting encoder—which does not require the sensitive attributes for inference—allows for the adaptation of a single representation to a variety of fair classification tasks with new target labels and subgroup definitions.
- ICLRExplaining Image Classifiers by Counterfactual GenerationChun-Hao Chang, Elliot Creager, Anna Goldenberg, and David DuvenaudIn International Conference on Learning Representations May 2019
When an image classifier makes a prediction, which parts of the image are relevant and why? We can rephrase this question to ask: which parts of the image, if they were not seen by the classifier, would most change its decision? Producing an answer requires marginalizing over images that could have been seen but weren’t. We can sample plausible image in-fills by conditioning a generative model on the rest of the image. We then optimize to find the image regions that most change the classifier’s decision after in-fill. Our approach contrasts with ad-hoc in-filling approaches, such as blurring or injecting noise, which generate inputs far from the data distribution, and ignore informative relationships between different parts of the image. Our method produces more compact and relevant saliency maps, with fewer artifacts compared to previous methods.
- ACM-FAccTFairness Through Causal Awareness: Learning Latent-Variable Models for Biased DataDavid Madras, Elliot Creager, Toniann Pitassi, and Richard ZemelIn ACM Conference on Fairness, Accountability, and Transparency Jan 2019
How do we learn from biased data? Historical datasets often reflect historical prejudices; sensitive or protected attributes may affect the observed treatments and outcomes. Classification algorithms tasked with predicting outcomes accurately from these datasets tend to replicate these biases. We advocate a causal modeling approach to learning from biased data, exploring the relationship between fair classification and intervention. We propose a causal model in which the sensitive attribute confounds both the treatment and the outcome. Building on prior work in deep learning and generative modeling, we describe how to learn the parameters of this causal model from observational data alone, even in the presence of unobserved confounders. We show experimentally that fairness-aware causal modeling provides better estimates of the causal effects between the sensitive attribute, the treatment, and the outcome. We further present evidence that estimating these causal effects can help learn policies that are both more accurate and fair, when presented with a historically biased dataset.
2018
- ICMLLearning Adversarially Fair and Transferable RepresentationsDavid Madras*, Elliot Creager*, Toniann Pitassi, and Richard ZemelIn Proceedings of the 35th International Conference on Machine Learning Jul 2018
In this paper, we advocate for representation learning as the key to mitigating unfair prediction outcomes downstream. Motivated by a scenario where learned representations are used by third parties with unknown objectives, we propose and explore adversarial representation learning as a natural method of ensuring those parties act fairly. We connect group fairness (demographic parity, equalized odds, and equal opportunity) to different adversarial objectives. Through worst-case theoretical guarantees and experimental validation, we show that the choice of this objective is crucial to fair prediction. Furthermore, we present the first in-depth experimental demonstration of fair transfer learning and demonstrate empirically that our learned representations admit fair predictions on new tasks while maintaining utility, an essential goal of fair representation learning.
- ICLR WorkshopsGradient-Based Optimization Of Neural Network ArchitectureWill Grathwohl*, Elliot Creager*, Seyed Kamyar Seyed Ghasemipour*, and Richard ZemelIn ICLR (Workshop Track) Apr 2018
Neural networks can learn relevant features from data, but their predictive accuracy and propensity to overfit are sensitive to the values of the discrete hyperparameters that specify the network architecture (number of hidden layers, number of units per layer, etc.). Previous work optimized these hyperparmeters via grid search, random search, and black box optimization techniques such as Bayesian optimization. Bolstered by recent advances in gradient-based optimization of discrete stochastic objectives, we instead propose to directly model a distribution over possible architectures and use variational optimization to jointly optimize the network architecture and weights in one training pass. We discuss an implementation of this approach that estimates gradients via the Concrete relaxation, and show that it finds compact and accurate architectures for convolutional neural networks applied to the CIFAR10 and CIFAR100 datasets.
2016
- ISMIRNonnegative Tensor Factorization with Frequency Modulation Cues for Blind Audio Source SeparationElliot Creager, Noah D. Stein, Roland Badeau, and Philippe DepalleIn 17th International Society for Music Information Retrieval Conference Aug 2016
We present Vibrato Nonnegative Tensor Factorization, an algorithm for single-channel unsupervised audio source separation with an application to separating instrumental or vocal sources with nonstationary pitch from music recordings. Our approach extends Nonnegative Matrix Factorization for audio modeling by including local estimates of frequency modulation as cues in the separation. This permits the modeling and unsupervised separation of vibrato or glissando musical sources, which is not possible with the basic matrix factorization formulation. The algorithm factorizes a sparse nonnegative tensor comprising the audio spectrogram and local frequency-slope-to-frequency ratios, which are estimated at each time-frequency bin using the Distributed Derivative Method. The use of local frequency modulations as separation cues is motivated by the principle of common fate partial grouping from Auditory Scene Analysis, which hypothesizes that each latent source in a mixture is characterized perceptually by coherent frequency and amplitude modulations shared by its component partials. We derive multiplicative factor updates by Minorization-Maximization, which guarantees convergence to a local optimum by iteration. We then compare our method to the baseline on two separation tasks: one considers synthetic vibrato notes, while the other considers vibrato string instrument recordings.
2015
- ThesisMusical Source Separation by Coherent Frequency Modulation CuesElliot CreagerAug 2015
This thesis explores the extraction of vibrato sounds from monaural excerpts of polyphonic music using the coherent frequency modulation (CFM) of component partials as a grouping cue. Nonnegative Matrix Factorization (NMF) (Lee and Seung 1999) is currently a popular tool for musical source separation (Wang and Plumbley 2005), since it can provide a lowrank approximate factorization of the magnitude spectrogram of the analyzed sound, where the factors can be interpreted as the spectral templates and temporal activations of the notes contributing to the recording. However, NMF implicitly models each source as having a fixed spectral template and is thus ill-suited to the analysis of vibrato sounds, which are characterized by slowly varying frequency and amplitude modulations.