[ad_1]
Deep learning has recently made tremendous progress in a wide range of applications, from realistic image generation and impressive restoration systems to language models that can hold human-like conversations. While this progress is very exciting, the widespread use of deep neural network models requires caution: Guided by Google’s AI principles, we strive to responsibly develop AI technologies by understanding and mitigating potential risks, such as the propagation and amplification of unfair biases, and safeguards. User Privacy.
Completely erasing the impact of data requested for deletion is difficult because, in addition to simply deleting it from the databases where it is stored, it also requires erasing the impact of that data on other artifacts, such as trained machine learning models. Moreover, recent studies [1, 2] It has been shown that in some cases it is possible to infer with high accuracy whether an example has been used to train a machine learning model using Membership Inference Attacks (MIAs). This can raise privacy concerns because it implies that even if an individual’s data is removed from the database, it is still possible to infer whether that individual’s data was used to train the model.
Based on the above, Unlearning the car is a subfield of machine learning that aims to remove the influence of a specific subset of training examples—the “forgetting set”—from a trained model. In addition, the ideal learning algorithm will remove the influence of certain examples while maintaining Other useful properties such as accuracy in the rest of the train set and generalization to the correct examples. A simple way to produce this unlearned model is to retrain the model on a custom training set that excludes samples from the forgetting set. However, this is not always a viable option, as retraining deep models can be computationally expensive. An ideal unlearning algorithm would instead use an already trained model as a starting point and effectively make adjustments to remove the influence of the requested data.
Today we are excited to announce that we have teamed up with a broad group of academic and industry researchers to organize the first Machine Unlearning Challenge. The competition considers a realistic scenario in which, after training, a certain subset of training images must be forgotten to protect the privacy or rights of the individuals concerned. The competition will be hosted on Kaggle and submissions will be automatically scored in terms of both quality and forgetting model utility. We hope that this competition will contribute to the development of machine learning techniques and encourage the development of effective, efficient and ethical machine learning algorithms.
Machine learning applications
Machine revocation has applications beyond protecting user privacy. For example, unlearning can be used to remove inaccurate or outdated information from trained models (eg, due to labeling errors or changes in the environment) or to remove malicious, manipulated, or outlier data.
The field of machine learning is related to other areas of machine learning such as differential privacy, continuous learning, and fairness. Differential privacy aims to ensure that no particular training example has too much influence on the trained model; A more powerful goal than unlearning, which only requires erasing the effects of a specified set of forgets. Lifelong learning research aims to develop models that can continuously learn while maintaining previously acquired skills. As work on unlearning progresses, it may also open up additional avenues for enhancing fairness in models by correcting unfair biases or treating different groups (eg, demographics, age groups, etc.) differently.
He teaches anatomy. The unlearning algorithm takes as input a pretrained model and one or more samples from the train set to discard (the “forgetting set”). From the model, forget set and keep set, the unlearning algorithm produces an updated model. An ideal unlearning algorithm produces a model that is indistinguishable from a model trained without the forgetting set. |
The challenges of machine learning
The unlearning problem is complex and multifaceted because it involves several conflicting goals: forgetting the requested data, maintaining model utility (eg, accuracy on stored and stored data), and efficiency. Because of this, existing unlearning algorithms make various trade-offs. For example, full retraining achieves successful forgetting without damaging model utility but with poor efficiency, while adding noise to the weights achieves forgetting at the expense of utility.
Furthermore, the evaluation of forgetting algorithms has been very inconsistent in the literature so far. While some works report classification accuracy on unlearned samples, others report the distance to a fully trained model, and some use the error rate of membership inference attacks as a quality forgetting metric. [4, 5, 6].
We believe that the inconsistency of assessment metrics and the lack of a standardized protocol are serious obstacles to progress in this field—we cannot make direct comparisons between different learning methods in the literature. This provides a myopic view of the relative advantages and disadvantages of different approaches, as well as open challenges and opportunities for developing improved algorithms. To address the issue of inconsistent evaluation and advance the state of the art in machine learning, we teamed up with a broad group of academic and industry researchers to organize the first Machine Learning Challenge.
Announce the first machine learning challenge
We are pleased to announce the first Machine Unlearning Challenge to be held as part of the NeurIPS 2023 Competition Track. The purpose of the competition is twofold. First, by unifying and standardizing learning evaluation metrics, we hope to reveal the strengths and weaknesses of different algorithms through an apples-to-apples comparison. Second, by opening this competition to everyone, we hope to encourage new solutions and shed light on open challenges and opportunities.
The competition will be hosted on Kaggle and will run from mid-July 2023 to mid-September 2023. As part of the competition, today we are announcing the availability of the Starter Kit. This starter kit gives participants a foundation to build and test their untrained models on a toy database.
The competition considers a realistic scenario in which an age predictor is trained on facial images, and after training, a certain subset of the training images must be discarded to protect the privacy or rights of the individuals concerned. To do this, we’ll make available as part of the starter kit a synthetic face dataset (samples are shown below), and we’ll also use some real face datasets to evaluate the submission. Participants are asked to submit code that takes as input a trained predictor, a forgetting and retention set, and outputs the weights of a predictor that has not learned the assigned forgetting set. We will evaluate the strength of both the submission forgetting algorithm and the utility of the model. We also implement a strict cutoff that rejects unlearned algorithms that perform slower than a fraction of the time required for retraining. A valuable outcome of this competition will be the characterization of the interdependence of different learning algorithms.
Extract images from the Face Synthetics dataset with age annotations. The competition considers a scenario in which an age predictor is trained on facial images such as the one above, and after training, must forget a certain subset of the training images. |
To assess forgetting, we will use MIA-inspired tools such as LiRA. MIAs were first developed in the privacy and security literature and aim to infer which instances were part of the training. Intuitively, if unlearning is successful, the unlearned model contains no trace of forgotten examples, causing the MIA to fail: the attacker will he can not We conclude that the forgetting set was, in fact, part of the original training set. In addition, we will also use statistical tests to quantify how different the distribution of untrained models (produced by a particular submitted unlearning algorithm) is compared to the distribution of models trained from scratch. For an ideal unlearning algorithm, the two are indistinguishable.
conclusion
Machine learning is a powerful tool that has the potential to solve several open problems in machine learning. As research continues in this area, we hope to see new methods that are more effective, efficient, and responsive. We are delighted to have the opportunity to generate interest in this field through this competition and look forward to sharing our insights and findings with the community.
Acknowledgments
The authors of this post are now part of Google DeepMind. We are writing this blog post on behalf of the Unlearning Competition organizing team: Eleni Triantafillou*, Fabian Pedregosa* (*equal contribution), Meghdad Kurmanji, Kairan Zhao, Gintare Karolina Dziugaite, Peter Triantafillou, Ioannis Mitliagkas, Lincent Dunhengmo Sun Hosoya, Peter Kairouzi, Julio CS Jacques Junior, Jun Wan, Sergio Escalera and Isabelle Guion.
[ad_2]
Source link