Meta Learning for Few-Shot One-Class Classification
Abstract
:1. Introduction
- We show how to learn a feature representation for one-class classification (Section 2) by defining an estimator for the classification loss of such algorithms (Section 2.1). We also describe how to efficiently backpropagate through the objective when the chosen algorithm is the SVDD method, so we can parametrize the feature representation with deep neural networks (Section 2.2). The efficiency requirement to train our model serves to make it work in the few-shot setting.
- We simplify Meta SVDD by replacing how the center of its hypersphere is computed. Instead of solving a quadratic optimization problem to find the weight of each example in the center’s averaging, we remove the weighting and make the center the result of an unweighted average (Section 3). The resulting One-Class Prototypical Networks are simpler, and have lower computational complexity and more stable training dynamics than Meta SVDD.
- After that, we detail how our method conceptually addresses the limitations of previous work (Section 4). We also show that our method has promising empirical performance by adapting two few-shot classification datasets to the one-class classification setting and obtaining comparable results with the state-of-the-art of the many-shot setting (Section 5). Our results indicate that learning the feature representations may compensate for the simplicity of replacing SVDD with feature averaging and that our approach is a viable way to replace data from the target class with labeled data from related tasks. Code to reproduce our experiments and methods is also made publicly.
2. Meta SVDD
2.1. Meta-Learning One-Class Classification
- Inner loop: Use to transform the inputs, and use SVDD to learn a one-class classification boundary for the resulting features.
- Outer loop: Learn from the classification loss obtained with the SVDD.
2.2. Gradient-Based Optimization
3. One-Class Prototypical Networks
4. Related Work
4.1. One-Class Classification
4.2. Few-Shot Learning
4.3. Few-Shot, One-Class Classification
5. Experiments
5.1. Evaluation Protocol
5.1.1. Datasets
5.1.2. Metrics and Comparison
5.1.3. Second Experiment
5.2. Setup
5.2.1. Network Architecture
5.2.2. Optimization and Hyperparameters
5.2.3. Baslines
5.3. Results
5.3.1. First Experiment
5.3.2. Second Experiment
5.3.3. Other Experiments
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Schölkopf, B.; Platt, J.C.; Shawe-Taylor, J.; Smola, A.J.; Williamson, R.C. Estimating the Support of a High-Dimensional Distribution. Neural Comput. 2001, 13, 1443–1471. [Google Scholar] [CrossRef] [PubMed]
- Ruff, L.; Vandermeulen, R.; Goernitz, N.; Deecke, L.; Siddiqui, S.A.; Binder, A.; Müller, E.; Kloft, M. Deep One-Class Classification. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Dy, J., Krause, A., Eds.; PMLR: Stockholm, Sweden, 2018; Volume 80, pp. 4393–4402. [Google Scholar]
- Seeböck, P.; Waldstein, S.M.; Klimscha, S.; Gerendas, B.S.; Donner, R.; Schlegl, T.; Schmidt-Erfurth, U.; Langs, G. Identifying and Categorizing Anomalies in Retinal Imaging Data. arXiv 2016, arXiv:1612.00686. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.C.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
- Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Schmidt-Erfurth, U.; Langs, G. Unsupervised Anomaly Detection with Generative Adversarial Networks to Guide Marker Discovery. In Proceedings of the Information Processing in Medical Imaging—25th International Conference (IPMI 2017), Boone, NC, USA, 25–30 June 2017; pp. 146–157. [Google Scholar] [CrossRef] [Green Version]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein GAN. arXiv 2017, arXiv:1701.07875. [Google Scholar]
- Mescheder, L.; Geiger, A.; Nowozin, S. Which Training Methods for GANs do actually Converge? In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; Dy, J., Krause, A., Eds.; PMLR: Stockholm, Sweden, 2018; Volume 80, pp. 3481–3490. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML’17), Sydney, Australia, 6–11 August 2017; Volume 70, pp. 1126–1135. [Google Scholar]
- Snell, J.; Swersky, K.; Zemel, R.S. Prototypical Networks for Few-shot Learning. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017; pp. 4077–4087. [Google Scholar]
- Tax, D.M.J.; Duin, R.P.W. Support Vector Data Description. Mach. Learn. 2004, 54, 45–66. [Google Scholar] [CrossRef] [Green Version]
- Erfani, S.M.; Rajasegarar, S.; Karunasekera, S.; Leckie, C. High-dimensional and large-scale anomaly detection using a linear one-class SVM with deep learning. Pattern Recognit. 2016, 58, 121–134. [Google Scholar] [CrossRef]
- Raghu, A.; Raghu, M.; Bengio, S.; Vinyals, O. Rapid Learning or Feature Reuse? Towards Understanding the Effectiveness of MAML. arXiv 2019, arXiv:1909.09157. [Google Scholar]
- Lee, K.; Maji, S.; Ravichandran, A.; Soatto, S. Meta-Learning with Differentiable Convex Optimization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
- Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016; Available online: http://www.deeplearningbook.org (accessed on 3 March 2021).
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. In Advances in Neural Information Processing Systems 32; Curran Associates, Inc.: Nice, France, 2019; pp. 8024–8035. [Google Scholar]
- Elzinga, D.J.; Hearn, D.W. The minimum covering sphere problem. Manag. Sci. 1972, 19, 96–104. [Google Scholar] [CrossRef]
- Amos, B.; Kolter, J.Z. OptNet: Differentiable Optimization as a Layer in Neural Networks. In Proceedings of the 34th International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Precup, D., Teh, Y.W., Eds.; International Convention Center: Sydney, Australia, 2017; Volume 70, pp. 136–145. [Google Scholar]
- Perera, P.; Oza, P.; Patel, V.M. One-Class Classification: A Survey. arXiv 2021, arXiv:2101.03064. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the Dimensionality of Data with Neural Networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Oza, P.; Patel, V.M. One-Class Convolutional Neural Network. IEEE Signal Process. Lett. 2019, 26, 277–281. [Google Scholar] [CrossRef] [Green Version]
- LeCun, Y.; Boser, B.E.; Denker, J.S.; Henderson, D.; Howard, R.E.; Hubbard, W.E.; Jackel, L.D. Handwritten digit recognition with a back-propagation network. In Proceedings of the Advances in Neural Information Processing Systems, Denver, CO, USA, 26–29 November 1990; pp. 396–404. [Google Scholar]
- Deng, J.; Dong, W.; Socher, R.; Li, L.; Li, K.; Li, F. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef] [Green Version]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Oladosu, A.; Xu, T.; Ekfeldt, P.; Kelly, B.A.; Cranmer, M.; Ho, S.; Price-Whelan, A.M.; Contardo, G. Meta-Learning for Anomaly Classification with Set Equivariant Networks: Application in the Milky Way. arXiv 2020, arXiv:2007.04459. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Technical Report. 2009. Available online: http://www.cs.toronto.edu/~kriz/learning-features-2009-TR.pdf (accessed on 21 April 2021).
- Bertinetto, L.; Henriques, J.F.; Torr, P.H.S.; Vedaldi, A. Meta-learning with differentiable closed-form solvers. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Lake, B.M.; Salakhutdinov, R.; Tenenbaum, J.B. Human-level concept learning through probabilistic program induction. Science 2015, 350, 1332–1338. [Google Scholar] [CrossRef] [Green Version]
- Vinyals, O.; Blundell, C.; Lillicrap, T.; Kavukcuoglu, K.; Wierstra, D. Matching Networks for One Shot Learning. In Proceedings of the Advances in Neural Information Processing Systems 29: Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; pp. 3630–3638. [Google Scholar]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France, 6–11 July 2015; pp. 448–456. [Google Scholar]
- Jarrett, K.; Kavukcuoglu, K.; Ranzato, M.; LeCun, Y. What is the best multi-stage architecture for object recognition? In Proceedings of the IEEE 12th International Conference on Computer Vision (ICCV 2009), Kyoto, Japan, 27 September–4 October 2009; pp. 2146–2153. [Google Scholar] [CrossRef] [Green Version]
- Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V.; et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res. 2011, 12, 2825–2830. [Google Scholar]
- Oliphant, T. NumPy: A guide to NumPy; Trelgol Publishing: Spanish Fork, UT, USA, 2006. [Google Scholar]
- Deleu, T.; Würfl, T.; Samiei, M.; Cohen, J.P.; Bengio, Y. Torchmeta: A Meta-Learning Library for PyTorch. 2019. Available online: https://github.com/tristandeleu/pytorch-meta (accessed on 3 March 2021).
- Robbins, H.; Monro, S. A Stochastic Approximation Method. Ann. Math. Stat. 1951, 22, 400–407. [Google Scholar] [CrossRef]
- Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference on Learning Representations, Conference Track Proceedings (ICLR 2015), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Dataset | DCAE | Deep SVDD | Dataset | Deep SVDD | One-Class Protonet | Meta SVDD | |
---|---|---|---|---|---|---|---|
Min. | 78.2 ± 2.7 | 88.5 ± 0.9 | 89.0 ± 0.2 | 88.6 ± 0.4 | |||
Med. | MNIST | 86.7 ± 0.9 | 94.6 ± 0.9 | Omniglot | – | 99.5 ± 0.0 | 99.5 ± 0.0 |
Max. | 98.3 ± 0.6 | 99.7 ± 0.1 | 100.0 ± 0.0 | 100.0 ± 0.0 | |||
Min. | 51.2 ± 5.2 | 50.8 ± 0.8 | 47.9 ± 4.9 | 60.2 ± 3.4 | 59.0 ± 5.7 | ||
Med. | CIFAR-10 | 58.6 ± 2.9 | 65.7 ± 2.5 | CIFAR-FS | 64.0 ± 5.0 | 72.7 ± 3.0 | 71.0 ± 4.0 |
Max. | 76.8 ± 1.4 | 75.9 ± 1.2 | 92.4 ± 2.3 | 90.1 ± 2.3 | 92.5 ± 1.7 |
Dataset | PCA + SVM | One-Class Protonet | Meta SVDD |
---|---|---|---|
Omniglot | 50.64 ± 0.10% | 94.68 ± 0.17% | 94.33 ± 0.19% |
CIFAR-FS | 54.77 ± 0.31% | 67.67 ± 0.39% | 64.95 ± 0.37% |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dahia, G.; Pamplona Segundo, M. Meta Learning for Few-Shot One-Class Classification. AI 2021, 2, 195-208. https://0-doi-org.brum.beds.ac.uk/10.3390/ai2020012
Dahia G, Pamplona Segundo M. Meta Learning for Few-Shot One-Class Classification. AI. 2021; 2(2):195-208. https://0-doi-org.brum.beds.ac.uk/10.3390/ai2020012
Chicago/Turabian StyleDahia, Gabriel, and Maurício Pamplona Segundo. 2021. "Meta Learning for Few-Shot One-Class Classification" AI 2, no. 2: 195-208. https://0-doi-org.brum.beds.ac.uk/10.3390/ai2020012