Learning and Generalisation for High-dimensional Data

Thumbnail Image

Date

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Modern data-driven Artificial Intelligence models are based on large datasets which have been recently made available to practitioners. Significant efforts have been put into gathering data and information. The volumes of our data assets grow with time and bring us to the new era of Big Data. In many relevant problems however, we are faced with one particular class of Big Data types: high-dimensional low-sample data or data with limited annotation. These data sets are characterized by many attributes in a single record. At the same time, the number of separate records in these datastets are often small or lack annotation. We refer to these datasets as high-dimensional low-sample size data. They are found in many significant fields such as medical image analysis such as asthma detection and treatment, financial data analysis, and bioinformatics. These are just examples of where the data has got more attributes compared to the observations made. Note that the volumes of unlabeled data in these areas may in fact be large. However, for reasons beyond control of AI practitioners (e.g. privacy, data protection laws, costs of human assessment, intellectual property) annotated data may not be fully available to them. This kind of data presents many challenges in machine learning algorithms. Over-fitting and high variance have been some of the major problems. They are just one of many facets of the grand challenge of learning and generalisation in high dimensions. Altogether they constitute the challenge of learning and generalisation for high-dimensional systems.

Description

Keywords

Citation

Endorsement

Review

Supplemented By

Referenced By

Copyright owned by the Saudi Digital Library (SDL) © 2025