The curse of dimensionality refers to various phenomena that arise when analyzing and organizing data in High dimensional spaces that donot occur in lower dimensions like 3D physical space of every day life.
Dimensionally cursed phenomena occurs in sampling, numerical analysis, data mining , machine learning and databases. The common theme of these problem is when dimensionality increases, volume of space increases and available data are sparse.This becomes problem for any method that requires statistical significance.
In order to obtain a statistically sound and reliable results the amount of data needed to support the result often grows exponentially with the dimensionality. Also, organizing and searching data often relies on detecting areas where objects form a group with similar properties, in high dimensional data, however all objects appear to be sparse and dissimilar in many ways, which prevent common data organization strategies from being efficient.
There are two things to consider regarding curse of dimensionality, on one hand machine learning excels at analyzing data with many dimensions. Humans are not good at finding patterns that may spread across so many dimensions, especially if those dimensions are interrelated in counter intuitive ways. On the other hand , as we add more dimensions we also increase the processing power we need to analyze the data, and we also increase the amount of training data required to make meaningful models.
Hughes Phenomena
Hughes phenomena shows that as number of features increases, classifiers performance increases as well until we reach the optimal number of features. Adding more features based on the same size as the training set will then degrade the classifiers performance.
Curse of dimensionality in distance function
An increase in the number of dimensions of a datasets means there are more entries in the vector of features that represent each observation in the corresponding Euclidean space. In other words as the number of features grows for a given number of observations, the feature space becomes increasingly sparse; that is less dense or emptier. On the flip side, the lower data density requires more observations to keep average distance between the data points the same.
When the distance between observation grows supervised machine learning becomes more difficult because predictions for new samples are less likely to be based on learning from similar training features.
Over fitting and Under fitting
In curse of dimensionality even the closest neighbor can appear to being far away in a high dimensional space to give a good estimate. Regularization is one way to avoid over fitting. We can sometimes use feature selection and dimensionality reduction techniques to help us avoid the curse of dimensionality. Over fitting occurs when a model starts to memorize the aspects of the training set and in turn loses the ability to generalize.
Ex:-As our training data is not good enough we risk producing a model that could be very good at predicting the target class on the training datasets that may fail miserably when faced with new data. That is our model, doesn't have the generalization power.
To avoid overfitting is to preference simple methods, hypothesis with fewest assumptions must be selected.
If we keep our model simple we must avoid overfitting but if we keep it simpler we may risk of suffering from undercutting. It arises when our model has such low representation power that it cannot model the data even if we had all the training data we want. A model undercuts when it fails to capture the pattern in the data. It suffers high bias.
Hence in order to avoid curse of dimensionality more data is needed.