An algorithm that removes less important data from a dataset.
PCA (as any 'dimensionality reduction' algorithms) helps to reduce the storage space, to improve the performance of the models, and, to ease the data visualization.
Technically, it's a statistical procedure that identifies and sorts a small number of uncorrelated variables, called principal components, in such a way that the first one has the most scattered data. So, if we want to reduce from
k dimensions, we keep the
k's first principal components. But the lower is
k the greater is the information loss.