Why is Max pooling CNN

Convolutional Neural Networks do a great job in dealing with high dimensional data. Restricting the number of weights only to kernels weights makes learning easier due to invariance properties of images or sound. But if you look carefully at what's going on you may notice that the after first convolutional layer the dimension of your data might severely increase if you don't do the tricks like pooling.

Max pooling decreases the dimension of your data simply by taking only the maximum input from a fixed region of your convolutional layer. Sum pooling works in a similiar manner - by taking the sum of inputs instead of it's maximum.

The conceptual difference between these approaches lies in the sort of invariance which they are able to catch. Max pooling is sensitive to existence of some pattern in pooled region. Sum pooling (which is proportional to Mean pooling) measures the mean value of existence of a pattern in a given region.

UPDATE:

The subregions for Sum pooling / Mean pooling are set exactly the same as for Max pooling but instead of using max function you use sum / mean. You can read about here in the paragraph about pooling.