t-SNE aims to address the crowding problem using a modified version of the KL divergence cost function and by substituting the Gaussian distribution with the Student's t-distribution in the low-dimensional space. Student's t-distribution is a continuous distribution that is used when one has a small sample size and unknown population standard deviation. It is often used in the Student's t-test.
The modified KL cost function considers the pairwise distances in the low-dimensional space equally, while the student's distribution employs a heavy tail in the low-dimensional space to avoid the crowding problem. In the higher-dimensional probability calculation, the Gaussian distribution is still used to ensure that a moderate distance in the higher dimensions is still represented as such in the lower dimensions. This combination of different distributions in the respective spaces allows the faithful representation of datapoints separated by small and moderate distances.