In this section, we will jump into the packtml
code base, and see how we can implement it from scratch. We'll start by revisiting the classic algorithm we covered in the last section, and then we'll look at the actual Python code, which has some implementation changes.
Recall the archetypal KNN algorithm. The efficient implementation is going to be to pre-compute the distances and store them in a special heap. Of course, with most things in computer science, there's the clever way and then there's the easy-to-read way. We're going to do things a bit differently in an effort to maximize the readability, but it's the same fundamental algorithm.
We've got two files we want to look at. The first is the source code in the packtml
Python package. Second, we're going to look an example of the KNN applied to the iris
dataset. Let's go ahead and jump over to PyCharm, where there are two files open. Inside of the clustering
submodule, we have the knn.py
file...