http://scikit-learn.org/stable/tutorial/index.html
Included in the scikit-learn documentation is a series of tutorials on data mining. The tutorials range from basic introductions to toy datasets, all the way through to comprehensive tutorials on techniques used in recent research.
The tutorials here will take quite a while to get through—they are very comprehensive—but are well worth the effort to learn.
http://ipython.org/ipython-doc/1/interactive/public_server.html
The IPython Notebook is a powerful tool. It can be extended in many ways, and one of those is to create a server to run your Notebooks, separately from your main computer. This is very useful if you use a low-power main computer, such as a small laptop, but have more powerful computers at your disposal. In addition, you can set up nodes to perform parallelized computations.More datasets are available at:
http://archive.ics.uci.edu/ml/
There are many datasets available on the Internet, from a number of different sources. These include academic, commercial, and government datasets. A collection of well-labelled datasets is available at the UCI ML library, which is one of the best options to find datasets for testing your algorithms.
Try out the OneR algorithm with some of these different datasets.