If you are developing your machine learning application on windows using Eclipse (as Maven project of course), probably you will face a problem since Spark expects that there is a runtime environment for Hadoop on Windows too.
More specifically, suppose you are running a Spark project written in Java with main class as JavaNaiveBayes_ML.java
, then you will experience an IO exception saying that:
16/10/04 11:59:52 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
The reason is that by default Hadoop is developed for the Linux environment and if you are developing your Spark applications on windows platform, a bridge is required that will provide the Hadoop environment for the Hadoop runtime for Spark to be properly executed.
Now, how to get rid of this problem then? The solution is...