We will be using the diabetes dataset which was constructed in the last chapter. For some of the other decision tree examples, we will need to load the stop and frisk dataset. You can obtain this dataset from the following URL: http://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page.
Select the 2015 CSV zip archive and download and extract the files to the projects directory, e.g C:/PracticalPredictiveAnalytics/Data, and name the file
Databricks contains a simple user interface which allows you to load a file to the Databricks HDFS filesystem. Alternatively, you can load the file directly to Amazon Web Services (AWS) and read the file directly from the Databricks API.
- Switch to the Databricks application, select
Tables, and then
Data Import. Note that in some of the versions of Databricks this is embedded under the Data menu: Select "Tables", and then click the +.
You may be prompted to create a new...