We will be using the diabetes dataset which was constructed in the last chapter. For some of the other decision tree examples, we will need to load the stop and frisk dataset. You can obtain this dataset from the following URL: http://www1.nyc.gov/site/nypd/stats/reports-analysis/stopfrisk.page.
Select the 2015 CSV zip archive and download and extract the files to the projects directory, e.g C:/PracticalPredictiveAnalytics/Data, and name the file 2015_sqf_csv
Databricks contains a simple user interface which allows you to load a file to the Databricks HDFS filesystem. Alternatively, you can load the file directly to Amazon Web Services (AWS) and read the file directly from the Databricks API.
- Switch to the Databricks application, select
Tables
, and thenData Import
. Note that in some of the versions of Databricks this is embedded under the Data menu: Select "Tables", and then click the +. You may be prompted to create a new...