Python has slowly established ground as a de-facto tool for data science. It has a command-line interface and decent visualization via matplotlib and ggplot, which is based on R's ggplot2. Recently, Wes McKinney, the creator of Pandas, the time series data-analysis package, has joined Cloudera to pave way for Python in big data.
Python is usually part of the default installation. Spark requires version 2.7.0+.
If you don't have Python on Mac OS, I recommend installing the Homebrew package manager from http://brew.sh:
[akozlov@Alexanders-MacBook-Pro spark(master)]$ ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" ==> This script will install: /usr/local/bin/brew /usr/local/Library/... /usr/local/share/man/man1/brew.1 … [akozlov@Alexanders-MacBook-Pro spark(master)]$ brew install python …
Otherwise, on a Unix-like system, Python can be compiled from the source distribution:
$ export PYTHON_VERSION=2.7.11 $ wget...