Book Image

Learning Data Mining with Python

Book Image

Learning Data Mining with Python

Overview of this book

Table of Contents (20 chapters)
Learning Data Mining with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Index

A

  • access keys
    • about / Downloading data from a social network
  • accuracy
    • improving, dictionary used / Improving accuracy using a dictionary
  • activation function
    • about / Artificial neural networks
  • Adult dataset
    • URL / Representing reality in models
  • Advertisements dataset
    • URL / Feature creation
  • affinity analysis
    • example / A simple affinity analysis example
    • defining / What is affinity analysis?
    • product recommendations / Product recommendations
    • dataset, loading with NumPy / Loading the dataset with NumPy
    • ranking of rules, implementing / Implementing a simple ranking of rules
    • ranking, to find best rules / Ranking to find the best rules
    • about / Affinity analysis
    • algorithms / Algorithms for affinity analysis
    • parameters, selecting / Choosing parameters
  • Amazon S3 console
    • URL / Training on Amazon's EMR infrastructure
  • API endpoint
    • URL / Using a Web API to get data
  • application
    • defining / Application
    • word counts, extracting / Extracting word counts
    • dictionaries, converting to matrix / Converting dictionaries to a matrix
    • Naive Bayes classifier, training / Training the Naive Bayes classifier
    • about / Application, Application
    • data, obtaining / Getting the data, Getting the data
    • neural network, creating / Creating the neural network
    • neural network, training with training dataset / Putting it all together
    • Naive Bayes algorithm / Naive Bayes prediction
  • apps, Twitter account
    • URL / Downloading data from a social network
  • Apriori algorithm / The Apriori algorithm
  • Apriori implementation
    • about / The Apriori implementation
    • Apriori algorithm / The Apriori algorithm
    • defining / Implementation
  • arbitrary websites
    • text, extracting from / Extracting text from arbitrary websites, Putting it all together
    • stories, finding / Finding the stories in arbitrary websites
    • data mining, using / Putting it all together
    • nodes, ignoring / Putting it all together
    • HTML file, parsing / Putting it all together
  • Artificial Neural Networks
    • about / Artificial neural networks
  • association rules
    • extracting / Extracting association rules
    • evaluating / Evaluation
  • authorship analysis
    • defining / Attributing documents to authors
    • applications / Applications and use cases
    • use cases / Applications and use cases
    • about / Applications and use cases
    • authorship attribution / Attributing authorship
    • data, obtaining / Getting the data
  • authorship analysis, problems
    • authorship profiling / Attributing documents to authors
    • authorship verification / Attributing documents to authors
    • authorship clustering / Attributing documents to authors
  • authorship attribution / Attributing authorship
  • AWS CLI
    • installing / Training on Amazon's EMR infrastructure
  • AWS console
    • URL / Running our code on a GPU

B

  • back propagation (backprop) algorithm / Back propagation
  • bagging
    • about / Random forests
  • BatchIterator instance
    • creating / Creating the neural network
  • Bayes' theorem / Bayes' theorem
    • about / Bayes' theorem
    • equation / Bayes' theorem
  • bias
    • about / How do ensembles work?
  • big data
    • about / Big data
    • use cases / Application scenario and goals
  • Bleeding Edge code
    • installing / Scalability with the nearest neighbor
    • URL / Scalability with the nearest neighbor
  • blog posts
    • extracting / Extracting the blog posts
  • blogs dataset
    • about / Blogs dataset

C

  • CAPTCHA
    • creating / Drawing basic CAPTCHAs
  • CAPTCHAs
    • references / Better (worse?) CAPTCHAs
    • defining / Better (worse?) CAPTCHAs
  • CART (Classification and Regression Trees)
    • about / Decision trees
  • character n-grams
    • about / Character n-grams
    • extracting / Extracting character n-grams
  • CIFAR-10
    • about / Application scenario and goals
    • URL / Application scenario and goals
  • class
    • about / A simple classification example
  • classification
    • example / A simple classification example
    • about / What is classification?
    • examples / What is classification?
    • dataset, loading / Loading and preparing the dataset
    • dataset, preparing / Loading and preparing the dataset
    • OneR algorithm, implementing / Implementing the OneR algorithm
    • algorithm, testing / Testing the algorithm
  • classifiers
    • comparing / Comparing classifiers
  • closed problem
    • about / Attributing authorship
  • cluster evaluation
    • URL / Evaluating the results
  • clustering
    • about / Grouping news articles
  • coassociation matrix
    • defining / Evidence accumulation
  • complex algorithms
    • references / More complex algorithms
  • complex features
    • references / More complex features
  • confidence
    • about / Implementing a simple ranking of rules
    • computing / Implementing a simple ranking of rules
  • connected components
    • about / Connected components
  • Cosine distance
    • about / Distance metrics
  • Coursera
    • about / More resources
    • references / More resources
  • Coval font I, Open Font Library
    • URL / Drawing basic CAPTCHAs
  • CPU
    • defining / When to use GPUs for computation
  • cross-fold validation framework
    • defining / Running the algorithm
  • CSV (Comma Separated Values)
    • about / Collecting the data

D

  • data, blogging
    • URL / Getting the data
  • data, Corpus
    • URL / Getting the data
  • Dataframe
    • about / Using pandas to load the dataset
  • data mining
    • defining / Introducing data mining
  • dataset
    • loading / Loading the dataset, Loading the dataset, An introduction to Lasagne
    • data, collecting / Collecting the data
    • URL / Collecting the data
    • loading, pandas used / Using pandas to load the dataset
    • cleaning up / Cleaning up the dataset
    • new features, extracting / Extracting new features
    • classifying, with existing model / Classifying with an existing model
    • follower information, obtaining from Twitter / Getting follower information from Twitter
    • network, building / Building the network
    • graph, creating / Creating a graph
    • Similarity graph, creating / Creating a similarity graph
    • creating / Creating the dataset
    • CAPTCHAs, drawing / Drawing basic CAPTCHAs
    • image, splitting into individual letters / Splitting the image into individual letters
    • training dataset, creating / Creating a training dataset
    • training dataset, adjusting to methodology / Adjusting our training dataset to our methodology
  • datasets
    • about / Introducing data mining
    • samples / Introducing data mining
    • features / Introducing data mining
    • example / Introducing data mining
    • URL / Obtaining the dataset, Extending the IPython Notebook
    • references / New datasets
  • decision tree implementation
    • min_samples_split / Parameters in decision trees
    • min_samples_leaf / Parameters in decision trees
  • decision trees
    • about / Decision trees
    • parameters / Parameters in decision trees
    • Gini impurity / Parameters in decision trees
    • Information gain / Parameters in decision trees
    • using / Using decision trees
  • dictionary
    • used, for improving accuracy / Improving accuracy using a dictionary
    • ranking mechanisms, for words / Ranking mechanisms for words
    • improved prediction function, testing / Putting it all together
  • DictVectorizer class
    • about / Converting dictionaries to a matrix
  • disambiguation
    • about / Disambiguation
    • data, downloading from social network / Downloading data from a social network
    • dataset, loading / Loading and classifying the dataset
    • dataset, classifying / Loading and classifying the dataset
    • replicable dataset, creating from Twitter / Creating a replicable dataset from Twitter
  • discretization
    • about / Common feature patterns
  • discretization algorithm
    • defining / Loading and preparing the dataset
  • documents
    • attributing, to authors / Attributing documents to authors

E

  • EC2 service console
    • URL / Running our code on a GPU
  • Eclat algorithm
    • about / Algorithms for affinity analysis
    • URL / The Eclat algorithm
    • implementing / The Eclat algorithm
  • Elastic Map Reduce (EMR)
    • about / Training on Amazon's EMR infrastructure
  • Enron dataset
    • using / Using the Enron dataset
    • accessing / Accessing the Enron dataset
    • URL / Accessing the Enron dataset
    • dataset loader, creating / Creating a dataset loader
    • existing parameter space, using / Putting it all together
    • classifier, using / Putting it all together
    • evaluation / Evaluation
  • ensembles
    • clustering / Clustering ensembles
    • evidence accumulation / Evidence accumulation
    • working / How it works
    • implementing / Implementation
  • environment
    • setting up / Setting up the environment
  • epochs
    • about / Back propagation
  • Euclidean distance
    • about / Distance metrics
  • evaluation, of clustering algorithms
    • references / Evaluation
  • Evidence Accumulation Clustering (EAC)
    • about / Evidence accumulation
    • defining / Evidence accumulation
  • Excel, pandas
    • URL / More on pandas

F

  • f1-score
    • about / Evaluation using the F1-score
    • computing / Evaluation using the F1-score
    • using / Evaluation using the F1-score
  • feature-based normalization
    • about / Standard preprocessing
  • feature creation
    • about / Feature creation
    • Principal Component Analysis (PCA) / Principal Component Analysis
  • feature extraction
    • about / Feature extraction
    • reality, representing in models / Representing reality in models
    • common feature patterns / Common feature patterns
    • good features, creating / Creating good features
  • features, dataset
    • URL / More complex pipelines
  • feature selection
    • about / Feature selection
    • best individual features, selecting / Selecting the best individual features
  • feed-forward neural network
    • about / An introduction to neural networks
  • filename, data
    • Blogger ID / Getting the data
    • Gender / Getting the data
    • Age / Getting the data
    • Industry / Getting the data
    • Star Sign / Getting the data
  • FP-growth algorithm
    • about / Algorithms for affinity analysis
  • frequent itemsets
    • about / Algorithms for affinity analysis
  • functions, transformer
    • fit() / The transformer API
    • transform() / The transformer API
  • function words
    • about / Function words
    • counting / Counting function words
    • classifying with / Classifying with function words

G

  • GPU
    • using, for computation / When to use GPUs for computation
    • benefits / When to use GPUs for computation
    • avenues, defining / When to use GPUs for computation
    • code, running on / Running our code on a GPU
  • GPU optimization
    • about / GPU optimization
  • graph
    • creating / Creating a graph
  • gzip
    • about / Accessing the Enron dataset

H

  • Hadoop
    • about / Hadoop MapReduce
    • Distributed File System (HDFS) / Hadoop MapReduce
    • YARN / Hadoop MapReduce
    • Pig / Hadoop MapReduce
    • Hive / Hadoop MapReduce
    • HBase / Hadoop MapReduce
    • courses / Courses on Hadoop
  • Hadoop MapReduce
    • about / Hadoop MapReduce
  • hash function
    • about / Finding the stories in arbitrary websites
  • hidden layer
    • about / An introduction to neural networks
    • creating / An introduction to Lasagne
  • hierarchical clustering
    • about / Evidence accumulation

I

  • image
    • extracting / Application scenario and goals
  • image datasets
    • URL / Mahotas
  • input layer
    • about / An introduction to neural networks
  • installation instructions, scikit-learn
    • URL / Installing scikit-learn
  • instructions, AWS CLI
    • URL / Training on Amazon's EMR infrastructure
  • intra-cluster distance
    • about / Optimizing criteria
  • Ionosphere
    • about / Loading the dataset
    • URL / Loading the dataset
  • Ionosphere Nearest Neighbor
    • about / Loading the dataset
  • IPython
    • installing / Installing IPython
    • URL / Installing IPython
  • IPython Notebook
    • creating / Downloading data from a social network
    • URL / Extending the IPython Notebook
  • IPython notebook
    • using / Using Python and the IPython Notebook
  • Iris Setosa / Loading and preparing the dataset
  • Iris Versicolour / Loading and preparing the dataset
  • Iris Virginica / Loading and preparing the dataset

J

  • Jaccard Similarity
    • about / Creating a similarity graph
  • JQuery library
    • about / Loading and classifying the dataset
  • JSON
    • about / Loading and classifying the dataset
    • and dataset, comparing / Loading and classifying the dataset

K

  • k-means algorithm
    • about / The k-means algorithm
    • assignment phase / The k-means algorithm
    • updating phase / The k-means algorithm
  • Kaggle
    • URL / More resources
    • about / More resources
  • karma
    • about / Reddit as a data source
  • Keras
    • URL / Keras and Pylearn2
  • kernel
    • about / Loading and classifying the dataset
  • kernel parameter
    • about / Kernels
  • kernels / Kernels

L

  • Lasagne
    • about / An introduction to Lasagne
    • URL / An introduction to Lasagne
  • Levenshtein edit distance
    • about / Ranking mechanisms for words
    • computing / Ranking mechanisms for words
  • Locality-Sensitive Hashing (LSH)
    • about / Scalability with the nearest neighbor
  • local n-grams
    • references / Local n-grams
    • about / Local n-grams
  • local optima
    • about / Back propagation
  • log probabilities
    • using / Putting it all together

M

  • machine-learning workflow
    • training / Testing the algorithm
    • testing / Testing the algorithm
  • Mahotas
    • about / Mahotas
    • references / Mahotas
  • Manhattan distance
    • about / Distance metrics
  • MapReduce
    • about / MapReduce
    • defining / Intuition
    • WordCount example / A word count example
    • Hadoop MapReduce / Hadoop MapReduce
  • matplotlib
    • URL / scikit-learn estimators
  • MD5 algorithm
    • using / Finding the stories in arbitrary websites
  • metadata
    • about / Disambiguation
  • MiniBatchKMeans
    • about / Implementation
  • Minimum Spanning Tree (MST)
    • about / Evidence accumulation
    • computing / Evidence accumulation
  • movie recommendation problem
    • about / The movie recommendation problem
    • dataset, obtaining / Obtaining the dataset
    • loading, with pandas / Loading with pandas
    • sparse data formats / Sparse data formats
  • mrjob
    • URL / Training on Amazon's EMR infrastructure
  • mrjob package / The mrjob package
  • multiple SVMs
    • creating / Classifying with SVMs

N

  • n-gram
    • about / Character n-grams
  • n-grams
    • about / N-grams
    • disadvantages / N-grams
    • advantages / N-grams
  • Naive Bayes
    • about / Naive Bayes
    • Bayes' theorem / Bayes' theorem
    • algorithm / Naive Bayes algorithm
    • working / How it works
  • Naive Bayes algorithm
    • mrjob package / The mrjob package
    • blog posts, extracting / Extracting the blog posts
    • Naive Bayes model, training / Training Naive Bayes
    • classifier, running / Putting it all together
    • Amazon's EMR infrastructure, training / Training on Amazon's EMR infrastructure
  • Naive Bayes model
    • training / Training Naive Bayes
  • NaN (Not a Number)
    • about / Feature creation
  • National Basketball Association (NBA)
    • about / Loading the dataset
    • URL / Collecting the data
  • Natural Language ToolKit (NLTK)
    • about / Bag-of-words
  • nearest neighbor
    • about / scikit-learn estimators
  • nearest neighbor algorithm
    • URL / Scalability with the nearest neighbor
  • Nearest neighbors
    • about / Nearest neighbors
  • network
    • building / Building the network
  • networks
    • defining / Deeper networks
  • NetworkX
    • URL / Creating a similarity graph, NetworkX
    • defining / NetworkX
  • NetworkX package
    • about / Creating a graph
  • neural network
    • training / Training and classifying
    • classifying / Training and classifying
    • back propagation (backprop) algorithm / Back propagation
    • words, predicting / Predicting words
  • neural network layers, Lasagne
    • network-in-network layers / An introduction to Lasagne
    • dropout layers / An introduction to Lasagne
    • noise layers / An introduction to Lasagne
  • Neural networks
    • about / An introduction to neural networks
  • neural networks
    • about / scikit-learn estimators, Artificial neural networks, Deep neural networks
    • training / Deep neural networks
    • defining / Intuition
    • implementing / Implementation
    • Theano, defining / An introduction to Theano
    • Lasagne, defining / An introduction to Lasagne
    • implementing, with nolearn / Implementing neural networks with nolearn
    • URL / More resources
  • neurons
    • about / Artificial neural networks
  • news articles
    • obtaining / Obtaining news articles
    • web API used, for obtaining data / Using a Web API to get data
    • Reddit, as data source / Reddit as a data source
    • data, obtaining / Getting the data
    • clustering / Grouping news articles
    • k-means algorithm / The k-means algorithm
    • results, evaluating / Evaluating the results
    • topic information, extracting from clusters / Extracting topic information from clusters
    • clustering algorithms, using as transformers / Using clustering algorithms as transformers
  • NLTK
    • references / Natural language processing and part-of-speech tagging
  • NLTK installation instructions
    • URL / Application
  • noise
    • adding / Adding noise
  • nolearn package
    • neural networks, implementing with / Implementing neural networks with nolearn
  • nonprogrammers, for Python language
    • URL / Installing Python
  • n_neighbors
    • about / Setting parameters

O

  • object classification
    • about / Object classification
  • one-versus-all classifier
    • creating / Classifying with SVMs
  • OneR
    • about / Implementing the OneR algorithm
  • online learning
    • about / Online learning
    • defining / An introduction to online learning
    • implementing / Implementation
  • ordinal
    • about / Common feature patterns
  • output layer
    • about / An introduction to neural networks
  • overfitting
    • about / Testing the algorithm

P

  • pagination
    • about / Getting follower information from Twitter
  • pandas
    • URL / Collecting the data, More on pandas
    • references / More on pandas
  • pandas (Python Data Analysis)
    • about / Collecting the data
  • pandas.read_csv function
    • about / Cleaning up the dataset
  • pandas documentation
    • URL / Engineering new features
  • parameters, ensemble process
    • n_estimators / Parameters in Random forests
    • oob_score / Parameters in Random forests
    • n_jobs / Parameters in Random forests
  • petal length / Loading and preparing the dataset
  • petal width / Loading and preparing the dataset
  • pip
    • about / Installing Python, Creating a graph
  • Pipeline
    • creating / Putting it all together
  • pipeline
    • creating / Application
    • NLTKBOW transformer / Putting it all together
    • DictVectorizer transformer / Putting it all together
    • BernoulliNB classifier / Putting it all together
  • pipelines
    • about / Pipelines
  • Pipelines
    • URL / More complex pipelines
  • precision
    • about / Evaluation using the F1-score
  • preprocessing, using pipelines
    • about / Preprocessing using pipelines
    • features / Preprocessing using pipelines
    • features, of animal / Preprocessing using pipelines
    • example / An example
    • standard preprocessing / Standard preprocessing
    • workflow, creating / Putting it all together
  • pricing alerts
    • URL / Training on Amazon's EMR infrastructure
  • Principal Component Analysis (PCA)
    • about / Principal Component Analysis
  • prior belief
    • about / Bayes' theorem
  • probabilistic graphical models
    • URL / More resources
  • probabilities
    • computing / Putting it all together
  • programmers, for Python language
    • URL / Installing Python
  • Project Gutenberg
    • URL / Getting the data
  • Pydoop
    • about / Pydoop
    • URL / Pydoop
  • Pylearn2
    • about / Keras and Pylearn2
    • URL / Keras and Pylearn2
  • Python
    • using / Using Python and the IPython Notebook
    • installing / Installing Python
    • URL / Installing Python
    • defining / Disambiguation
  • Python 3.4
    • about / Installing Python

Q

  • quotequail package
    • about / Creating a dataset loader

R

  • RandomForestClassifier
    • about / Parameters in Random forests
  • random forests
    • about / scikit-learn estimators
    • defining / Random forests
    • ensembles, working / How do ensembles work?
    • parameters / Parameters in Random forests
    • applying / Applying Random forests
    • new features, engineering / Engineering new features
  • README
    • about / Extracting association rules
  • real-time clusterings
    • about / Real-time clusterings
  • reasons, feature selection
    • complexity, reducing / Feature selection
    • noise, reducing / Feature selection
    • readable models, creating / Feature selection
  • recall
    • about / Evaluation using the F1-score
  • recommendation engine
    • building / Recommendation engine
    • URL / Recommendation engine
  • reddit
    • about / Obtaining news articles, Using a Web API to get data
    • references / Using a Web API to get data
  • Reddit
    • about / Reddit as a data source
    • URL / Reddit as a data source
  • regularization
    • URL / Principal Component Analysis
  • reinforcement learning
    • URL / Reinforcement learning
  • RESTful interface (Representational State Transfer)
    • about / Using a Web API to get data
  • rules
    • support / Implementing a simple ranking of rules
    • confidence / Implementing a simple ranking of rules
    • finding / Ranking to find the best rules

S

  • sample size
    • increasing / Increasing the sample size
  • scikit-learn
    • installing / Installing scikit-learn
    • URL / Installing scikit-learn
  • scikit-learn estimators
    • defining / scikit-learn estimators
    • fit() / scikit-learn estimators
    • predict() / scikit-learn estimators
    • Nearest neighbors / Nearest neighbors
    • distance metrics / Distance metrics
    • dataset, loading / Loading the dataset
    • standard workflow, defining / Moving towards a standard workflow
    • fit() function / Moving towards a standard workflow
    • predict() function / Moving towards a standard workflow
    • algorithm, running / Running the algorithm
    • parameters, setting / Setting parameters
  • scikit-learn package
    • references / Evaluation
  • Scikit-learn tutorials
    • URL / Scikit-learn tutorials
  • self-posts
    • about / Reddit as a data source
  • sepal length / Loading and preparing the dataset
  • sepal width / Loading and preparing the dataset
  • shapes adding, CAPTCHAs
    • URL / Better (worse?) CAPTCHAs
  • Silhouette Coefficient
    • about / Optimizing criteria
    • computing / Optimizing criteria
    • parameters / Optimizing criteria
  • Similarity graph
    • creating / Creating a similarity graph
  • SNAP
    • URL / NetworkX
  • softmax nonlinearity
    • about / An introduction to Lasagne
  • Spam detection
    • references / Spam detection
  • spam filter
    • about / Evaluation using the F1-score
  • sparse matrix
    • about / Distance metrics
  • sparse matrix format
    • about / Sparse data formats
  • sports outcome prediction
    • about / Sports outcome prediction
    • features / Sports outcome prediction
  • stacking
    • about / Putting it all together
  • StackOverflow question
    • URL / More on pandas
  • standings
    • loading / Putting it all together
  • standings data
    • obtaining / Putting it all together
    • URL / Putting it all together
  • Stratified K Fold
    • about / Running the algorithm
  • style sheets
    • about / Extracting text from arbitrary websites
  • stylometry
    • about / Attributing documents to authors
  • subgraphs
    • finding / Finding subgraphs
    • connected components / Connected components
    • criteria, optimizing / Optimizing criteria
  • subreddits
    • about / Obtaining news articles, Reddit as a data source
  • support / Implementing a simple ranking of rules
  • support vector machines (SVM)
    • about / scikit-learn estimators
  • SVMs
    • about / Support vector machines
    • URL / Support vector machines
    • classifying with / Classifying with SVMs
    • kernels / Kernels
  • system
    • building, for taking image as input / Application scenario and goals

T

  • temporal analysis
    • about / Temporal analysis
  • text
    • about / Disambiguation
    • extracting, from arbitrary websites / Extracting text from arbitrary websites
  • text transformers
    • defining / Text transformers
    • word, counting in dataset / Bag-of-words
    • bag-of-words model / Bag-of-words
    • n-grams / N-grams
    • features / Other features
  • tf-idf
    • about / Bag-of-words
  • Theano
    • about / An introduction to Theano
    • using / An introduction to Theano
    • URL / Running our code on a GPU
  • Torch
    • URL / Keras and Pylearn2
  • train_feature_value() function
    • about / Implementing the OneR algorithm
  • transformer
    • creating / Creating your own transformer
    • API / The transformer API
    • implementing / Implementation details
    • unit testing / Unit testing
  • tutorial, Google
    • URL / Courses on Hadoop
  • tutorial, Yahoo
    • URL / Courses on Hadoop
  • tweet
    • about / Disambiguation
  • tweets
    • loading / Putting it all together
    • F1-score, used for evaluation / Evaluation using the F1-score
    • features, obtaining from models / Getting useful features from models
  • Twitter
    • follower information, obtaining from / Getting follower information from Twitter
  • Twitter account
    • URL / Downloading data from a social network
  • twitter documentation
    • URL / Downloading data from a social network

U

  • UCL Machine Learning data repository
    • URL / Loading the dataset
  • univariate feature
    • about / Selecting the best individual features
  • unstructured format
    • about / Disambiguation
  • use cases, computer vision
    • about / Use cases

V

  • V's, big data
    • volume / Big data
    • velocity / Big data
    • variety / Big data
    • veracity / Big data
  • variance
    • about / How do ensembles work?, Principal Component Analysis
  • virtualenv
    • URL / Setting up the environment, Scalability with the nearest neighbor
  • vocabulary
    • about / Counting function words
  • Vowpal Wabbit
    • about / Vowpal Wabbit
    • URL / Vowpal Wabbit

W

  • web-based API, considerations
    • authorization methods / Using a Web API to get data
    • rate limiting / Using a Web API to get data
    • API Endpoints / Using a Web API to get data
  • weight
    • about / An introduction to neural networks
  • weighted edge
    • about / Creating a similarity graph

Z

  • 7-zip
    • URL / Accessing the Enron dataset