Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Practical Data Analysis
  • Table Of Contents Toc
Practical Data Analysis

Practical Data Analysis - Second Edition

By : Hector Cuesta, Dr. Sampath Kumar
3.5 (2)
close
close
Practical Data Analysis

Practical Data Analysis

3.5 (2)
By: Hector Cuesta, Dr. Sampath Kumar

Overview of this book

Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you’ll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark.
Table of Contents (16 chapters)
close
close

The algorithm


We use the function list_words() to get a list of unique words with more than three characters in lower case:

def list_words(text): 
    words = [] 
    words_tmp = text.lower().split() 
    for w in words_tmp: 
        if w not in words and len(w) > 3: 
            words.append(w) 
    return words 

Tip

For a more advanced term-document matrix, we can use the Python textmining package from: https://pypi.python.org/pypi/textmining/1.0

The training() function creates variables to store the data needed for the classification. The c_words variable is a dictionary with the unique words and its number of occurrences in the text (frequency) by category. The c_categories variable stores a dictionary of each category and its number of texts. Finally, c_text and c_total_words store the total count of texts and words, respectively:

def training(texts): 
    c_words ={} 
    c_categories ={} 
    c_texts = 0 
    c_total_words =0 
    #add the classes to the categories 
    for t in texts...
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Practical Data Analysis
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon