#### Overview of this book

Data Analysis with Stata
Credits
www.PacktPub.com
Preface
Free Chapter
Introduction to Stata and Data Analytics
Stata Programming and Data Management
Data Visualization
Important Statistical Tests in Stata
Linear Regression in Stata
Logistic Regression in Stata
Survey Analysis in Stata
Time Series Analysis in Stata
Survival Analysis in Stata
Index

## Introducing data analytics

We analyze data everyday for various reasons. To predict an event or forecast the key indicators, such as the revenue for a given organization, is fast becoming a major requirement in the industry. There are various types of techniques and tools that can be leveraged to analyze the data. Here are the techniques that will be covered in this book using Stata as a tool:

• Stata programming and data management: Before predicting anything, we need to manage and massage the data in order to make it good enough to be something through which insights can be derived. The programming aspect helps in creating new variables to treat data in such a way that finding patterns in historical data or predicting the outcome of given event becomes much easier.

• Data visualization: After the data preparation, we need to visualize the data for the the following:

• To view what patterns in the data look like

• To check whether there are any outliers in the data

• To understand the data better

• To draw preliminary insights from the data

• Important statistical tests in Stata: After data visualization, based on observations, you can try to come up with various hypotheses about the data. We need to test these hypotheses on the datasets to check whether they are statistically significant and whether we can depend on and apply these hypotheses in future situations as well.

• Linear regression in Stata: Once done with the hypothesis testing, there is always a business need to predict one of the variables, such as what the revenue of the financial organization will be in specific conditions, and so on. These predictions about continuous variables, such as revenue, the default amount on a credit card, and the number of items sold in a given store, come through linear regression. Linear regression is the most basic and widely used prediction methodology. We will go into details of linear regression in a later chapter.

• Logistic regression in Stata: When you need to predict the outcome of a particular event along with the probability, logistic regression is the best and most acknowledged method by far. Predicting which team will win the match in football or cricket or predicting whether a customer will default on a loan payment can be decided through the probabilities given by logistic regression.

• Survey analysis in Stata: Understanding the customer sentiment and consumer experience is one of the biggest requirements of the retail industry. The research industry also needs data about people's opinions in order to derive the effect of a certain event or the sentiments of the affected people. All of these can be achieved by conducting and analyzing survey datasets. Survey analysis can have various subtechniques, such as factor analysis, principle component analysis, panel data analysis, and so on.

• Time series analysis in Stata: When you try to forecast a time-dependent variable with reasonable cyclic behavior of seasonality, time series analysis comes handy. There are many techniques of time series analysis, but we will talk about a couple of them: Autoregressive Integrated Moving Average (ARIMA) and Box Jenkins. Forecasting the amount of rainfall depending on the amount of rainfall in the past 5 years is a classic time series analysis problem.

• Survival analysis in Stata: These days, lots of customers attrite from telecom plans, healthcare plans, and so on, and join the competitors. When you need to develop a churn model or attrition model to check who will attrite, survival analysis is the best model.