Statistics for Data Analytics [Week 1-4]

Correlation and Regression Analysis [Week 3]

These methods are used to determine relationships between variables. Is there a relationship between marketing spend and sales revenue? Can we predict one variable based on another?

Concepts to Master:

  • Correlation Coefficient (r): Measures how strongly two variables are related.

    • Pearson Correlation: Measures the linear relationship between two variables.
    • Spearman Rank Correlation: Measures the strength and direction of the association between two ranked variables.

    Real-life Example:
    You might want to know if there’s a correlation between the amount of money spent on Facebook ads and the number of website visits. A high positive correlation would indicate that as ad spending increases, website visits also increase.

  • Linear Regression: A technique to model the relationship between a dependent variable (outcome) and one or more independent variables (predictors).

    Real-life Example:
    A retail company might use linear regression to predict sales based on the amount spent on advertising. The regression equation will give you a formula to make predictions based on advertising spend.

  • Multiple Regression: Extends linear regression by allowing for more than one independent variable.

    Real-life Example:
    You could use multiple regression to predict a customer's likelihood of purchasing based on several factors: age, income, location, and previous purchase history.