Simple Inferential Statistics
“Inferential statistics” is a term used to describe the use of information regarding a sample of subjects to make:
(1) Assumptions about the population at large and/or
(2) Predictions about what might happen in the future
What’s your Batting Average?
You can calculate the mean (or average) batting average of a known sample of ball players by adding up their total hits for last season and dividing by the number of the players. The mean of the players is therefore a known variable. To determine a mean of a population of those players for next season requires the data scientist to make assumptions (because their number of hits is not yet known).
The goal of inferential statistics is to do just that:
“To take what is known and make assumptions or an inference about what is not known”.
The specific procedures used to make inferences about an unknown population or unknown score can vary (depending on the type of data used and the purpose of making the inference).
The five most basic inferential procedures include:
- Factor Analysis
- Regression Analysis, and
The purpose of a T-test is to determine if a difference exists between the averages of two groups, using the means, standard deviations, and number of subjects for each group.
A factor analysis is used when an attempt is being made to break down a large data pond into different subgroups or factors looking at each question within a group of questions to determine how these questions accumulate together.
When a correlation is used you can determine the strength and direction of a relationship between two or more variables. Got example, if it is determined that a connection between a midterm test and a final exam was +.95, we could say that these two tests are strongly and directly related to each other. (In other words, a student who scored high on one would likely score high on the other).
When your data pond is much larger and the correlation less than perfect, making a prediction requires the use of the statistical regression, which is basically a formula used to determine where a score falls on a straight line.
“Meta-analysis” refers to the combining of numerous studies into one larger study. When this technique is used, each study becomes one subject in the new “meta study”. For instance, the combination of 12 studies on years in the league and batting averages would result in a Meta study with 12 subjects. “Meta-analysis” combines many studies together to determine if the results of all of them, when taken as a whole, are significant.
Hugo Cabret: It’s like a puzzle. When you put it together, something’s going to happen.