Data & Intelligence

Simple Inferential Statistics

 Simple Inferential Statistics

“Inferential statistics” is a term used to describe the use of information regarding a sample of subjects to make:

(1) Assumptions about the population at large and/or

(2) Predictions about what might happen in the future

 

What’s your Batting Average?

You can calculate the mean (or average) batting average of a known sample of ball players by adding up their total hits for last season and dividing by the number of the players. The mean of the players is therefore a known variable. To determine a mean of a population of those players for next season requires the data scientist to make assumptions (because their number of hits is not yet known).

Money ball

The goal of inferential statistics is to do just that:

To take what is known and make assumptions or an inference about what is not known”.

Data Intelligence - The Future of Big Data
The Future of Big Data

With some guidance, you can craft a data platform that is right for your organization’s needs and gets the most return from your data capital.

Get the Guide

The specific procedures used to make inferences about an unknown population or unknown score can vary (depending on the type of data used and the purpose of making the inference).

Basic Procedures

The five most basic inferential procedures include:

  1. T-test
  2. ANOVA
  3. Factor Analysis
  4. Regression Analysis, and
  5. Meta-Analysis.

T-Test

The purpose of a T-test is to determine if a difference exists between the averages of two groups, using the means, standard deviations, and number of subjects for each group.

Factor Analysis

A factor analysis is used when an attempt is being made to break down a large data pond into different subgroups or factors looking at each question within a group of questions to determine how these questions accumulate together.

Regression Analysis

When a correlation is used you can determine the strength and direction of a relationship between two or more variables. Got example, if it is determined that a connection between a midterm test and a final exam was +.95, we could say that these two tests are strongly and directly related to each other. (In other words, a student who scored high on one would likely score high on the other).

When your data pond is much larger and the correlation less than perfect, making a prediction requires the use of the statistical regression, which is basically a formula used to determine where a score falls on a straight line.

Meta-Analysis

“Meta-analysis” refers to the combining of numerous studies into one larger study. When this technique is used, each study becomes one subject in the new “meta study”. For instance, the combination of 12 studies on years in the league and batting averages would result in a Meta study with 12 subjects. “Meta-analysis” combines many studies together to determine if the results of all of them, when taken as a whole, are significant.

Play Ball!

Hugo Cabret: It’s like a puzzle. When you put it together, something’s going to happen.

 

About the Author

Mr. Miller is an IBM certified and accomplished Senior Project Leader and Application/System Architect-Developer with over 30 years of extensive applications and system design and development experience. His current role is National FPM Practice Leader. His experience includes BI, Web architecture & design, systems analysis, GUI design and testing, Database modeling and systems analysis, design, and development of Client/Server, Web and Mainframe applications and systems utilizing: Applix TM1 (including TM1 rules, TI, TM1Web and Planning Manager), dynaSight - ArcPlan, ASP, DHTML, XML, IIS, MS Visual Basic and VBA, Visual Studio, PERL, Websuite, MS SQL Server, ORACLE, SYBASE SQL Server, etc. His Responsibilities have included all aspects of Windows and SQL solution development and design including: analysis; GUI (and Web site) design; data modeling; table, screen/form and script development; SQL (and remote stored procedures and triggers) development and testing; test preparation and management and training of programming staff. Other experience includes development of ETL infrastructure such as data transfer automation between mainframe (DB2, Lawson, Great Plains, etc.) systems and client/server SQL server and Web based applications and integration of enterprise applications and data sources. In addition, Mr. Miller has acted as Internet Applications Development Manager responsible for the design, development, QA and delivery of multiple Web Sites including online trading applications, warehouse process control and scheduling systems and administrative and control applications. Mr. Miller also was responsible for the design, development and administration of a Web based financial reporting system for a 450 million dollar organization, reporting directly to the CFO and his executive team. Mr. Miller has also been responsible for managing and directing multiple resources in various management roles including project and team leader, lead developer and applications development director. Specialties Include: Cognos/TM1 Design and Development, Cognos Planning, IBM SPSS and Modeler, OLAP, Visual Basic, SQL Server, Forecasting and Planning; International Application Development, Business Intelligence, Project Development. IBM Certified Developer - Cognos TM1 (perfect score 100% on exam) IBM Certified Business Analyst - Cognos TM1

More from this Author

Thoughts on “Simple Inferential Statistics”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Subscribe to the Weekly Blog Digest:

Sign Up