Choosing the number of observations or replicates to include in a statistical sample is sample size determination as we all know. It happens to be one of the most important features of any empirical study, where the aim is to make inferences about a population from a sample. With only 5% chance of your sample results differing from the true population average, means you can attain 95% confidence level. Margin of error can be estimated, or as you may say the confidence interval, is given by 1/√N. Here, N is the number of participants of same size.
Enlisted are some of the reasons to run sample size calculations before data collection:
1. Granting agency or committee needs it
This might be one of the good reasons, but is not over and above several others. It would be really helpful to keep other reasons in mind when the inclination is towards skipping this step or you are upset with the calculations that you are required to make. The easiest way out is to base the sample size on what prevails in the domain, we use 20 subjects per condition; or to just use the number of subjects in a similar study, so if they used 100, we will too. Occasionally one can get away with doing that.
Apart from funding, there are several other reasons with their own weightage, to do sample size estimates. Because it is not really time consuming it is worth doing, because maximum time is consumed in figuring out and writing data analysis plan to base calculations on.
2. Studies with low power are bad sources for basing your sample size on
Usually it is observed that the power of every study is just under 50%. So if it is so, it clearly indicates that the study had 50% chances of finding significant results considering the kind of sample size, effect size, and the statistical tests. Also that these studies had significant power as they were the ones which were published; but what about those studies which never got published as they didn’t have adequate power.
So it would be wise not to use the same sample size while attempting to build on such a study, as doing so will leave you with only 50% chance of replicating it with significant results. Doing your own power calculation and deriving the sample size is far more doable.
3. Not only how many participants you need, but how many you don’t need
Every sane analyst would not want to spend more than requires resources, money and energy on collecting more data that that is required. This is stressed more in cases where the study hikes the risk or inconvenience to human or animal participants. The next reason is you would not want to oversize your study as that also consumes extra resources. You don’t want to expose more participants than necessary to the risk.
4. Sample size calculations tell you you’re close, but you are not
There comes a time when you sample size calculations indicate that you are close, but upon revisiting the data you realize that you do not have adequate subjects. It is here that you can make adjustments to the study which will increase the power to your study in several ways. That one thing that you can do is to adjust the way you have been measuring the variables to add precision or switch your design to something which can give you a little more power.
You can also ensure to include some controls that can control some of the random errors. And you would be happy to increase the power without increasing the sample size.
5. Doing sample calculations for impossible analysis
It may be the last on this list, but not the least. Doing these calculations is not to waste years and hundreds of dollars in grants, or tuition pursuing impossible analysis. If sample size calculations indicate that you need a thousand subjects to find significant results; and on other hand time, money or ethical constrains limit you to 50, better not to do that study. It may be really painful, but it is good to experience pain at this point of time, than after a few years of work.
Statisticians or market research firms would certainly have a plan to succeed before data collection begins. Never begin data collection without calculating necessary sample size; may be they said this and not a wise man.
About Author: Chirag Shivalker loves working with Hi-Tech, a company thriving in the industry for more than two decades. Since then they have helped enterprise level clients globally in managing their data and providing analytics; ultimately empowering them to make insightful business decisions, take bold action, and execute quickly. He regularly writes about the importance of data management for data analytics and the changing landscape of the business process management industry.