Using simulation to understand properties of random variables
A central topic in statistics is determining the properties of different random variables (e.g. what are their expectations and variances?). Additionally, we often obtain results for the large-sample behavior of an estimator, but how good is that approximation in a particular finite sample? Often these properties are difficult or cumbersome to analyze theoretically.
A very common tool that is used to obtain numerical results in lieu of analytical ones is the monte carlo simulation. The principle is simple - the law of large numbers.
We can approximate the properties of some random variable \(X\) – typically expectations of functions (which includes the mean, the variance, etc…) – by taking a repeated number of i.i.d. draws from that random variable and then computing the corresponding sample quantity of interest from the distribution of those draws. For example, if we wanted to know the expectation of some random variable, we can take a large number of independent, repeated draws from that random variable, store those draws, and compute the average. As we let the number of draws get arbitrarily large (our only limitation is computing resources and time), this will converge to the true expected value.
In practice, you see simulations in statistical methods papers all the time, often to illustrate certain properties of an estimator where the intuition may not be clear just from the analytical result or to get some sense of properties that are difficult to derive. For example, in papers where much of the theory relies on asymptotic approximations, we may use simulations to get a sense of how good the approximation is in small samples or to compare the performance of different estimators across fixed sample sizes.
You also will find simulations useful as a way of checking analytical results - it’s easy for a proof to go wrong or to be unsure of some of the steps, so it can help to use a simulation to understand what the correct answer should be at least for a particular set of parameter values. Simulations can be a great way of generating intuition when analytical results are hard to come by.
Acknowledgments
Special thanks to Zikai Li for developing an earlier version of this tutorial for PLSC 30600 at the University of Chicago.