Last modified: 28 March 2017

#### Abstract

Maximum simulated likelihood estimator is commonly used by researchers dealing with discrete choice models. It allows for the estimation of wide variety of models, i.a. models in which unobserved preference heterogeneity is directly accounted for, such as the Mixed Multinomial Logit (MXL) or Hybrid Choice. These models require simulation-based solving of multidimensional integrals what can lead to identification problems, spurious convergence, numerical computation issues, and simulation bias. In this paper we focus on the last of these issues – we investigate to what extent the type and number of draws can influence the results. What we are interested in is how the simulation bias depends on (i) the type of draws used, (ii) the number of draws, and how these results change as (iii) the number of choice tasks per respondent and (iv) the number of respondents varies. To this end, we analyze 27 artificial datasets generated with Mixed Logit data generating process in which all 5 parameters are random and follow normal distribution. The datasets differ with respect to the number of individuals (400, 800 or 1’200), the number of choice tasks (4, 8 or 12) and design of a Discrete Choice Experiment (orthogonal optimal in the differences design, d-efficient design optimized for the MNL model, d-efficient design optimized for the MXL model). For each dataset we estimate 100 MXL models with 4 different type of draws: (i) pseudo random numbers (pseudo Monte Carlo, PMC), (ii) modified Latin hypercube sampling (MLHS), (iii) randomized scrambled Halton sequence (RSH) and (iv) randomized scrambled Sobol sequence (SOB), each using 100, 200, 500, 1’000, 2’000, 5’000 or 10’000 draws, and for Sobol sequence also 20’000, 50’000 and 100’000 draws. Each specification is estimated 100 times as all type of draws are randomized, therefore in total we estimated 78’300 models. As a result, our large scale, comprehensive simulation study provides interesting comparisons and is free of several drawbacks of earlier studies, such as low maximum number of draws and low number of repetitions.

In our analysis we mostly focus on 3 characteristics of estimated models: log-likelihood values, estimated coefficients and standard errors. Figure 1 presents an example of how log-likelihood varies for the given dataset, type and number of draws. This variation can be quite substantial for low number of draws and may easily change inference based on e.g., Likelihood Ratio test. We use similar results to compare the performance of different types of draws, and suggest the minimum number of draws to use for the likelihood-ratio test based inference to have sufficiently high precision (i.e., low enough probability that the result is due to simulation error).

**Figure 1. **Standard deviations of Log-likelihood values for different numbers and types of draws for artificial dataset with 1200 respondent and 12 choice tasks.

In Figure 2. we illustrate how identification of significance of a coefficient can be misled by using too few draws for simulations. For 100 draws, less than 50% of estimates were significant, while at the same time it occurred significant in 100% cases with 2’000 Sobol draws. For more than a half of the sets of 100 draws the lack of the coefficients’ significance was therefore spurious. These examples show that using low number of draws, even if they are a quasi-monte carlo sequences, can lead to misleading inference from the choice model.

**Figure 2.** Percentage of significant coefficients (standard deviation of one random parameter) for different numbers and types of draws for artificial dataset with 1200 respondent and 4 choice tasks (significance on 5% level).

Throughout the paper we use two one-sided convolutions tests (TOSC) to compare performance of different types and numbers of draws. We use this measure, as it has intuitive interpretation of expected differences between given values (e.g., values of log-likelihood) in probability terms. It allow us to make statements like “*if one uses 1’000 draws instead of 10’000 draws, the difference in log-likelihood values will be no larger than with 95% probability*”. Analogous statements can be made for parameter estimates and Z-statistics.

The main findings of our analysis are as follows. Firstly, we find that scrambled Sobol sequence numbers perform best, although they are closely followed by the scrambled Halton sequence. Contrary to some earlier findings in literature, modified Latin hypercube sampling draws provide significantly worse results. Not surprisingly, we also found, that MXL design outperforms MNL and OOD in majority of cases, and therefore, using designs optimized for MNL require higher number of draws for simulations to obtain the same precision as if designs optimized for MXL were used. Lastly, we provide recommendations for the number of draws researchers should use, which depends on datasets characteristics and the requested precision. For example, we found that relatively low number of Sobol draws (100-500) is needed to properly recover means of random parameters. This is no longer true for their standard deviations – in small datasets even 1’000 Sobol draws may not be sufficient. In general, our findings imply that researchers using discrete choice models, which involve Maximum Simulated Likelihood estimator, should utilize more draws than what is currently considered state-of-practice.