How can I test that my data are normally distributed for a 2AFC experiment?

rhoslynroebuck · Post by **rhoslynroebuck** » Wed May 14, 2025 4:55 am

I am analysing data that I obtained from a 2AFC experiment in which participants discriminate the rigidities of two objects. Due to experimental limitations, I had to pre-specify the stimulus values instead of using an adaptive procedure.

I have my experimental results in a matrix containing the number of correct responses per stimulus value for each participant. I combine the data for all participants per stimulus value, and fit the psychometric curve to the aggregated data using PAL_PFML_Fit in order to calculate the just-noticeable-difference (JND). I then want to calculate the standard deviation of this JND using either PAL_PFML_BootstrapParametric or PAL_PFML_BootstrapNonParametric. Of course, I need to know whether my data are normally distributed in order to choose whether I use a parametric or nonparametric procedure.

I am familiar with how to check for normal distributions—I usually use Python to run tests (e.g. D’Agostino & Pearson’s test and the Shapiro-Wilk test in the SciPy package) and plot histograms. However, for this kind of data I'm unsure exactly which distribution I should be testing for normality. I don't have a JND for each participant (as I did not use an adaptive procedure and I'm using aggregated data to obtain a single JND for the entire dataset), otherwise I would test that these JND values follow a normal distribution.

Is there another distribution that I am able to plot to test for normality? How can I choose whether to use a parametric or nonparametric bootstrap procedure to estimate the standard deviation of the JND?

Thanks in advance for any advice and please let me know if I have missed any information.

Post by **Nick Prins** » Wed May 14, 2025 4:07 pm

The assumption to test here is not normality of the data. The difference between the parametric bootstrap and the non-parametric bootstrap is that the parametric bootstrap assumes that the fitted psychometric function (PF) is accurate. The non-parametric bootstrap does not make that assumption. ‘Parametric’ in this particular situation means that you summarized how probability correct varies with stimulus intensity by way of the PF. That allows you to describe the relationship between intensity and proportion correct using the parameters (threshold, slope, etc) of the PF. The non-parametric alternative is not to make assumptions about how probability correct varies with intensity, but instead to allow each stimulus intensity to have its own estimated probability correct that is independent of what the probability correct is at other stimulus intensities. This is called the saturated model. The saturated model does not assume a specific form of the PF (logistic or cumulative normal or whatever) or even that probability correct increases with increasing stimulus intensity. Of course, the saturated model is not very useful. It doesn’t allow you to estimate a JND or anything. In order to do that you must fit a PF (i.e., use a parametric model).

So the assumption to test here in order to justify a parametric bootstrap is not whether your data are normal. That would be the appropriate test in a situation where the parametric model makes the assumption that the data are normally distributed. Of course, very many standard tests do make that assumption and for those testing normality is the appropriate test to justify the parametric test. But when fitting a PF the assumption that the parametric model makes is that the relationship between probability correct and stimulus intensity is accurately described by the fitted PF. So that’s the assumption that needs to be tested to justify the parametric bootstrap here. This assumption is tested using a goodness-of-fit test. PAL_PFML_GoodnessOfFit will perform that test.

Now, if the goodness-of-fit test indicates that your PF does not fit the data very well, that means that your model is no good and there is really not much point in running either a parametric or a non-parametric bootstrap.

I think the strategy here is to test whether your fitted model fits the data well using a goodness-of-fit test. If your model passes the goodness-of-fit test you can perform either a parametric or a non-parametric bootstrap. Opinions might differ on this but I would go with the parametric bootstrap. If your model does not pass the goodness-of-fit test you can tinker with your model until it does. E.g., if you freed your lapse rate in the original fit, try fixing it (contrary to very popular belief, freeing the lapse rate does not necessarily make for a better model or a better goodness-of-fit). Keep in mind that there is a bit of an issue with such post-hoc tinkering with models until you find a model that fits well. Our position (which may differ from the opinion of others) is that it’s okay to tinker all you want as long as you understand the danger and, critically, report all the tinkering you did instead of pretending that the model you end up was the first and only one you tried.

You may have a hard time getting a good fit. The practice of combining data from multiple participants before fitting a single PF to the combined data is a bit questionable. Let’s say the probability correct as a function of intensity of an individual participant is well-described by a cumulative normal PF. That is a reasonable assumption that can be theoretically defended. However, that does not mean that the combined data will also follow a cumulative normal PF. They almost assuredly would not. It also makes it difficult to interpret your parameters. For example, the slope of the PF fitted to the aggregated data would be less than the slope of individual participants (unless all participants have the exact same threshold).

I assume you decided to combine data before fitting PF (as opposed to fitting PFs to data from individual participants) because you couldn’t get nice fits to individual participants(?). If so, you could try a Bayesian Hierarchical fit (using PAL_PFHB_fitModel). That allows you to fit all participants in one go (but not by combining their data before fitting). Through the hierarchical structure, information about likely threshold or slope values is shared between participants. This occurs even without using an informative prior.

rhoslynroebuck · Post by **rhoslynroebuck** » Mon May 19, 2025 2:49 am

Thank you so much for the detailed explanation!

I will test the goodness-of-fit to see if it's reasonable, and then continue with the parametric bootstrap.

Regarding the approach I have taken to fitting, yes, I aggregated the data because I couldn't get good fits for individual participants (I have a very low number of repeats per stimulus value). I have tried using PAL_PFHB_fitModel, and this gave some good fits for some of my data, but not others. I think this is due to a sampling problem, so I will increase the number of repeats for my upcoming study.

Thank you again for the help.

Palamedes Toolbox Forum

How can I test that my data are normally distributed for a 2AFC experiment?

How can I test that my data are normally distributed for a 2AFC experiment?

Re: How can I test that my data are normally distributed for a 2AFC experiment?

Re: How can I test that my data are normally distributed for a 2AFC experiment?