Guided Priors

How does Palamedes set the default priors for parameters?

Here is the short, take home version:

Palamedes centers prior for location ('threshold') parameter(s) on the stimulus intensities used during testing.
Palamedes centers prior for log₁₀(slope) parameter(s) under the assumption that the stimulus intensities used during testing cover much of the dynamic range of the psychometric function (PF).
Priors for the guess rate and lapse rate are not dependent on stimulus placement.
Default priors are very wide (i.e., not very informative). If you have 'Too much model, too little data' (see Prins, 2019), your likelihood function is probably a mess (see our page on maximum likelihood fitting) and you will likely have to narrow the priors in order for the analysis to converge.
Importantly, when multiple conditions and/or subjects are combined in single analysis, identical priors will be applied across all conditions/subjects even if different stimulus placements were used. This ensures that any differences between posteriors are the result of differences in responses/behavior, not differences in priors.
Remember that you can always provide your own priors, overriding those that Palamedes came up with. We won't be offended.

Here is the long version:

The default prior distributions that are used by Palamedes for the location ('threshold') parameter and slope parameter are based on the stimulus placement that was used. This strategy is based on the assumption that stimulus intensities that are used in the experiment reflect the prior beliefs of the experimenter regarding the values of the location and slope parameters or resulted from the use of an adaptive method. Specifically, it is assumed that the intensities used are, at least approximately, centered on the expected psychometric function (PF) and are spaced such as to cover at least the bulk of the dynamic range of the expected PF or may extend somewhat beyond it. Since the default parameters of the prior distributions will be chosen in order to lead to very wide (i.e. little-informative) prior distributions, it is not critical that these assumptions are closely met.

The specific process to find values for the prior parameters for a model for a single PF is demonstrated in Figure 1. Consider the results of a 2-alternative forced-choice experiment shown in Figure 1a. The experiment used the method of constant stimuli with 8 stimulus levels x and 50 trials at each intensity. The location, slope, and lapse rate parameters are free parameters. The default prior for both the location and log₁₀(slope) parameter is a normal distribution. In order to generate values for the parameters of the prior distributions (i.e., their mean and standard deviation), Palamedes first finds the mean and the standard deviation of the stimulus intensities that were used in the experiment. Before the mean and sd are calculated, the intensities are trimmed such as to exclude the lowest and highest stimulus intensities that were used in the experiment (i.e., the stimulus intensities shown in red in Figure 1a). The remaining intensities are shown in blue and are denoted x*. When trials are not evenly distributed across the trimmed intensities (e.g., when an adaptive method was used to collect responses), these values are weighted by the number of trials used at each intensity. Palamedes disregards the lowest and highest stimulus intensities, because it is good practice to place the lowest and/or highest stimulus intensity at a stimulus value far outside of the dynamic range of the expected PF where it has reached (near) asymptotic levels (see Prins, 2012; also see palamedestoolbox.org/parameterredundancy.htm).

Figure 1. See text for details.

The mean and standard deviation of the trimmed intensities are given in Figure 1a as M(x*) and SD(x*), respectively. The mean of the prior distribution for the location ('threshold') parameter is set to the mean of the trimmed stimulus intensities M(x*) used in the experiment (here: 5.500). By default, the standard deviation of the prior distribution for the location threshold is set to four times the standard deviation of the stimulus intensities used (here: 4 x 1.708 = 6.832). The resulting prior distribution for this experiment is shown in Figure 1b by the thick orange line. Note that this prior is near uniform within the entire stimulus range used in the experiment (i.e., it is a relatively non-informative prior). We call the multiplicative factor on the sd of the prior the prior width factor for the location parameter and it may be changed from the default value (4) by the user. The light orange lines in Figure 1b show priors using location prior width factors of .5, 1, and 2. The blue distribution in Figure 1b shows the posterior distribution for the location parameter when the default value of 4 was used for the location prior width factor.

The mean of the prior distribution for the log₁₀(slope) parameter is chosen such that the middle 68% of the range of F(x; α, β) (see equation 1) corresponds to a width equal to twice the standard deviation of the (trimmed) stimulus intensities [SD(x*)] used in the experiment (see Figure 1c).

Equation 1: Standard equation of psychometric function

The curve shown in black in Figure 1c corresponds to F(x; α, β) with alpha and beta parameters corresponding to the means of the prior distributions for the location and log₁₀(slope) parameters. Note that for the normal distribution, the mean, median, and mode are equal. In other words, this curve is, based on the placement of stimuli, considered to be the most likely model for F(x; α, β). The standard deviation for the prior on the log₁₀(slope) parameter is by default set to 1.5. We refer to the standard deviation of this prior as the prior width factor for the slope parameter. Remember that, since the slope parameter is the log₁₀ transform of the 'raw' slope parameter (β in equation 1), each unit difference in the log₁₀(slope) parameter corresponds to an order of magnitude difference in the 'raw' slope and the 'width' of the PF. Thus, one standard deviation of the prior distribution away from the mean corresponds to a function that is 10^1.5 = 31.6 times as wide (or narrow) relative to the PF that is at the mean and mode of the prior. The thick orange curves in Figure 1c show PFs with log₁₀(slope) values that are 1.5 units away from the prior's mean value. From this, it is clear that the default prior on log₁₀(slope) also is not very informative. The user can change the informativeness of the prior by choosing a different value for the prior width factor for the slope parameter. The thin orange curves in Figure 1c show PFs with log₁₀(slope) values that are 0.3, 0.6, and 1 removed from the curve shown in black. These would correspond to PFs that are 10^0.3 (≈ 2), 10^0.6 (≈ 4), and 10¹ (= 10) times as steep/narrow or shallow/wide as the black curve. Figure 1d shows the corresponding prior distributions. The blue curve shows the posterior distribution for the log₁₀(slope) parameter based on the analysis. The orange curve in Figure 1a shows a PF with α, β, and λ values that correspond to the modes in the respective prior distributions.

When the analysis includes multiple conditions and/or subjects:

In the case of an analysis in which multiple conditions and/or subjects are included, the values for the priors are based on the intensities used in all condition/subject combinations. Importantly, the same prior will then be applied to all conditions and subjects in the analysis. This is important in order to ensure that any obtained differences in posterior distributions obtained cannot be attributed to differences in priors that were used, but instead can only be attributed to performance differences between conditions and/or subjects. Figure 2a show the results of a simulated experiment with 3 subjects (one in each row), each testing in 2 conditions (one in each column). Data collection was controlled by the adaptive psi-marginal method with the lapse rate marginalized.

Figure 2. See text for details.

In order to determine the values for the priors' parameters, for each subject/condition combination the mean and sd of the stimulus intensities is computed (again, intensities are trimmed by excluding the lowest and highest intensities used). These values are given in the plots in Figure 2a. The mean of the prior for the location parameters is then set to the mean of the means of the stimulus intensities used in each condition/subject combination. In this example:

where M_s,c(x*) is the mean of the trimmed stimulus intensities for subject s, condition c and S and C are the total number of subjects and conditions, respectively, in the analysis.

The standard deviation of the prior for the location parameter is set to the default location prior width factor (i.e., 4) times half the distance between the highest obtained mean plus the highest obtained standard deviation and the lowest obtained mean minus the highest obtained standard deviation. In this example:

The resulting default prior distribution is shown by the thick orange curve in the left panel of Figure 2b. The light orange curves show the prior distributions for location prior width factors of .5, 1, and 2. In case there is only a single subject, this prior will be applied to the location parameters for each condition of that single subject. In case there are multiple subjects included in the analysis (as is the case in the example here), the prior will be applied to the hyper parameter for the location mean ('amu') for each condition. The blue curves show the posterior distributions for the two hyper parameters (one for each condition) for the location mean that resulted when the default prior was used. It is clear here again that the default prior is not very informative.

In the case of multiple subjects, we also need a prior for the hyper parameter for the location standard deviation ('asigma'). The default form of the prior for this parameter is the uniform distribution. The lower bound for the uniform distribution will be set to 0, while the upper bound will be set to the same value that is used for the standard deviation of the prior for the location parameter (here: 19.124). The uniform distribution shown in thick orange lines in the left panel of Figure 2c corresponds to the default prior for the standard deviation of the hyper parameter for the standard deviation of the location parameters. The lighter orange curves show the priors when the location prior width factor is set to .5, 1, or 2.

The prior for the log₁₀(slope) parameter is chosen as in the case for a single PF except that now it is based on the average standard deviation of the trimmed stimulus intensities. Here, the average standard deviation equals:

Thus the mean for the prior on log₁₀(slope) values is set to 1.718. The standard deviation of the prior for log₁₀(slope) values is again set to the default prior width factor of 1.5. When only one subject is included in analysis these values are applied to the priors for the log₁₀(slope) values in all conditions. When more than one subject is included in the analysis, the mean and standard deviation values are applied to the priors for the hyper parameters corresponding to the mean log₁₀(slope) value ('bmu') across subjects for each condition.

In the case of multiple subjects, we also need a prior for the hyper parameter for the log₁₀(slope) standard deviation ('bsigma'). The default form of the prior for this parameter is the uniform distribution. The lower bound for the uniform distribution will be set to 0, while the upper bound will be set to the default value for the slope's prior width factor. The uniform distribution shown in thick orange lines in the right panel of Figure 2c corresponds to the default prior for the standard deviation of the hyper parameter for the standard deviation of the log₁₀(slope) parameter. The lighter orange curves show the priors when the prior width factor is set to .5, 1, or 2.

In case a custom model matrix (see www.palamedestoolbox.org/modelmatrix.html) is supplied, the priors for the parameters that the model matrix defines (which we refer to as reparameters) are determined as above but should be appropriately scaled as explained below. As an example, let's say that in the analysis above the location and log₁₀(slope) parameters were reparameterized using the following model matrix:

The first row defines a reparameter that corresponds to the sum across conditions, the second defines a parameter that corresponds to the difference between conditions. Above we argued that based on stimulus placement, a value of 5.250 would be a relatively likely value for each of the two location hyper parameters (one for each condition) and thus we center the priors for each condition on that value. However, if the above model matrix is used, the first reparameter corresponds to the sum of the location parameters in the two conditions. The result is comparable to a rescaling of the stimulus intensities by a factor equal to the sum of the coefficients. Of course, when we rescale a dataset by a multiplicative factor, the mean of the dataset scales by the same multiplicative factor. For example, it is easy to see that if a value of 5.250 is a likely value for the location in each condition, a value of 2 x 5.250 = 10.500 would be a likely value for the sum of the location parameters in the two conditions. Generally, the value for the mean of a prior for a reparameter should be scaled by the sum of the coefficients that define the reparameterization. Note that when we apply the same rule to the second reparameter, the prior for it will be centered on 0. Effectively, this favors the value of 0 for any difference in the values for the parameter across conditions (i.e., it favors the hypothesis that there is no effect of the variable that is manipulated between conditions).

The standard deviation of prior distributions, on the other hand, should be scaled by the sum of the absolute values of the coefficients defining the reparameters. For this example, the standard deviation for the first as well as the second reparameter defined by matrix M above are scaled by a factor of 2.

References

Prins, N. (2019). Too much model, too little data: How a maximum-likelihood fit of a psychometric function may fail and how to detect and avoid this. Attention, Perception, & Psychophysics. https://doi.org/10.3758/s13414-019-01706-7

Prins, N. (2012). The psychometric function: The lapse rate revisited. Journal of Vision, 12, 25. https://doi.org/10.1167/12.6.25