To Pay or Not to Pay: Measuring Risk Preferences in Lab and Field

Measuring risk preferences in the field is critical for policy, however, it can be costly. For instance, the commonly used measure of Holt and Laury (2002) relies on a dozen lottery choices and payments which makes it time-consuming and costly. We propose a short version of the Holt and Laury (2002) which produces in the lab (Spain) the same results as the long HL. Using the short HL in the field (Honduras and Nigeria), we observe that paying or not for the measurement of risk preferences produce the same findings using a faster and cheaper measure.


Introduction
Measuring risk preferences is fundamental for modeling and predicting how people make choices. Risk preferences matters for human capital investment and career decisions (Weiss, 1972), technology adoption (Liu, 2013), financial portafolios (Schubert et al., 1999), insurance (Friedman, 1974) and migration (Katz and Stark, 1986); as a result, there is a growing interest by social scientists in measuring these preferences in observational and experimental studies.
While there is a common practice of using monetary incentives in lab experiments when eliciting risk preferences, in the field, the use of such incentives is less common.
In particular, when data collection involves thousands of observations as in national representative household surveys and impact evaluation data, researchers tend to use hypothetical measures of risk preferences.
Standard non-hypothetical measures of risk preferences use incentive compatible devices to elicit individual preferences for risk. This approach entails three important implications: i) subjects make real choices (hence, risk preferences are not self-reported as in hypothetical measures), ii) choices have real monetary consequences, and iii) subjects' final earnings depend on their choices and nature. Therefore, the distribution of payoffs is unequal across participants given the chances of getting high or low payoffs after choosing between lotteries.
This inequality in payoffs may also create a sense of unfairness among participants even when they are fully aware of what they can get and the chances of getting it when making choices. Knowing what they can get and in which circumstances is true for different devices such as those suggested by Holt and Laury (2002); Gneezy and Potters (1997); Eckel and Grossman (2008) and Crosetto and Filippin (2013).
As a consequence, moral concerns emerge in the lab and field when using incentive compatible mechanisms. If payments are real and unequal, experiments using lotteries generate ex-post inequality among participants. This inequality is inevitable since any alteration of the randomness of the lottery would imply deception to the participants.
Hence, paying experimental subjects real money when using lotteries generates inequality. This issue raises further concerns in the field when unequal payments create tensions among neighbours, inhabitants living in the same communities, or even between experimenter and field participants. Because of these potential risks, it is unclear whether using lotteries is acceptable for only research purposes.
Paying subjects with real money -the gold standard in lab experimental economicsinstead of using hypothetical choices, as in observational studies and many impact evaluations, is therefore an issue. There are two reasons why incentives have been acclaimed to be necessary: i) apparently, when payoffs are hypothetical and subjects do not risk their own money, they are more likely to be more risk loving; and ii) in the absence of monetary incentives, subjects do not put enough effort on the task and consequently, they make random choices 1 .
Our study tests to what extent the use of hypothetical payoffs has an impact on the measurement of risk preferences. This study has two important features. First, we ran a series of lab experiments in Spain (April, 2019). A reduced version of the Holt and Laury (2002) task with only 5 choices (hereafter HL5) was introduced to reduce the number of choices and time that participants are exposed to. The original HL considers 10 choices.
We ran a trial in the lab to test whether the reduced HL5 differs from the original HL in terms of risk aversion and inconsistency. We also ran a series of experiments where participants were randomly assigned to three treatment arms with probability 1/3. Each treatment differs from the other in payment schemes: real payment, paying 1 out of 10 (BRIS) 2 and no payment at all (hypothetical). We show that HL5 does not differ from HL regardless of the payment scheme.
Second, we bring our reduced form of the Holt-Laury test (HL5) to the field. We ran two field experiments in Kano (Nigeria, April 2019) and Copán (Honduras, May 2019). We test whether paying or not to the participants produces different profiles of risk preferences. As in the lab experiment, all participants were randomly assigned to a real payment, payment with probability 1 out of 10 and no payment at all. To compare risk preferences across treatment arms, we contrast three dimensions of the measure: number of inconsistent subjects, number of safe options (under consistency) and response time.
We find that hypothetical and probabilistic payments have no impact on consistency, number of safe choices or response time. This means that when field experimentalists elicit risk preferences using hypothetical measures, they can trust that these are consistent across payment schemes (i.e., real, probabilistic and hypothetical) and that this can be done faster (and therefore cheaper) by using a short version of the Holt-Laury approach. However, paying only a fraction of the sample (probabilistic payment) introduces noise in the elicitation of risk preference in the lab: subjects become more risk loving when only a fraction of them is paid than when all of them are paid. A possible explanation is the idea of a "hot hand": participants may assume that if they were lucky in the first lottery, they will also be lucky in the second one (Miller and Sanjurjo, 2018).
While the question about the effect of incentives on risk elicitation is not new in the lab, the evidence is somehow mixed and even scarce about their effect in the field. Wiseman and Levin (1996) show that subjects make the same risky decisions under real and hypothetical consequences. In the same line, Kühberger et al. (2002) show that hypothetical choices match real choices for small as well as for large payoffs. Conversely, Laury (2002, 2005) find that increasing the size of real payoffs leads to more risk averse behavior than hypothetical payments. Etchart-Vincent and l'Haridon (2011) and Barreda-Tarrazona et al. (2011) find that real and hypothetical choices significantly differ in the gain domain.
Our study provides robust evidence on the elicitation of risk preferences from two

Dimensions of Risk Preferences
In Study 1, 2 and 3, we compare three dimensions of the elicitation of risk preference under the three payment schemes. The first one is consistency, that is, whether subjects make inconsistent choices. A typical example of inconsistency is multiple switching from A to B and then from B to A. Any switch from B to A is indeed another example of inconsistency. The second dimension is the number of safe choices (A instead of B) under consistency. Finally, we compute response time, that is, the duration of the task.
This last dimension was only measured in Study 2 and 3. Here we summarise our dimensions: C i Consistency: whether the subject makes consistent choices or not. RT i Time: number of seconds spent by the subject to complete the task.

Experimental Protocol
Reduced vs. Long HL Using a sample of participants in Spain (Study 1 ), we consider two experimental arms with 1/2 probability to be selected in one or the other. Subjects are exposed to the reduced HL (5-item HL -HL5) or the long version (10-item HL -HL10).

Payment Schemes
Study 1, 2 and 3 consider three experimental arms with 1/3 probability. Subjects are exposed to payment schemes under the first two arms and to no payment scheme under the third one. The difference across treatments is the probability of being paid, p: T R : p = 1; T B : p = 1/10 ; T H : p = 0 Under T R , all subjects are certain about receiving a payment, whereas under T H subjects are certain of receiving no payment. For each study, the entire sample was randomly assigned to one out of three treatment arms: T R refers to real payments, T B to 1 out of 10 and T H to hypothetical payments. Subjects were fully aware of their payment scheme before the elicitation of their preferences.

Use of enumerators
Both field experiments (Study 2 and 3 ) were conducted by enumerators who were trained by the authors. Using enumerators implied that subjects did not self-manage the instructions and these were read and explained by the enumerator. Enumerators used computer-assisted personal interviewing (CAPI) questionnaires in Nigeria and paperbased in Honduras. In both cases, enumerators received a list of households they had to visit, including the type of questionnaire they had to apply (i.e., T R , T B or T H questionnaire). The authors conducted the random allocation of treatments prior to the visit to the communities and the enumerators did not have any influence on such selection. To be sure the enumerators were applying the corresponding questionnaire to the households, a field coordinator monitored the correct use of the lists created by the researchers in the ground. Prior to running both experiments, the authors piloted the risk preference questionnaire with around 20 subjects to ensure the translations in Hausa (Nigeria) and Spanish (Honduras) were appropriate to the context. All questionnaires and instructions were originally written in English.
For both field experiments, enumerators conducted all face-to-face interviews in the households of the participants. Only one experimental subject was interviewed per household.
In sharp contrast, Study 1 did not use enumerators for data collection of risk preferences. Subjects self-managed the instructions in a lab.

Study 1 : The lab experiment
This lab experiment was run in the School of Economics and Business at the University of Sevilla, the largest public university in Andalusia (Southern Spain). The experiment used paper-based questionnaires and was conducted the last week of April 2019.
In the majority of lab experiments, participants are university students who are selfselected into the experiment and hold high levels of education, see Exadaktylos et al. (2013). The main advantage of lab experiments is the absolute control of the conditions faced by the subject: a) participants cannot interact among them, unless interaction is required by the experiment; and b) there are no external distractions. For these reasons, a lab experiment is the cleanest approach to test our main research questions: i) do monetary incentives make a difference in the elicitation of risk preferences? and ii) can we use a simplified version of the HL measure and still being informative as the original 10-item HL?
The HL5 used in the lab is identical to the multiple price list (MPL) shown in subsection 2.1 but adjusting the monetary values according to the Spanish context (amounts were adjusted to be meaningful to Spanish participants and expressed in Euros). Thus, for Study 1 lottery A offers 5 euros with q probability and 4 euros with (1 − q); lottery B offers 10 and 0.1 euros with probabilities q and (1 − q), respectively.
The rest of the section is organised as follows. First, we show sample sizes and balance tests for all treatments considered in Study 1. Then we provide details on the ethical approval and pre-registration for this experiment. Next, we present results addressing our two main research questions and finally, we discuss the implications of this study for field experiments.

Sample size and balance
The entire sample consisted of n=178 Spanish subjects 4 . Every subject was randomly assigned to one out of three payment schemes (T R : 60, T B : 57, T H : 62). Table 1 shows the balance between sub-samples. Table 1 shows no differences across treatments regarding age, gender, cognitive reflection test (CRT) and courses (freshmen, sophomore, etc). Participants included in the BRIS treatment are (p < 0.1) more likely to belong to year 1 (freshmen). Applying a Bonferroni correction to the p-values, the coefficient is not longer significant. Note: H refers to hypothetical, R real payment and B probabilistic payment with 1 out of 10 chance of winning (BRIS). Inference was made using robust standard errors. *** p < 0.01, ** p < 0.05, * p < 0.1. R refers to Real, H to Hypothetical and B to BRIS

Ethics committee and pre-registration
The study was approved by the Ethics Committee of Loyola Andalucía University. All participants signed an informed consent prior to starting the experiment.
The lab experiment was pre-registered in April 10, 2019 and made public in April 25, 2019. 5  Finally, columns 4 and 5 provide the regression coefficients of each treatment for the two outcomes variables: consistency and risk aversion responses. For C i , we estimate a linear probability model (LPM) and for RA i a negative binomial regression model. All regressions control for age, gender, CRT and session fixed effects.

Results: The impact of payment schemes
Column 5 in Table 2 shows that risk preferences measured under real payments do not differ from those under hypothetical payment. This is true for the number of consistent individuals (C i ) and the level of risk aversion (RA i ).
However, column 4 shows that paying 1 out 10 (T B ) has an impact. While there is no difference in the number of consistent individuals, the number of safe choices is fewer under T B . Subjects who are paid under p = 1/10 are less likely to be risk averse, that is, they are more risk loving. The coefficient is significant at the 1% significance level. Since T B involves two lotteries (to be selected for payment and the HL), a possible explanation might be that subjects consider that both lotteries are correlated 6  6 Under the idea of a "hot hand", participants may assume that if they were lucky in the first lottery, they will also be lucky in the second one (Miller and Sanjurjo, 2018  As robustness check, in the Appendix we show that these results are hold using different specifications 7 . From the findings of Study 1, we conclude: Result 1: Compared to hypothetical payments in the lab: paying all the subjects have no impact in consistency or number of safe choices.
Result 2: Compared to hypothetical payments in the lab: paying 1/10 generates no differences in terms of consistency but decreases the level of risk aversion.

Testing the reduced Holt-Laury
As noted before, our field set-up uses a reduced version of the HL test. In here, we test whether this trimmed version has any potential impact on the individual choices (consistency and risk aversion). The second lab experiment was conducted with 40% of the original pool of participants and took place in the same place as the previous 7 Table A1 shows the results for C i . Columns 1 reports the results with no controls:β T B = experiment on payment schemes, see footnote 4. In this study, subjects were randomly assigned to one out of two experimental arms (by treatments, T R : 60, T H : 59). Observe that we do not have T B in this sample. Table 3 addresses two main questions: i) whether paying or not has an impact on risk choices in the HL10 and ii) whether short (HL5) and long (H10) lottery tasks generate similar risk preference profiles. 8  Column 4 and 5 addresses whether HL5 produces different risk preference profiles from HL10 or not. 9 Column 4 -β 5R|10R -compares HL5 and HL10 under real payments. Regression uses the same controls and fixed effects noted before. No significant differences are found.
Column 5 -β 5H|10H -compares HL5 and HL10 under hypothetical incentives. Regression also uses the same controls and fixed effects. We do not find significant differences between HL5 and HL10 when using hypothetical payments.
From this second part of Study 1, we conclude: Result 3: Compared to the HL10 in the lab: The HL5 produces similar outcomes in terms of consistency and number of safe choices regardless the payment scheme.
8 Hereafter we use HL10 to refer to the 10-item HL to emphasise the number of choices under the original Holt-Laury measure. 9 We use the samples of HL5 R and HL5 H from the previous study.

Discussion
Study 1 analyses risky decision-making in the lab. We address two main questions: i) whether the payment scheme impacts the measurement of risk preferences and ii) whether the instrument (the reduced HL5 task) affects such measurement.
Result 1 shows that paying all the subjects or none have no significant effect on consistency or number of safe choices. This is a critical issue for field data collection since samples (and therefore costs) are substantial when household visits are required.
Result 2, on the contrary, shows that BRIS payments may have an impact on risk taking (although do not alter inconsistency).
Result 3 shows that the HL5 we used in the field has no impact on the measurement of risk preferences: neither in consistency nor in number of safe choices. This is also relevant for field studies where time constraints are frequently present.
Overall, Study 1 served for two purposes: to test whether incentives are important (and apparently are not) and to test whether our "fast" instrument is informative. In the following two sections, we will test Result 1 and 2 in the field -Nigeria and Honduras -using the short version HL5.

Study 2 : Field in Nigeria
This study was conducted in Kano (Nigeria) in April 2019. We ran the field experiment in three different villages in Kano: Dorayi, Ja'en and Gidan Maharba where 360 households were randomly selected. Our sample in each village was selected according to the eligibility criterion of having at least one child between 6 and 9 years old.
The experiment was composed of four modules: social norms (coordination games), subjective expectations, time and risk preferences. The HL5 was the last module we elicited. Random assignment to T R , T B or T H treatments remained the same throughout the entire experiment. Response time (RT i ) was collected using CAPI for all four modules.
To elicit risk preferences, we used the same HL5 as in the lab experiment. We adapted the currency to Nairas (see the example in subsection 2.1). On average, subjects earned 126 Nairas in the HL5 section 10 .
The rest of the section is organised as follows: First we show the sample size and balance across treatments. Then, we describe the ethical approval and pre-registration.
Last, we show results and discuss the implications of our main findings for field experiments. 10 We planned payments in order to cover one-day average wage for the entire experiment. This daily average wage was equal to 1080 Nairas (3 US$). We paid (on average) 350N in coordination, 405N in time discount and 126N in risk.

Sample size and balance
Sample size is n = 360 (by treatments, T R : 120, T B : 124, T H : 116). Table 4 shows the balance between subsamples.
In this sample, 25% has no education, 8% has completed primary education and 32% has completed the secondary education. Around 40% of the sample is female and the average age is 39 years old. This sample is quite different from the sample used in the lab, but reflects common characteristics of the populations studied in Development Economics and who are subject to interventions aiming at increasing financial and human capital investment. Table 4 shows the balance across treatments. We observe significant differences in age, where subjects in the real and BRIS treatment (T R and T B ) have 2.7 years more than those in the hypothetical treatment (T H ). Another significant difference appears in the number of children at age 6-9, where those in the BRIS treatment (T B ) have a higher number than in T H . However, all these differences were no longer significant after adjusting p-values using Bonferroni corrections.   where n c and % reflects the number and percentage of consistent subjects, respectively.

Results
The second line focuses on number of safe choices or risk aversion (RA i ) and the last one  Note: Column 4 and 5 reflect the regression coefficients for each outcome variable (rows) in the BRIS and Real treatments (columns). We use the treatment with hypothetical money as a reference group. The time variable is expressed in seconds. Robust p-values are presented in parentheses. Significance levels: *p < 0.10, **p < 0.05 and ***p < 0.01. Table 5 the fraction of consistent individuals, the mean of safe choices and the response time are nearly the same across treatments, which implies that there are no significant differences.

As shown in
To test whether there are statistical differences between treatments we run different regression models: a linear probability model for C i , a negative binomial regression for RA i and an OLS for RT i . All models control for enumerator effects, age, gender, education level and income. The last 2 columns of Table 5 show the results of these regressions: column 4 focuses on the BRIS treatment and column 5 on the real case.
The reference group is the hypothetical payment in both cases.
For C i (first row), we do not find any significant effect between T B and T H . The estimated impact of paying 1/10 (vs hypothetical payment) is not significant (p = 0.835).
Similarly the estimated impact of paying all subjects (real) is not significant (p = 0.388). For RA i (second row) no significant differences are found between treatments either.
The estimated coefficients T B and T R are not significant (p = 0.459 and p = 0.867).
These results are also robust to different model specifications (see Table B2 in Appendix).
Finally, RT i (last row) offers some differences across treatments: While there is no effect for T B (p > 0.1) we find that T R is positive but weakly significant (p = 0.082).
Indeed subjects making incentivized choices (with probability equal one of received a payment) are 15 seconds slower than those making hypothetical choices. However, these results are not robust to different model specifications (see Table B3 of in Appendix).  In this study, we find that the payment scheme has no impact on consistency and the number of safe choices. For the last outcome, it also shows no difference in distributions between payment schemes. However, we find a slightly significant effect of real payments over the response time.
Here we summarise our main findings for Study 2 : Result 4: Compared to hypothetical payments in the field: paying all subjects have no impact in consistency or number of safe choices.
Result 5: Compared to hypothetical payments in the field: Paying 1/10 payments have no impact in consistency, number of safe choices or response time.
Note that the fact that subjects are (weakly) faster in hypothetical choices is not included in R4 since its not consistent.

Discussion
Results provided in Study 2 show that elicitation of risk preferences with monetary incentives in the field provides the same results as the hypothetical ones.

Study 3 : Field in Honduras
The main purpose of this experiment is to test whether Results 4 and 5 replicate in a different location (Banerjee et al., 2017;Peters et al., 2018;Al-Ubaydli et al., 2017).
We ran a replication of the Nigerian experiment in Santa Rosa de Copán (Honduras) households were randomly selected based on the eligibility criterion of having a child between 6 and 9 years old, following the criterion used for the field experiment in Nigeria.
As in Nigeria, the entire experiment consisted of four tasks (coordination, expectations, risk and time preferences). The main difference is that the risk preference task took place in the 3rd position, not in the last one as in Study 2. Subjects assignment to T R , T B or T H remained the same for the entire experiment. RT i was recorded by the enumerator for the entire block of risk preferences.
To elicit risk preferences, we used the same reduced HL task (HL5)  The rest of the section is organised as follows: First we show the sample size and balance. Then, we give details on the Ethics committee and the pre-registration. Last, we show results and discuss their implications on field experiments.

Sample size and balance
The total sample consist of 360 subjects. Every subject was randomly assigned to 1 out of 3 arms resulting the following distribution. T R : 109, T H : 126, T H : 125. Table   6 shows the balance across treatments. We observe significant differences in age, where the subjects allocated to the real payment (T R ) had 3.3 years less than those allocated to the hypothetical payment (T H ). However, this difference is not economically important (9% of the average age).      Table C1).

Results
Finally, our payment schemes have no impact on risk aversion and response time.
It should be notice, that the coefficients of both treatments in RT i are positive. This suggests that paying all subjects or 1 out 10 increases the response time in comparison to hypothetical payments, however, the difference is not economically relevant (between 11 and 14 seconds).
As before, Figure 3 shows  Note: Column 4 and 5 reflect the regression coefficients for each outcome variable (rows) in the hypothetical and BRIS treatment (columns). We use the treatment with real money as a reference group, that is, the coefficients in column 4 and 5 estimate the difference between real and BRIS treatment with respect to the hypothetical treatment. The time variable is expressed in seconds. Robust p-values are presented in parentheses. Significance levels: *p < 0.10, **p < 0.05 and ***p < 0.01. Our main findings show that the payment scheme has no impact on consistency, number of safe options or response time. We also find that there are no differences in the distributions of the number of safe choices among payment schemes.
Here we summarise our main findings for Study 3 : Result 6: Result 4 is replicated in Honduras: Compared to hypothetical money, the use of real incentives has not impact on consistency, or risk taking.
Result 7: Result 5 is replicated in Honduras: Compared to hypothetical money, paying 1 out of 10 subjects has not impact on consistency, risk taking or response time.
Observe that, compared to Nigeria, in Honduras time RT i is not longer significant.
However, Table C3, shows that the treatment variables are significant only for one specification − when controlling for school fixed effects and clustering standard errors at the enumerator level.

Discussion
Our findings from Study 3 reinforce our conclusion from Study 2. The elicitation of risk preferences with monetary incentives in the field provides the same results as in the hypothetical scheme.
Our findings in the field are important for two reasons: First, existing hypothetical measures gathered in the field are indeed informative. Second, future measurements of risk preferences in the field do not need to use monetary incentives to get accurate proxies of risk preferences.

Equivalence Tests
In this section we test whether our estimates are different across treatments comparing estimates within a range instead of with respect to a point estimate.
A possible alternative is to explore whether or not the observed effect is large enough to be deemed worthwhile. This procedure is called equivalence (Lakens, 2017;Wellek, 2010).This procedure consists in testing whether the observed effect falls within or outside of an equivalence interval, defined by two exogenous bounds: the lower (−γ L ) and the upper (γ U ). To test for equivalence, a two one-sided test (TOST) approach is applied. Two composite null hypotheses are tested: When both null hypotheses are rejected, we can conclude that −γ L < γ < γ U or that the observed effect falls within the equivalence bounds and it is close enough to zero to be practically equivalent (Lakens, 2017).
Our objective and exogenous bounds are defined based on Holt and Laury (2005) where they found a difference of 1 in the average number of safe choices between real and hypothetical incentives. Hence, our equivalence level is equal to 1 where we define the equivalence interval for each TOST as H 01 → γ ≤ −1 and H 02 → γ ≥ 1. To check the robustness of our results, we also use two additional equivalence levels: 0.50 and 0.75.
To consider a thorough analysis of our main findings, we use both the null hypothesis significance test (NHST) and the equivalence test (ET). 14 Table 8 provides the summary of results from the equivalence test. Panel A shows that in the lab, paying 1 out 10 yield to relevant differences with respect to hypothetical decisions, while using real payments to equivalent results. In both field studies (panel B and C), equivalence tests suggest that both treatments (real and BRIS) yield equivalent results to the hypothetical payment scheme. All in all, from the equivalence test we can summarize the following findings: Result 8: Hypothetical and real measures yield to equivalent results in both the lab and field.
Result 9: BRIS and Hypothetical payment mechanisms yield to equivalent results only in the field.

Conclusion
This paper shows a systematic study of the impact of monetary incentives in the elicitation of risk preferences using Holt and Laury (2002). We cover a lab and two field experiments with different subject pools. The first one with university students, the second with parents living in rural areas and the third one with parents living in urban areas.
14 According to these tests, there are four possible outcomes in the analysis: i) the observed effect can be statically equivalent (−γ L < γ < γ U ) and not statistically different from zero (Equivalence or E); ii) statistically different from zero but not statistically equivalent (Relevant Difference or RD); iii) statistically different from zero and statistically equivalent (Trivial Difference or T D); and iv) neither statistically different from zero nor statistically equivalent (Undetermined or U ).
Our study answers a simple question: Can we elicit risk preferences in the field without using monetary incentives and without generating inequality among participants?.
We show in this study that this is possible. Lab experimentalists, who have widely used monetary incentives in the lab when eliciting such measures can confidently trust the measures that their field counterparts collect, in most of the cases, using hypothetical payments; and field experimentalists can now trust that these measures are consistent across payment schemes (i.e., real, probabilistic and hypothetical) and that this can be done faster (and therefore cheaper) by using a short version of the Holt-Laury approach.
Our concrete findings are summarised in here: • The proposed short version (HL5 − 5 choices) of the Holt-Laury approach generates the same results as the original long version (HL10 − 10 choices).
• Using the short version (HL5) in the lab and field, produces the same risk preference profiles (in terms of consistency and safe choices) regardless of the payment scheme (hypothetical or real payment).
• When comparing the HL5 measures, hypothetical (no payment) and real payment schemes generate the same consistency and risk levels, that is, they yield equivalent measures.
• When comparing the HL5 measures, hypothetical (no payment) and BRIS scheme generate the same consistency and risk levels in the field, while in the lab the measurement of risk preferences display higher levels of risk loving attitudes under the probabilistic scheme.
Academic scholars concerned about the inequality generated among participants (when using lotteries) or between participants and non-participants (when offering monetary incentives to some, but not all) may want to consider our reduced version of the hypothetical HL to measure risk preferences. This approach is faster (and cheaper) and does not create any asymmetries among participants. Field experimentalists who not only aim at reducing data collection costs, but also at minimising feelings of unfairness among experimental participants, should consider hypothetical payment schemes to minimise potential frictions between participants and experimenter, or among winners and losers.

Study 1 : the lab experiment
In this section, we present the results from different specifications in the lab. Table A1 provide the regression results for consistency using different specifications. In all the specifications, the treatment coefficients are not statically significant. This suggest that the different payment schemes have no impact on consistency. These results are robust to different specifications. However, for risk aversion (see Table A2) we find that the BRIS treatment has an impact on risk aversion: people in this treatment tend to choose a lower number of safe choices than in the real incentives group.    For time response (see Table B3) we find that T H has an impact on risk time response: people in this treatment tend to be 15 seconds faster than in the real incentives group.
However this effect is not robust to different specifications, especially when we took out from the regression the enumerator fixed effect.

Study 3 : the field experiment in Honduras
In this section, we present the results from different specifications in the Honduras experiment. Tables C1 provide the regression results for consistency using different specifications. Columns 3 and 4 shows that T B is positive and significant (p = 0.086 and p = 0.034, respectively) when we controlled for school fixed effects. The estimated coefficient of T H is not significant in all the specifications.   For time response (see Table C3) we find that the estimated coefficient of T H is negative but not significant. However, in the last column (specification with school fixed effects and cluster error at enumerator level) this coefficient is significant at 10% level of significance. The estimated coefficient of T B in all the specifications is not statically significant.