Before we start to analyse using data, we need to decide what functional form the relationship between variables should be. The relationship we primarily care about is that between the CD-\(\kappa\) ratio and the frequency of ‘Top’ amongst the population. The simplest relationship is a Linear Probability Model (LPM). Whilst the LPM is the easiest to interpret, it could possibly exceed the bounds of [0, 1] (at some point it must, but specifically if it exceed these bounds within the data set this would be bad). The LPM may also omit useful information such as any non-linear effects: we may expect the relationship to be S-shaped, for example, and the LPM is constrained to being a straight line. The LPM is be specified as:
The intercept \(\beta_0\) has a complicated explanation, even though it may seem simple: the predicted frequency of top when the CD Ratio is zero. THis is true, however there are multiple ways the CD Ratio could be zero: any of the numerators \((\eta_2, \beta)\) could equal 0; or, less likely, one of the denominators \((\eta_1, \kappa)\) could be infinitely large. The later cases can be ignored as no game can realistically be constructed using infinitely large decomposed parts. One could argue that if \(\eta_2=0\) then resulting came is not longer strictly an Anti-coordination game. If we accept this definition as fact, we must then conclude that the intercept shows the frequency of top when there is no behavioural incentives \((\beta=0)\).
The slope \(\beta_1\) is simpler to understand: it is the rate at which changes to the CD ratio increase the frequency of Top. The estimation result below shows the OLS estimates for 1. From this we can infer that for every 0.1 increase in the CD-\(\kappa\) Ratio the frequency of Top will increase by 3.4 percentage points, starting from a base of 59.6%. These estimate are significant at a level of 0.1%, and has an \(R^2=37.9%\).
Notes:
[1] Standard Errors are heteroscedasticity robust HC3
Linear regressions with bounded dependant variables inherently have heteroskedastic errors, therefore robust standard errors are necessary. A robust Bruesch-Pagan Test Breusch and Pagan (1979)Koenker (1981) yields a f statistic of 5.8229
Typically, a common method of introducing curvature with simplicity is by using logs. Specifically if we take the natural log of the RHS of 1, we yield:
This equation has similar intuition as 1, but uses a relative scale as opposed to absolute. The intercept is \(\beta_0\), and has the same intuition: the frequency of Top with zero behavioural incentives. The gradient of 2 is decreasing and is equal to \(\frac{\beta_1}{CD\_RATIO}\).
Notes:
[1] Standard Errors are heteroscedasticity robust HC3
Whilst including non-linearity, the log model is monotonically increasing and unbounded, therefore will be larger than 1 eventually. The predicted results lie within the [0, 1] bounds within the data, but this is not guaranteed, therefore one can use models such as Beta regression or Logistical Regression which are bound by (0, 1) by design. Despite 0 and 1 not being within the set of possible outcomes (both of which are extremely low probability outcomes), these are more realistic models of the relationship between the CD Ratio and the frequency of Top. However, interpreting the results is far less clear. The Beta regression uses the beta PDF given by:
with \(0\lt\mu\lt 1\) and \(\phi\gt 0\). Therefore, \(y\sim \mathcal{B} (\mu, \phi)\), \(E(y)=\mu\) and \(Var(y)=\frac{\mu(1-\mu)}{1+\phi}\). Given this, using a random sample (indexed at \(i\)) the beta regression, in this case, is defined as: \[
g(\mu_i) = \beta_0 + \beta_1 * CD\_RATIO_i = \gamma_i
\tag{4}\]
The link function \(g(\cdot):(0, 1)\mapsto\mathbb{R}\) used here is that of a logit model:
\[
g(\mu)=\log(\frac{\mu}{1-\mu})
\tag{5}\]
The model is naturally heteroscedastic, taking into account the fact the errors will vary far less as we close in on the bounds of \(y\).
The results of the above beta regression, found using maximum likelihood, are presented below. Predictably, the coefficients are positive, significant, and the scale parameter is relatively large, suggesting a smaller variance in the frequency of Top.
From a visual reading of these four estimated functions, the difference in prediction is not practically significant. Therefore, it follows that the estimation method most useful is that with the easiest to understand intuition: the OLS. The above OLS regression estimates that a 0.1 increase in the CD-\(\kappa\) Ratio will increase the frequency of Top by 2.96 percentage points. Note that the range of the CD-\(\kappa\) Ratio used in the experiments is between 0 and 1. The OLS regression also suggests that with a CD-\(\kappa\) Ratio of zero, Top is likely to be played 62.15% of the time. This may give us some idea of how much Top is played with zero Behavioural component, but as discussed previously (in theory paper), there are multiple different scenarios where the CD-\(\kappa\) Ratio could be zero. This is a downfall of using this method, and the reason why only strictly positive values were used in the experiment.
The below table shows the OLS regression output for a ‘decomposed’ CD-\(\kappa\) Ratio, regressing the frequency of Top on the individual components of the games with a constant term. Note that this is not the exact same specification as the previous OLS regression. Due to the need for a constant one cannot just take logs of both sides in order to linearize that model, thus a different model was employed.
Notes:
[1] Standard Errors are heteroscedasticity robust HC3
1.2 Discrete Outcomes
Due to the nature of the outcome data, it is necessarily fractional, which means that the size of the set of possible outcomes depends on the number of participants used to collect the data. Even if we expand to a population level, this is still not infinitely large. The experiments conducted for this paper were carried out with 50 participants, which translates to a possible 51 different frequencies of Top (0, 1/50, 2/50, … , 50/50). Therefore, it is more appropriate to use methods designed for discrete, specifically fractional, dependant variables. Note, however, that for sufficiently large sample size, the continuous beta regression can still be used, and offers more flexibility (Cribari-Neto and Zeileis 2010 [p. 5]). Such a model is a parameterised version of the quasi-binomial model from (McCullagh and Nelder 1989). The binomial distribution shows the number of successes in a sample population, in this case the number of time Top is played. This implies that the proportion of Top \((y)\) in a single game \((i)\) follows an augmented binomial distribution which has been scaled by dividing by the sample size \((M)\):
\[
y_i\sim B(M, \pi)/M
\]
Where the probability of getting exactly \(k\) choices of Top in a sample of \(M\), given a individual probability of Top, is given by the PDF:
\[
f(k/M, M, \pi) = \Pr(k;M, \pi)=\binom{M}{k}\pi^{k}(1-\pi)^{1-k}
\tag{6}\]
for \(k=0, 1, 2, ..., M\), where
\[
\binom{M}{k}=\frac{M!}{k!(M-k)!}
\]
In cases where the sample size is too small to justify a beta regression, one can use a binomial generalised linear model (GLM). Due to the number of observations here, we can pretty safely use a beta regression, but for completeness let us take a deeper look into discrete models.
The scale that the proportion of Top follows is that of an interval scale: the categories are not arbitrary; it makes no sense to join adjacent categories; and each category has an attached cardinal number score, which can be used as a measure of difference between categories. These are all true as each category is a fraction of the number of participants (0, 1/50, 2/50, … , 50/50). We can take it even further, as we have true zero (no-one plays Top), therefore meaning we have a (discrete) ratio scale.
So what regression models can one use with a binomially distributed endogenous variable on a discrete ratio scale? Let us start with the specification of the frequency of Top \(y\) in game \(i\), dependant of the exogenous variable of the CD-Ratio \(x\):
We can express the probability of Top \(\pi_i\) as a function of the exogenous variables, using link function \(g(\cdot)\):
\[
\pi_i=h(x_i)
\tag{8}\]
The link function can take a number of forms, but we will use the logistic link function, as we did with the beta regression:
\[
g(\pi_i|x_i)=\ln(\frac{\pi_i}{1-\pi_i})
\]
The logistic function has simpler theoreticaal properties and intuition over using Probit \((g(\pi)=\Phi^{-1}(\pi))\), a complementary log-log \((g(\pi)=\log(-\log(1-\pi)))\), and the log-log \((g(\pi)=-\log(-\log(\pi)))\) link functions. It follows that using the logistic link function that the log odds are:
This effect is largest when the probability is near 0.5, and gets smaller nearer to the bounds. This just reflects the more accurate non-linear relationship here, but highlights the slight cost in intuition compared to a LPM.
Note that one must assume that participants are homogenous (however unlikely) and independant (more appropriate). However, we do not know the values of the true \(\pi_{ij}\)’s, therefore we can regard them as independant random variables themselves with a mean of \(\bar{\pi}_i\). Therefore, each observation of either Top or Bottom is a Bernoulli trial with mean \(\bar{\pi}_i\), or \(y_{ij}\sim B(1, \bar{\pi}_i)\), which means that the sum \(y_i=y_{i1}+...+y_{iM}\) is distributed as \(y_i\sim B(M, \bar{\pi}_i)\). This also suggests that the proportion of Top \(\bar{y}_i=\frac{y_i}{M}\) follows a similar distribution, scaled to be proportional \(\bar{y}_i\sim B(M, \bar{\pi}_i)/M\). It can be shown that as \(m\rightarrow\infty\), the sum of Top \(y_i\), for any fixed \(\pi\) tends towards a Normal distribution. This is all from chapter 4 in (McCullagh and Nelder (1989)).
Does normality follow for the proportion of Top \(\bar{y}_i\)? I think this may degerate to a distribution around the mean with zero variance. The binomial distribution has a mean of \(M\pi\) and a variance of \(M\pi(1-\pi)\). If we divide the binomial distribution by the number of participants/trials then the mean and variance become \(\pi\) and \(\frac{\pi(1-\pi)}{M}\) respectively. Therefore, as \(M\rightarrow\infty\) the variance of the proportional binomial distribution tends to zero.
Another way we could assess this is in relation to Bernoulli trials. As each observation \(j\) in game \(i\) can be modelled as a result from a Bernoulli trial with mean \(\bar{\pi}_i\), the expected proportion of Top in the game \(i\) from the sample will be the sample mean:
\[
\bar{y}_i = \frac{1}{M}\sum_{j=1}^M y_{ij}
\]
As \(y_{ij}\sim B(1, \bar{\pi}_i)\), it follows that \(\bar{y}_i\sim \frac{1}{M}\sum_{j=1}^M B(1, \bar{\pi}_i)\). The proportion of top follows the mean of the Bernoulli distribution, or likewise the mean of a Binomial distribution with \(M=1\), with a distribution mean of \(\bar{\pi}_i\). The mean of this meaned distribution is the same \((\bar{\pi}_i)\), but the variance will change depending on the value of M, as written above \(\big(\frac{\bar{\pi}_i(1-\bar{\pi}_i)}{M}\big)\). This conclusion also implies, due to the Central Limit Theorum, that for a large enough sample size the proportion of Top \(\bar{y}_i\) will follow a Normal distribution:
Note that both the mean and variance are dependant on the explanatory variables, thus each game’s proportion of Top follows a different distribution with different mean and variances. The closer the mean is to zero and one the smaller the variance will be, thus heteroskedasticity across games should be expected.
Using the logistic link function 10, we can rewrite the sample variance as:
The takeaway from this variance equation, is twofold: clearly it has inherent heterskedasticity; and the variance is falling in M. The heteroskedasticity can be shown in Figure 2 below. The logistic function has been scaled by a factor of 100, such that both lines fit on the same graph (ojs doesn’t have secondary axes yet smh). One can see that as the logistic function approaches the upper bound of 0.01 (1 if unscaled) the variance of \(\bar{y}\) is falling, shown both in theory and in the data1. It is also clear that the variance is smaller with more observations.
Code
{const logistic_func =function (x, gamma_0, gamma_1) {return (Math.exp(gamma_0 + gamma_1*x) / (100*(1+Math.exp(gamma_0 + gamma_1*x))) ); }const logisticvar =function (x, gamma_0, gamma_1, m) {return (Math.exp(gamma_0 + gamma_1*x) / (m * (1+Math.exp(gamma_0 + gamma_1*x))**2) ); }let logistic_data = []for (let x=0; x<=1;x=x+0.001) { logistic_data.push({"Number of Participants":31,"x": x,"y":logistic_func(x,0.5,2),"y_var":logisticvar(x,0.5,2,31)}); logistic_data.push({"Number of Participants":50,"x": x,"y":logistic_func(x,0.5,2),"y_var":logisticvar(x,0.5,2,50)}); logistic_data.push({"Number of Participants":100,"x": x,"y":logistic_func(x,0.5,2),"y_var":logisticvar(x,0.5,2,100)}); }let var_top_bar = []for (let i=0; i<50; i++) { var_top_bar.push({"Number of Participants":31,'x': cd_data[i]['CD-kappa Ratio'],'y': cd_data[i]['Var Top']/31}); var_top_bar.push({"Number of Participants":50,'x': cd_data[i]['CD-kappa Ratio'],'y': cd_data[i]['Var Top']/50}); var_top_bar.push({"Number of Participants":100,'x': cd_data[i]['CD-kappa Ratio'],'y': cd_data[i]['Var Top']/100}); }const logistic_var_figure = Plot.plot({color: {legend:true,domain: ["scaled logistic function","logistic variance of ȳ|x","Var of Mean Top"],range: ["black","red","blue"] },x: {label:'x',line:false,domain: [0,1],// ticks: 4, },y: {label:"y",labelAnchor:'center',line:false,domain: [0,0.01], },grid:true,width: width,marks: [ Plot.frame(), Plot.line(logistic_data, {x:'x',y:'y',stroke:'black',fx:'Number of Participants'}), Plot.line(logistic_data, {x:'x',y:'y_var',stroke:'red',fx:'Number of Participants'}), Plot.dot(var_top_bar, {x:'x',y:'y',fill:'blue',fx:'Number of Participants'}), ], })return logistic_var_figure}
Figure 2: Variance of y from a logistic link function
Neither of these results are particularly novel but both are important. The beta regression model naturally accounts for this heteroskedasticity, but Figure 2 highlights that the magnitude of heteroskediasticity falls with more participants, by the flatness of the logistic variance of \(\bar{y}_i\). Making this relationship flatter does not mean it is less significant, in fact with more observations it is likely to become more statistically significant. However, the practical significance, the relative size of the correlation between the variance of \(\bar{y}_i\) and \(x_i\) is becoming less so. The OLS estimator of the gradient between \(Var(\bar{y}_i)\) and \(x_i\) is:
Given that the correlation coefficient between \(Var(\bar{y}_i)\) and \(x_i\) is -0.5912, the standard deviation of x is 0.2309 and the standard deviation of \(Var(\bar{y})=\frac{1}{M}Var(y)\) can be written as:
Therefore, using the current data, and assuming the estimates do not change with more data, we can construct a rough estimate of the OLS estimator of the gradient between \(Var(\bar{y}_i)\) and \(x_i\) for the number of participants M:
The figure below shows the decisions of each participants over the course of the experimental session, and also the times it took to make each of the 50 choices. This is only the Anti-Coordination games, and omits the other games. “Cumulative-Mean of Choices” shows how the mean of each participants’ choices changes over time (i.e. the mean of “Top” from period 1 to the Round set by the slider). Dragging the slider to the right will increase the range of rounds over the mean is calculated. “Cumulative-Mean Times” is much the same, but for the times the participants took to make their decisions. Note that the decision page had a 20 second timer restricting participants to spend at least 20 seconds on the page. One can see a general downward trend in times, suggesting that participants do get quicker at making choices as the experiment progresses.
Using NVivo, one can analyse the descriptions the participants have provided at the end of the experiment. Using their responses one can create codes for different decision making processes and their intentions. The most frequent codes are shown in the figure below.
Figure 11: Frequency of Codes Amongst Participants’ Decision Descriptions
These codes are not mutually exclusive, but merely indicate the number of participants who have expressed this code in their description. For example, any participant said that they thought about their opponent’s actions have been coded as ‘Strategic Decision Making’, based on this being evidence of at least Level 1 rationality. Descriptions of each code can be found in the appendix. The most frequent code, 87.1% of participants, is ‘Self-Regarding’: Indicating an attempt to increase their own payoff. Presumably, all participants were self-regarding, but not all have expressed this in their answer. It should also be noted that ‘Self-Regarding’ is not exclusively so, as 35.5% of participants stated an intention to be ‘Other-Regarding’, in that they cared about their opponent’s payoff also. There were also sizeable frequencies of ‘Risk-Aversion’, 29%, and ‘Max-Min’ (Minimum payoff maximisation, or avoidance of small payoffs), 22.6%.
These percentages only represent the frequency these codes exist but it is important to dissect how these codes overlap. This can be represented in the diagram below.
Figure 12: Matrix of Codes Amongst Participants’ Decision Descriptions
The above figure should be read horizontally. It shoulds the overlap between different codes. The right-hand side shows the total number of occurances for each code, and the numbers in each cell along each row show how often each corresponding code occurs alongside that row’s code (either in absolute terms, or relative to the code’s set size).
Below shows an UpSet plot to show in great detail the intersections of all of the codes.
The UpSet plot allows us to interpret the intersection of our 9 codes where a Venn diagram would not. The most common intersections unsurprisingly occur between the most populated groups, and tend to be intersections of only 2 or 3 codes. The most common intersection is that between the 2 largest sets: “Self-Regarding” and “Strategic Decision Making”, followed by the intersection of “Other-Regarding”, “Self-Regarding” and “Strategic Decision Making”, and “Strategic Decision Making” on its own. Due to currently limited data, many intersections only happen once or twice.
References
Breusch, T. S., and A. R. Pagan. 1979. “A Simple Test for Heteroscedasticity and Random Coefficient Variation.”Econometrica 47 (5): 1287–94. http://www.jstor.org/stable/1911963.
Cribari-Neto, Francisco, and Achim Zeileis. 2010. “Beta Regression in r.”Journal of Statistical Software 34 (2): 1–24. https://doi.org/10.18637/jss.v034.i02.
Koenker, Roger. 1981. “A Note on Studentizing a Test for Heteroscedasticity.”Journal of Econometrics 17 (1): 107–12. https://doi.org/https://doi.org/10.1016/0304-4076(81)90062-2.
Lex, Alexander, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, and Hanspeter Pfister. 2014. “UpSet: Visualization of Intersecting Sets.”IEEE Transactions on Visualization and Computer Graphics 20 (12): 1983–92. https://doi.org/10.1109/TVCG.2014.2346248.
McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. London: Chapman & Hall / CRC.
Seabold, Skipper, and Josef Perktold. 2010. “Statsmodels: Econometric and Statistical Modeling with Python.” In 9th Python in Science Conference.
Footnotes
The data for 50 and 100 participants does not exist so is just the existing data for 31 participants scaled down as if it were the same data, i.e. divided by 50 and 100 respectively instead of by 31.↩︎
Source Code
---title: "CD-Ratio-Experiment-Analysis"author: "Alex Ralphs"toc: truenumber-sections: truebibliography: references.biblink-citations: truelink-bibliography: truecrossref: eq-prefix: ""highlight-style: pygmentsexecute: echo: trueformat: html: code-tools: true code-fold: true html-math-method: mathjax smooth-scroll: true page-layout: full embed-resources: true # pdf: # geometry: # - top=30mm # - left=20mm # docx: defaultjupyter: python3editor: render-on-save: trueeditor_options: chunk_output_type: inline---<head><!-- Remote stylesheet --><link rel="stylesheet" href="cd-ratio-experiment-analysis-styles.css"></head><body>```{ojs}cd_data = FileAttachment("cd-experiment-all-cleaned.csv").csv({typed:true})cd_data_in_order = FileAttachment("cd-experiment-all-cleaned-in-order.csv").csv({typed:true})code_list_raw = FileAttachment("cd-experiment-all-by-participant-cleaned-code-list.xlsx").xlsx()code_list=code_list_raw.sheet("Sheet1", {headers: true})code_matrix_raw = FileAttachment("cd-experiment-all-by-participant-cleaned-code-matrix.xlsx").xlsx()code_matrix = code_matrix_raw.sheet("Sheet3", {headers: true})code_totals = code_matrix_raw.sheet("Sheet4", {headers: true})code_upset_data = code_matrix_raw.sheet("Sheet5", {headers: true})code_upset_data_cross_prod = code_matrix_raw.sheet("Sheet6", {headers: true})code_upset_data_per_cat = code_matrix_raw.sheet("Sheet7", {headers: true})import {Plot} from "@observablehq/plot"import {colorContrast} from "@observablehq/plot-colorcontrast-custom-transform"```## Quantitative Analysis### Continuous OutcomesBefore we start to analyse using data, we need to decide what functional form the relationship between variables should be. The relationship we primarily care about is that between the CD-$\kappa$ ratio and the frequency of 'Top' amongst the population. The simplest relationship is a Linear Probability Model (LPM). Whilst the LPM is the easiest to interpret, it could possibly exceed the bounds of [0, 1] (at some point it must, but specifically if it exceed these bounds within the data set this would be bad). The LPM may also omit useful information such as any non-linear effects: we may expect the relationship to be S-shaped, for example, and the LPM is constrained to being a straight line. The LPM is be specified as:$$FREQ\_TOP = \beta_0 + \beta_1 * CD\_RATIO$$ {#eq-linear-function}The intercept $\beta_0$ has a complicated explanation, even though it may seem simple: the predicted frequency of top when the CD Ratio is zero. THis is true, however there are multiple ways the CD Ratio could be zero: any of the numerators $(\eta_2, \beta)$ could equal 0; or, less likely, one of the denominators $(\eta_1, \kappa)$ could be infinitely large. The later cases can be ignored as no game can realistically be constructed using infinitely large decomposed parts. One could argue that if $\eta_2=0$ then resulting came is not longer <i>strictly</i> an Anti-coordination game. If we accept this definition as fact, we must then conclude that the intercept shows the frequency of top when there is no behavioural incentives $(\beta=0)$. The slope $\beta_1$ is simpler to understand: it is the rate at which changes to the CD ratio increase the frequency of Top. The estimation result below shows the OLS estimates for @eq-linear-function. From this we can infer that for every 0.1 increase in the CD-$\kappa$ Ratio the frequency of Top will increase by 3.4 percentage points, starting from a base of 59.6%. These estimate are significant at a level of 0.1%, and has an $R^2=37.9%$. ```{python}import pandas as pdimport numpy as npimport statsmodels.api as smfrom statsmodels.stats.stattools import jarque_berafrom statsmodels.stats.diagnostic import het_breuschpaganimport statsmodels.formula.api as smffrom IPython.display import display, HTMLgame_data = pd.read_csv('cd-experiment-all-cleaned.csv')freq_top = game_data['Mean Top']var_top = game_data['Var Top']cd_ratio_kappa = game_data['CD-kappa Ratio']cd_ratio_kappa = sm.add_constant(cd_ratio_kappa)ols_cd_ratio_kappa = sm.OLS(freq_top, cd_ratio_kappa)result_ols = ols_cd_ratio_kappa.fit(cov_type='HC3')ols_jb = jarque_bera(result_ols.resid)ols_het_bp = het_breuschpagan(result_ols.resid, cd_ratio_kappa, robust=True)std_var_top = np.std(var_top, ddof=1)var_top_bar = var_top / result_ols.nobsstd_cd = np.std(game_data['CD-kappa Ratio'], ddof=1)corr_var_top_cd = np.corrcoef(var_top, game_data['CD-kappa Ratio'])[0][1]ols_var_top_cd = corr_var_top_cd * std_var_top / std_cdif np.sign(ols_var_top_cd) ==-1: ols_var_top_cd_sign =u"\u2212"else: ols_var_top_cd_sign =""ols_output_cd =open("regression_output_ols.html").read().format( title="OLS Regression Results", Model_SS="%.6f"% result_ols.ess, Model_df=int(result_ols.df_model), Model_MS="%.6f"% result_ols.mse_model, Residual_SS="%.6f"% result_ols.ssr, Residual_df=int(result_ols.df_resid), Residual_MS="%.6f"% result_ols.mse_resid, Total_SS="%.6f"% result_ols.centered_tss, Total_df=int(result_ols.nobs-1), Total_MS="%.6f"% result_ols.mse_total, Number_of_obs=int(result_ols.nobs), F="%.4f"% result_ols.fvalue, F_p="%.4f"% result_ols.f_pvalue, R_squared="%.4f"% result_ols.rsquared, Adj_R_squared="%.4f"% result_ols.rsquared_adj, Root_MSE="%.4f"% np.sqrt(result_ols.mse_resid), Constant_coef="%.6f"% result_ols.params.const, Constant_std_err="%.6f"% result_ols.bse.const, Constant_t="%.4f"% result_ols.tvalues.const, Constant_p="%.4f"% result_ols.pvalues.const, Constant_ci_low="%.6f"% result_ols.conf_int(alpha=0.05)[0]["const"], Constant_ci_high="%.6f"% result_ols.conf_int(alpha=0.05)[1].const, X="CD-κ Ratio", beta_coef="%.6f"% result_ols.params["CD-kappa Ratio"], beta_std_err="%.6f"% result_ols.bse["CD-kappa Ratio"], beta_t="%.4f"% result_ols.tvalues["CD-kappa Ratio"], beta_p="%.4f"% result_ols.pvalues["CD-kappa Ratio"], beta_ci_low="%.6f"% result_ols.conf_int(alpha=0.05)[0]["CD-kappa Ratio"], beta_ci_high="%.6f"% result_ols.conf_int(alpha=0.05)[1]["CD-kappa Ratio"], cov_type = result_ols.cov_type,)display(HTML(ols_output_cd))# print(result_ols.summary())predicted_y_ols = []for i inrange(int(result_ols.nobs)): predicted_y_ols.append(result_ols.params.const+result_ols.params["CD-kappa Ratio"]*cd_ratio_kappa["CD-kappa Ratio"][i])cd_kappa_ratio_ols_params_dict =dict( Constant_coef=[result_ols.params.const], Constant_std_err=[result_ols.bse.const], Constant_t=[result_ols.tvalues.const], Constant_p=[result_ols.pvalues.const], Constant_ci_low=[result_ols.conf_int(alpha=0.05)[0]["const"]], Constant_ci_high=[result_ols.conf_int(alpha=0.05)[1].const], beta_coef=[result_ols.params["CD-kappa Ratio"]], beta_std_err=[result_ols.bse["CD-kappa Ratio"]], beta_t=[result_ols.tvalues["CD-kappa Ratio"]], beta_p=[result_ols.pvalues["CD-kappa Ratio"]], beta_ci_low=[result_ols.conf_int(alpha=0.05)[0]["CD-kappa Ratio"]], beta_ci_high=[result_ols.conf_int(alpha=0.05)[1]["CD-kappa Ratio"]], Root_MSE="%.4f"% np.sqrt(result_ols.mse_resid), jarque_bera_stat="%.4f"% ols_jb[0], jarque_bera_p="%.4f"% ols_jb[1], breuschpagan_lm_stat="%.4f"% ols_het_bp[0], breuschpagan_lm_p="%.4f"% ols_het_bp[1], breuschpagan_f_stat="%.4f"% ols_het_bp[2], breuschpagan_f_p="%.4f"% ols_het_bp[3])cd_kappa_ratio_ols_params = pd.DataFrame(cd_kappa_ratio_ols_params_dict)cd_kappa_ratio_ols_data_dict =dict( residuals = result_ols.resid, predicted_freq_top = predicted_y_ols, CD_kappa_Ratio = cd_ratio_kappa["CD-kappa Ratio"], freq_top_data = freq_top )cd_kappa_ratio_ols_data = pd.DataFrame(cd_kappa_ratio_ols_data_dict)ojs_define(cd_ols_data = cd_kappa_ratio_ols_data)ojs_define(cd_ols_params = cd_kappa_ratio_ols_params)```Linear regressions with bounded dependant variables inherently have heteroskedastic errors, therefore robust standard errors are necessary. A robust Bruesch-Pagan Test @Breusch-Pagan-1979 @Koenker-1981 yields a f statistic of `{python} cd_kappa_ratio_ols_params_dict['breuschpagan_f_stat']`Typically, a common method of introducing curvature with simplicity is by using logs. Specifically if we take the natural log of the RHS of @eq-linear-function, we yield:$$FREQ\_TOP = \beta_0 + \beta_1 \ln(CD\_RATIO)$$ {#eq-ln-function}This equation has similar intuition as @eq-linear-function, but uses a relative scale as opposed to absolute. The intercept is $\beta_0$, and has the same intuition: the frequency of Top with zero behavioural incentives. The gradient of @eq-ln-function is decreasing and is equal to $\frac{\beta_1}{CD\_RATIO}$. ```{python}import pandas as pdimport numpy as npimport mathimport statsmodels.api as smfrom statsmodels.sandbox.regression.gmm import GMMfrom statsmodels.stats.stattools import jarque_beraimport statsmodels.formula.api as smffrom IPython.display import display, HTMLgame_data = pd.read_csv('cd-experiment-all-cleaned.csv')freq_top = game_data['Mean Top']cd_ratio_kappa = game_data['CD-kappa Ratio']# class GMMLog(GMM):# def __init__(self, *args, **kwds):# # set appropriate counts for moment conditions and parameters# # TODO: clean up signature# kwds.setdefault('k_moms', 2)# kwds.setdefault('k_params', 2)# super(GMMLog, self).__init__(*args, **kwds)# def momcond(self, params):# p0, p1 = params# endog = self.endog# exog = self.exog.squeeze()# inst = self.instrument# # error1 = np.exp(endog) - p0 - p1 * exog# # error2 = exog * (np.exp(endog) - p0 - p1 * exog)# error1 = endog - np.log(p0 + p1 * exog)# error2 = exog * (endog - np.log(p0 + p1 * exog))# g = np.column_stack((error1, error2))# return g# ln_cd_ratio_kappa = GMMLog(freq_top, cd_ratio_kappa, cd_ratio_kappa)# beta0 = np.array([2, 0.1])# result_ln = ln_cd_ratio_kappa.fit(beta0, maxiter=2, optim_method='nm', wargs=dict(centered=False))cd_ratio_kappa_ln = np.log(cd_ratio_kappa)cd_ratio_kappa_ln = sm.add_constant(cd_ratio_kappa_ln)ln_cd_ratio_kappa = sm.OLS(freq_top, cd_ratio_kappa_ln)result_ln = ln_cd_ratio_kappa.fit(cov_type='HC3')ln_jb = jarque_bera(result_ln.resid)ln_het_bp = het_breuschpagan(result_ln.resid, cd_ratio_kappa_ln, robust=True)ln_output_cd =open("regression_output_ols.html").read().format( title="Ln-OLS Regression Results", Model_SS="%.6f"% result_ln.ess, Model_df=int(result_ln.df_model), Model_MS="%.6f"% result_ln.mse_model, Residual_SS="%.6f"% result_ln.ssr, Residual_df=int(result_ln.df_resid), Residual_MS="%.6f"% result_ln.mse_resid, Total_SS="%.6f"% result_ln.centered_tss, Total_df=int(result_ln.nobs-1), Total_MS="%.6f"% result_ln.mse_total, Number_of_obs=int(result_ln.nobs), F="%.4f"% result_ln.fvalue, F_p="%.4f"% result_ln.f_pvalue, R_squared="%.4f"% result_ln.rsquared, Adj_R_squared="%.4f"% result_ln.rsquared_adj, Root_MSE="%.4f"% np.sqrt(result_ln.mse_resid), Constant_coef="%.6f"% result_ln.params.const, Constant_std_err="%.6f"% result_ln.bse.const, Constant_t="%.4f"% result_ln.tvalues.const, Constant_p="%.4f"% result_ln.pvalues.const, Constant_ci_low="%.6f"% result_ln.conf_int(alpha=0.05)[0]["const"], Constant_ci_high="%.6f"% result_ln.conf_int(alpha=0.05)[1].const, X="ln(CD-κ Ratio)", beta_coef="%.6f"% result_ln.params["CD-kappa Ratio"], beta_std_err="%.6f"% result_ln.bse["CD-kappa Ratio"], beta_t="%.4f"% result_ln.tvalues["CD-kappa Ratio"], beta_p="%.4f"% result_ln.pvalues["CD-kappa Ratio"], beta_ci_low="%.6f"% result_ln.conf_int(alpha=0.05)[0]["CD-kappa Ratio"], beta_ci_high="%.6f"% result_ln.conf_int(alpha=0.05)[1]["CD-kappa Ratio"], cov_type = result_ln.cov_type,)display(HTML(ln_output_cd))predicted_y_ln = []for i inrange(int(result_ln.nobs)): predicted_y_ln.append(result_ln.params.const+result_ln.params["CD-kappa Ratio"]*np.log(cd_ratio_kappa[i]))cd_kappa_ratio_ln_params_dict =dict( Constant_coef=[result_ln.params.const], Constant_std_err=[result_ln.bse.const], Constant_t=[result_ln.tvalues.const], Constant_p=[result_ln.pvalues.const], Constant_ci_low=[result_ln.conf_int(alpha=0.05)[0]["const"]], Constant_ci_high=[result_ln.conf_int(alpha=0.05)[1].const], beta_coef=[result_ln.params["CD-kappa Ratio"]], beta_std_err=[result_ln.bse["CD-kappa Ratio"]], beta_t=[result_ln.tvalues["CD-kappa Ratio"]], beta_p=[result_ln.pvalues["CD-kappa Ratio"]], beta_ci_low=[result_ln.conf_int(alpha=0.05)[0]["CD-kappa Ratio"]], beta_ci_high=[result_ln.conf_int(alpha=0.05)[1]["CD-kappa Ratio"]], Root_MSE="%.4f"% np.sqrt(result_ln.mse_resid), jarque_bera_stat="%.4f"% ln_jb[0], jarque_bera_p="%.4f"% ln_jb[1], breuschpagan_lm_stat="%.4f"% ln_het_bp[0], breuschpagan_lm_p="%.4f"% ln_het_bp[1], breuschpagan_f_stat="%.4f"% ln_het_bp[2], breuschpagan_f_p="%.4f"% ln_het_bp[3])cd_kappa_ratio_ln_params = pd.DataFrame(cd_kappa_ratio_ln_params_dict)cd_kappa_ratio_ln_data_dict =dict( residuals = result_ln.resid, predicted_freq_top = predicted_y_ln, CD_kappa_Ratio = cd_ratio_kappa, freq_top_data = freq_top )cd_kappa_ratio_ln_data = pd.DataFrame(cd_kappa_ratio_ln_data_dict)ojs_define(cd_ln_data = cd_kappa_ratio_ln_data)ojs_define(cd_ln_params = cd_kappa_ratio_ln_params)```Whilst including non-linearity, the log model is monotonically increasing and unbounded, therefore will be larger than 1 eventually. The predicted results lie within the [0, 1] bounds within the data, but this is not guaranteed, therefore one can use models such as Beta regression or Logistical Regression which are bound by (0, 1) by design. Despite 0 and 1 not being within the set of possible outcomes (both of which are extremely low probability outcomes), these are more realistic models of the relationship between the CD Ratio and the frequency of Top. However, interpreting the results is far less clear. The Beta regression uses the beta PDF given by:$$f(y;\mu,\phi)=\frac{\Gamma(\phi)}{\Gamma(\mu\phi)\Gamma((1-\mu)\phi)}y^{\mu\phi-1}(1-y)^{(1-\mu)\phi-1}\quad 0\lt y\lt 1$$ {#eq-beta-pdf}with $0\lt\mu\lt 1$ and $\phi\gt 0$. Therefore, $y\sim \mathcal{B} (\mu, \phi)$, $E(y)=\mu$ and $Var(y)=\frac{\mu(1-\mu)}{1+\phi}$. Given this, using a random sample (indexed at $i$) the beta regression, in this case, is defined as:$$g(\mu_i) = \beta_0 + \beta_1 * CD\_RATIO_i = \gamma_i$$ {#eq-beta-regression}The link function $g(\cdot):(0, 1)\mapsto\mathbb{R}$ used here is that of a logit model:$$g(\mu)=\log(\frac{\mu}{1-\mu})$$ {#eq-beta-link-function-logit}The model is naturally heteroscedastic, taking into account the fact the errors will vary far less as we close in on the bounds of $y$. The results of the above beta regression, found using maximum likelihood, are presented below. Predictably, the coefficients are positive, significant, and the scale parameter is relatively large, suggesting a smaller variance in the frequency of Top. ```{python}import pandas as pdimport numpy as npimport statsmodels.api as smfrom statsmodels.stats.stattools import jarque_berafrom statsmodels.othermod.betareg import BetaModelimport statsmodels.formula.api as smffrom IPython.display import display, HTMLgame_data = pd.read_csv('cd-experiment-all-cleaned.csv')freq_top = game_data['Mean Top']cd_ratio_kappa = game_data['CD-kappa Ratio']cd_ratio_kappa = sm.add_constant(cd_ratio_kappa)beta_cd_ratio_kappa = BetaModel(freq_top, cd_ratio_kappa)result_beta = beta_cd_ratio_kappa.fit()beta_jb = jarque_bera(result_beta.resid)beta_output_cd =open("regression_output_beta.html").read().format( link_function='g(u) = log(u/(1-u))', slink_function='g(u) = log(u)', log_likelihood="%.6f"% result_beta.llf, Number_of_obs=int(result_beta.nobs), LR_chi2 ="%.4f"% result_beta.llr, LR_chi2_p ="%.4f"% result_beta.llr_pvalue, Constant_coef="%.6f"% result_beta.params.const, Constant_std_err="%.6f"% result_beta.bse.const, Constant_z="%.4f"% result_beta.tvalues.const, Constant_p="%.4f"% result_beta.pvalues.const, Constant_ci_low="%.6f"% result_beta.conf_int(alpha=0.05)[0].const, Constant_ci_high="%.6f"% result_beta.conf_int(alpha=0.05)[1].const, X="CD-κ Ratio", beta_coef="%.6f"% result_beta.params["CD-kappa Ratio"], beta_std_err="%.6f"% result_beta.bse["CD-kappa Ratio"], beta_z="%.4f"% result_beta.tvalues["CD-kappa Ratio"], beta_p="%.4f"% result_beta.pvalues["CD-kappa Ratio"], beta_ci_low="%.6f"% result_beta.conf_int(alpha=0.05)[0]["CD-kappa Ratio"], beta_ci_high="%.6f"% result_beta.conf_int(alpha=0.05)[1]["CD-kappa Ratio"], scale_coef="%.6f"% result_beta.params.precision, scale_std_err="%.6f"% result_beta.bse.precision, scale_z="%.4f"% result_beta.tvalues.precision, scale_p="%.4f"% result_beta.pvalues.precision, scale_ci_low="%.6f"% result_beta.conf_int(alpha=0.05)[0].precision, scale_ci_high="%.6f"% result_beta.conf_int(alpha=0.05)[1].precision, cov_type = result_beta.cov_type,)display(HTML(beta_output_cd))# print(result_beta.summary())predicted_y_beta = []for i inrange(int(result_beta.nobs)): predicted_y_beta.append(freq_top[i]-result_beta.resid[i])cd_kappa_ratio_beta_params_dict =dict( Constant_coef=[result_beta.params.const], Constant_std_err=[result_beta.bse.const], Constant_t=[result_beta.tvalues.const], Constant_p=[result_beta.pvalues.const], Constant_ci_low=[result_beta.conf_int(alpha=0.05)[0]["const"]], Constant_ci_high=[result_beta.conf_int(alpha=0.05)[1].const], beta_coef=[result_beta.params["CD-kappa Ratio"]], beta_std_err=[result_beta.bse["CD-kappa Ratio"]], beta_t=[result_beta.tvalues["CD-kappa Ratio"]], beta_p=[result_beta.pvalues["CD-kappa Ratio"]], beta_ci_low=[result_beta.conf_int(alpha=0.05)[0]["CD-kappa Ratio"]], beta_ci_high=[result_beta.conf_int(alpha=0.05)[1]["CD-kappa Ratio"]], precision_coef=[result_beta.params["precision"]], precision_std_err=[result_beta.bse["precision"]], precision_t=[result_beta.tvalues["precision"]], precision_p=[result_beta.pvalues["precision"]], precision_ci_low=[result_beta.conf_int(alpha=0.05)[0]["precision"]], precision_ci_high=[result_beta.conf_int(alpha=0.05)[1]["precision"]], resid_se=np.std(result_beta.resid), jarque_bera_stat="%.4f"% beta_jb[0], jarque_bera_p="%.4f"% beta_jb[1])cd_kappa_ratio_beta_params = pd.DataFrame(cd_kappa_ratio_beta_params_dict)cd_kappa_ratio_beta_data_dict =dict( residuals = result_beta.resid, predicted_freq_top = predicted_y_beta, CD_kappa_Ratio = cd_ratio_kappa["CD-kappa Ratio"], freq_top_data = freq_top )cd_kappa_ratio_beta_data = pd.DataFrame(cd_kappa_ratio_beta_data_dict)ojs_define(cd_beta_data = cd_kappa_ratio_beta_data)ojs_define(cd_beta_params = cd_kappa_ratio_beta_params)``````{python}import pandas as pdimport numpy as npimport statsmodels.api as smfrom statsmodels.stats.stattools import jarque_beraimport statsmodels.formula.api as smffrom IPython.display import display, HTMLgame_data = pd.read_csv('cd-experiment-all-cleaned.csv')freq_top = game_data['Mean Top']cd_ratio_kappa = game_data['CD-kappa Ratio']cd_ratio_kappa = sm.add_constant(cd_ratio_kappa)logit_cd_ratio_kappa = sm.Logit(freq_top, cd_ratio_kappa)result_logit = logit_cd_ratio_kappa.fit(disp=False, method='newton', cov_type='HC0')result_logit_wald = result_logit.wald_test_terms(scalar=True)logit_jb = jarque_bera(result_logit.resid_response)logit_output_cd =open("regression_output_logit.html").read().format( log_pseudolikelihood="%.6f"% result_logit.llf, Number_of_obs=int(result_logit.nobs), Wald_chi2 ="%.6f"% result_logit_wald.statistic[1], chi2_p ="%.6f"% result_logit_wald.pvalues[1], Pseudo_R_squared="%.4f"% result_logit.prsquared, Constant_coef="%.6f"% result_logit.params.const, Constant_std_err="%.6f"% result_logit.bse.const, Constant_z="%.4f"% result_logit.tvalues.const, Constant_p="%.4f"% result_logit.pvalues.const, Constant_ci_low="%.6f"% result_logit.conf_int(alpha=0.05)[0].const, Constant_ci_high="%.6f"% result_logit.conf_int(alpha=0.05)[1].const, X="CD-κ Ratio", beta_coef="%.6f"% result_logit.params["CD-kappa Ratio"], beta_std_err="%.6f"% result_logit.bse["CD-kappa Ratio"], beta_z="%.4f"% result_logit.tvalues["CD-kappa Ratio"], beta_p="%.4f"% result_logit.pvalues["CD-kappa Ratio"], beta_ci_low="%.6f"% result_logit.conf_int(alpha=0.05)[0]["CD-kappa Ratio"], beta_ci_high="%.6f"% result_logit.conf_int(alpha=0.05)[1]["CD-kappa Ratio"], cov_type = result_logit.cov_type,)display(HTML(logit_output_cd))# print(result_logit.summary())predicted_y_logit = []for i inrange(int(result_logit.nobs)): predicted_y_logit.append(freq_top[i]-result_logit.resid_response[i])cd_kappa_ratio_logit_params_dict =dict( Constant_coef=[result_logit.params.const], Constant_std_err=[result_logit.bse.const], Constant_t=[result_logit.tvalues.const], Constant_p=[result_logit.pvalues.const], Constant_ci_low=[result_logit.conf_int(alpha=0.05)[0]["const"]], Constant_ci_high=[result_logit.conf_int(alpha=0.05)[1].const], beta_coef=[result_logit.params["CD-kappa Ratio"]], beta_std_err=[result_logit.bse["CD-kappa Ratio"]], beta_t=[result_logit.tvalues["CD-kappa Ratio"]], beta_p=[result_logit.pvalues["CD-kappa Ratio"]], beta_ci_low=[result_logit.conf_int(alpha=0.05)[0]["CD-kappa Ratio"]], beta_ci_high=[result_logit.conf_int(alpha=0.05)[1]["CD-kappa Ratio"]], resid_se=np.std(result_logit.resid_response), jarque_bera_stat="%.4f"% logit_jb[0], jarque_bera_p="%.4f"% logit_jb[1])cd_kappa_ratio_logit_params = pd.DataFrame(cd_kappa_ratio_logit_params_dict)cd_kappa_ratio_logit_data_dict =dict( residuals = result_logit.resid_response, predicted_freq_top = predicted_y_logit, CD_kappa_Ratio = cd_ratio_kappa["CD-kappa Ratio"], freq_top_data = freq_top )cd_kappa_ratio_logit_data = pd.DataFrame(cd_kappa_ratio_logit_data_dict)ojs_define(cd_logit_data = cd_kappa_ratio_logit_data)ojs_define(cd_logit_params = cd_kappa_ratio_logit_params)```@fig-cd-data-cont-estimations shows the fitted estimation results for the CD Ratio Models. ::: {#fig-cd-data-cont-estimations}::: {.panel-tabset .ojs-track-layout}## CD Ratio Prediction```{ojs}Plot.plot({ color: { legend: true, domain: ['OLS', 'Beta', 'Logit', 'Ln'], range: ['blueviolet', 'limegreen', 'blue', 'orange'] }, x: { label: 'CD Ratio (Kappa)', labelAnchor: 'center', line: true, domain: [0, 1], }, y: { label: 'Freq. of Top', labelAnchor: 'center', line: true, domain: [0, 1], }, grid: true, marks: [ Plot.dot(transpose(cd_ols_data), {x: "CD_kappa_Ratio", y: "freq_top_data", fill: 'black', tip: true}), Plot.line(transpose(cd_ols_data), {x: "CD_kappa_Ratio", y: "predicted_freq_top", stroke: 'blueviolet', sort: "CD_kappa_Ratio"}), Plot.line(transpose(cd_beta_data), {x: "CD_kappa_Ratio", y: "predicted_freq_top", stroke: 'limegreen', sort: "CD_kappa_Ratio"}), Plot.line(transpose(cd_logit_data), {x: "CD_kappa_Ratio", y: "predicted_freq_top", stroke: 'blue', sort: "CD_kappa_Ratio"}), Plot.line(transpose(cd_ln_data), {x: "CD_kappa_Ratio", y: "predicted_freq_top", stroke: 'orange', sort: "CD_kappa_Ratio"}), Plot.text([[0.8, 0.95]], {text: [`Y_ols = ${cd_ols_params['Constant_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} + ${cd_ols_params['beta_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} X`], fill: 'blueviolet'}), Plot.text([[0.8, 0.8]], {text: [`Y_beta = B(g(${cd_beta_params['Constant_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} + ${cd_beta_params['beta_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} X), ${cd_beta_params['precision_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})})`], fill: 'limegreen'}), Plot.text([[0.8, 0.75]], {text: [`Y_logit = 1/(1 + exp(-(${cd_logit_params['Constant_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} + ${cd_logit_params['beta_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} X)))`], fill: 'blue'}), Plot.text([[0.75, 0.9]], {text: [`Y_ln = ${cd_ln_params['Constant_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} + ${cd_ln_params['beta_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} ln(X)`], fill: 'orange'}) ], width: width,})```## Residuals```{ojs}Plot.plot({ x: { label: 'CD Ratio (Kappa)', labelAnchor: 'center', line: false, domain: [0, 1], }, y: { label: 'Residual', labelAnchor: 'center', line: true, domain: [-0.3, 0.3], }, grid: true, marks: [ Plot.dot(transpose(cd_ols_data), {x: 'CD_kappa_Ratio', y: 'residuals', fill: 'blueviolet', tip: true}), Plot.ruleY([0], {stroke: 'black'}), Plot.text(['OLS Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}), Plot.text([`Jarque Bera Stat: ${cd_ols_params['jarque_bera_stat']} (p = ${cd_ols_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}) ], width: width, height: 200})Plot.plot({ x: { label: 'CD Ratio (Kappa)', labelAnchor: 'center', line: false, domain: [0, 1], }, y: { label: 'Residual', labelAnchor: 'center', line: true, domain: [-0.3, 0.3], }, title: 'Beta Residuals', grid: true, marks: [ Plot.dot(transpose(cd_beta_data), {x: 'CD_kappa_Ratio', y: 'residuals', fill: 'limegreen', tip: true}), Plot.ruleY([0], {stroke: 'black'}), Plot.text(['Beta Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}), Plot.text([`Jarque Bera Stat: ${cd_beta_params['jarque_bera_stat']} (p = ${cd_beta_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}) ], width: width, height: 200})Plot.plot({ x: { label: 'CD Ratio (Kappa)', labelAnchor: 'center', line: false, domain: [0, 1], }, y: { label: 'Residual', labelAnchor: 'center', line: true, domain: [-0.3, 0.3], }, title: 'Logit Residuals', grid: true, marks: [ Plot.dot(transpose(cd_logit_data), {x: 'CD_kappa_Ratio', y: 'residuals', fill: 'blue', tip: true}), Plot.ruleY([0], {stroke: 'black'}), Plot.text(['Logit Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}), Plot.text([`Jarque Bera Stat: ${cd_logit_params['jarque_bera_stat']} (p = ${cd_logit_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}) ], width: width, height: 200})Plot.plot({ x: { label: 'CD Ratio (Kappa)', labelAnchor: 'center', line: false, domain: [0, 1], }, y: { label: 'Residual', labelAnchor: 'center', line: true, domain: [-0.3, 0.3], }, title: 'Ln Residuals', grid: true, marks: [ Plot.dot(transpose(cd_ln_data), {x: 'CD_kappa_Ratio', y: 'residuals', fill: 'orange', tip: true}), Plot.ruleY([0], {stroke: 'black'}), Plot.text(['Ln Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}), Plot.text([`Jarque Bera Stat: ${cd_ln_params['jarque_bera_stat']} (p = ${cd_ln_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}) ], width: width, height: 200})```## Residual histogram```{ojs}{ const normalpdf = function (x, sigma) { return ( (1 / (sigma * Math.sqrt(2 * Math.PI))) * Math.exp((-0.5 * Math.pow(x, 2)) / (2 * Math.pow(sigma, 2))) / 10 ); } let sigma_ols = cd_ols_params['Root_MSE'] let sigma_beta = cd_beta_params['resid_se'] let sigma_logit = cd_logit_params['resid_se'] let sigma_ln = cd_ln_params['resid_se'] let data_ols_normalpdf = [] let data_beta_normalpdf = [] let data_logit_normalpdf = [] let data_ln_normalpdf = [] for (let x=-0.3; x<=0.3;x=x+0.01) { data_ols_normalpdf.push({"x": x, "y": normalpdf(x, sigma_ols)}); data_beta_normalpdf.push({"x": x, "y": normalpdf(x, sigma_beta)}); data_logit_normalpdf.push({"x": x, "y": normalpdf(x, sigma_logit)}); data_ln_normalpdf.push({"x": x, "y": normalpdf(x, sigma_ln)}); } const ols_resid_hist = Plot.plot({ y: { label: 'Proportion', labelAnchor: 'center', grid: true, line: true }, x: { label: 'Residual', labelAnchor: 'center', line: true }, marks: [ Plot.rectY(transpose(cd_ols_data), Plot.binX({y: "proportion"}, {x: 'residuals', fill: 'blueviolet', tip: true})), Plot.text(['OLS Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}), Plot.text([`Jarque Bera Stat: ${cd_ols_params['jarque_bera_stat']} (p = ${cd_ols_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}), Plot.line(data_ols_normalpdf, {x: 'x', y: 'y', stroke: 'black'}) ], width: width, height: 300 }) const beta_resid_hist = Plot.plot({ y: { label: 'Proportion', labelAnchor: 'center', grid: true, line: true }, x: { label: 'Residual', labelAnchor: 'center', line: true }, marks: [ Plot.rectY(transpose(cd_beta_data), Plot.binX({y: "proportion"}, {x: 'residuals', fill: 'limegreen', tip: true})), Plot.text(['Beta Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}), Plot.text([`Jarque Bera Stat: ${cd_beta_params['jarque_bera_stat']} (p = ${cd_beta_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}), Plot.line(data_beta_normalpdf, {x: 'x', y: 'y', stroke: 'black'}) ], width: width, height: 300 }) const logit_resid_hist = Plot.plot({ y: { label: 'Proportion', labelAnchor: 'center', grid: true, line: true }, x: { label: 'Residual', labelAnchor: 'center', line: true }, marks: [ Plot.rectY(transpose(cd_logit_data), Plot.binX({y: "proportion"}, {x: 'residuals', fill: 'blue', tip: true})), Plot.text(['Logit Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}), Plot.text([`Jarque Bera Stat: ${cd_logit_params['jarque_bera_stat']} (p = ${cd_logit_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}), Plot.line(data_logit_normalpdf, {x: 'x', y: 'y', stroke: 'black'}) ], width: width, height: 300 }) const ln_resid_hist = Plot.plot({ y: { label: 'Proportion', labelAnchor: 'center', grid: true, line: true }, x: { label: 'Residual', labelAnchor: 'center', line: true }, marks: [ Plot.rectY(transpose(cd_ln_data), Plot.binX({y: "proportion"}, {x: 'residuals', fill: 'orange', tip: true})), Plot.text(['Ln Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}), Plot.text([`Jarque Bera Stat: ${cd_ln_params['jarque_bera_stat']} (p = ${cd_ln_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}), Plot.line(data_ln_normalpdf, {x: 'x', y: 'y', stroke: 'black'}) ], width: width, height: 300 }) return html`${ols_resid_hist}${beta_resid_hist}${logit_resid_hist}${ln_resid_hist}`}```:::Prediction Estimations [@seabold2010statsmodels]:::From a visual reading of these four estimated functions, the difference in prediction is not practically significant. Therefore, it follows that the estimation method most useful is that with the easiest to understand intuition: the OLS. The above OLS regression estimates that a 0.1 increase in the CD-$\kappa$ Ratio will increase the frequency of Top by `{python} "%.2f" % (10*result_ols.params["CD-kappa Ratio"])` percentage points. Note that the range of the CD-$\kappa$ Ratio used in the experiments is between 0 and 1. The OLS regression also suggests that with a CD-$\kappa$ Ratio of zero, Top is likely to be played `{python} "%.2f" % (100*result_ols.params.const)`% of the time. This may give us some idea of how much Top is played with zero Behavioural component, but as discussed previously (in theory paper), there are multiple different scenarios where the CD-$\kappa$ Ratio could be zero. This is a downfall of using this method, and the reason why only strictly positive values were used in the experiment. The below table shows the OLS regression output for a 'decomposed' CD-$\kappa$ Ratio, regressing the frequency of Top on the individual components of the games with a constant term. Note that this is not the exact same specification as the previous OLS regression. Due to the need for a constant one cannot just take logs of both sides in order to linearize that model, thus a different model was employed. ```{python}import pandas as pdimport numpy as npimport statsmodels.api as smimport statsmodels.formula.api as smffrom IPython.display import display, HTMLgame_data = pd.read_csv('cd-experiment-all-cleaned.csv')freq_top = game_data['Mean Top']eta_1 = game_data['Eta1']eta_2 = game_data['Eta2']beta = game_data['Beta']kappa = game_data['Kappa']data = pd.DataFrame(list(zip(eta_1, eta_2, beta, kappa)), columns=['Eta1', 'Eta2', 'Beta', 'Kappa'])data = sm.add_constant(data)ols_reg = sm.OLS(freq_top, data)result_ols = ols_reg.fit(cov_type='HC3')ols_output_cd =open("regression_output_ols_full.html").read().format( Model_SS="%.6f"% result_ols.ess, Model_df=int(result_ols.df_model), Model_MS="%.6f"% result_ols.mse_model, Residual_SS="%.6f"% result_ols.ssr, Residual_df=int(result_ols.df_resid), Residual_MS="%.6f"% result_ols.mse_resid, Total_SS="%.6f"% result_ols.centered_tss, Total_df=int(result_ols.nobs-1), Total_MS="%.6f"% result_ols.mse_total, Number_of_obs=int(result_ols.nobs), F="%.4f"% result_ols.fvalue, F_p="%.4f"% result_ols.f_pvalue, R_squared="%.4f"% result_ols.rsquared, Adj_R_squared="%.4f"% result_ols.rsquared_adj, Root_MSE="%.4f"% np.sqrt(result_ols.mse_resid), Constant_coef="%.6f"% result_ols.params.const, Constant_std_err="%.6f"% result_ols.bse.const, Constant_t="%.4f"% result_ols.tvalues.const, Constant_p="%.4f"% result_ols.pvalues.const, Constant_ci_low="%.6f"% result_ols.conf_int(alpha=0.05)[0]["const"], Constant_ci_high="%.6f"% result_ols.conf_int(alpha=0.05)[1].const, X_1="Eta 1", beta_1_coef="%.6f"% result_ols.params.Eta1, beta_1_std_err="%.6f"% result_ols.bse.Eta1, beta_1_t="%.4f"% result_ols.tvalues.Eta1, beta_1_p="%.4f"% result_ols.pvalues.Eta1, beta_1_ci_low="%.6f"% result_ols.conf_int(alpha=0.05)[0].Eta1, beta_1_ci_high="%.6f"% result_ols.conf_int(alpha=0.05)[1].Eta1, X_2="Eta 2", beta_2_coef="%.6f"% result_ols.params.Eta2, beta_2_std_err="%.6f"% result_ols.bse.Eta2, beta_2_t="%.4f"% result_ols.tvalues.Eta2, beta_2_p="%.4f"% result_ols.pvalues.Eta2, beta_2_ci_low="%.6f"% result_ols.conf_int(alpha=0.05)[0].Eta2, beta_2_ci_high="%.6f"% result_ols.conf_int(alpha=0.05)[1].Eta2, X_3="Beta", beta_3_coef="%.6f"% result_ols.params.Beta, beta_3_std_err="%.6f"% result_ols.bse.Beta, beta_3_t="%.4f"% result_ols.tvalues.Beta, beta_3_p="%.4f"% result_ols.pvalues.Beta, beta_3_ci_low="%.6f"% result_ols.conf_int(alpha=0.05)[0].Beta, beta_3_ci_high="%.6f"% result_ols.conf_int(alpha=0.05)[1].Beta, X_4="Kappa", beta_4_coef="%.6f"% result_ols.params.Kappa, beta_4_std_err="%.6f"% result_ols.bse.Kappa, beta_4_t="%.4f"% result_ols.tvalues.Kappa, beta_4_p="%.4f"% result_ols.pvalues.Kappa, beta_4_ci_low="%.6f"% result_ols.conf_int(alpha=0.05)[0].Kappa, beta_4_ci_high="%.6f"% result_ols.conf_int(alpha=0.05)[1].Kappa, cov_type = result_ols.cov_type,)display(HTML(ols_output_cd))# print(result.summary())```### Discrete OutcomesDue to the nature of the outcome data, it is necessarily fractional, which means that the size of the set of possible outcomes depends on the number of participants used to collect the data. Even if we expand to a population level, this is still not infinitely large. The experiments conducted for this paper were carried out with `{python} int(result_ols.nobs)` participants, which translates to a possible `{python} int(result_ols.nobs+1)` different frequencies of Top (0, 1/`{python} int(result_ols.nobs)`, 2/`{python} int(result_ols.nobs)`, ... , `{python} int(result_ols.nobs)`/`{python} int(result_ols.nobs)`). Therefore, it is more appropriate to use methods designed for discrete, specifically fractional, dependant variables. Note, however, that for sufficiently large sample size, the continuous beta regression can still be used, and offers more flexibility [@Cribari-Neto_Zeileis [p. 5]]. Such a model is a parameterised version of the quasi-binomial model from [@McCullagh-Nelder]. The binomial distribution shows the number of successes in a sample population, in this case the number of time Top is played. This implies that the proportion of Top $(y)$ in a single game $(i)$ follows an augmented binomial distribution which has been scaled by dividing by the sample size $(M)$: $$y_i\sim B(M, \pi)/M$$Where the probability of getting exactly $k$ choices of Top in a sample of $M$, given a individual probability of Top, is given by the PDF:$$f(k/M, M, \pi) = \Pr(k;M, \pi)=\binom{M}{k}\pi^{k}(1-\pi)^{1-k}$$ {#eq-binomial-pdf-proportion}for $k=0, 1, 2, ..., M$, where$$\binom{M}{k}=\frac{M!}{k!(M-k)!}$$In cases where the sample size is too small to justify a beta regression, one can use a binomial generalised linear model (GLM). Due to the number of observations here, we can pretty safely use a beta regression, but for completeness let us take a deeper look into discrete models. The scale that the proportion of Top follows is that of an interval scale: the categories are not arbitrary; it makes no sense to join adjacent categories; and each category has an attached cardinal number score, which can be used as a measure of difference between categories. These are all true as each category is a fraction of the number of participants (0, 1/`{python} int(result_ols.nobs)`, 2/`{python} int(result_ols.nobs)`, ... , `{python} int(result_ols.nobs)`/`{python} int(result_ols.nobs)`). We can take it even further, as we have true zero (no-one plays Top), therefore meaning we have a (discrete) ratio scale. So what regression models can one use with a binomially distributed endogenous variable on a discrete ratio scale? Let us start with the specification of the frequency of Top $y$ in game $i$, dependant of the exogenous variable of the CD-Ratio $x$:$$\Pr(y_i=k/M|X=x_i)=\binom{M}{k}\pi_i^{k}(1-\pi_i)^{1-k}$$ {#eq-conditional-binomial-pdf-proportion}We can express the probability of Top $\pi_i$ as a function of the exogenous variables, using link function $g(\cdot)$:$$\pi_i=h(x_i)$$ {#eq-probability-of-success-function}The link function can take a number of forms, but we will use the logistic link function, as we did with the beta regression:$$g(\pi_i|x_i)=\ln(\frac{\pi_i}{1-\pi_i})$$The logistic function has simpler theoreticaal properties and intuition over using Probit $(g(\pi)=\Phi^{-1}(\pi))$, a complementary log-log $(g(\pi)=\log(-\log(1-\pi)))$, and the log-log $(g(\pi)=-\log(-\log(\pi)))$ link functions. It follows that using the logistic link function that the log odds are:$$g(\pi_i|x_i)=\ln(\frac{\pi_i}{1-\pi_i})=\gamma_0+\gamma_1x_i$$ {#eq-parameterisation-of-logistic-link}This can alternatively be written as the probability of Top in game $i$:$$\pi_i=h(x_i)=\frac{\exp(\gamma_0+\gamma_1x_i)}{1+\exp(\gamma_0+\gamma_1x_i)}=\frac{1}{1+\exp(-(\gamma_0+\gamma_1x_i))}$$ {#eq-prob-top-wth-logistic-link}The interpretations of $\gamma_1$ is given by the derivatives of $\pi_i$ with respect to $x_i$:$$\frac{\partial\pi_i}{\partial x_i}=\pi_i(1-\pi_i)\gamma_1$$ {#eq-logistic-relationship-between-x-and-prob-top}This effect is largest when the probability is near 0.5, and gets smaller nearer to the bounds. This just reflects the more accurate non-linear relationship here, but highlights the slight cost in intuition compared to a LPM. Note that one must assume that participants are homogenous (however unlikely) and independant (more appropriate). However, we do not know the values of the true $\pi_{ij}$'s, therefore we can regard them as independant random variables themselves with a mean of $\bar{\pi}_i$. Therefore, each observation of either Top or Bottom is a Bernoulli trial with mean $\bar{\pi}_i$, or $y_{ij}\sim B(1, \bar{\pi}_i)$, which means that the sum $y_i=y_{i1}+...+y_{iM}$ is distributed as $y_i\sim B(M, \bar{\pi}_i)$. This also suggests that the proportion of Top $\bar{y}_i=\frac{y_i}{M}$ follows a similar distribution, scaled to be proportional $\bar{y}_i\sim B(M, \bar{\pi}_i)/M$. It can be shown that as $m\rightarrow\infty$, the sum of Top $y_i$, for any fixed $\pi$ tends towards a Normal distribution. This is all from chapter 4 in (@McCullagh-Nelder). Does normality follow for the proportion of Top $\bar{y}_i$? I think this may degerate to a distribution around the mean with zero variance. The binomial distribution has a mean of $M\pi$ and a variance of $M\pi(1-\pi)$. If we divide the binomial distribution by the number of participants/trials then the mean and variance become $\pi$ and $\frac{\pi(1-\pi)}{M}$ respectively. Therefore, as $M\rightarrow\infty$ the variance of the proportional binomial distribution tends to zero. Another way we could assess this is in relation to Bernoulli trials. As each observation $j$ in game $i$ can be modelled as a result from a Bernoulli trial with mean $\bar{\pi}_i$, the expected proportion of Top in the game $i$ from the sample will be the sample mean:$$\bar{y}_i = \frac{1}{M}\sum_{j=1}^M y_{ij}$$As $y_{ij}\sim B(1, \bar{\pi}_i)$, it follows that $\bar{y}_i\sim \frac{1}{M}\sum_{j=1}^M B(1, \bar{\pi}_i)$. The proportion of top follows the mean of the Bernoulli distribution, or likewise the mean of a Binomial distribution with $M=1$, with a distribution mean of $\bar{\pi}_i$. The mean of this meaned distribution is the same $(\bar{\pi}_i)$, but the variance will change depending on the value of M, as written above $\big(\frac{\bar{\pi}_i(1-\bar{\pi}_i)}{M}\big)$. This conclusion also implies, due to the Central Limit Theorum, that for a large enough sample size the proportion of Top $\bar{y}_i$ will follow a Normal distribution:$$\sqrt{M}(\bar{y}_i-\bar{\pi}_i) \xrightarrow{a} \mathcal{N}(0, \bar{\pi}_i(1-\bar{\pi}_i))$$Note that both the mean and variance are dependant on the explanatory variables, thus each game's proportion of Top follows a different distribution with different mean and variances. The closer the mean is to zero and one the smaller the variance will be, thus heteroskedasticity across games should be expected. Using the logistic link function @eq-prob-top-wth-logistic-link, we can rewrite the sample variance as:$$Var(\bar{y}_i|x_i) = \frac{1}{M}\frac{\exp(\gamma_0+\gamma_1x_i)}{(1+\exp(\gamma_0+\gamma_1x_i))^2} = \frac{1}{M}\frac{\exp(-(\gamma_0+\gamma_1x_i))}{(1+\exp(-(\gamma_0+\gamma_1x_i)))^2} = \frac{1}{M}Var(y_i|x_i)$$The takeaway from this variance equation, is twofold: clearly it has inherent heterskedasticity; and the variance is falling in M. The heteroskedasticity can be shown in @fig-logistic-function-and-variance-of-y below. The logistic function has been scaled by a factor of 100, such that both lines fit on the same graph (ojs doesn't have secondary axes yet smh). One can see that as the logistic function approaches the upper bound of 0.01 (1 if unscaled) the variance of $\bar{y}$ is falling, shown both in theory and in the data[^1]. It is also clear that the variance is smaller with more observations. [^1]: The data for 50 and 100 participants does not exist so is just the existing data for 31 participants scaled down as if it were the same data, i.e. divided by 50 and 100 respectively instead of by 31. ::: {#fig-logistic-function-and-variance-of-y}```{ojs}{ const logistic_func = function (x, gamma_0, gamma_1) { return ( Math.exp(gamma_0 + gamma_1*x) / (100*(1 + Math.exp(gamma_0 + gamma_1*x))) ); } const logisticvar = function (x, gamma_0, gamma_1, m) { return ( Math.exp(gamma_0 + gamma_1*x) / (m * (1 + Math.exp(gamma_0 + gamma_1*x))**2) ); } let logistic_data = [] for (let x=0; x<=1;x=x+0.001) { logistic_data.push({"Number of Participants": 31, "x": x, "y": logistic_func(x, 0.5, 2), "y_var": logisticvar(x, 0.5, 2, 31)}); logistic_data.push({"Number of Participants": 50, "x": x, "y": logistic_func(x, 0.5, 2), "y_var": logisticvar(x, 0.5, 2, 50)}); logistic_data.push({"Number of Participants": 100, "x": x, "y": logistic_func(x, 0.5, 2), "y_var": logisticvar(x, 0.5, 2, 100)}); } let var_top_bar = [] for (let i=0; i<50; i++) { var_top_bar.push({"Number of Participants": 31, 'x': cd_data[i]['CD-kappa Ratio'], 'y': cd_data[i]['Var Top']/31}); var_top_bar.push({"Number of Participants": 50, 'x': cd_data[i]['CD-kappa Ratio'], 'y': cd_data[i]['Var Top']/50}); var_top_bar.push({"Number of Participants": 100, 'x': cd_data[i]['CD-kappa Ratio'], 'y': cd_data[i]['Var Top']/100}); } const logistic_var_figure = Plot.plot({ color: { legend: true, domain: ["scaled logistic function", "logistic variance of ȳ|x", "Var of Mean Top"], range: ["black", "red", "blue"] }, x: { label: 'x', line: false, domain: [0, 1], // ticks: 4, }, y: { label: "y", labelAnchor: 'center', line: false, domain: [0, 0.01], }, grid: true, width: width, marks: [ Plot.frame(), Plot.line(logistic_data, {x: 'x', y: 'y', stroke: 'black', fx: 'Number of Participants'}), Plot.line(logistic_data, {x: 'x', y: 'y_var', stroke: 'red', fx: 'Number of Participants'}), Plot.dot(var_top_bar, {x:'x', y: 'y', fill: 'blue', fx: 'Number of Participants'}), ], }) return logistic_var_figure}```Variance of y from a logistic link function:::Neither of these results are particularly novel but both are important. The beta regression model naturally accounts for this heteroskedasticity, but @fig-logistic-function-and-variance-of-y highlights that the magnitude of heteroskediasticity falls with more participants, by the flatness of the logistic variance of $\bar{y}_i$. Making this relationship flatter does not mean it is less significant, in fact with more observations it is likely to become more statistically significant. However, the practical significance, the relative size of the correlation between the variance of $\bar{y}_i$ and $x_i$ is becoming less so. The OLS estimator of the gradient between $Var(\bar{y}_i)$ and $x_i$ is:$$\hat{\gamma}_{Var(\bar{y}), x} = \frac{\hat{Cov}(Var(\bar{y}), x)}{\hat{Var}(x)}=\hat{\rho}(Var(\bar{y}), x)\frac{\hat{\sigma}_{Var(\bar{y})}}{\hat{\sigma}_x}$$Given that the correlation coefficient between $Var(\bar{y}_i)$ and $x_i$ is `{python} "%.4f" % corr_var_top_cd`, the standard deviation of x is `{python} "%.4f" % std_cd` and the standard deviation of $Var(\bar{y})=\frac{1}{M}Var(y)$ can be written as:$$\hat{\sigma}_{Var(\bar{y})}=\frac{1}{M}\hat{\sigma}_{Var(y)}=\frac{{`{python} round(std_var_top, 4)`}}{M}$$Therefore, using the current data, and assuming the estimates do not change with more data, we can construct a rough estimate of the OLS estimator of the gradient between $Var(\bar{y}_i)$ and $x_i$ for the number of participants M:$$\hat{\gamma}_{Var(\bar{y}), x} = {`{python} round(corr_var_top_cd, 4)`}\frac{{`{python} round(std_var_top, 4)`}}{{`{python} round(std_cd, 4)`}}M^{-1} \approx {`{python} ols_var_top_cd_sign`}\frac{{`{python} abs(round(ols_var_top_cd, 4))`}}{M}$$The question now is what is the gradient at which heteroskedasticity can be deemed practically insignificant. The figure below show many graphs using the CD model and CRRA.::: {.panel-tabset .ojs-track-layout}## CD Ratio```{ojs}//| label: fig-cd_ratioPlot.plot({ x: { label: 'CD Ratio (Kappa)', labelAnchor: 'center', line: true, domain: [0, 1], }, y: { label: 'Freq. of Top', labelAnchor: 'center', line: true, domain: [0, 1], }, grid: true, marks: [ Plot.dot(cd_data, {x: "CD-kappa Ratio", y: "Mean Top", fill: 'red'}), Plot.linearRegressionY(cd_data, {x: "CD-kappa Ratio", y: "Mean Top", stroke: 'blueviolet'}) ], width: width,})```## CD Prediction```{ojs}//| label: fig-cd_predictionPlot.plot({ color: { legend: true, domain: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], }, x: { label: 'CD Ratio Prediction (Kappa)', labelAnchor: 'center', line: true, domain: [0, 1], }, y: { label: 'Freq. of Top', labelAnchor: 'center', line: true, domain: [0, 1], }, grid: true, width: width, marks: [ Plot.dot(cd_data, {x: "CD-kappa Ratio Pred", y: "Mean Top", fill: 'Strat Eq Set'}), Plot.linearRegressionY(cd_data, {x: "CD-kappa Ratio Pred", y: "Mean Top", stroke: 'blueviolet'}), Plot.link([0, 1], {x1: 0, y1: 0, x2: 1, y2: 1}), ]})// legend names = ['NE=0.67', 'NE=0.5', 'NE=0.8', 'NE=0.6', 'NE=0.7', 'NE=0.9']```## CRRA Prediction```{ojs}//| label: fig-crra_predictionPlot.plot({ color: { legend: true, domain: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], }, x: { label: 'CRRA Prediction', labelAnchor: 'center', line: true, domain: [0, 1], }, y: { label: 'Freq. of Top', labelAnchor: 'center', line: true, domain: [0, 1], }, grid: true, width: width, marks: [ Plot.dot(cd_data, {x: "CRRA Pred", y: "Mean Top", fill: 'Strat Eq Set'}), Plot.linearRegressionY(cd_data, {x: "CRRA Pred", y: "Mean Top", stroke: 'blueviolet'}), Plot.link([0, 1], {x1: 0, y1: 0, x2: 1, y2: 1}), ]})```## CD Vs CRRA```{ojs}//| label: fig-cd_vs_crraPlot.plot({ color: { legend: true, domain: ['CD Prediction (Kappa)', 'CRRA Prediction'], range: ['blue', 'limegreen'] }, x: { label: 'Prediction', labelAnchor: 'center', line: true, domain: [0, 1], }, y: { label: 'Freq. of Top', labelAnchor: 'center', line: true, domain: [0, 1], }, grid: true, width: width, marks: [ Plot.dot(cd_data, {x: "CD-kappa Ratio Pred", y: "Mean Top", fill: 'blue'}), Plot.linearRegressionY(cd_data, {x: "CD-kappa Ratio Pred", y: "Mean Top", stroke: 'blue'}), Plot.dot(cd_data, {x: "CRRA Pred", y: "Mean Top", fill: 'limegreen'}), Plot.linearRegressionY(cd_data, {x: "CRRA Pred", y: "Mean Top", stroke: 'limegreen'}), Plot.link([0, 1], {x1: 0, y1: 0, x2: 1, y2: 1}), ]})```## Strat Equivalent Groups```{ojs}//| label: fig-strat_eq_groupsPlot.plot({ color: { legend: true, domain: ['NE', 'CD Ratio Prediction', 'CD Ratio Prediction (Kappa)', 'CRRA', 'Mean of Top'], range:['black', 'orange', 'blue', 'limegreen', 'red'], }, x: { label: 'Games', line: false, domain: [1, 5], ticks: 4, }, y: { label: 'Freq. of Top', labelAnchor: 'center', line: false, domain: [0, 1], }, grid: true, width: width, fx: { label: "Strategically Equivalent Set" }, facet: {data: cd_data, x: 'Strat Eq Set'}, marks: [ Plot.frame(), Plot.line(cd_data, {x: 'ID in Set', y: 'NE', stroke: 'black'}), Plot.line(cd_data, {x: 'ID in Set', y: 'CD Ratio Pred', stroke: 'orange'}), Plot.line(cd_data, {x: 'ID in Set', y: 'CD-kappa Ratio Pred', stroke: 'blue'}), Plot.line(cd_data, {x: 'ID in Set', y: 'CRRA Pred', stroke: 'limegreen'}), Plot.line(cd_data, {x: 'ID in Set', y: 'Mean Top', stroke: 'red'}), ]})```## Decision Times By Strat Eq Set```{ojs}//| label: fig-decision_times_strat_eqlower_time = []upper_time = []{ for (var i = 0; i < 50; i++) { lower_time[i] = cd_data[i]['Mean Time'] - cd_data[i]['Std Time'] upper_time[i] = cd_data[i]['Mean Time'] + cd_data[i]['Std Time'] } return html``}Plot.plot({ color: { legend: true, domain: ['Mean Time'], range:['red'], }, x: { label: 'Games', line: false, domain: [1, 5], ticks: 4, }, y: { label: 'Seconds', labelAnchor: 'center', line: false, domain: [0, 60], }, grid: true, width: width, fx: { label: "Strategically Equivalent Set" }, facet: {data: cd_data, x: 'Strat Eq Set'}, marks: [ Plot.frame(), Plot.dot(cd_data, {x: 'ID in Set', y: 'Mean Time', fill: 'red'}), Plot.ruleX(cd_data, {x: 'ID in Set', y1: lower_time, y2: upper_time, strokeWidth: 2}) ]})```## Decision Times Correlation to Data```{ojs}//| label: fig-decision_times_corrPlot.plot({ color: { legend: true, domain: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10], }, x: { label: 'Mean Time', labelAnchor: 'center', line: true, domain: [20, 35], }, y: { label: 'Freq. of Top', labelAnchor: 'center', line: true, domain: [0, 1], }, grid: true, width: width, marks: [ Plot.dot(cd_data, {x: "Mean Time", y: "Mean Top", fill: 'Strat Eq Set'}), Plot.linearRegressionY(cd_data, {x: "Mean Time", y: "Mean Top", stroke: 'blueviolet'}) ]})```## Data```{ojs}Inputs.table(cd_data)```:::The figure below shows the decisions of each participants over the course of the experimental session, and also the times it took to make each of the 50 choices. This is only the Anti-Coordination games, and omits the other games. "Cumulative-Mean of Choices" shows how the mean of each participants' choices changes over time (i.e. the mean of "Top" from period 1 to the Round set by the slider). Dragging the slider to the right will increase the range of rounds over the mean is calculated. "Cumulative-Mean Times" is much the same, but for the times the participants took to make their decisions. Note that the decision page had a 20 second timer restricting participants to spend at least 20 seconds on the page. One can see a general downward trend in times, suggesting that participants do get quicker at making choices as the experiment progresses. ::: {#fig-cd_data_mean_coop}::: {.panel-tabset .ojs-track-layout}## Cumulative-Mean of Choices```{ojs}cd_data_in_order_filter = cd_data_in_order.slice(0, Round_Number)strokes = [ "#c73c12", "#7f3b91", "#3cb38e", "#3ca6b8", "#164a93", ]Plot.plot({ x: { label: 'Round', domain: [0, 50] }, y: { label: 'Freq. of Top', domain: [0,1] }, marks: [ Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_1_rolling_mean_choice: wie2lsge", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_2_rolling_mean_choice: r3mxhovg", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_3_rolling_mean_choice: 4tu66uu6", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_4_rolling_mean_choice: 0biy1qxx", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_5_rolling_mean_choice: 8lux4hb3", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_6_rolling_mean_choice: nos1ed85", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_7_rolling_mean_choice: xb7bmbep", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_8_rolling_mean_choice: zabwtpfh", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_9_rolling_mean_choice: j02ftbwj", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_10_rolling_mean_choice: vmtlvcw6", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_11_rolling_mean_choice: 0ih5xn19", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_12_rolling_mean_choice: nxcgwlcg", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_13_rolling_mean_choice: jldxqcry", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_14_rolling_mean_choice: np9ne33s", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_15_rolling_mean_choice: bj7kao3e", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_16_rolling_mean_choice: mune4bur", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_17_rolling_mean_choice: cbwvf4zj", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_18_rolling_mean_choice: wn5be09d", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_19_rolling_mean_choice: e1uv9wut", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_20_rolling_mean_choice: hdrcg7z1", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_21_rolling_mean_choice: ffsfod6h", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_22_rolling_mean_choice: d6t61z0s", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_23_rolling_mean_choice: fd9h7qlm", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_24_rolling_mean_choice: uy1t7iem", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_25_rolling_mean_choice: mg7kner7", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_26_rolling_mean_choice: rqu56kec", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_27_rolling_mean_choice: 1gkqfhob", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_28_rolling_mean_choice: 7tco6o97", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_29_rolling_mean_choice: phdlm5a3", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_30_rolling_mean_choice: t4q22to2", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_31_rolling_mean_choice: jayomb7t", stroke: strokes[0]}), ], grid: true, width: width, height: 450,})```## Cumulative-Mean Times```{ojs}Plot.plot({ x: { label: 'Round', domain: [0, 50] }, y: { label: 'Seconds', domain: [0,110] }, marks: [ Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_1_rolling_mean_time: wie2lsge", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_2_rolling_mean_time: r3mxhovg", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_3_rolling_mean_time: 4tu66uu6", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_4_rolling_mean_time: 0biy1qxx", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_5_rolling_mean_time: 8lux4hb3", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_6_rolling_mean_time: nos1ed85", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_7_rolling_mean_time: xb7bmbep", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_8_rolling_mean_time: zabwtpfh", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_9_rolling_mean_time: j02ftbwj", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_10_rolling_mean_time: vmtlvcw6", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_11_rolling_mean_time: 0ih5xn19", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_12_rolling_mean_time: nxcgwlcg", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_13_rolling_mean_time: jldxqcry", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_14_rolling_mean_time: np9ne33s", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_15_rolling_mean_time: bj7kao3e", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_16_rolling_mean_time: mune4bur", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_17_rolling_mean_time: cbwvf4zj", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_18_rolling_mean_time: wn5be09d", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_19_rolling_mean_time: e1uv9wut", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_20_rolling_mean_time: hdrcg7z1", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_21_rolling_mean_time: ffsfod6h", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_22_rolling_mean_time: d6t61z0s", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_23_rolling_mean_time: fd9h7qlm", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_24_rolling_mean_time: uy1t7iem", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_25_rolling_mean_time: mg7kner7", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_26_rolling_mean_time: rqu56kec", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_27_rolling_mean_time: 1gkqfhob", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_28_rolling_mean_time: 7tco6o97", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_29_rolling_mean_time: phdlm5a3", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_30_rolling_mean_time: t4q22to2", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_31_rolling_mean_time: jayomb7t", stroke: strokes[0]}), Plot.ruleY([20], {stroke: "black"}), ], grid: true, width: width, height: 450,})```## Choices```{ojs}Plot.plot({ x: { label: 'Round', domain: [0, 50] }, y: { label: 'Freq. of Top', domain: [0,1] }, marks: [ Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_1_choices: wie2lsge", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_2_choices: r3mxhovg", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_3_choices: 4tu66uu6", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_4_choices: 0biy1qxx", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_5_choices: 8lux4hb3", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_6_choices: nos1ed85", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_7_choices: xb7bmbep", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_8_choices: zabwtpfh", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_9_choices: j02ftbwj", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_10_choices: vmtlvcw6", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_11_choices: 0ih5xn19", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_12_choices: nxcgwlcg", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_13_choices: jldxqcry", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_14_choices: np9ne33s", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_15_choices: bj7kao3e", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_16_choices: mune4bur", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_17_choices: cbwvf4zj", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_18_choices: wn5be09d", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_19_choices: e1uv9wut", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_20_choices: hdrcg7z1", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_21_choices: ffsfod6h", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_22_choices: d6t61z0s", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_23_choices: fd9h7qlm", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_24_choices: uy1t7iem", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_25_choices: mg7kner7", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_26_choices: rqu56kec", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_27_choices: 1gkqfhob", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_28_choices: 7tco6o97", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_29_choices: phdlm5a3", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_30_choices: t4q22to2", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_31_choices: jayomb7t", stroke: strokes[0]}), ], grid: true, width: width, height: 450,})```## Times```{ojs}Plot.plot({ x: { label: 'Round', domain: [0, 50] }, y: { label: 'Seconds', domain: [0,200] }, marks: [ Plot.line(cd_data_in_order_filter, {x: "Round", y: "wie2lsge", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "r3mxhovg", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "4tu66uu6", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "0biy1qxx", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "8lux4hb3", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "nos1ed85", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "xb7bmbep", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "zabwtpfh", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "j02ftbwj", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "vmtlvcw6", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "0ih5xn19", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "nxcgwlcg", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "jldxqcry", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "np9ne33s", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "bj7kao3e", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "mune4bur", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "cbwvf4zj", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "wn5be09d", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "e1uv9wut", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "hdrcg7z1", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "ffsfod6h", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "d6t61z0s", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "fd9h7qlm", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "uy1t7iem", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "mg7kner7", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "rqu56kec", stroke: strokes[0]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "1gkqfhob", stroke: strokes[1]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "7tco6o97", stroke: strokes[2]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "phdlm5a3", stroke: strokes[3]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "t4q22to2", stroke: strokes[4]}), Plot.line(cd_data_in_order_filter, {x: "Round", y: "jayomb7t", stroke: strokes[0]}), Plot.ruleY([20], {stroke: "black"}), ], grid: true, width: width, height: 450,})```:::```{ojs}viewof Round_Number = Inputs.range([1, 50], { label: "Round", value: 1, step: 1, })```Time series of data:::## Qualitative AnalysisUsing NVivo, one can analyse the descriptions the participants have provided at the end of the experiment. Using their responses one can create codes for different decision making processes and their intentions. The most frequent codes are shown in the figure below. ::: {#fig-code_frequency}```{ojs}Plot.plot({ x: { label: "Code", labelAnchor: 'center', line: true, }, y: { label: 'Frequency', labelAnchor: 'center', line: true, domain: [0, 1] }, grid: true, width: width, marks: [ Plot.barY(code_list, {x: "Name", y: "Frequency", tip: true, sort: {x: "y", reverse: true}}), ], color: {scheme: "purples"}})```Frequency of Codes Amongst Participants' Decision Descriptions:::These codes are not mutually exclusive, but merely indicate the number of participants who have expressed this code in their description. For example, any participant said that they thought about their opponent's actions have been coded as 'Strategic Decision Making', based on this being evidence of at least Level 1 rationality. Descriptions of each code can be found in the appendix. The most frequent code, 87.1% of participants, is 'Self-Regarding': Indicating an attempt to increase their own payoff. Presumably, all participants were self-regarding, but not all have expressed this in their answer. It should also be noted that 'Self-Regarding' is not exclusively so, as 35.5% of participants stated an intention to be 'Other-Regarding', in that they cared about their opponent's payoff also. There were also sizeable frequencies of 'Risk-Aversion', 29%, and 'Max-Min' (Minimum payoff maximisation, or avoidance of small payoffs), 22.6%. These percentages only represent the frequency these codes exist but it is important to dissect how these codes overlap. This can be represented in the diagram below. ::: {#fig-code_correlation_matrix}```{ojs}viewof abs_rel_cellplt = Inputs.radio(["Absolute", "Relative"], {label: "View", value: "Absolute"}){ if (abs_rel_cellplt == "Relative"){ var fill = "freq"; var title = d => `|${d.Code} ∩ ${d.Vs}| / |${d.Code}| = ${d3.format(".3")(d.freq)}` }else if(abs_rel_cellplt == "Absolute"){ var fill = "value"; var title = d => `|${d.Code} ∩ ${d.Vs}| = ${d.value}` }const cellplt = Plot.plot({ marks: [ Plot.cell(code_matrix, { x: "Vs", y: "Code", fill: fill, tip: true, title: title, }), Plot.text(code_matrix, colorContrast({ x: "Vs", y: "Code", fill: fill, text: fill, pointerEvents: "none", })), ], x: { tickRotate: 45, label: "Code", }, y: { label: "Code", }, color: { scheme: "purples"}, grid: true, width: 800, height: 516, marginLeft: 125, marginRight: 55, marginBottom: 125,})const celltotalplt = Plot.plot({ marks: [ Plot.cell(code_totals, { x: "Code", y: "Vs", fill: "total", tip: true, title: d => `${d.Vs}: ${d.total}`, }), Plot.text(code_totals, colorContrast({ x: "Code", y: "Vs", fill: "total", text: "total", pointerEvents: "none", })), ], x: { label: "", }, y: { axis: "right", label: "", }, color: { scheme: "greens"}, grid: true, width: 62, height: 516, marginLeft: -68, marginRight: 68, marginBottom: 125,})return html`<div class="container" style="display: flex;"><div>${cellplt}</div><div">${celltotalplt}</div></div>`}```Matrix of Codes Amongst Participants' Decision Descriptions:::The above figure should be read horizontally. It shoulds the overlap between different codes. The right-hand side shows the total number of occurances for each code, and the numbers in each cell along each row show how often each corresponding code occurs alongside that row's code (either in absolute terms, or relative to the code's set size). Below shows an UpSet plot to show in great detail the intersections of all of the codes. ::: {#fig-code_upset-plot}```{ojs}{ let Codes = ["Change", "Inequality Aversion", "Instinctive", "Joint Max", "Max Expected Payoff", "Max-Min", "Other-Regarding", "Risk Aversion", "Risk Positive", "Self-Regarding", "Strategic Decision Making"] let evenCodes = ["Change", "Instinctive", "Max Expected Payoff", "Other-Regarding", "Risk Positive", "Strategic Decision Making"] let fullWidth = 820 let lowerBarChartWidth = 254 let lowerSetChartHeight = 328//288 const backgroundStripeChart = Plot.cell( code_upset_data_cross_prod.filter((d) => evenCodes.indexOf(d["Code"]) > -1), { x: d => d.id, y: d => d["Code"], fill: '#efefef', stroke: '#efefef' } ) const backgroundDotChart = Plot.dot( code_upset_data_cross_prod, { x: d => d.id, y: d => d["Code"], fill: '#dedede', r: 3 } ) const dotChart = Plot.dot( code_upset_data_per_cat, { x: d => d.id, y: d => d["Code"], fill: "#3f007d", r: 3 } ) const lineChart = Plot.line( code_upset_data_per_cat, { x: d => d.id, y: d => d["Code"], z: d => d.id, strokeWidth: 2, stroke: "#3f007d" } ) const upperUpsetBarPlot = Plot.plot({ marks: [ Plot.ruleY([0], {fill: "black"}), Plot.barY(code_upset_data, {x: "id", y: "total", tip: true, title: d => `${d.Codes}: ${d.total}`, sort: {x: "y", reverse: true}, fill: "#3f007d"}), Plot.text(code_upset_data, {x: "id", y: "total", text: d => d.total, textAnchor: 'end', dy: -7}), ], x: { axis: null, padding: 0.2, round: false }, y: { tickPadding: 2, line: true }, width: fullWidth - lowerBarChartWidth, marginLeft: 20, style: { background: '#ffffff00' } }) const lowerUpsetBarChart = Plot.plot({ marks: [ Plot.barX(code_totals, {y: "Vs", x: "total", fill: "#3f007d"}), Plot.text(code_totals, {y: "Vs", x: "total", text: d => d.total, textAnchor: 'end', dx: -4}), ], x: { axis: 'top', line: true, reverse: true }, y: { axis: 'right', tickSize: 0, tickPadding: 7, insetTop: 2, round: false, padding: 0.2, align: 0, label: "", }, width: lowerBarChartWidth, height: lowerSetChartHeight + 22, marginRight: 122, style: { background: '#ffffff00' } }) const lowerUpsetChart = Plot.plot({ marks: [ backgroundStripeChart, backgroundDotChart, dotChart, lineChart ], x: { axis: null, padding: 0, round: false, insetRight: 1, insetLeft: 1 }, y: { axis: null, tickSize: 0 }, width: fullWidth - lowerBarChartWidth - 20, height: lowerSetChartHeight, marginTop: 28, style: { background: '#ffffff00' } })return html`<div style="overflow-x: auto"> <div style="display: flex; flex-direction: column; align-items: flex-end; width: ${fullWidth}px"> <div> ${[upperUpsetBarPlot]} </div> <div style="display: flex; position: relative; top: -50px; align-items: flex-start"> ${[lowerUpsetBarChart, lowerUpsetChart]} </div> </div></div`}```Code UpSet Plot (@2014_infovis_upset):::The UpSet plot allows us to interpret the intersection of our 9 codes where a Venn diagram would not. The most common intersections unsurprisingly occur between the most populated groups, and tend to be intersections of only 2 or 3 codes. The most common intersection is that between the 2 largest sets: "Self-Regarding" and "Strategic Decision Making", followed by the intersection of "Other-Regarding", "Self-Regarding" and "Strategic Decision Making", and "Strategic Decision Making" on its own. Due to currently limited data, many intersections only happen once or twice. </body>