CD-Ratio-Experiment-Analysis

Author

Alex Ralphs

Code

cd_data = FileAttachment("cd-experiment-all-cleaned.csv").csv({typed:true})
cd_data_in_order = FileAttachment("cd-experiment-all-cleaned-in-order.csv").csv({typed:true})
code_list_raw = FileAttachment("cd-experiment-all-by-participant-cleaned-code-list.xlsx").xlsx()
code_list=code_list_raw.sheet("Sheet1", {headers: true})
code_matrix_raw = FileAttachment("cd-experiment-all-by-participant-cleaned-code-matrix.xlsx").xlsx()
code_matrix = code_matrix_raw.sheet("Sheet3", {headers: true})
code_totals = code_matrix_raw.sheet("Sheet4", {headers: true})
code_upset_data = code_matrix_raw.sheet("Sheet5", {headers: true})
code_upset_data_cross_prod = code_matrix_raw.sheet("Sheet6", {headers: true})
code_upset_data_per_cat = code_matrix_raw.sheet("Sheet7", {headers: true})
import {Plot} from "@observablehq/plot"
import {colorContrast} from "@observablehq/plot-colorcontrast-custom-transform"

1 Quantitative Analysis

1.1 Continuous Outcomes

Before we start to analyse using data, we need to decide what functional form the relationship between variables should be. The relationship we primarily care about is that between the CD-\(\kappa\) ratio and the frequency of ‘Top’ amongst the population. The simplest relationship is a Linear Probability Model (LPM). Whilst the LPM is the easiest to interpret, it could possibly exceed the bounds of [0, 1] (at some point it must, but specifically if it exceed these bounds within the data set this would be bad). The LPM may also omit useful information such as any non-linear effects: we may expect the relationship to be S-shaped, for example, and the LPM is constrained to being a straight line. The LPM is be specified as:

\[ FREQ\_TOP = \beta_0 + \beta_1 * CD\_RATIO \tag{1}\]

The intercept \(\beta_0\) has a complicated explanation, even though it may seem simple: the predicted frequency of top when the CD Ratio is zero. THis is true, however there are multiple ways the CD Ratio could be zero: any of the numerators \((\eta_2, \beta)\) could equal 0; or, less likely, one of the denominators \((\eta_1, \kappa)\) could be infinitely large. The later cases can be ignored as no game can realistically be constructed using infinitely large decomposed parts. One could argue that if \(\eta_2=0\) then resulting came is not longer strictly an Anti-coordination game. If we accept this definition as fact, we must then conclude that the intercept shows the frequency of top when there is no behavioural incentives \((\beta=0)\).

The slope \(\beta_1\) is simpler to understand: it is the rate at which changes to the CD ratio increase the frequency of Top. The estimation result below shows the OLS estimates for 1. From this we can infer that for every 0.1 increase in the CD-\(\kappa\) Ratio the frequency of Top will increase by 3.4 percentage points, starting from a base of 59.6%. These estimate are significant at a level of 0.1%, and has an \(R^2=37.9%\).

Code

import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.stattools import jarque_bera
from statsmodels.stats.diagnostic import het_breuschpagan
import statsmodels.formula.api as smf
from IPython.display import display, HTML

game_data = pd.read_csv('cd-experiment-all-cleaned.csv')

freq_top = game_data['Mean Top']
var_top = game_data['Var Top']
cd_ratio_kappa = game_data['CD-kappa Ratio']
cd_ratio_kappa = sm.add_constant(cd_ratio_kappa)

ols_cd_ratio_kappa = sm.OLS(freq_top, cd_ratio_kappa)
result_ols = ols_cd_ratio_kappa.fit(cov_type='HC3')
ols_jb = jarque_bera(result_ols.resid)
ols_het_bp = het_breuschpagan(result_ols.resid, cd_ratio_kappa, robust=True)

std_var_top = np.std(var_top, ddof=1)
var_top_bar = var_top / result_ols.nobs
std_cd = np.std(game_data['CD-kappa Ratio'], ddof=1)
corr_var_top_cd = np.corrcoef(var_top, game_data['CD-kappa Ratio'])[0][1]
ols_var_top_cd = corr_var_top_cd * std_var_top / std_cd
if np.sign(ols_var_top_cd) == -1:
    ols_var_top_cd_sign = u"\u2212"
else: 
    ols_var_top_cd_sign = "" 

ols_output_cd = open("regression_output_ols.html").read().format(
    title="OLS Regression Results",
    Model_SS="%.6f" % result_ols.ess,
    Model_df=int(result_ols.df_model),
    Model_MS="%.6f" % result_ols.mse_model,
    Residual_SS="%.6f" % result_ols.ssr,
    Residual_df=int(result_ols.df_resid),
    Residual_MS="%.6f" % result_ols.mse_resid,
    Total_SS="%.6f" % result_ols.centered_tss,
    Total_df=int(result_ols.nobs-1),
    Total_MS="%.6f" % result_ols.mse_total,
    Number_of_obs=int(result_ols.nobs),
    F="%.4f" % result_ols.fvalue, 
    F_p="%.4f" % result_ols.f_pvalue,
    R_squared="%.4f" % result_ols.rsquared,
    Adj_R_squared="%.4f" % result_ols.rsquared_adj,
    Root_MSE="%.4f" % np.sqrt(result_ols.mse_resid),
    Constant_coef="%.6f" % result_ols.params.const,
    Constant_std_err="%.6f" % result_ols.bse.const,
    Constant_t="%.4f" % result_ols.tvalues.const,
    Constant_p="%.4f" % result_ols.pvalues.const,
    Constant_ci_low="%.6f" % result_ols.conf_int(alpha=0.05)[0]["const"],
    Constant_ci_high="%.6f" % result_ols.conf_int(alpha=0.05)[1].const,
    X="CD-κ Ratio",
    beta_coef="%.6f" % result_ols.params["CD-kappa Ratio"],
    beta_std_err="%.6f" % result_ols.bse["CD-kappa Ratio"],
    beta_t="%.4f" % result_ols.tvalues["CD-kappa Ratio"],
    beta_p="%.4f" % result_ols.pvalues["CD-kappa Ratio"],
    beta_ci_low="%.6f" % result_ols.conf_int(alpha=0.05)[0]["CD-kappa Ratio"],
    beta_ci_high="%.6f" % result_ols.conf_int(alpha=0.05)[1]["CD-kappa Ratio"],
    cov_type = result_ols.cov_type,
)
display(HTML(ols_output_cd))
# print(result_ols.summary())

predicted_y_ols = []
for i in range(int(result_ols.nobs)):
    predicted_y_ols.append(result_ols.params.const+result_ols.params["CD-kappa Ratio"]*cd_ratio_kappa["CD-kappa Ratio"][i])

cd_kappa_ratio_ols_params_dict = dict(
    Constant_coef=[result_ols.params.const],
    Constant_std_err=[result_ols.bse.const],
    Constant_t=[result_ols.tvalues.const],
    Constant_p=[result_ols.pvalues.const],
    Constant_ci_low=[result_ols.conf_int(alpha=0.05)[0]["const"]],
    Constant_ci_high=[result_ols.conf_int(alpha=0.05)[1].const],
    beta_coef=[result_ols.params["CD-kappa Ratio"]],
    beta_std_err=[result_ols.bse["CD-kappa Ratio"]],
    beta_t=[result_ols.tvalues["CD-kappa Ratio"]],
    beta_p=[result_ols.pvalues["CD-kappa Ratio"]],
    beta_ci_low=[result_ols.conf_int(alpha=0.05)[0]["CD-kappa Ratio"]],
    beta_ci_high=[result_ols.conf_int(alpha=0.05)[1]["CD-kappa Ratio"]],
    Root_MSE="%.4f" % np.sqrt(result_ols.mse_resid),
    jarque_bera_stat="%.4f" % ols_jb[0],
    jarque_bera_p="%.4f" % ols_jb[1],
    breuschpagan_lm_stat="%.4f" % ols_het_bp[0],
    breuschpagan_lm_p="%.4f" % ols_het_bp[1],
    breuschpagan_f_stat="%.4f" % ols_het_bp[2],
    breuschpagan_f_p="%.4f" % ols_het_bp[3]
)

cd_kappa_ratio_ols_params = pd.DataFrame(cd_kappa_ratio_ols_params_dict)

cd_kappa_ratio_ols_data_dict = dict(
    residuals = result_ols.resid,
    predicted_freq_top = predicted_y_ols,
    CD_kappa_Ratio = cd_ratio_kappa["CD-kappa Ratio"],
    freq_top_data = freq_top
    )

cd_kappa_ratio_ols_data = pd.DataFrame(cd_kappa_ratio_ols_data_dict)

ojs_define(cd_ols_data = cd_kappa_ratio_ols_data)
ojs_define(cd_ols_params = cd_kappa_ratio_ols_params)

OLS Regression Results

Source	SS	df	MS
Model	0.228284	1	0.228284
Residual	0.525051	48	0.010939
Total	0.753335	49	0.015374

Number of obs	=	50
F ( 1, 50)	=	19.7902
Prob > F	=	0.0001
R-squared	=	0.3030
Adj R-squared	=	0.2885
Root MSE	=	0.1046

Top	Coef.	Std. Err.	t	P>\|t\|	[95% Conf. Interval]
Constant	0.621518	0.035488	17.5134	0.0000	0.551963	0.691074
CD-κ Ratio	0.295632	0.066455	4.4486	0.0000	0.165383	0.425881

Notes:
[1] Standard Errors are heteroscedasticity robust HC3

Linear regressions with bounded dependant variables inherently have heteroskedastic errors, therefore robust standard errors are necessary. A robust Bruesch-Pagan Test Breusch and Pagan (1979) Koenker (1981) yields a f statistic of 5.8229

Typically, a common method of introducing curvature with simplicity is by using logs. Specifically if we take the natural log of the RHS of 1, we yield:

\[ FREQ\_TOP = \beta_0 + \beta_1 \ln(CD\_RATIO) \tag{2}\]

This equation has similar intuition as 1, but uses a relative scale as opposed to absolute. The intercept is \(\beta_0\), and has the same intuition: the frequency of Top with zero behavioural incentives. The gradient of 2 is decreasing and is equal to \(\frac{\beta_1}{CD\_RATIO}\).

Code

import pandas as pd
import numpy as np
import math
import statsmodels.api as sm
from statsmodels.sandbox.regression.gmm import GMM
from statsmodels.stats.stattools import jarque_bera
import statsmodels.formula.api as smf
from IPython.display import display, HTML

game_data = pd.read_csv('cd-experiment-all-cleaned.csv')

freq_top = game_data['Mean Top']
cd_ratio_kappa = game_data['CD-kappa Ratio']

# class GMMLog(GMM):

#     def __init__(self, *args, **kwds):
#         # set appropriate counts for moment conditions and parameters
#         # TODO: clean up signature
#         kwds.setdefault('k_moms', 2)
#         kwds.setdefault('k_params', 2)
#         super(GMMLog, self).__init__(*args, **kwds)


#     def momcond(self, params):
#         p0, p1 = params
#         endog = self.endog
#         exog = self.exog.squeeze()
#         inst = self.instrument
#         # error1 = np.exp(endog) - p0 - p1 * exog
#         # error2 = exog * (np.exp(endog) - p0 - p1 * exog)
#         error1 = endog - np.log(p0 + p1 * exog)
#         error2 = exog * (endog - np.log(p0 + p1 * exog))
#         g = np.column_stack((error1, error2))
#         return g

# ln_cd_ratio_kappa = GMMLog(freq_top, cd_ratio_kappa, cd_ratio_kappa)
# beta0 = np.array([2, 0.1])
# result_ln = ln_cd_ratio_kappa.fit(beta0, maxiter=2, optim_method='nm', wargs=dict(centered=False))

cd_ratio_kappa_ln = np.log(cd_ratio_kappa)
cd_ratio_kappa_ln = sm.add_constant(cd_ratio_kappa_ln)
ln_cd_ratio_kappa = sm.OLS(freq_top, cd_ratio_kappa_ln)
result_ln = ln_cd_ratio_kappa.fit(cov_type='HC3')
ln_jb = jarque_bera(result_ln.resid)
ln_het_bp = het_breuschpagan(result_ln.resid, cd_ratio_kappa_ln, robust=True)

ln_output_cd = open("regression_output_ols.html").read().format(
    title="Ln-OLS Regression Results",
    Model_SS="%.6f" % result_ln.ess,
    Model_df=int(result_ln.df_model),
    Model_MS="%.6f" % result_ln.mse_model,
    Residual_SS="%.6f" % result_ln.ssr,
    Residual_df=int(result_ln.df_resid),
    Residual_MS="%.6f" % result_ln.mse_resid,
    Total_SS="%.6f" % result_ln.centered_tss,
    Total_df=int(result_ln.nobs-1),
    Total_MS="%.6f" % result_ln.mse_total,
    Number_of_obs=int(result_ln.nobs),
    F="%.4f" % result_ln.fvalue, 
    F_p="%.4f" % result_ln.f_pvalue,
    R_squared="%.4f" % result_ln.rsquared,
    Adj_R_squared="%.4f" % result_ln.rsquared_adj,
    Root_MSE="%.4f" % np.sqrt(result_ln.mse_resid),
    Constant_coef="%.6f" % result_ln.params.const,
    Constant_std_err="%.6f" % result_ln.bse.const,
    Constant_t="%.4f" % result_ln.tvalues.const,
    Constant_p="%.4f" % result_ln.pvalues.const,
    Constant_ci_low="%.6f" % result_ln.conf_int(alpha=0.05)[0]["const"],
    Constant_ci_high="%.6f" % result_ln.conf_int(alpha=0.05)[1].const,
    X="ln(CD-κ Ratio)",
    beta_coef="%.6f" % result_ln.params["CD-kappa Ratio"],
    beta_std_err="%.6f" % result_ln.bse["CD-kappa Ratio"],
    beta_t="%.4f" % result_ln.tvalues["CD-kappa Ratio"],
    beta_p="%.4f" % result_ln.pvalues["CD-kappa Ratio"],
    beta_ci_low="%.6f" % result_ln.conf_int(alpha=0.05)[0]["CD-kappa Ratio"],
    beta_ci_high="%.6f" % result_ln.conf_int(alpha=0.05)[1]["CD-kappa Ratio"],
    cov_type = result_ln.cov_type,
)
display(HTML(ln_output_cd))

predicted_y_ln = []
for i in range(int(result_ln.nobs)):
    predicted_y_ln.append(result_ln.params.const+result_ln.params["CD-kappa Ratio"]*np.log(cd_ratio_kappa[i]))

cd_kappa_ratio_ln_params_dict = dict(
    Constant_coef=[result_ln.params.const],
    Constant_std_err=[result_ln.bse.const],
    Constant_t=[result_ln.tvalues.const],
    Constant_p=[result_ln.pvalues.const],
    Constant_ci_low=[result_ln.conf_int(alpha=0.05)[0]["const"]],
    Constant_ci_high=[result_ln.conf_int(alpha=0.05)[1].const],
    beta_coef=[result_ln.params["CD-kappa Ratio"]],
    beta_std_err=[result_ln.bse["CD-kappa Ratio"]],
    beta_t=[result_ln.tvalues["CD-kappa Ratio"]],
    beta_p=[result_ln.pvalues["CD-kappa Ratio"]],
    beta_ci_low=[result_ln.conf_int(alpha=0.05)[0]["CD-kappa Ratio"]],
    beta_ci_high=[result_ln.conf_int(alpha=0.05)[1]["CD-kappa Ratio"]],
    Root_MSE="%.4f" % np.sqrt(result_ln.mse_resid),
    jarque_bera_stat="%.4f" % ln_jb[0],
    jarque_bera_p="%.4f" % ln_jb[1],
    breuschpagan_lm_stat="%.4f" % ln_het_bp[0],
    breuschpagan_lm_p="%.4f" % ln_het_bp[1],
    breuschpagan_f_stat="%.4f" % ln_het_bp[2],
    breuschpagan_f_p="%.4f" % ln_het_bp[3]
)

cd_kappa_ratio_ln_params = pd.DataFrame(cd_kappa_ratio_ln_params_dict)

cd_kappa_ratio_ln_data_dict = dict(
    residuals = result_ln.resid,
    predicted_freq_top = predicted_y_ln,
    CD_kappa_Ratio = cd_ratio_kappa,
    freq_top_data = freq_top
    )

cd_kappa_ratio_ln_data = pd.DataFrame(cd_kappa_ratio_ln_data_dict)

ojs_define(cd_ln_data = cd_kappa_ratio_ln_data)
ojs_define(cd_ln_params = cd_kappa_ratio_ln_params)

Ln-OLS Regression Results

Source	SS	df	MS
Model	0.230394	1	0.230394
Residual	0.522941	48	0.010895
Total	0.753335	49	0.015374

Number of obs	=	50
F ( 1, 50)	=	16.8910
Prob > F	=	0.0002
R-squared	=	0.3058
Adj R-squared	=	0.2914
Root MSE	=	0.1044

Top	Coef.	Std. Err.	t	P>\|t\|	[95% Conf. Interval]
Constant	0.848676	0.021507	39.4607	0.0000	0.806524	0.890829
ln(CD-κ Ratio)	0.097205	0.023652	4.1099	0.0000	0.050849	0.143562

Notes:
[1] Standard Errors are heteroscedasticity robust HC3

Whilst including non-linearity, the log model is monotonically increasing and unbounded, therefore will be larger than 1 eventually. The predicted results lie within the [0, 1] bounds within the data, but this is not guaranteed, therefore one can use models such as Beta regression or Logistical Regression which are bound by (0, 1) by design. Despite 0 and 1 not being within the set of possible outcomes (both of which are extremely low probability outcomes), these are more realistic models of the relationship between the CD Ratio and the frequency of Top. However, interpreting the results is far less clear. The Beta regression uses the beta PDF given by:

\[ f(y;\mu,\phi)=\frac{\Gamma(\phi)}{\Gamma(\mu\phi)\Gamma((1-\mu)\phi)}y^{\mu\phi-1}(1-y)^{(1-\mu)\phi-1}\quad 0\lt y\lt 1 \tag{3}\]

with \(0\lt\mu\lt 1\) and \(\phi\gt 0\). Therefore, \(y\sim \mathcal{B} (\mu, \phi)\), \(E(y)=\mu\) and \(Var(y)=\frac{\mu(1-\mu)}{1+\phi}\). Given this, using a random sample (indexed at \(i\)) the beta regression, in this case, is defined as: \[ g(\mu_i) = \beta_0 + \beta_1 * CD\_RATIO_i = \gamma_i \tag{4}\]

The link function \(g(\cdot):(0, 1)\mapsto\mathbb{R}\) used here is that of a logit model:

\[ g(\mu)=\log(\frac{\mu}{1-\mu}) \tag{5}\]

The model is naturally heteroscedastic, taking into account the fact the errors will vary far less as we close in on the bounds of \(y\).

The results of the above beta regression, found using maximum likelihood, are presented below. Predictably, the coefficients are positive, significant, and the scale parameter is relatively large, suggesting a smaller variance in the frequency of Top.

Code

import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.stattools import jarque_bera
from statsmodels.othermod.betareg import BetaModel
import statsmodels.formula.api as smf
from IPython.display import display, HTML

game_data = pd.read_csv('cd-experiment-all-cleaned.csv')

freq_top = game_data['Mean Top']
cd_ratio_kappa = game_data['CD-kappa Ratio']
cd_ratio_kappa = sm.add_constant(cd_ratio_kappa)

beta_cd_ratio_kappa = BetaModel(freq_top, cd_ratio_kappa)
result_beta = beta_cd_ratio_kappa.fit()
beta_jb = jarque_bera(result_beta.resid)

beta_output_cd = open("regression_output_beta.html").read().format(
    link_function= 'g(u) = log(u/(1-u))',
    slink_function= 'g(u) = log(u)',
    log_likelihood="%.6f" % result_beta.llf,
    Number_of_obs=int(result_beta.nobs),
    LR_chi2 = "%.4f" % result_beta.llr,
    LR_chi2_p = "%.4f" % result_beta.llr_pvalue,
    Constant_coef="%.6f" % result_beta.params.const,
    Constant_std_err="%.6f" % result_beta.bse.const,
    Constant_z="%.4f" % result_beta.tvalues.const,
    Constant_p="%.4f" % result_beta.pvalues.const,
    Constant_ci_low="%.6f" % result_beta.conf_int(alpha=0.05)[0].const,
    Constant_ci_high="%.6f" % result_beta.conf_int(alpha=0.05)[1].const,
    X="CD-κ Ratio",
    beta_coef="%.6f" % result_beta.params["CD-kappa Ratio"],
    beta_std_err="%.6f" % result_beta.bse["CD-kappa Ratio"],
    beta_z="%.4f" % result_beta.tvalues["CD-kappa Ratio"],
    beta_p="%.4f" % result_beta.pvalues["CD-kappa Ratio"],
    beta_ci_low="%.6f" % result_beta.conf_int(alpha=0.05)[0]["CD-kappa Ratio"],
    beta_ci_high="%.6f" % result_beta.conf_int(alpha=0.05)[1]["CD-kappa Ratio"],
    scale_coef="%.6f" % result_beta.params.precision,
    scale_std_err="%.6f" % result_beta.bse.precision,
    scale_z="%.4f" % result_beta.tvalues.precision,
    scale_p="%.4f" % result_beta.pvalues.precision,
    scale_ci_low="%.6f" % result_beta.conf_int(alpha=0.05)[0].precision,
    scale_ci_high="%.6f" % result_beta.conf_int(alpha=0.05)[1].precision,
    cov_type = result_beta.cov_type,
)
display(HTML(beta_output_cd))
# print(result_beta.summary())

predicted_y_beta = []
for i in range(int(result_beta.nobs)):
    predicted_y_beta.append(freq_top[i]-result_beta.resid[i])

cd_kappa_ratio_beta_params_dict = dict(
    Constant_coef=[result_beta.params.const],
    Constant_std_err=[result_beta.bse.const],
    Constant_t=[result_beta.tvalues.const],
    Constant_p=[result_beta.pvalues.const],
    Constant_ci_low=[result_beta.conf_int(alpha=0.05)[0]["const"]],
    Constant_ci_high=[result_beta.conf_int(alpha=0.05)[1].const],
    beta_coef=[result_beta.params["CD-kappa Ratio"]],
    beta_std_err=[result_beta.bse["CD-kappa Ratio"]],
    beta_t=[result_beta.tvalues["CD-kappa Ratio"]],
    beta_p=[result_beta.pvalues["CD-kappa Ratio"]],
    beta_ci_low=[result_beta.conf_int(alpha=0.05)[0]["CD-kappa Ratio"]],
    beta_ci_high=[result_beta.conf_int(alpha=0.05)[1]["CD-kappa Ratio"]],
    precision_coef=[result_beta.params["precision"]],
    precision_std_err=[result_beta.bse["precision"]],
    precision_t=[result_beta.tvalues["precision"]],
    precision_p=[result_beta.pvalues["precision"]],
    precision_ci_low=[result_beta.conf_int(alpha=0.05)[0]["precision"]],
    precision_ci_high=[result_beta.conf_int(alpha=0.05)[1]["precision"]],
    resid_se=np.std(result_beta.resid),
    jarque_bera_stat="%.4f" % beta_jb[0],
    jarque_bera_p="%.4f" % beta_jb[1]
)

cd_kappa_ratio_beta_params = pd.DataFrame(cd_kappa_ratio_beta_params_dict)

cd_kappa_ratio_beta_data_dict = dict(
    residuals = result_beta.resid,
    predicted_freq_top = predicted_y_beta,
    CD_kappa_Ratio = cd_ratio_kappa["CD-kappa Ratio"],
    freq_top_data = freq_top
    )

cd_kappa_ratio_beta_data = pd.DataFrame(cd_kappa_ratio_beta_data_dict)

ojs_define(cd_beta_data = cd_kappa_ratio_beta_data)
ojs_define(cd_beta_params = cd_kappa_ratio_beta_params)

Beta Regression Results


Link function	:	g(u) = log(u/(1-u))
Slink function	:	g(u) = log(u)

Log likelihood	=	47.823585

Number of obs	=	50
LR chi2(1)	=	18.3526
Prob > chi2	=	0.0000

[Logit]
[Log]

Top	Coef.	Std. Err.	z	P>\|z\|	[95% Conf. Interval]
Top
Constant	0.459089	0.132768	3.4578	0.0005	0.198868	0.719310
CD-κ Ratio	1.495219	0.333815	4.4792	0.0000	0.840954	2.149485
scale
Constant	2.999547	0.196628	15.2549	0.0000	2.614162	3.384932

Notes:
[1] Standard Errors are nonrobust

Code

import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.stats.stattools import jarque_bera
import statsmodels.formula.api as smf
from IPython.display import display, HTML

game_data = pd.read_csv('cd-experiment-all-cleaned.csv')

freq_top = game_data['Mean Top']
cd_ratio_kappa = game_data['CD-kappa Ratio']
cd_ratio_kappa = sm.add_constant(cd_ratio_kappa)

logit_cd_ratio_kappa = sm.Logit(freq_top, cd_ratio_kappa)
result_logit = logit_cd_ratio_kappa.fit(disp=False, method='newton', cov_type='HC0')
result_logit_wald = result_logit.wald_test_terms(scalar=True)
logit_jb = jarque_bera(result_logit.resid_response)

logit_output_cd = open("regression_output_logit.html").read().format(
    log_pseudolikelihood="%.6f" % result_logit.llf,
    Number_of_obs=int(result_logit.nobs),
    Wald_chi2 = "%.6f" % result_logit_wald.statistic[1],
    chi2_p = "%.6f" % result_logit_wald.pvalues[1],
    Pseudo_R_squared="%.4f" % result_logit.prsquared,
    Constant_coef="%.6f" % result_logit.params.const,
    Constant_std_err="%.6f" % result_logit.bse.const,
    Constant_z="%.4f" % result_logit.tvalues.const,
    Constant_p="%.4f" % result_logit.pvalues.const,
    Constant_ci_low="%.6f" % result_logit.conf_int(alpha=0.05)[0].const,
    Constant_ci_high="%.6f" % result_logit.conf_int(alpha=0.05)[1].const,
    X="CD-κ Ratio",
    beta_coef="%.6f" % result_logit.params["CD-kappa Ratio"],
    beta_std_err="%.6f" % result_logit.bse["CD-kappa Ratio"],
    beta_z="%.4f" % result_logit.tvalues["CD-kappa Ratio"],
    beta_p="%.4f" % result_logit.pvalues["CD-kappa Ratio"],
    beta_ci_low="%.6f" % result_logit.conf_int(alpha=0.05)[0]["CD-kappa Ratio"],
    beta_ci_high="%.6f" % result_logit.conf_int(alpha=0.05)[1]["CD-kappa Ratio"],
    cov_type = result_logit.cov_type,
)
display(HTML(logit_output_cd))
# print(result_logit.summary())

predicted_y_logit = []
for i in range(int(result_logit.nobs)):
    predicted_y_logit.append(freq_top[i]-result_logit.resid_response[i])

cd_kappa_ratio_logit_params_dict = dict(
    Constant_coef=[result_logit.params.const],
    Constant_std_err=[result_logit.bse.const],
    Constant_t=[result_logit.tvalues.const],
    Constant_p=[result_logit.pvalues.const],
    Constant_ci_low=[result_logit.conf_int(alpha=0.05)[0]["const"]],
    Constant_ci_high=[result_logit.conf_int(alpha=0.05)[1].const],
    beta_coef=[result_logit.params["CD-kappa Ratio"]],
    beta_std_err=[result_logit.bse["CD-kappa Ratio"]],
    beta_t=[result_logit.tvalues["CD-kappa Ratio"]],
    beta_p=[result_logit.pvalues["CD-kappa Ratio"]],
    beta_ci_low=[result_logit.conf_int(alpha=0.05)[0]["CD-kappa Ratio"]],
    beta_ci_high=[result_logit.conf_int(alpha=0.05)[1]["CD-kappa Ratio"]],
    resid_se=np.std(result_logit.resid_response),
    jarque_bera_stat="%.4f" % logit_jb[0],
    jarque_bera_p="%.4f" % logit_jb[1]
)

cd_kappa_ratio_logit_params = pd.DataFrame(cd_kappa_ratio_logit_params_dict)

cd_kappa_ratio_logit_data_dict = dict(
    residuals = result_logit.resid_response,
    predicted_freq_top = predicted_y_logit,
    CD_kappa_Ratio = cd_ratio_kappa["CD-kappa Ratio"],
    freq_top_data = freq_top
    )

cd_kappa_ratio_logit_data = pd.DataFrame(cd_kappa_ratio_logit_data_dict)

ojs_define(cd_logit_data = cd_kappa_ratio_logit_data)
ojs_define(cd_logit_params = cd_kappa_ratio_logit_params)

Logit Regression Results


Log pseudolikelihood	:	-23.874471

Number of obs	=	50
Wald chi2(1)	=	25.057481
Prob > chi2	=	0.000001
Pseudo R2	=	-1.9623

Top	Coef.	Std. Err.	z	P>\|z\|	[95% Conf. Interval]
Constant	0.425273	0.161884	2.6270	0.0086	0.107985	0.742560
CD-κ Ratio	1.638431	0.327310	5.0057	0.0000	0.996915	2.279947

Notes:
[1] Standard Errors are heteroscedasticity robust HC0

Figure 1 shows the fitted estimation results for the CD Ratio Models.

Code

Plot.plot({
    color: {
        legend: true,
        domain: ['OLS', 'Beta', 'Logit', 'Ln'],
        range: ['blueviolet', 'limegreen', 'blue', 'orange']
    },
    x: {
        label: 'CD Ratio (Kappa)',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    y: {
        label: 'Freq. of Top',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    grid: true,
    marks: [
        Plot.dot(transpose(cd_ols_data), {x: "CD_kappa_Ratio", y: "freq_top_data", fill: 'black', tip: true}),
        Plot.line(transpose(cd_ols_data), {x: "CD_kappa_Ratio", y: "predicted_freq_top", stroke: 'blueviolet', sort: "CD_kappa_Ratio"}),
        Plot.line(transpose(cd_beta_data), {x: "CD_kappa_Ratio", y: "predicted_freq_top", stroke: 'limegreen', sort: "CD_kappa_Ratio"}),
        Plot.line(transpose(cd_logit_data), {x: "CD_kappa_Ratio", y: "predicted_freq_top", stroke: 'blue', sort: "CD_kappa_Ratio"}),
        Plot.line(transpose(cd_ln_data), {x: "CD_kappa_Ratio", y: "predicted_freq_top", stroke: 'orange', sort: "CD_kappa_Ratio"}),
        Plot.text([[0.8, 0.95]], {text: [`Y_ols = ${cd_ols_params['Constant_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} + ${cd_ols_params['beta_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} X`], fill: 'blueviolet'}),
        Plot.text([[0.8, 0.8]], {text: [`Y_beta = B(g(${cd_beta_params['Constant_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} + ${cd_beta_params['beta_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} X), ${cd_beta_params['precision_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})})`], fill: 'limegreen'}),
        Plot.text([[0.8, 0.75]], {text: [`Y_logit = 1/(1 + exp(-(${cd_logit_params['Constant_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} + ${cd_logit_params['beta_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} X)))`], fill: 'blue'}),
        Plot.text([[0.75, 0.9]], {text: [`Y_ln = ${cd_ln_params['Constant_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} + ${cd_ln_params['beta_coef'].toLocaleString(undefined, {minimumFractionDigits: 4})} ln(X)`], fill: 'orange'})
    ],
    width: width,
})

Code

Plot.plot({
    x: {
        label: 'CD Ratio (Kappa)',
        labelAnchor: 'center',
        line: false,
        domain: [0, 1],
    },
    y: {
        label: 'Residual',
        labelAnchor: 'center',
        line: true,
        domain: [-0.3, 0.3],
    },
    grid: true,
    marks: [
        Plot.dot(transpose(cd_ols_data), {x: 'CD_kappa_Ratio', y: 'residuals', fill: 'blueviolet', tip: true}),
        Plot.ruleY([0], {stroke: 'black'}),
        Plot.text(['OLS Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}),
        Plot.text([`Jarque Bera Stat: ${cd_ols_params['jarque_bera_stat']} (p = ${cd_ols_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300})
    ],
    width: width,
    height: 200
})

Code

Plot.plot({
    x: {
        label: 'CD Ratio (Kappa)',
        labelAnchor: 'center',
        line: false,
        domain: [0, 1],
    },
    y: {
        label: 'Residual',
        labelAnchor: 'center',
        line: true,
        domain: [-0.3, 0.3],
    },
    title: 'Beta Residuals',
    grid: true,
    marks: [
        Plot.dot(transpose(cd_beta_data), {x: 'CD_kappa_Ratio', y: 'residuals', fill: 'limegreen', tip: true}),
        Plot.ruleY([0], {stroke: 'black'}),
        Plot.text(['Beta Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}),
        Plot.text([`Jarque Bera Stat: ${cd_beta_params['jarque_bera_stat']} (p = ${cd_beta_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300})
    ],
    width: width,
    height: 200
})

Code

Plot.plot({
    x: {
        label: 'CD Ratio (Kappa)',
        labelAnchor: 'center',
        line: false,
        domain: [0, 1],
    },
    y: {
        label: 'Residual',
        labelAnchor: 'center',
        line: true,
        domain: [-0.3, 0.3],
    },
    title: 'Logit Residuals',
    grid: true,
    marks: [
        Plot.dot(transpose(cd_logit_data), {x: 'CD_kappa_Ratio', y: 'residuals', fill: 'blue', tip: true}),
        Plot.ruleY([0], {stroke: 'black'}),
        Plot.text(['Logit Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}),
        Plot.text([`Jarque Bera Stat: ${cd_logit_params['jarque_bera_stat']} (p = ${cd_logit_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300})
    ],
    width: width,
    height: 200
})

Code

Plot.plot({
    x: {
        label: 'CD Ratio (Kappa)',
        labelAnchor: 'center',
        line: false,
        domain: [0, 1],
    },
    y: {
        label: 'Residual',
        labelAnchor: 'center',
        line: true,
        domain: [-0.3, 0.3],
    },
    title: 'Ln Residuals',
    grid: true,
    marks: [
        Plot.dot(transpose(cd_ln_data), {x: 'CD_kappa_Ratio', y: 'residuals', fill: 'orange', tip: true}),
        Plot.ruleY([0], {stroke: 'black'}),
        Plot.text(['Ln Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}),
        Plot.text([`Jarque Bera Stat: ${cd_ln_params['jarque_bera_stat']} (p = ${cd_ln_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300})
    ],
    width: width,
    height: 200
})

Code

{
    const normalpdf = function (x, sigma) {
        return (
            (1 / (sigma * Math.sqrt(2 * Math.PI))) * Math.exp((-0.5 * Math.pow(x, 2)) / (2 * Math.pow(sigma, 2))) / 10
        );
    }
    let sigma_ols = cd_ols_params['Root_MSE']
    let sigma_beta = cd_beta_params['resid_se']
    let sigma_logit = cd_logit_params['resid_se']
    let sigma_ln = cd_ln_params['resid_se']
    
    let data_ols_normalpdf = []
    let data_beta_normalpdf = []
    let data_logit_normalpdf = []
    let data_ln_normalpdf = []
    for (let x=-0.3; x<=0.3;x=x+0.01) {
        data_ols_normalpdf.push({"x": x, "y": normalpdf(x, sigma_ols)});
        data_beta_normalpdf.push({"x": x, "y": normalpdf(x, sigma_beta)});
        data_logit_normalpdf.push({"x": x, "y": normalpdf(x, sigma_logit)});
        data_ln_normalpdf.push({"x": x, "y": normalpdf(x, sigma_ln)});
    }
    
    const ols_resid_hist = Plot.plot({
        y: {
            label: 'Proportion',
            labelAnchor: 'center',
            grid: true,
            line: true
            },
        x: {
            label: 'Residual',
            labelAnchor: 'center',
            line: true
        },
        marks: [
            Plot.rectY(transpose(cd_ols_data), Plot.binX({y: "proportion"}, {x: 'residuals', fill: 'blueviolet', tip: true})),
            Plot.text(['OLS Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}),
            Plot.text([`Jarque Bera Stat: ${cd_ols_params['jarque_bera_stat']} (p = ${cd_ols_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}),
            Plot.line(data_ols_normalpdf, {x: 'x', y: 'y', stroke: 'black'})
        ],
        width: width,
        height: 300
    })

    const beta_resid_hist = Plot.plot({
        y: {
            label: 'Proportion',
            labelAnchor: 'center',
            grid: true,
            line: true
            },
        x: {
            label: 'Residual',
            labelAnchor: 'center',
            line: true
        },
        marks: [
            Plot.rectY(transpose(cd_beta_data), Plot.binX({y: "proportion"}, {x: 'residuals', fill: 'limegreen', tip: true})),
            Plot.text(['Beta Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}),
            Plot.text([`Jarque Bera Stat: ${cd_beta_params['jarque_bera_stat']} (p = ${cd_beta_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}),
            Plot.line(data_beta_normalpdf, {x: 'x', y: 'y', stroke: 'black'})
        ],
        width: width,
        height: 300
    })

    const logit_resid_hist = Plot.plot({
        y: {
            label: 'Proportion',
            labelAnchor: 'center',
            grid: true,
            line: true
            },
        x: {
            label: 'Residual',
            labelAnchor: 'center',
            line: true
        },
        marks: [
            Plot.rectY(transpose(cd_logit_data), Plot.binX({y: "proportion"}, {x: 'residuals', fill: 'blue', tip: true})),
            Plot.text(['Logit Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}),
            Plot.text([`Jarque Bera Stat: ${cd_logit_params['jarque_bera_stat']} (p = ${cd_logit_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}),
            Plot.line(data_logit_normalpdf, {x: 'x', y: 'y', stroke: 'black'})
        ],
        width: width,
        height: 300
    })

    const ln_resid_hist = Plot.plot({
        y: {
            label: 'Proportion',
            labelAnchor: 'center',
            grid: true,
            line: true
            },
        x: {
            label: 'Residual',
            labelAnchor: 'center',
            line: true
        },
        marks: [
            Plot.rectY(transpose(cd_ln_data), Plot.binX({y: "proportion"}, {x: 'residuals', fill: 'orange', tip: true})),
            Plot.text(['Ln Residuals'], {frameAnchor: 'top', dy: -15, fontSize: 16, text: (d) => d}),
            Plot.text([`Jarque Bera Stat: ${cd_ln_params['jarque_bera_stat']} (p = ${cd_ln_params['jarque_bera_p']})`], {frameAnchor: 'top', dy: 5, dx: 300}),
            Plot.line(data_ln_normalpdf, {x: 'x', y: 'y', stroke: 'black'})
        ],
        width: width,
        height: 300
    })
    return html`${ols_resid_hist}${beta_resid_hist}${logit_resid_hist}${ln_resid_hist}`
}

Figure 1: Prediction Estimations (Seabold and Perktold 2010)

From a visual reading of these four estimated functions, the difference in prediction is not practically significant. Therefore, it follows that the estimation method most useful is that with the easiest to understand intuition: the OLS. The above OLS regression estimates that a 0.1 increase in the CD-\(\kappa\) Ratio will increase the frequency of Top by 2.96 percentage points. Note that the range of the CD-\(\kappa\) Ratio used in the experiments is between 0 and 1. The OLS regression also suggests that with a CD-\(\kappa\) Ratio of zero, Top is likely to be played 62.15% of the time. This may give us some idea of how much Top is played with zero Behavioural component, but as discussed previously (in theory paper), there are multiple different scenarios where the CD-\(\kappa\) Ratio could be zero. This is a downfall of using this method, and the reason why only strictly positive values were used in the experiment.

The below table shows the OLS regression output for a ‘decomposed’ CD-\(\kappa\) Ratio, regressing the frequency of Top on the individual components of the games with a constant term. Note that this is not the exact same specification as the previous OLS regression. Due to the need for a constant one cannot just take logs of both sides in order to linearize that model, thus a different model was employed.

Code

import pandas as pd
import numpy as np
import statsmodels.api as sm
import statsmodels.formula.api as smf
from IPython.display import display, HTML

game_data = pd.read_csv('cd-experiment-all-cleaned.csv')

freq_top = game_data['Mean Top']
eta_1 = game_data['Eta1']
eta_2 = game_data['Eta2']
beta = game_data['Beta']
kappa = game_data['Kappa']

data = pd.DataFrame(list(zip(eta_1, eta_2, beta, kappa)), columns=['Eta1', 'Eta2', 'Beta', 'Kappa'])
data = sm.add_constant(data)

ols_reg = sm.OLS(freq_top, data)
result_ols = ols_reg.fit(cov_type='HC3')

ols_output_cd = open("regression_output_ols_full.html").read().format(
    Model_SS="%.6f" % result_ols.ess,
    Model_df=int(result_ols.df_model),
    Model_MS="%.6f" % result_ols.mse_model,
    Residual_SS="%.6f" % result_ols.ssr,
    Residual_df=int(result_ols.df_resid),
    Residual_MS="%.6f" % result_ols.mse_resid,
    Total_SS="%.6f" % result_ols.centered_tss,
    Total_df=int(result_ols.nobs-1),
    Total_MS="%.6f" % result_ols.mse_total,
    Number_of_obs=int(result_ols.nobs),
    F="%.4f" % result_ols.fvalue, 
    F_p="%.4f" % result_ols.f_pvalue,
    R_squared="%.4f" % result_ols.rsquared,
    Adj_R_squared="%.4f" % result_ols.rsquared_adj,
    Root_MSE="%.4f" % np.sqrt(result_ols.mse_resid),
    Constant_coef="%.6f" % result_ols.params.const,
    Constant_std_err="%.6f" % result_ols.bse.const,
    Constant_t="%.4f" % result_ols.tvalues.const,
    Constant_p="%.4f" % result_ols.pvalues.const,
    Constant_ci_low="%.6f" % result_ols.conf_int(alpha=0.05)[0]["const"],
    Constant_ci_high="%.6f" % result_ols.conf_int(alpha=0.05)[1].const,
    X_1="Eta 1",
    beta_1_coef="%.6f" % result_ols.params.Eta1,
    beta_1_std_err="%.6f" % result_ols.bse.Eta1,
    beta_1_t="%.4f" % result_ols.tvalues.Eta1,
    beta_1_p="%.4f" % result_ols.pvalues.Eta1,
    beta_1_ci_low="%.6f" % result_ols.conf_int(alpha=0.05)[0].Eta1,
    beta_1_ci_high="%.6f" % result_ols.conf_int(alpha=0.05)[1].Eta1,
    X_2="Eta 2",
    beta_2_coef="%.6f" % result_ols.params.Eta2,
    beta_2_std_err="%.6f" % result_ols.bse.Eta2,
    beta_2_t="%.4f" % result_ols.tvalues.Eta2,
    beta_2_p="%.4f" % result_ols.pvalues.Eta2,
    beta_2_ci_low="%.6f" % result_ols.conf_int(alpha=0.05)[0].Eta2,
    beta_2_ci_high="%.6f" % result_ols.conf_int(alpha=0.05)[1].Eta2,
    X_3="Beta",
    beta_3_coef="%.6f" % result_ols.params.Beta,
    beta_3_std_err="%.6f" % result_ols.bse.Beta,
    beta_3_t="%.4f" % result_ols.tvalues.Beta,
    beta_3_p="%.4f" % result_ols.pvalues.Beta,
    beta_3_ci_low="%.6f" % result_ols.conf_int(alpha=0.05)[0].Beta,
    beta_3_ci_high="%.6f" % result_ols.conf_int(alpha=0.05)[1].Beta,
    X_4="Kappa",
    beta_4_coef="%.6f" % result_ols.params.Kappa,
    beta_4_std_err="%.6f" % result_ols.bse.Kappa,
    beta_4_t="%.4f" % result_ols.tvalues.Kappa,
    beta_4_p="%.4f" % result_ols.pvalues.Kappa,
    beta_4_ci_low="%.6f" % result_ols.conf_int(alpha=0.05)[0].Kappa,
    beta_4_ci_high="%.6f" % result_ols.conf_int(alpha=0.05)[1].Kappa,
    cov_type = result_ols.cov_type,
)
display(HTML(ols_output_cd))
# print(result.summary())

OLS Regression Results

Source	SS	df	MS
Model	0.623356	4	0.155839
Residual	0.129978	45	0.002888
Total	0.753335	49	0.015374

Number of obs	=	50
F ( 1, 50)	=	42.3487
Prob > F	=	0.0000
R-squared	=	0.8275
Adj R-squared	=	0.8121
Root MSE	=	0.0537

Top	Coef.	Std. Err.	t	P>\|t\|	[95% Conf. Interval]
Constant	0.565715	0.044933	12.5900	0.0000	0.477647	0.653783
Eta 1	-0.069003	0.006974	-9.8948	0.0000	-0.082671	-0.055335
Eta 2	0.046281	0.007994	5.7893	0.0000	0.030613	0.061949
Beta	0.002230	0.006233	0.3578	0.7205	-0.009987	0.014447
Kappa	0.017316	0.004811	3.5995	0.0003	0.007887	0.026744

Notes:
[1] Standard Errors are heteroscedasticity robust HC3

1.2 Discrete Outcomes

Due to the nature of the outcome data, it is necessarily fractional, which means that the size of the set of possible outcomes depends on the number of participants used to collect the data. Even if we expand to a population level, this is still not infinitely large. The experiments conducted for this paper were carried out with 50 participants, which translates to a possible 51 different frequencies of Top (0, 1/50, 2/50, … , 50/50). Therefore, it is more appropriate to use methods designed for discrete, specifically fractional, dependant variables. Note, however, that for sufficiently large sample size, the continuous beta regression can still be used, and offers more flexibility (Cribari-Neto and Zeileis 2010 [p. 5]). Such a model is a parameterised version of the quasi-binomial model from (McCullagh and Nelder 1989). The binomial distribution shows the number of successes in a sample population, in this case the number of time Top is played. This implies that the proportion of Top \((y)\) in a single game \((i)\) follows an augmented binomial distribution which has been scaled by dividing by the sample size \((M)\):

\[ y_i\sim B(M, \pi)/M \]

Where the probability of getting exactly \(k\) choices of Top in a sample of \(M\), given a individual probability of Top, is given by the PDF:

\[ f(k/M, M, \pi) = \Pr(k;M, \pi)=\binom{M}{k}\pi^{k}(1-\pi)^{1-k} \tag{6}\]

for \(k=0, 1, 2, ..., M\), where

\[ \binom{M}{k}=\frac{M!}{k!(M-k)!} \]

In cases where the sample size is too small to justify a beta regression, one can use a binomial generalised linear model (GLM). Due to the number of observations here, we can pretty safely use a beta regression, but for completeness let us take a deeper look into discrete models.

The scale that the proportion of Top follows is that of an interval scale: the categories are not arbitrary; it makes no sense to join adjacent categories; and each category has an attached cardinal number score, which can be used as a measure of difference between categories. These are all true as each category is a fraction of the number of participants (0, 1/50, 2/50, … , 50/50). We can take it even further, as we have true zero (no-one plays Top), therefore meaning we have a (discrete) ratio scale.

So what regression models can one use with a binomially distributed endogenous variable on a discrete ratio scale? Let us start with the specification of the frequency of Top \(y\) in game \(i\), dependant of the exogenous variable of the CD-Ratio \(x\):

\[ \Pr(y_i=k/M|X=x_i)=\binom{M}{k}\pi_i^{k}(1-\pi_i)^{1-k} \tag{7}\]

We can express the probability of Top \(\pi_i\) as a function of the exogenous variables, using link function \(g(\cdot)\):

\[ \pi_i=h(x_i) \tag{8}\]

The link function can take a number of forms, but we will use the logistic link function, as we did with the beta regression:

\[ g(\pi_i|x_i)=\ln(\frac{\pi_i}{1-\pi_i}) \]

The logistic function has simpler theoreticaal properties and intuition over using Probit \((g(\pi)=\Phi^{-1}(\pi))\), a complementary log-log \((g(\pi)=\log(-\log(1-\pi)))\), and the log-log \((g(\pi)=-\log(-\log(\pi)))\) link functions. It follows that using the logistic link function that the log odds are:

\[ g(\pi_i|x_i)=\ln(\frac{\pi_i}{1-\pi_i})=\gamma_0+\gamma_1x_i \tag{9}\]

This can alternatively be written as the probability of Top in game \(i\):

\[ \pi_i=h(x_i)=\frac{\exp(\gamma_0+\gamma_1x_i)}{1+\exp(\gamma_0+\gamma_1x_i)}=\frac{1}{1+\exp(-(\gamma_0+\gamma_1x_i))} \tag{10}\]

The interpretations of \(\gamma_1\) is given by the derivatives of \(\pi_i\) with respect to \(x_i\):

\[ \frac{\partial\pi_i}{\partial x_i}=\pi_i(1-\pi_i)\gamma_1 \tag{11}\]

This effect is largest when the probability is near 0.5, and gets smaller nearer to the bounds. This just reflects the more accurate non-linear relationship here, but highlights the slight cost in intuition compared to a LPM.

Note that one must assume that participants are homogenous (however unlikely) and independant (more appropriate). However, we do not know the values of the true \(\pi_{ij}\)’s, therefore we can regard them as independant random variables themselves with a mean of \(\bar{\pi}_i\). Therefore, each observation of either Top or Bottom is a Bernoulli trial with mean \(\bar{\pi}_i\), or \(y_{ij}\sim B(1, \bar{\pi}_i)\), which means that the sum \(y_i=y_{i1}+...+y_{iM}\) is distributed as \(y_i\sim B(M, \bar{\pi}_i)\). This also suggests that the proportion of Top \(\bar{y}_i=\frac{y_i}{M}\) follows a similar distribution, scaled to be proportional \(\bar{y}_i\sim B(M, \bar{\pi}_i)/M\). It can be shown that as \(m\rightarrow\infty\), the sum of Top \(y_i\), for any fixed \(\pi\) tends towards a Normal distribution. This is all from chapter 4 in (McCullagh and Nelder (1989)).

Does normality follow for the proportion of Top \(\bar{y}_i\)? I think this may degerate to a distribution around the mean with zero variance. The binomial distribution has a mean of \(M\pi\) and a variance of \(M\pi(1-\pi)\). If we divide the binomial distribution by the number of participants/trials then the mean and variance become \(\pi\) and \(\frac{\pi(1-\pi)}{M}\) respectively. Therefore, as \(M\rightarrow\infty\) the variance of the proportional binomial distribution tends to zero.

Another way we could assess this is in relation to Bernoulli trials. As each observation \(j\) in game \(i\) can be modelled as a result from a Bernoulli trial with mean \(\bar{\pi}_i\), the expected proportion of Top in the game \(i\) from the sample will be the sample mean:

\[ \bar{y}_i = \frac{1}{M}\sum_{j=1}^M y_{ij} \]

As \(y_{ij}\sim B(1, \bar{\pi}_i)\), it follows that \(\bar{y}_i\sim \frac{1}{M}\sum_{j=1}^M B(1, \bar{\pi}_i)\). The proportion of top follows the mean of the Bernoulli distribution, or likewise the mean of a Binomial distribution with \(M=1\), with a distribution mean of \(\bar{\pi}_i\). The mean of this meaned distribution is the same \((\bar{\pi}_i)\), but the variance will change depending on the value of M, as written above \(\big(\frac{\bar{\pi}_i(1-\bar{\pi}_i)}{M}\big)\). This conclusion also implies, due to the Central Limit Theorum, that for a large enough sample size the proportion of Top \(\bar{y}_i\) will follow a Normal distribution:

\[ \sqrt{M}(\bar{y}_i-\bar{\pi}_i) \xrightarrow{a} \mathcal{N}(0, \bar{\pi}_i(1-\bar{\pi}_i)) \]

Note that both the mean and variance are dependant on the explanatory variables, thus each game’s proportion of Top follows a different distribution with different mean and variances. The closer the mean is to zero and one the smaller the variance will be, thus heteroskedasticity across games should be expected.

Using the logistic link function 10, we can rewrite the sample variance as:

\[ Var(\bar{y}_i|x_i) = \frac{1}{M}\frac{\exp(\gamma_0+\gamma_1x_i)}{(1+\exp(\gamma_0+\gamma_1x_i))^2} = \frac{1}{M}\frac{\exp(-(\gamma_0+\gamma_1x_i))}{(1+\exp(-(\gamma_0+\gamma_1x_i)))^2} = \frac{1}{M}Var(y_i|x_i) \]

The takeaway from this variance equation, is twofold: clearly it has inherent heterskedasticity; and the variance is falling in M. The heteroskedasticity can be shown in Figure 2 below. The logistic function has been scaled by a factor of 100, such that both lines fit on the same graph (ojs doesn’t have secondary axes yet smh). One can see that as the logistic function approaches the upper bound of 0.01 (1 if unscaled) the variance of \(\bar{y}\) is falling, shown both in theory and in the data¹. It is also clear that the variance is smaller with more observations.

Code

{
    const logistic_func = function (x, gamma_0, gamma_1) {
        return (
            Math.exp(gamma_0 + gamma_1*x) / (100*(1 + Math.exp(gamma_0 + gamma_1*x)))
        );
    }
    const logisticvar = function (x, gamma_0, gamma_1, m) {
        return (
            Math.exp(gamma_0 + gamma_1*x) / (m * (1 + Math.exp(gamma_0 + gamma_1*x))**2)
        );
    }
    let logistic_data = []
    for (let x=0; x<=1;x=x+0.001) {
        logistic_data.push({"Number of Participants": 31, "x": x, "y": logistic_func(x, 0.5, 2), "y_var": logisticvar(x, 0.5, 2, 31)});
        logistic_data.push({"Number of Participants": 50, "x": x, "y": logistic_func(x, 0.5, 2), "y_var": logisticvar(x, 0.5, 2, 50)});
        logistic_data.push({"Number of Participants": 100, "x": x, "y": logistic_func(x, 0.5, 2), "y_var": logisticvar(x, 0.5, 2, 100)});
    }
    let var_top_bar = []
    for (let i=0; i<50; i++) {
        var_top_bar.push({"Number of Participants": 31, 'x': cd_data[i]['CD-kappa Ratio'], 'y': cd_data[i]['Var Top']/31});
        var_top_bar.push({"Number of Participants": 50, 'x': cd_data[i]['CD-kappa Ratio'], 'y': cd_data[i]['Var Top']/50});
        var_top_bar.push({"Number of Participants": 100, 'x': cd_data[i]['CD-kappa Ratio'], 'y': cd_data[i]['Var Top']/100});
    }

    const logistic_var_figure = Plot.plot({
        color: {
            legend: true,
            domain: ["scaled logistic function", "logistic variance of ȳ|x", "Var of Mean Top"],
            range: ["black", "red", "blue"]
        },
        x: {
            label: 'x',
            line: false,
            domain: [0, 1],
            // ticks: 4,
        },
        y: {
            label: "y",
            labelAnchor: 'center',
            line: false,
            domain: [0, 0.01],
        },
        grid: true,
        width: width,
        marks: [
            Plot.frame(),
            Plot.line(logistic_data, {x: 'x', y: 'y', stroke: 'black', fx: 'Number of Participants'}),
            Plot.line(logistic_data, {x: 'x', y: 'y_var', stroke: 'red', fx: 'Number of Participants'}),
            Plot.dot(var_top_bar, {x:'x', y: 'y', fill: 'blue', fx: 'Number of Participants'}),
        ],
    })
    return logistic_var_figure
}

Figure 2: Variance of y from a logistic link function

Neither of these results are particularly novel but both are important. The beta regression model naturally accounts for this heteroskedasticity, but Figure 2 highlights that the magnitude of heteroskediasticity falls with more participants, by the flatness of the logistic variance of \(\bar{y}_i\). Making this relationship flatter does not mean it is less significant, in fact with more observations it is likely to become more statistically significant. However, the practical significance, the relative size of the correlation between the variance of \(\bar{y}_i\) and \(x_i\) is becoming less so. The OLS estimator of the gradient between \(Var(\bar{y}_i)\) and \(x_i\) is:

\[ \hat{\gamma}_{Var(\bar{y}), x} = \frac{\hat{Cov}(Var(\bar{y}), x)}{\hat{Var}(x)}=\hat{\rho}(Var(\bar{y}), x)\frac{\hat{\sigma}_{Var(\bar{y})}}{\hat{\sigma}_x} \]

Given that the correlation coefficient between \(Var(\bar{y}_i)\) and \(x_i\) is -0.5912, the standard deviation of x is 0.2309 and the standard deviation of \(Var(\bar{y})=\frac{1}{M}Var(y)\) can be written as:

\[ \hat{\sigma}_{Var(\bar{y})}=\frac{1}{M}\hat{\sigma}_{Var(y)}=\frac{{0.0435}}{M} \]

Therefore, using the current data, and assuming the estimates do not change with more data, we can construct a rough estimate of the OLS estimator of the gradient between \(Var(\bar{y}_i)\) and \(x_i\) for the number of participants M:

\[ \hat{\gamma}_{Var(\bar{y}), x} = {-0.5912}\frac{{0.0435}}{{0.2309}}M^{-1} \approx {−}\frac{{0.1115}}{M} \]

The question now is what is the gradient at which heteroskedasticity can be deemed practically insignificant.

The figure below show many graphs using the CD model and CRRA.

Code

Plot.plot({
    x: {
        label: 'CD Ratio (Kappa)',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    y: {
        label: 'Freq. of Top',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    grid: true,
    marks: [
        Plot.dot(cd_data, {x: "CD-kappa Ratio", y: "Mean Top", fill: 'red'}),
        Plot.linearRegressionY(cd_data, {x: "CD-kappa Ratio", y: "Mean Top", stroke: 'blueviolet'})
    ],
    width: width,
})

Figure 3

Code

Plot.plot({
    color: {
        legend: true,
        domain: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    },
    x: {
        label: 'CD Ratio Prediction (Kappa)',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    y: {
        label: 'Freq. of Top',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    grid: true,
    width: width,
    marks: [
        Plot.dot(cd_data, {x: "CD-kappa Ratio Pred", y: "Mean Top", fill: 'Strat Eq Set'}),
        Plot.linearRegressionY(cd_data, {x: "CD-kappa Ratio Pred", y: "Mean Top", stroke: 'blueviolet'}),
        Plot.link([0, 1], {x1: 0, y1: 0, x2: 1, y2: 1}),
    ]
})
// legend names = ['NE=0.67', 'NE=0.5', 'NE=0.8', 'NE=0.6', 'NE=0.7', 'NE=0.9']

Figure 4

Code

Plot.plot({
    color: {
        legend: true,
        domain: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    },
    x: {
        label: 'CRRA Prediction',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    y: {
        label: 'Freq. of Top',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    grid: true,
    width: width,
    marks: [
        Plot.dot(cd_data, {x: "CRRA Pred", y: "Mean Top", fill: 'Strat Eq Set'}),
        Plot.linearRegressionY(cd_data, {x: "CRRA Pred", y: "Mean Top", stroke: 'blueviolet'}),
        Plot.link([0, 1], {x1: 0, y1: 0, x2: 1, y2: 1}),
    ]
})

Figure 5

Code

Plot.plot({
    color: {
        legend: true,
        domain: ['CD Prediction (Kappa)', 'CRRA Prediction'],
        range: ['blue', 'limegreen']
    },
    x: {
        label: 'Prediction',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    y: {
        label: 'Freq. of Top',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    grid: true,
    width: width,
    marks: [
        Plot.dot(cd_data, {x: "CD-kappa Ratio Pred", y: "Mean Top", fill: 'blue'}),
        Plot.linearRegressionY(cd_data, {x: "CD-kappa Ratio Pred", y: "Mean Top", stroke: 'blue'}),
        Plot.dot(cd_data, {x: "CRRA Pred", y: "Mean Top", fill: 'limegreen'}),
        Plot.linearRegressionY(cd_data, {x: "CRRA Pred", y: "Mean Top", stroke: 'limegreen'}),
        Plot.link([0, 1], {x1: 0, y1: 0, x2: 1, y2: 1}),
    ]
})

Figure 6

Code

Plot.plot({
    color: {
        legend: true,
        domain: ['NE', 'CD Ratio Prediction', 'CD Ratio Prediction (Kappa)', 'CRRA', 'Mean of Top'],
        range:['black', 'orange', 'blue', 'limegreen', 'red'],
    },
    x: {
        label: 'Games',
        line: false,
        domain: [1, 5],
        ticks: 4,
    },
    y: {
        label: 'Freq. of Top',
        labelAnchor: 'center',
        line: false,
        domain: [0, 1],
    },
    grid: true,
    width: width,
    fx: {
        label: "Strategically Equivalent Set"
    },
    facet: {data: cd_data, x: 'Strat Eq Set'},
    marks: [
        Plot.frame(),
        Plot.line(cd_data, {x: 'ID in Set', y: 'NE', stroke: 'black'}),
        Plot.line(cd_data, {x: 'ID in Set', y: 'CD Ratio Pred', stroke: 'orange'}),
        Plot.line(cd_data, {x: 'ID in Set', y: 'CD-kappa Ratio Pred', stroke: 'blue'}),
        Plot.line(cd_data, {x: 'ID in Set', y: 'CRRA Pred', stroke: 'limegreen'}),
        Plot.line(cd_data, {x: 'ID in Set', y: 'Mean Top', stroke: 'red'}),
    ]
})

Figure 7

Code

lower_time = []
upper_time = []
{
    for (var i = 0; i < 50; i++) {
        lower_time[i] = cd_data[i]['Mean Time'] - cd_data[i]['Std Time']
        upper_time[i] = cd_data[i]['Mean Time'] + cd_data[i]['Std Time']
    }
    return html``
}

Code

Plot.plot({
    color: {
        legend: true,
        domain: ['Mean Time'],
        range:['red'],
    },
    x: {
        label: 'Games',
        line: false,
        domain: [1, 5],
        ticks: 4,
    },
    y: {
        label: 'Seconds',
        labelAnchor: 'center',
        line: false,
        domain: [0, 60],
    },
    grid: true,
    width: width,
    fx: {
        label: "Strategically Equivalent Set"
    },
    facet: {data: cd_data, x: 'Strat Eq Set'},
    marks: [
        Plot.frame(),
        Plot.dot(cd_data, {x: 'ID in Set', y: 'Mean Time', fill: 'red'}),
        Plot.ruleX(cd_data, {x: 'ID in Set', y1: lower_time, y2: upper_time, strokeWidth: 2})
    ]
})

(a)

(b)

(c)

(d)

Figure 8

Code

Plot.plot({
    color: {
        legend: true,
        domain: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
    },
    x: {
        label: 'Mean Time',
        labelAnchor: 'center',
        line: true,
        domain: [20, 35],
    },
    y: {
        label: 'Freq. of Top',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1],
    },
    grid: true,
    width: width,
    marks: [
        Plot.dot(cd_data, {x: "Mean Time", y: "Mean Top", fill: 'Strat Eq Set'}),
        Plot.linearRegressionY(cd_data, {x: "Mean Time", y: "Mean Top", stroke: 'blueviolet'})
    ]
})

Figure 9

Code

Inputs.table(cd_data)

The figure below shows the decisions of each participants over the course of the experimental session, and also the times it took to make each of the 50 choices. This is only the Anti-Coordination games, and omits the other games. “Cumulative-Mean of Choices” shows how the mean of each participants’ choices changes over time (i.e. the mean of “Top” from period 1 to the Round set by the slider). Dragging the slider to the right will increase the range of rounds over the mean is calculated. “Cumulative-Mean Times” is much the same, but for the times the participants took to make their decisions. Note that the decision page had a 20 second timer restricting participants to spend at least 20 seconds on the page. One can see a general downward trend in times, suggesting that participants do get quicker at making choices as the experiment progresses.

Code

cd_data_in_order_filter = cd_data_in_order.slice(0, Round_Number)
strokes = [
    "#c73c12", "#7f3b91", "#3cb38e", "#3ca6b8", "#164a93", 
    ]
Plot.plot({
    x: {
        label: 'Round',
        domain: [0, 50]
    },
    y: {
        label: 'Freq. of Top',
        domain: [0,1]
    },
    marks: [
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_1_rolling_mean_choice: wie2lsge", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_2_rolling_mean_choice: r3mxhovg", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_3_rolling_mean_choice: 4tu66uu6", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_4_rolling_mean_choice: 0biy1qxx", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_5_rolling_mean_choice: 8lux4hb3", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_6_rolling_mean_choice: nos1ed85", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_7_rolling_mean_choice: xb7bmbep", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_8_rolling_mean_choice: zabwtpfh", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_9_rolling_mean_choice: j02ftbwj", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_10_rolling_mean_choice: vmtlvcw6", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_11_rolling_mean_choice: 0ih5xn19", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_12_rolling_mean_choice: nxcgwlcg", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_13_rolling_mean_choice: jldxqcry", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_14_rolling_mean_choice: np9ne33s", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_15_rolling_mean_choice: bj7kao3e", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_16_rolling_mean_choice: mune4bur", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_17_rolling_mean_choice: cbwvf4zj", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_18_rolling_mean_choice: wn5be09d", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_19_rolling_mean_choice: e1uv9wut", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_20_rolling_mean_choice: hdrcg7z1", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_21_rolling_mean_choice: ffsfod6h", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_22_rolling_mean_choice: d6t61z0s", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_23_rolling_mean_choice: fd9h7qlm", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_24_rolling_mean_choice: uy1t7iem", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_25_rolling_mean_choice: mg7kner7", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_26_rolling_mean_choice: rqu56kec", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_27_rolling_mean_choice: 1gkqfhob", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_28_rolling_mean_choice: 7tco6o97", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_29_rolling_mean_choice: phdlm5a3", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_30_rolling_mean_choice: t4q22to2", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_31_rolling_mean_choice: jayomb7t", stroke: strokes[0]}),
    ],
    grid: true,
    width: width,
    height: 450,

})

Code

Plot.plot({
    x: {
        label: 'Round',
        domain: [0, 50]
    },
    y: {
        label: 'Seconds',
        domain: [0,110]
    },
    marks: [
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_1_rolling_mean_time: wie2lsge", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_2_rolling_mean_time: r3mxhovg", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_3_rolling_mean_time: 4tu66uu6", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_4_rolling_mean_time: 0biy1qxx", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_5_rolling_mean_time: 8lux4hb3", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_6_rolling_mean_time: nos1ed85", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_7_rolling_mean_time: xb7bmbep", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_8_rolling_mean_time: zabwtpfh", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_9_rolling_mean_time: j02ftbwj", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_10_rolling_mean_time: vmtlvcw6", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_11_rolling_mean_time: 0ih5xn19", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_12_rolling_mean_time: nxcgwlcg", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_13_rolling_mean_time: jldxqcry", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_14_rolling_mean_time: np9ne33s", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_15_rolling_mean_time: bj7kao3e", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_16_rolling_mean_time: mune4bur", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_17_rolling_mean_time: cbwvf4zj", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_18_rolling_mean_time: wn5be09d", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_19_rolling_mean_time: e1uv9wut", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_20_rolling_mean_time: hdrcg7z1", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_21_rolling_mean_time: ffsfod6h", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_22_rolling_mean_time: d6t61z0s", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_23_rolling_mean_time: fd9h7qlm", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_24_rolling_mean_time: uy1t7iem", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_25_rolling_mean_time: mg7kner7", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_26_rolling_mean_time: rqu56kec", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_27_rolling_mean_time: 1gkqfhob", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_28_rolling_mean_time: 7tco6o97", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_29_rolling_mean_time: phdlm5a3", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_30_rolling_mean_time: t4q22to2", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_31_rolling_mean_time: jayomb7t", stroke: strokes[0]}),
        Plot.ruleY([20], {stroke: "black"}),
    ],
    grid: true,
    width: width,
    height: 450,

})

Code

Plot.plot({
    x: {
        label: 'Round',
        domain: [0, 50]
    },
    y: {
        label: 'Freq. of Top',
        domain: [0,1]
    },
    marks: [
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_1_choices: wie2lsge", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_2_choices: r3mxhovg", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_3_choices: 4tu66uu6", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_4_choices: 0biy1qxx", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_5_choices: 8lux4hb3", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_6_choices: nos1ed85", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_7_choices: xb7bmbep", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_8_choices: zabwtpfh", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_9_choices: j02ftbwj", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_10_choices: vmtlvcw6", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_11_choices: 0ih5xn19", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_12_choices: nxcgwlcg", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_13_choices: jldxqcry", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_14_choices: np9ne33s", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_15_choices: bj7kao3e", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_16_choices: mune4bur", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_17_choices: cbwvf4zj", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_18_choices: wn5be09d", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_19_choices: e1uv9wut", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_20_choices: hdrcg7z1", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_21_choices: ffsfod6h", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_22_choices: d6t61z0s", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_23_choices: fd9h7qlm", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_24_choices: uy1t7iem", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_25_choices: mg7kner7", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_26_choices: rqu56kec", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_27_choices: 1gkqfhob", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_28_choices: 7tco6o97", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_29_choices: phdlm5a3", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_30_choices: t4q22to2", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "participant_31_choices: jayomb7t", stroke: strokes[0]}),
    ],
    grid: true,
    width: width,
    height: 450,

})

Code

Plot.plot({
    x: {
        label: 'Round',
        domain: [0, 50]
    },
    y: {
        label: 'Seconds',
        domain: [0,200]
    },
    marks: [
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "wie2lsge", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "r3mxhovg", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "4tu66uu6", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "0biy1qxx", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "8lux4hb3", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "nos1ed85", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "xb7bmbep", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "zabwtpfh", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "j02ftbwj", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "vmtlvcw6", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "0ih5xn19", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "nxcgwlcg", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "jldxqcry", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "np9ne33s", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "bj7kao3e", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "mune4bur", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "cbwvf4zj", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "wn5be09d", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "e1uv9wut", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "hdrcg7z1", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "ffsfod6h", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "d6t61z0s", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "fd9h7qlm", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "uy1t7iem", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "mg7kner7", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "rqu56kec", stroke: strokes[0]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "1gkqfhob", stroke: strokes[1]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "7tco6o97", stroke: strokes[2]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "phdlm5a3", stroke: strokes[3]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "t4q22to2", stroke: strokes[4]}),
        Plot.line(cd_data_in_order_filter, {x: "Round", y: "jayomb7t", stroke: strokes[0]}),
        Plot.ruleY([20], {stroke: "black"}),
    ],
    grid: true,
    width: width,
    height: 450,

})

Code

viewof Round_Number = Inputs.range([1, 50], {
    label: "Round",
    value: 1,
    step: 1,
    })

Figure 10: Time series of data

2 Qualitative Analysis

Using NVivo, one can analyse the descriptions the participants have provided at the end of the experiment. Using their responses one can create codes for different decision making processes and their intentions. The most frequent codes are shown in the figure below.

Code

Plot.plot({
    x: {
        label: "Code",
        labelAnchor: 'center',
        line: true,
    },
    y: {
        label: 'Frequency',
        labelAnchor: 'center',
        line: true,
        domain: [0, 1]
    },
    grid: true,
    width: width,
    marks: [
        Plot.barY(code_list, {x: "Name", y: "Frequency", tip: true, sort: {x: "y", reverse: true}}),
    ],
    color: {scheme: "purples"}
})

Figure 11: Frequency of Codes Amongst Participants’ Decision Descriptions

These codes are not mutually exclusive, but merely indicate the number of participants who have expressed this code in their description. For example, any participant said that they thought about their opponent’s actions have been coded as ‘Strategic Decision Making’, based on this being evidence of at least Level 1 rationality. Descriptions of each code can be found in the appendix. The most frequent code, 87.1% of participants, is ‘Self-Regarding’: Indicating an attempt to increase their own payoff. Presumably, all participants were self-regarding, but not all have expressed this in their answer. It should also be noted that ‘Self-Regarding’ is not exclusively so, as 35.5% of participants stated an intention to be ‘Other-Regarding’, in that they cared about their opponent’s payoff also. There were also sizeable frequencies of ‘Risk-Aversion’, 29%, and ‘Max-Min’ (Minimum payoff maximisation, or avoidance of small payoffs), 22.6%.

These percentages only represent the frequency these codes exist but it is important to dissect how these codes overlap. This can be represented in the diagram below.

Code

viewof abs_rel_cellplt = Inputs.radio(["Absolute", "Relative"], {label: "View", value: "Absolute"})
{
    if (abs_rel_cellplt == "Relative"){
        var fill = "freq";
        var title = d => `|${d.Code} ∩ ${d.Vs}| / |${d.Code}| = ${d3.format(".3")(d.freq)}`
    }else if(abs_rel_cellplt == "Absolute"){
        var fill = "value";
        var title = d => `|${d.Code} ∩ ${d.Vs}| = ${d.value}`
    }
const cellplt = Plot.plot({
    marks: [
        Plot.cell(code_matrix, {
            x: "Vs", 
            y: "Code",
            fill: fill,
            tip: true,
            title: title,
            }),
        Plot.text(code_matrix, colorContrast({
            x: "Vs", 
            y: "Code",
            fill: fill,
            text: fill,
            pointerEvents: "none",
            })),
    ],
    x: { 
        tickRotate: 45,
        label: "Code",
    },
    y: {
        label: "Code",
    },
    color: { scheme: "purples"},
    grid: true,
    width: 800,
    height: 516,
    marginLeft: 125,
    marginRight: 55,
    marginBottom: 125,
})

const celltotalplt = Plot.plot({
    marks: [
         Plot.cell(code_totals, {
            x: "Code",
            y: "Vs",
            fill: "total",
            tip: true,
            title: d => `${d.Vs}: ${d.total}`,
        }),
        Plot.text(code_totals, colorContrast({
            x: "Code", 
            y: "Vs", 
            fill: "total",
            text: "total",
            pointerEvents: "none",
            })),
    ],
    x: {
        label: "",
    },
    y: {
        axis: "right",
        label: "",
    },
    color: { scheme: "greens"},
    grid: true,
    width: 62,
    height: 516,
    marginLeft: -68,
    marginRight: 68,
    marginBottom: 125,
})
return html`<div class="container" style="display: flex;"><div>${cellplt}</div><div">${celltotalplt}</div></div>`
}

Figure 12: Matrix of Codes Amongst Participants’ Decision Descriptions

The above figure should be read horizontally. It shoulds the overlap between different codes. The right-hand side shows the total number of occurances for each code, and the numbers in each cell along each row show how often each corresponding code occurs alongside that row’s code (either in absolute terms, or relative to the code’s set size).

Below shows an UpSet plot to show in great detail the intersections of all of the codes.

Code

{
    let Codes = ["Change", "Inequality Aversion", "Instinctive", "Joint Max", "Max Expected Payoff", "Max-Min", "Other-Regarding", "Risk Aversion", "Risk Positive", "Self-Regarding", "Strategic Decision Making"]
    let evenCodes = ["Change", "Instinctive", "Max Expected Payoff", "Other-Regarding", "Risk Positive",  "Strategic Decision Making"]
    let fullWidth = 820
    let lowerBarChartWidth = 254
    let lowerSetChartHeight = 328//288

    const backgroundStripeChart = Plot.cell(
    code_upset_data_cross_prod.filter((d) => evenCodes.indexOf(d["Code"]) > -1),
    {
        x: d => d.id,
        y: d => d["Code"],
        fill: '#efefef',
        stroke: '#efefef'
    }
    )

    const backgroundDotChart = Plot.dot(
    code_upset_data_cross_prod,
    {
        x: d => d.id,
        y: d => d["Code"],
        fill: '#dedede',
        r: 3
    }
    )

    const dotChart = Plot.dot(
    code_upset_data_per_cat,
    {
        x: d => d.id,
        y: d => d["Code"],
        fill: "#3f007d",
        r: 3
    }
    )

    const lineChart = Plot.line(
    code_upset_data_per_cat,
    {
        x: d => d.id,
        y: d => d["Code"],
        z: d => d.id,
        strokeWidth: 2,
        stroke: "#3f007d"
    } 
    )

    const upperUpsetBarPlot = Plot.plot({
        marks: [
            Plot.ruleY([0], {fill: "black"}),
            Plot.barY(code_upset_data, {x: "id", y: "total", tip: true, title: d => `${d.Codes}: ${d.total}`, sort: {x: "y", reverse: true}, fill: "#3f007d"}),
            Plot.text(code_upset_data, {x: "id", y: "total", text: d => d.total, textAnchor: 'end', dy: -7}),
        ],
        x: {
            axis: null,
            padding: 0.2,
            round: false
        },
        y: {
            tickPadding: 2,
            line: true
        },
        width: fullWidth - lowerBarChartWidth,
        marginLeft: 20,
        style: {
            background: '#ffffff00'
        }
    })

    const lowerUpsetBarChart = Plot.plot({
        marks: [
            Plot.barX(code_totals, {y: "Vs", x: "total", fill: "#3f007d"}),
            Plot.text(code_totals, {y: "Vs", x: "total", text: d => d.total, textAnchor: 'end', dx: -4}),
        ],
        x: {
            axis: 'top',
            line: true,
            reverse: true
        },
        y: {
            axis: 'right',
            tickSize: 0,
            tickPadding: 7,
            insetTop: 2,
            round: false,
            padding: 0.2,
            align: 0,
            label: "",
        },
        width: lowerBarChartWidth,
        height: lowerSetChartHeight + 22,
        marginRight: 122,
        style: {
            background: '#ffffff00'
        }  
    })

    const lowerUpsetChart = Plot.plot({
        marks: [
            backgroundStripeChart,
            backgroundDotChart,
            dotChart,
            lineChart
        ],
        x: {
            axis: null,
            padding: 0,
            round: false,
            insetRight: 1,
            insetLeft: 1
        },
        y: {
            axis: null,
            tickSize: 0
        },
        width: fullWidth - lowerBarChartWidth - 20,
        height: lowerSetChartHeight,
        marginTop: 28,
        style: {
            background: '#ffffff00'
        }
    })




return html`
<div style="overflow-x: auto">
  <div style="display: flex; flex-direction: column; align-items: flex-end; width: ${fullWidth}px">
    <div>
      ${[upperUpsetBarPlot]}
    </div>
    <div style="display: flex; position: relative; top: -50px; align-items: flex-start">
      ${[lowerUpsetBarChart, lowerUpsetChart]}
    </div>
  </div>
</div
`
}

Figure 13: Code UpSet Plot (Lex et al. (2014))

The UpSet plot allows us to interpret the intersection of our 9 codes where a Venn diagram would not. The most common intersections unsurprisingly occur between the most populated groups, and tend to be intersections of only 2 or 3 codes. The most common intersection is that between the 2 largest sets: “Self-Regarding” and “Strategic Decision Making”, followed by the intersection of “Other-Regarding”, “Self-Regarding” and “Strategic Decision Making”, and “Strategic Decision Making” on its own. Due to currently limited data, many intersections only happen once or twice.

References

Breusch, T. S., and A. R. Pagan. 1979. “A Simple Test for Heteroscedasticity and Random Coefficient Variation.” Econometrica 47 (5): 1287–94. http://www.jstor.org/stable/1911963.

Cribari-Neto, Francisco, and Achim Zeileis. 2010. “Beta Regression in r.” Journal of Statistical Software 34 (2): 1–24. https://doi.org/10.18637/jss.v034.i02.

Koenker, Roger. 1981. “A Note on Studentizing a Test for Heteroscedasticity.” Journal of Econometrics 17 (1): 107–12. https://doi.org/https://doi.org/10.1016/0304-4076(81)90062-2.

Lex, Alexander, Nils Gehlenborg, Hendrik Strobelt, Romain Vuillemot, and Hanspeter Pfister. 2014. “UpSet: Visualization of Intersecting Sets.” IEEE Transactions on Visualization and Computer Graphics 20 (12): 1983–92. https://doi.org/10.1109/TVCG.2014.2346248.

McCullagh, P., and J. A. Nelder. 1989. Generalized Linear Models. London: Chapman & Hall / CRC.

Seabold, Skipper, and Josef Perktold. 2010. “Statsmodels: Econometric and Statistical Modeling with Python.” In 9th Python in Science Conference.

Footnotes

The data for 50 and 100 participants does not exist so is just the existing data for 31 participants scaled down as if it were the same data, i.e. divided by 50 and 100 respectively instead of by 31.↩︎