A Matlab tutorial for bootstrapping with an introduction to multiple comparisons correction in EEG
Literate script to demonstrate how the bootstrap works and how we can use it for multiple comparison correction in EEG. Some of the code relies on
LIMO EEG. A large part was inspired by
G. Rousselet tutorial on bootstrap (in R). The code below (and data) can be found
here.
The bootstrap is a well-established early computer-age inferential method (
Efron, 1979;
Efron & Hastie, 2016;
Efron & Tibshirani, 1994). The bootstrap is based on the idea that using only the data at hand can sometimes give better results than making unwarranted assumptions about the populations we’re trying to estimate. The core mechanism of the bootstrap is sampling with replacement from the data, which is a form of data-driven simulation.
The bootstrap idea
The idea is to make inferences using the data at hand only, avoiding making assumptions about the underlying distribution, observations are coming from by sampling with replacement the data.
clear variables
N = 60; obs = randn(N,1);
avg_obs = mean(obs);
The theory tells us that, for normally distributed data, the mean follows a Student t-distribution with N-1 degrees of freedom. A confidence interval (CI) for the mean is an interval that covers the population mean with a rate of error alpha, here 5% for a 95%CI (
Morey et al., 2016)
alpha_value = 5/100;
Tcritic = tinv(alpha_value/2,N-1);
se = std(obs)/sqrt(N);
CI = [avg_obs-abs(Tcritic)*se avg_obs+abs(Tcritic)*se];
Let's illustrate what it means
figure('units','normalized','outerposition',[0 0 1 1])
subplot(1,2,1); scatter([0.5:1/(N-1):1.5],obs,30,'filled');
box on; grid on; axis tight; xlabel('observations'); ylabel('observed values')
hold on; rectangle('Position',[0.75 CI(1) 0.5 diff(CI)],'Curvature',[0.4 0.4],'LineWidth',2,...
'FaceColor',[0.35 0.35 0.35],'EdgeColor',[0.35 0.35 0.35]);
plot([0.75:0.1:1.25],repmat(avg_obs,1,6),'k','LineWidth',2);
title(sprintf('Observed data and 95%% CI of the mean [%g %g]',CI(1),CI(2)))
subplot(1,2,2);
sample = [-5:.01:5]+avg_obs;
theoretical_distribution = tpdf(sample,N-1);
plot(sample,theoretical_distribution,'LineWidth',3)
box on; grid on; axis tight; xlabel('mean values'); ylabel('relative frequency')
title(sprintf('Student t(%g,%g) distribution',avg_obs,N-1))
hold on; [~,position] = min(abs(theoretical_distribution-alpha_value));
plot([-5:.01:sample(position)],repmat(theoretical_distribution(position),1,...
length([-5:.01:sample(position)])),'r','LineWidth',2);
plot([5:-.01:abs(sample(position))],repmat(theoretical_distribution(position),1,...
length([5:-.01:abs(sample(position))])),'r','LineWidth',2);
plot(repmat(sample(position),1,6),[0:.01:alpha_value],'r','LineWidth',2);
plot(repmat(abs(sample(position)),1,6),[0:.01:alpha_value],'r','LineWidth',2);
text(-4.3,0.06,['T critic = ' num2str(Tcritic)]);
text(2.2,0.06,['T critic = ' num2str(abs(Tcritic))]);
We can compute the same thing without assuming a Student t distribution and simply computing the distribution of sample means, as if we were collecting data from many experiments - ie by bootstrapping.
Nboot = 1000;
resamples = randi(N,N,Nboot);
avg_boot = sort(mean(obs(resamples)));
CI_boot = [avg_boot(alpha_value/2*Nboot) avg_boot(Nboot-alpha_value/2*Nboot)];
figure('units','normalized','outerposition',[0 0 1 1])
subplot(1,2,1); plot(sample+mean(avg_boot),theoretical_distribution,'LineWidth',3)
box on; grid on; axis tight; xlabel('mean values'); ylabel('relative frequency')
hold on; plot(repmat(sample(position)+mean(avg_boot),1,6),[0:.01:alpha_value],'r','LineWidth',2);
plot(repmat(-1*(sample(position))+mean(avg_boot),1,6),[0:.01:alpha_value],'r','LineWidth',2);
title(sprintf('Student t(%g,%g) distribution',avg_obs,N-1))
subplot(1,2,2); h = histogram(avg_boot, 100);
title('density of mean values'); xlabel('mean values'); ylabel('frequency')
hold on; bar(CI(1),5,h.BinWidth,'r');
bar(CI(2),5,h.BinWidth,'r');
axis tight; grid on; box on
Let's have a quick look at different distributions - and since this is that simple, let's get both the mean and the median confidence intervals
figure('units','normalized','outerposition',[0 0 1 1]);
resample = randi(100,100,1000);
for d=1:3
if d == 1
data = sort(randn(100,1))+80;
elseif d == 2
No = normrnd((200+randn(1)*50),25*abs(randn(1)),[100,1]);
Ex = exprnd(75,[100,1]) + (150+randn(1)*50);
data = (No+Ex)/2;
else
unif = sort(rand(200,1)).*150;
data = unif(end-100:end);
data = data.*100/max(data);
end
boot = sort(mean(data(resample)));
boot2 = sort(median(data(resample)));
subplot(1,3,d); histogram(data,50); [~,~,tCI]=ttest(data);
title(sprintf('mean %g, 95%% CI [%g %g] \n boot CI [%g %g] \n median %g boot CI [%g %g]',...
mean(data), tCI(1), tCI(2), boot(25), boot(975), median(data), boot2(25), boot2(975)))
end
Bootstrap does not bring robustness
In statistics, the term
robust means resistant to outliers. The mean is the least robust location estimator with a breakdown point of 0 while the median is the most robust with a breakdown point of 50% (i.e. doesn't change up to 50% of outliers).
A=[1:9 50]; Nboot = 1000;
index = 1;
while index<Nboot
tmp = randi(10,10,1);
if length(unique(tmp))>4
bootsamples(:,index) = tmp;
index = index+1;
end
end
means = sort(mean(A(bootsamples)));
medians = sort(median(A(bootsamples)));
low = floor(alpha_value*Nboot); high = Nboot-low;
figure('units','normalized','outerposition',[0 0 1 1])
subplot(1,2,1); histogram(means, 100); axis tight; grid on; box on
title(['Boostrapped mean values ' num2str(mean(A)) ' 95% CI ...
[' num2str(means(low)) ' ' num2str(means(high)) ']'])
subplot(1,2,2); histogram(medians, 100); axis tight; grid on; box on
title(['Boostrapped median values ' num2str(median(A)) ' 95% CI ...
[' num2str(medians(low)) ' ' num2str(medians(high)) ']'])
Bootstrap is not assumption free
When we sample with replacement, we sample from the observed data only
A = [1 1 1 2 2 2 3 3 3 6 9 12 15 15 15 16 16 16 17 17 17];
resamples = randi(length(A),length(A),Nboot);
We could, however, imagine that missing values exist, simply not observed in the current sample! Using such priors is called Bayesian bootstrap
Rubin (1981)
n = length(A);
Bayes_resamples = NaN(size(resamples));
for boot=1:Nboot
theta = exprnd(1,[n,1]);
weigths = theta ./ repmat(sum(theta),n,1);
Bayes_resamples(:,boot) = (datasample(A,n,'Replace',true,'Weights',weigths));
end
[~,~,tCI] = ttest(A);
tmp = sort(mean(A(resamples)));
bootCI = [tmp(low) tmp(high)];
tmp = sort(mean(A(Bayes_resamples)));
ci = 1:low;
ciWidth = tmp(ci+high) - tmp(ci);
[~,index] = find(ciWidth == min(ciWidth));
HDI = [tmp(index) tmp(index+high)];
figure('units','normalized','outerposition',[0 0 1 1])
subplot(1,3,1); histogram(A);
title(['observed data: 95% CI =[' num2str(tCI(1)) ' ' num2str(tCI(2)) ']'])
subplot(1,3,2); all = A(resamples); histogram(all(:));
title(['bootstrapped data: 95% CI =[' num2str(bootCI(1)) ' ' num2str(bootCI(2)) ']'])
subplot(1,3,3); all = A(Bayes_resamples); histogram(all(:));
title(['Bayes bootstrapped data: 95% HDI =[' num2str(HDI(1)) ' ' num2str(HDI(2)) ']'])
Application to ERP
electrode = 60;
limo_add_plots('cond1-2_single_subjects_Mean',1,electrode);
title('condition 1 - all subjects')
limo_add_plots('cond1-2_single_subjects_Mean',2,electrode);
title('condition 2 - all subjects')
% set the proper time in msec
load('cond1-2_single_subjects_Mean'); % loading data from LIMO central tendency outputs
timevect = Data.limo.data.start:(1000/Data.limo.data.sampling_rate):Data.limo.data.end;
[est,HDI] = limo_central_estimator(squeeze(Data.data(electrode,:,1,:)),'Mean',95/100);
[est2,HDI2] = limo_central_estimator(squeeze(Data.data(electrode,:,2,:)),'Mean',95/100);
figure('units','normalized','outerposition',[0 0 1 1]); subplot(2,1,1)
plot(timevect,est); hold on; plot(timevect,est2); grid on
fillhandle = patch([timevect fliplr(timevect)], [HDI(1,:),fliplr(HDI(2,:))], [0 0 1]);
set(fillhandle,'EdgeColor',[0 0 1],'FaceAlpha',0.2,'EdgeAlpha',0.8);
fillhandle = patch([timevect fliplr(timevect)], [HDI2(1,:),fliplr(HDI2(2,:))], [1 0 0]);
set(fillhandle,'EdgeColor',[1 0 0],'FaceAlpha',0.2,'EdgeAlpha',0.8);
title('Means and 95% HDI')
[est,HDI] = limo_central_estimator(squeeze(Data.data(electrode,:,1,:)),'Trimmed mean',95/100);
[est2,HDI2] = limo_central_estimator(squeeze(Data.data(electrode,:,2,:)),'Trimmed mean',95/100);
subplot(2,1,2); plot(timevect,est); hold on; plot(timevect,est2); grid on
fillhandle = patch([timevect fliplr(timevect)], [HDI(1,:),fliplr(HDI(2,:))], [0 0 1]);
set(fillhandle,'EdgeColor',[0 0 1],'FaceAlpha',0.2,'EdgeAlpha',0.8);
fillhandle = patch([timevect fliplr(timevect)], [HDI2(1,:),fliplr(HDI2(2,:))], [1 0 0]);
set(fillhandle,'EdgeColor',[1 0 0],'FaceAlpha',0.2,'EdgeAlpha',0.8);
title('20% Trimmed Means and 95% HDI')
The beauty of using HDI is that 1 - this is the confidence interval of the mean (not the alpha prob. to include the population mean interval) and 2 - since this is Bayesian, you can accept the null (rather than only fail to reject).
Estimating H0 using bootstrap
The same way as we can compute the distribution of mean values, we can do it for the null - for instance, a one-sample t-test testing if the trimmed mean(data)=0
data = squeeze(Data.data(:,:,1,:));
[t,tmdata,trimci,se,p,tcrit,df]=limo_trimci(data, 20/100, 5/100, 0);
null = data - repmat(tmdata,[1 1 size(data,3)]);
figure('units','normalized','outerposition',[0 0 1 1]);
subplot(1,2,1); plot(timevect,squeeze(data(electrode,:,:)));
grid on; hold on; plot(timevect,squeeze(tmdata(electrode,:)),'k','LineWidth',2);
title('observed data and trimmed mean');
subplot(1,2,2); plot(timevect,squeeze(null(electrode,:,:)));
grid on; hold on; [~,tmnull]=limo_trimci(null, 20/100, 5/100, 0);
plot(timevect,squeeze(tmnull(electrode,:)),'k','LineWidth',2);
title('null data and trimmed mean');

TM_null = NaN(Nboot,length(timevect));
T_null = NaN(Nboot,length(timevect));
boot_table = limo_create_boot_table(data,Nboot);
parfor b=1:Nboot
[tnull,tmnull] = limo_trimci(null(:,:,boot_table{electrode}(:,b)), 20/100, 5/100, 0);
TM_null(b,:) = tmnull(electrode,:);
T_null(b,:) = tnull(electrode,:);
end
pboot = mean(squeeze(tmdata(electrode,:,:))>TM_null);
pboot = min([pboot' 1-pboot'],[],2)';
ptboot = mean(squeeze(t(electrode,:,:))>T_null);
ptboot = min([ptboot' 1-ptboot'],[],2)';
figure('units','normalized','outerposition',[0 0 1 1]);
plot(timevect,squeeze(tmdata(electrode,:))); hold on; grid on
fillhandle = patch([timevect fliplr(timevect)], ...
[squeeze(trimci(electrode,:,1)),fliplr(squeeze(trimci(electrode,:,2)))], [0 0 1]);
set(fillhandle,'EdgeColor',[0 0 1],'FaceAlpha',0.2,'EdgeAlpha',0.8);
plot(timevect,(squeeze(p(electrode,:))<0.05)-11,'rx'); hold on
plot(timevect,(pboot<0.05)-10.8,'go');
plot(timevect,(ptboot<0.05)-10.6,'k+');
axis([timevect(1) timevect(end) -10 6])
title('Trimmed mean with sig values based on t distrib (red) or percentile boot (green) or percentile t-boot (black)')

Multiple comparisons correction
The issue we are facing now is that we have done 501 t-tests so we can expect 25 false positives, we just don't know which ones
h = ttest(randn(18,length(timevect)));
fprintf('found %g false positives out of %g expected \n',sum(h),5/100*length(timevect))
found 20 false positives out of 25.05 expected
In EEG we talk about family-wise error (FWE) correction because tests are correlated with each other. The FWER = 1 - (1 - alpha)^n, so for 501 tests we reach 100%. Since the FWER is the probability to make at least one error, let's compute the distribution of maximum t value -- if the biggest effect is controlled then smaller effects are controlled too.
for simulation = 1:100
h(simulation,:) = ttest(randn(18,length(timevect)));
end
E = mean(h);
figure('units','normalized','outerposition',[0 0 1 1]);
subplot(1,2,1); plot(E.*100,'LineWidth',3); grid on; box;
title(['average type 1 per variable: ' num2str(mean(E))])
subplot(1,2,2); plot(cumsum(E).*100,'LineWidth',3); grid on; box;
for n=1:length(timevect); FWER(n) = 1 - (1 - 0.05)^n; end
hold on; plot(FWER.*100,'LineWidth',3); axis([0 100 1 110])
title('FWER cumulative vs predicted');
let's see what happens to our ERP using maximum statistics
Tmax_null = sort(max(T_null,[],2));
high = round((1-alpha_value).*Nboot);
threshold = Tmax_null(high);
figure; histogram(Tmax_null); hold on; grid on
title(['Distribution of maxima 95th percentile = ' num2str(threshold)])

figure('units','normalized','outerposition',[0 0 1 1]);
plot(timevect,squeeze(tmdata(electrode,:))); hold on; grid on
fillhandle = patch([timevect fliplr(timevect)], ...
[squeeze(trimci(electrode,:,1)),fliplr(squeeze(trimci(electrode,:,2)))], [0 0 1]);
set(fillhandle,'EdgeColor',[0 0 1],'FaceAlpha',0.2,'EdgeAlpha',0.8);
plot(timevect,(ptboot<0.05)-10.8,'rx');
plot(timevect,(squeeze(t(electrode,:))>threshold)-10.6,'k+');
axis([timevect(1) timevect(end) -10 6])
title('Trimmed mean with sig values based on percentile t-boot (red) with MCC (black)')

So far, we have considered each time point as independent and even didn't consider space (over all electrodes) - Dedicated methods have been developped like cluster-mass and tfce do do just that, see
Pernet et al 2015 for an overview of these, more powerful, approaches. In short, instead of looking at the maximum of all data point, electrodes and time points are clustered and we look at the maximum of clusters under null.