Linear Regression Example

You can perform (linear) regression analysis using Igor's curve fitting operations.. StatsLinearRegression provides more statistical information/tests for single and multiple regressions.

1. Simple linear regression.

Start by creating a wave with a known slope and additive Gaussian noise.

Make/O/N=100 data1=x+gnoise(4)

The simple linear regression analysis is obtained by:

StatsLinearRegression /T=1/Q  data1

The results appear in the Linear Regression table (shown transposed):

N100
a-0.912472
b1.01654
xBar49.5
yBar49.4064
sumx283325
sumy287529
sumxy84703.4
Syx3.81255
F5923.74
Fc3.93811
r20.983726
Sb0.0132077
t
tc1.98447
L10.990332
L21.04275

When a slope is not specified the null hypothesis is that the slope is zero and the t-statistic is not relevant. The regression results above clearly reject this hypothesis with regression line given by:

y=-0.912472+1.01654*x.

To plot the regression and the data curves execute the commands:

Duplicate/O data1,regression1
regression1=-0.912472+1.01654*x
Display/K=1 data1,regression1
ModifyGraph lsize(regression1)=2,rgb(regression1)=(0,0,0)

Picture0

Here the red trace corresponds to our input data and the black trace is the regression line.

You can create confidence interval waves and prediction interval waves using

StatsLinearRegression /T=1/Q/BCIW/BPIW  data1

You can now add the two bands to the graph:

AppendToGraph data1_CH,data1_CL,data1_PH,data1_PL
ModifyGraph rgb(data1_CH)=(0,65535,0),rgb(data1_CL)=(0,65535,0)
ModifyGraph rgb(data1_PH)=(0,0,65535),rgb(data1_PL)=(0,0,65535)

Picture0

Here the green traces correspond to the confidence interval while the blue traces correspond to the prediction interval (default single prediction).

2. Zero slope hypothesis

To see the results of the zero-slope hypothesis test on different data set we generate data that have zero slope and random noise:

Make/O/N=100 data2=10+gnoise(4)

Picture0

A graph pf data2 is shown above. To run the test execute the command:

StatsLinearRegression /T=1/Q  data2

N100
a9.44344
b0.00449769
xBar49.5
yBar9.66608
sumx283325
sumy21505.25
sumxy374.77
Syx3.91695
F0.109865
Fc3.93811
r20.00111981
Sb0.0135694
t
tc1.98447
L1-0.0224303
L20.0314257

As expected, F<Fc and the hypothesis of zero slope must be accepted.

3. Specific slope hypothesis

Testing the hypothesis that the slope b=beta0. Using the wave data1 which was generated with a unit slope we have:

StatsLinearRegression /T=1/Q/B=1.0  data1
N100
a-0.912472
b1.01654
xBar49.5
yBar49.4064
sumx283325
sumy287529
sumxy84703.4
Syx3.81255
F5923.74
Fc3.93811
r20.983726
Sb0.0132077
t1.25247
tc1.98447
L10.990332
L21.04275

It should be obvious from the definitions of L1 and L2 that any value of beta0 outside the range [L1,L2] will be rejected.

4. Linear regression for more than one wave.

Make/O/N=100 data3=4+x+gnoise(4)
Make/O/N=100 data4=5+x+gnoise(5)

You can run the linear regression test on multiple samples using the command:

StatsLinearRegression /T=1/Q data1,data3,data4

The results are displayed in the Linear Regression table and in the Linear Regression MC table.

Linear Regression table:

datadata3data4
N100100100
a-0.9124724.107094.37794
b1.016540.9932320.993211
xBar49.549.549.5
yBar49.406453.27253.5419
sumx2833258332583325
sumy28752983725.85298.4
sumxy84703.4827682759.3
Syx3.812553.943685.62508
F5923.745285.342597.77
Fc3.93813.93813.93811
r20.9837260.9817960.963647
Sb0.01320770.0136620.0194868
t
tc1.984471.984471.98447
L10.9903320.966120.95454
L21.042751.020341.03188

Linear Regression MC table:

Ac249975
Bc250224
Cc256552
SSp6049.51
SSc6079.72
SSt7150.35
DFp294
DFc296
DFt298
Slopes F0.734116
Slopes Fc3.02647
CoincidentalRegression F13.375
CoincidentalRegression Fc2.40235
Elevations F26.0627
Elevations Fc3.02626

In this case the slopes' test results in F<Fc so the hypothesis of equal slopes is accepted while elevation's test results F>Fc implies that the equal elevations hypothesis is rejected as is the possibility of coincidental regression.

You can also test waves of unequal lengths as in the following example:

Make/O/N=150 data5=x+gnoise(4)
Make/O/N=100 data6=x+gnoise(5)
StatsLinearRegression /T=1/Q data1,data5,data6

Ac447888
Bc449403
Cc457496
SSp6549.12
SSc6571.72
SSt6614.3
DFp344
DFc346
DFt348
Slopes F0.593541
Slopes Fc3.02197
CoincidentalRegression F0.855888
CoincidentalRegression Fc2.3979
Elevations F1.12087
Elevations Fc3.02182

In this case both the slopes and the elevations are determined to be the same. The single test for coincidental regression would also indicate that the three waves have coincident regressions.

5. Dunnett's multi-comparison test for elevations.

You can perform Dunnett's multi-comparison test on the elevations of the several waves. In this example we define the first input wave as the control sample (/DET=0):

StatsLinearRegression /T=1/Q/DET=0 data1,data3,data4,data5,data6

The operation computes the linear regression and the general multi-comparison as described above. In addition it displays Dunnett's MC Elevations table for tests of each input wave against the control wave:

PairSEqqpConclusion
1_vs_00.6419556.021762.165390
2_vs_00.6419556.442082.165390
3_vs_00.6154240.1270392.165391
4_vs_00.6419551.230372.165391

The table indicates that the equal elevation hypothesis is rejected for the pairs data3-data1 and data4-data1. The hypothesis is accepted for the other combinations.

6. The Tukey test for multiple regressions

The test depends on the F tests for the slopes. If the slopes are determined to be the same then the Tukey test compares elevations. Otherwise, the Tukey test performs a multi-comparison of slopes.

StatsLinearRegression /T=1/Q/TUK data1,data3,data4,data5,data6

Since the slopes were determined to be the same, the Tukey test was performed on the elevations and the results are displayed in the Tukey MC Elevations table:

PairSEqqcConclusion
4_vs_00.453931.74003.870731
4_vs_0.453936.776043.870730
4_vs_20.453937.370473.870730
4_vs_30.435171.635363.870731
3_vs_00.435170.179663.870731
3_vs_0.435178.70353.870730
3_vs_20.435179.323563.870730
2_vs_00.453939.110483.870730
2_vs_0.453930.5944293.870731
1_vs_00.453938.516053.870730

As expected input waves 1 and 2 do not match the elevations of the other waves.

7. Regression analysis with multiple Y values for each X value

When the data has multiple Y values for each X value you need to store the input as a 2D wave there each column represents one set of Y values. In this example the wave dataMYV consists of 30 rows and 6 columns so there are at most 6 Y values for each X. There are 6 NaNs representing missing values or padding used to format the input for the operation in case the number of Y-values is not the same for all x values. The X-values are implied by the wave scaling of the rows with start=1 and delta=3.

StatsLinearRegression /T=1/Q/MYVW={*,dataMYV}

The results are displayed in the "Linear Regression" table (shown transposed):

N
174
a-0.564705
b2.01551
xBar44.2586
yBar88.639
sumx2117111
sumy2485408
sumxy236039
amongGroupsSS477052
amongGroupsDF29
withinGroupsSS8355.47
withinGroupsDF144
devLinearitySS1312.81
devLinearityDF28
F0.808045
Fc1.55463
F28463.47
F2c3.89609
r20.980082
Syx7.4974

The F value is smaller than the critical Fc which implies that the regression of the data is indeed linear. The value of F2 is much greater than the critical value F2c which implies that the hypothesis of slope b=0 has to be rejected.