PCA : Plot of the Scores and the loadings
Nasser
Is it possible to get such a plot directly with the
PCA
operation ?:http://www.youtube.com/watch?v=26YhtSJi1qc&feature=related#t=2m56s
Because a tried the PCA experiment from the tutorial, and I got a different kind of plot.
Thank you in advance.
P.S. : I'm a beginner in multivariate analysis.
The PCA operation performs the analysis; it does not produce any graphs. The PCA Demo experiment takes you through the steps of creating data from a known number of components, mixing it with noise and then performing the PCA in an attempt to recover the original principal components. After you obtain the principal components, you can compute the "loading" as a projection on the selected basis. In this context, a projection is a dot product of any input vector with any principal axis. Projections are NOT computed in the demo and are NOT part of the PCA operation.
If you plan to use PCA I recommend the first few chapters in the Malinowski book (see reference in the help). It is a great book which explains in detail all steps in the analysis. It might also be helpful (though perhaps not necessary) to have working knowledge of vector spaces and linear algebra (especially SVD).
A.G.
WaveMetrics, Inc.
October 21, 2011 at 09:55 am - Permalink
Thank you for the reply. I will have a look on the theory, but actually I'd like to view trends as soon as possible. And I don't think that it would be a good idea to develop a procedure that calculates projections of scores and loadings, if I don't know anything about the theory.
I thought that PCA was the "easiest" way to view correlations betweem many variables. Am I right ?
Thank you
October 25, 2011 at 02:21 am - Permalink
I agree. For the same reason I would not expect one to understand the built-in operation or its results.
I don't think that PCA should be used as a tool to observe correlations between many variables. I'd recommend that you start with Scatter Plot Matrix representation. See the Scatter Plot Matrix Demo experiment for more information.
A.G.
WaveMetrics, Inc.
October 25, 2011 at 09:56 am - Permalink
October 26, 2011 at 01:47 am - Permalink
In order for me to finally understand the analysis clearly I started with simple data with only 2 or 3 variables. PCA, as run in Igor's PCA command, produces the eigenvalues and eigenvectors (the principal components and their variances) but the final step, computing the loadings as you call it, is the part that makes the process make sense. That's how it was for me at least. The eigenvector matrix (I think it's M_C from PCA in Igor) is essentially a rotation matrix. A rotation matrix is a matrix that, when multiplied against another matrix, rotates that matrix in n-space. Picture a drawing of a square. Now consider that the square is a 2D dataset where the points in the dataset are the 4 corners of the square. You can create a 2D rotation matrix that, when multiplied against this data matrix, rotates that square some angle in radians. A rotation matrix is essentially what PCA calculates for you with the condition that the axis' are orthogonal and each contains as much variance as possible in ranked order (first has the most, second has the next most, etc).
Multiplying the raw data matrix by the M_C matrix created in Igor's PCA command actually performs the "dimension reduction" and "rotation" of the original data reorienting it in multidimensional space according to the principals of what PCA does. What you get back from this calculation will be a matrix with the same number of samples as your raw data with a reduced number of variables (assuming you use some arbitrary selection of the number of significant components to reduce the data and assuming your data CAN be reduced). Finally, in order to make sense out of the rotated data, you can compute the cross correlation of the rotated data with the raw data which will reveal through correlation coefficients how the original variables relate to the PCA variables and, in fact, how they relate to each other. This is how PCA shows you the relationships between variables in a multidimensional dataset and reduces the dimension of the data. So there are in fact two additional steps past the output of Igor's PCA command you need to run before you get the complete picture.
PCA can be quite powerful but its interpretation is far from simple. I've got a pretty good grasp on it now and I understand why I'd want to use it. However the researchers I work with still glaze over as I walk them through results of PCA. To follow the results you have to hold visual models in your mind while simultaneously looking at 2 or 3 different graphs/heatmaps. It's certainly not straightforward...and not for the faint of heart!
November 3, 2011 at 10:47 pm - Permalink
I've just read your post. Thank you for your answer, however I've not really understood this :
"To follow the results you have to hold visual models in your mind while simultaneously looking at 2 or 3 different graphs/heatmaps".
You were talking about two additionnal steps, would you have written such a package for Igor in order to get the final biplot ?
Thank you in advance.
Best regards
December 21, 2011 at 03:26 am - Permalink