K means clustering on a data matrix

I have a data matrix of 551 rows and 35 columns. Rows are time series and columns are elements. I would like to cluster those variables which are showing similar concentrations e.g. clustering of those elements which are high in concentrations on specific days. Please help!

Clustering_test.pxp (206.48 KB)

Can you be more specific on what your actual goal is? Are you interested in days where the elemental concentrations are generally high or where they are similar (that must not be the same thing)? I'm not sure if you really want clustering.

If you just wanted to know the days with highest concentrations you could simply look at the sum of all elements:

MatrixOP/O W_SumRows = sumrows(pm10)
display W_Sumrows vs ts
ShowInfo

Then drag the cursor onto the spikes to identify date/time. One problem is that you have 4-5 orders of magnitude difference in maximum concentrations:

WaveStats/pcst PM10 // requires Igor 7 or higher
Edit M_WaveStats.ld

Is that of concern?

Another way of simply visualising concentrations is to display your data as an image, e.g.:

NewImage PM10
ModifyImage PM10 ctab= {*,*,Spectrum,0}
ShowInfo

 

The range of magnitude variation can be handled using log scale. 

If you decided on some threshold value (representing high concentration) you could simply threshold the image (using ImageThreshold or otherwise) to obtain the high cluster membership as a function of time.

If you mean something else by clustering please explain.

 

A.G.

In reply to by ChrLie

Hi ChrLie,

First I want to cluster whole dataset. I assume that with clustering I would be able to see the groups of elements where they would classify in different classes. Then I want to apply some conditions. My criteria is to cluster elements based on concentration value on hourly basis e.g. if Pb, Se and Cu are having highest peak at 02:00 hr, they should come in one group while the low values of these elements along with other elements could group in another class and so on.

I do not want to see the days with higher concentration but to cluster elements based on high spikes and low concentration for particular hours.

 

Thanks a lot!

In reply to by Igor

Image plot would not give me much help here. I want to see the elemental concentration on hourly basis for each day. For that I would like to cluster high and low concentrations elements in separate classes as a function of hourly time for each day. e.g. one day at 02:00 some elements are super high while another day these are super low at 02:00 but high at 06:00 and so on.

FWIW, I think the issue here is not clustering (K-Means or otherwise).  The issue appears to be that of representation and not so much how the various elements are distributed between clusters.  For a data set with 551 time points and 35 elements you could create a wave containing 35 threshold values (one for each element) and then apply these thresholds for all time points.  A simple image of the threshold will show you the "clustering" of elements above threshold.

In reply to by Igor

I tried that. The command:  ImageThreshold/T=2000 PM10 worked.

But I am not able to run wave that contain 35 threshold values. It shows incompatible dimensions error by executing the command: 

ImageThreshold/W=Pmth PM10

where Pmth is the threshold wave and PM10 is the data matrix. Pmth wave is of 1*35 size. 

Could you please help me for coding step by step?

Please note the documentation for the flag /W; you need to have pairs of values.  Your wave apparently has an odd number of values.  Also, it needs to be a 1D wave.

Forum

Support

Gallery

Igor Pro 9

Learn More

Igor XOP Toolkit

Learn More

Igor NIDAQ Tools MX

Learn More