K means clustering on a data matrix

Can you be more specific on what your actual goal is? Are you interested in days where the elemental concentrations are generally high or where they are similar (that must not be the same thing)? I'm not sure if you really want clustering.

If you just wanted to know the days with highest concentrations you could simply look at the sum of all elements:

MatrixOP/O W_SumRows = sumrows(pm10)
display W_Sumrows vs ts
ShowInfo

Then drag the cursor onto the spikes to identify date/time. One problem is that you have 4-5 orders of magnitude difference in maximum concentrations:

WaveStats/pcst PM10 // requires Igor 7 or higher
Edit M_WaveStats.ld

Is that of concern?

Another way of simply visualising concentrations is to display your data as an image, e.g.:

NewImage PM10
ModifyImage PM10 ctab= {*,*,Spectrum,0}
ShowInfo

Log in or register to post comments

April 8, 2019 at 01:16 am - Permalink

Igor

The range of magnitude variation can be handled using log scale.

If you decided on some threshold value (representing high concentration) you could simply threshold the image (using ImageThreshold or otherwise) to obtain the high cluster membership as a function of time.

If you mean something else by clustering please explain.

A.G.

Log in or register to post comments

April 8, 2019 at 10:59 am - Permalink

Vinni

Hi ChrLie,

First I want to cluster whole dataset. I assume that with clustering I would be able to see the groups of elements where they would classify in different classes. Then I want to apply some conditions. My criteria is to cluster elements based on concentration value on hourly basis e.g. if Pb, Se and Cu are having highest peak at 02:00 hr, they should come in one group while the low values of these elements along with other elements could group in another class and so on.

I do not want to see the days with higher concentration but to cluster elements based on high spikes and low concentration for particular hours.

Thanks a lot!

Log in or register to post comments

April 17, 2019 at 02:45 am - Permalink

Vinni

Image plot would not give me much help here. I want to see the elemental concentration on hourly basis for each day. For that I would like to cluster high and low concentrations elements in separate classes as a function of hourly time for each day. e.g. one day at 02:00 some elements are super high while another day these are super low at 02:00 but high at 06:00 and so on.

Log in or register to post comments

April 17, 2019 at 02:58 am - Permalink

Igor

FWIW, I think the issue here is not clustering (K-Means or otherwise). The issue appears to be that of representation and not so much how the various elements are distributed between clusters. For a data set with 551 time points and 35 elements you could create a wave containing 35 threshold values (one for each element) and then apply these thresholds for all time points. A simple image of the threshold will show you the "clustering" of elements above threshold.

Log in or register to post comments

April 18, 2019 at 04:44 pm - Permalink

Vinni

I tried that. The command: ImageThreshold/T=2000 PM10 worked.

But I am not able to run wave that contain 35 threshold values. It shows incompatible dimensions error by executing the command:

ImageThreshold/W=Pmth PM10

where Pmth is the threshold wave and PM10 is the data matrix. Pmth wave is of 1*35 size.

Could you please help me for coding step by step?

Log in or register to post comments

April 23, 2019 at 03:14 am - Permalink

Igor

Please note the documentation for the flag /W; you need to have pairs of values. Your wave apparently has an odd number of values. Also, it needs to be a 1D wave.

Log in or register to post comments

April 23, 2019 at 07:50 am - Permalink

Igor Pro 9

Igor XOP Toolkit

Igor NIDAQ Tools MX