Creating matrix of small graphs from several datasets with different attributes

Scatter Plot Matrix provides a GUI and procedure code to build a matrix of scatter plots showing all possible XY plots of all pairs from the selected list of data sets. The GUI provides quite a bit of control over how the XY plots are generated.

John Weeks
WaveMetrics, Inc.
support@wavemetrics.com

Log in or register to post comments

December 20, 2011 at 10:32 am - Permalink

hrodstein

I have written the start of some custom code that might be of use to you. I don't know if it is better than scatter plot matrix package for your situation.

I started by loading your data into a numeric 2D wave using this command:

LoadWave/J/M/E=1/K=1/V={" "," $",0,0} ""

/K=1 tells LoadWave to treat all columns as numeric. As a result, all of the non-numeric columns are loaded as blanks.

Then I wrote this function and executed it:

Function PlotArray(w2D)
    Wave w2D                // 2-D wave containing data
    
    Display /W=(5,45,805,545)                       // Make master graph
    String masterGraphName = WinName(0, 1)
    
    Variable numPointsPerPlot = 6
    Variable numPointsTotal = DimSize(w2D,0) - 1    // -1 because of header row
    Variable firstDataRow = 1                       // Skip header
    
    Variable numPlotColumns = 4
    Variable numPlotRows = trunc((numPointsTotal / numPointsPerPlot) / numPlotColumns)
    
    Variable columnLabelHeight = .05        // Space allocated for column labels
    Variable rowLabelHeight = .05           // Space allocated for row labels
    
    Variable grout = 0.02
    Variable plotWidth = (1 - grout*(numPlotColumns-1) - rowLabelHeight) / numPlotColumns
    Variable plotHeight = (1 - grout*(numPlotRows-1) - columnLabelHeight) / numPlotRows
 
    #if 0   // For debugging only
        Printf "Plot width = %.03g, Plot height = %.03g\r", plotWidth, plotHeight
    #endif
    
    Variable frameStyle = 1         // Single-width frame
 
    // Coordinates for subwindows
    Variable left, top, right, bottom
 
    // Create plots
    Variable plotRow
    for (plotRow=0; plotRow<numPlotRows; plotRow+=1)
        Variable plotColumn
        for (plotColumn=0; plotColumn<numPlotColumns; plotColumn+=1)
            left = plotColumn * (plotWidth + grout)
            right = left + plotWidth
            top = columnLabelHeight + plotRow * (plotHeight + grout)
            bottom = top + plotHeight
            #if 0   // For debugging only
                if (plotRow == 0)
                    Printf "Row %d: L=%.3g, T=%.3g, R=%.3g, B=%.3g\r", plotRow, left, top, right, bottom
                endif
            #endif
            
            Variable xColumn = 0            // Column containing X values
            Variable yColumn = 9            // Column containing Y values
            
            Variable startDataRow = 1 + plotColumn*numPointsPerPlot + plotRow*numPointsPerPlot*numPlotColumns
            Variable endDataRow = startDataRow + numPointsPerPlot -1
            
            Display /HOST=$masterGraphName /W=(left, top, right, bottom) w2D[startDataRow,endDataRow][yColumn] vs w2D[startDataRow,endDataRow][xColumn]
            ModifyGraph frameStyle=frameStyle
            ModifyGraph mode=4,marker=19
            
            SetActiveSubwindow $masterGraphName
        endfor
    endfor
End

The function creates a master graphs with subwindows for each of the plots.

I have attached the experiment which contains the code I wrote and the resulting graph.

I left room for the labels but did not write the code to create them. You would have to reload the data file as text (/K=2) to get the text to use as labels.

Attachments Plot Array.pxp (69.11 KB)

Log in or register to post comments

December 20, 2011 at 11:03 am - Permalink

arzensekd

Thank you both for your answers.

@hrodstein this is exactly what I want. The reason for having my data in this arrangement and not in the way suitable for using Scatter Plot Matrix, is that some instruments give me data in this way. Then I want to quickly look in the resulting graphs and see if there are any possible outliers due to measurement error. The next step it will be to quickly erase outliers from the resulting graphs.
Can you help me with writing the code in which could simply choose (with cursor) the outlying points on the resulting graphs, erase them from wave and recreate the original input data in the same form that was transferred into the wave, but now without the rows of erased outlying points?

Thanks, this will help a lot due to short time schedule!

Best,
Dejan

Log in or register to post comments

December 21, 2011 at 12:14 am - Permalink

JimProuty

arzensekd wrote:
...Then I want to quickly look in the resulting graphs and see if there are any possible outliers due to measurement error. The next step it will be to quickly erase outliers from the resulting graphs.
Can you help me with writing the code in which could simply choose (with cursor) the outlying points on the resulting graphs, erase them from wave and recreate the original input data in the same form that was transferred into the wave, but now without the rows of erased outlying points?

This entry in the WM Procedures Index help file may be useful:

#include <Delete Marquee Points>

Contains a set of routines that can be used to delete data from an XY pair of waves. You graph your data, drag out a marquee around some of your data and then choose a new command from the usual popup menu. A dialog is then presented that allows you to delete data values inside or outside the marquee.

--Jim Prouty
Software Engineer, WaveMetrics, Inc.

Log in or register to post comments

December 21, 2011 at 11:00 am - Permalink

hrodstein

Quote:
This entry in the WM Procedures Index help file may be useful:

#include <Delete Marquee Points>

This won't work in this case because the X and Y are sections of a 2D wave, something not supported by Delete Marquee Points.

Furthermore, I don't know if cursors will work here. Neither pcsr nor CsrInfo identify the point within the matrix - they just identify the point in the trace.

It seems as though you would have to use TraceInfo in combination with pcsr to identify the point within the matrix. That seems like a lot of work but a better way does not jump out at me.

I have attached the experiment I used to investigate this issue in case someone wants to play with it.

Attachments Identify Point in Matrix.pxp (58.57 KB)

Log in or register to post comments

December 21, 2011 at 02:11 pm - Permalink

arzensekd

hrodstein][quote wrote:

I have attached the experiment I used to investigate this issue in case someone wants to play with it.

Thanks for attaching this experiment. I will play with this. I've also noticed that you posted snippet "Identify Point in Matrix Subrange". Is this also helpful for me?
I have one suggestion or maybe just the question how to do. You attached in previous post the custom code for my attached data file. Is it possible to identify part of 2D wave with attribute and not by number of rows (points) for each subset of 2D wave? Your custom code works great for 2D waves with fixed (same) number of subset points. But, sometimes each subset doesn't have same number of points (rows) and is easier to distinguish between them to mark them with attribute value (e.g., the value of pH for separate subset).
This approach is well known in statistical programming language R with using package ggplot2, where one can mark x and y column by column name or number and identify each subset with another column (I don't know how well are you informed with R).

Thanks for all answers!

Best regards,
Dejan

Log in or register to post comments

December 26, 2011 at 09:20 am - Permalink

hrodstein

Quote:
I've also noticed that you posted snippet "Identify Point in Matrix Subrange". Is this also helpful for me?

Yes. The GetMatrixTraceIndices function allows you to identify an element of a 2D wave by putting a cursor on it. This would allow you to put a cursor on a point of a trace and then zap it (set it to NaN, also known as blank) as a way of removing outliers.

Quote:
I have one suggestion or maybe just the question how to do. You attached in previous post the custom code for my attached data file. Is it possible to identify part of 2D wave with attribute and not by number of rows (points) for each subset of 2D wave?

It is possible but it would take some additional programming and I have to get on to other things.

I would do this as follows:

Load your data into a text matrix wave.

Use column 18 (excipient) to determine where each subrange starts and ends by scanning it for None, A, B, C.

Replace the lines in my procedure that set the startDataRow and endDataRow variables with values determined in step 2.

This would require an intermediate level of Igor programming skill.

Quote:
Your custom code works great for 2D waves with fixed (same) number of subset points. But, sometimes each subset doesn't have same number of points (rows) and is easier to distinguish between them to mark them with attribute value (e.g., the value of pH for separate subset).

I'm not clear on your use of "subset". I'm not sure if you are referring to the collection of points for a particular subplot or the collection of points for a row of subplots.

Are there always four excipients (none, A, B, and C) or can that vary?

For a given row of subplots, is the number of data points per plot constant or can that vary?

Can the number of data points vary from one row of subplots to the next?

If you answer these questions and post one or two more files that illustrate the variability, I may be able to provide better guidance.

Quote:
This approach is well known in statistical programming language R with using package ggplot2, where one can mark x and y column by column name or number and identify each subset with another column (I don't know how well are you informed with R).

I'm very close to perfectly ignorant about R but it sounds like it is easier to do this kind of thing in R than in Igor.

Log in or register to post comments

December 26, 2011 at 02:34 pm - Permalink

arzensekd

hrodstein wrote:

I'm not clear on your use of "subset". I'm not sure if you are referring to the collection of points for a particular subplot or the collection of points for a row of subplots.

Are there always four excipients (none, A, B, and C) or can that vary?

For a given row of subplots, is the number of data points per plot constant or can that vary?

Can the number of data points vary from one row of subplots to the next?

If you answer these questions and post one or two more files that illustrate the variability, I may be able to provide better guidance.

It's long time from these post, but I have problems with sample data producing.
I attached the sample file where are attributes numerical values in columns pH, I, kappa_nm_1. From these columns I want to plot for example two attributes (e.g., pH vs I) and the points are for x axis the avgConc column with errors in sdConc and for y axis in avgD with add errors. As you can see, is for given set of pH and I different number of rows with same pH and I.

There is not always the same number of different attributes (now are eight different pH attributes and six different for I, but this could vary).

As I also want to plot fitted linear function for each subplot, which command I have to use in the procedure file and finally write fitting results in the separate wave?
Thanks for all answers,
Dejan

Attachments DLS_final_beforeclean.txt (129.58 KB)

Log in or register to post comments

January 21, 2012 at 11:08 am - Permalink