I think this is a case where you need to write a bit of code. FTWIW, here is a suggested approach which may not be trivial to implement.
I suppose the basic method would be to iterate with FPClustering until all your classes have one element. After every iteration you store the W_FPClusterIndex results and then apply FPClustering repeatedly to each one of the resulting classes until you end with 1 element per class. The tricky part is how to map the resulting classes in every run into global classes. One approach may be to keep a DP wave corresponding to a global clusterIndex say DPClusterIndex. Since we need to store the changes in class membership over M dendogram levels, this would be a 2D wave with the same number of rows as W_FPClusterIndex and M columns. After the first run you store the integer results of W_FPClusterIndex in DPClusterIndex. Now you need to figure out how to keep the different DP indices from overlapping. Since we expect a maximum of M levels in the dendogram and suppose each class in the first step had Ni members, so compute dNi=1/((M+1)*(Ni+1)). This differential "index" will be added to DPClusterIndex in each of the subsequent iterations (iteration>0) so that DPClusterIndex[][iteration]=DPClusterIndex[p][iteration-1]+W_FPClusterIndex[p]*iteration*p*dNi.
Note that the last expression hinges on the assumption that you are going to run the full input in each iteration. To do so it is best if you use as an input a copy of the full input where you set all rows corresponding to the classes that are NOT involved in the given run to some constant. In fact, to facilitate this I'd add one row like that to the original data -- just to make sure that I have an established "null" class that will never be confused with real data. Also note that the factor p*dNi can be optimized (for nicer intervals) by taking into account the exact number of elements that belong to each class.
When you run out of iterations you end up with a 2D wave with rows that describe class memberships. If you now plot all rows you should get a dendogram.
If you would like to follow this approach or if you have any questions, feel free to contact me at support@wavemetrics.com
Thank you very much for the help.
I'll try to make it happen and if I succeed I'll post the code.
At the moment it appears easier to me to have my colleague plot this in python.
I think this is a case where you need to write a bit of code. FTWIW, here is a suggested approach which may not be trivial to implement.
I suppose the basic method would be to iterate with FPClustering until all your classes have one element. After every iteration you store the W_FPClusterIndex results and then apply FPClustering repeatedly to each one of the resulting classes until you end with 1 element per class. The tricky part is how to map the resulting classes in every run into global classes. One approach may be to keep a DP wave corresponding to a global clusterIndex say DPClusterIndex. Since we need to store the changes in class membership over M dendogram levels, this would be a 2D wave with the same number of rows as W_FPClusterIndex and M columns. After the first run you store the integer results of W_FPClusterIndex in DPClusterIndex. Now you need to figure out how to keep the different DP indices from overlapping. Since we expect a maximum of M levels in the dendogram and suppose each class in the first step had Ni members, so compute dNi=1/((M+1)*(Ni+1)). This differential "index" will be added to DPClusterIndex in each of the subsequent iterations (iteration>0) so that DPClusterIndex[][iteration]=DPClusterIndex[p][iteration-1]+W_FPClusterIndex[p]*iteration*p*dNi.
Note that the last expression hinges on the assumption that you are going to run the full input in each iteration. To do so it is best if you use as an input a copy of the full input where you set all rows corresponding to the classes that are NOT involved in the given run to some constant. In fact, to facilitate this I'd add one row like that to the original data -- just to make sure that I have an established "null" class that will never be confused with real data. Also note that the factor p*dNi can be optimized (for nicer intervals) by taking into account the exact number of elements that belong to each class.
When you run out of iterations you end up with a 2D wave with rows that describe class memberships. If you now plot all rows you should get a dendogram.
If you would like to follow this approach or if you have any questions, feel free to contact me at support@wavemetrics.com
A.G.
WaveMetrics, Inc.
November 14, 2011 at 11:26 am - Permalink
I'll try to make it happen and if I succeed I'll post the code.
At the moment it appears easier to me to have my colleague plot this in python.
Greets
Boris
November 17, 2011 at 12:38 pm - Permalink