Replace duplicates with average values-- remove data points and then insert?
mwpro
I have time series measurements that have occasional duplicate timestamps, my datapoint are around 49000 in number and there are about 2000 pair duplicates.
I can find where the duplicates are but don't know how to take average of each pair of duplicate measurements, remove one of the two rows of duplicates and replace the left row with the averaged value. I tried to use delete points, but the point number would then change after each remove which creates problem in ID the next pair of duplicates. Any good solutions to this problem? Other than do math on the point number, such as make the point number -2 in the loop after each remove. Can I get around using deleting points but use other approaches?
Thank you!
Are the timestamps monotonic (I would guess so but better be sure)? Are there triplets?
In case time stamps are monotonic, I'd probably crawl through the data last set to first set and check whether the 'next' (actually previous) one has the same time stamp. If so, average them, store the result in the high index set, and delete the low index one. Repeat until you reach index 0. Caution with multiplets here...
HJ
April 29, 2018 at 01:45 pm - Permalink
April 29, 2018 at 03:50 pm - Permalink
FindDuplicates
, with the resultant wave + source wave blended throughMatrixOP
using implicit indexing or in-line logical testing?--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAH
April 30, 2018 at 12:47 pm - Permalink
April 30, 2018 at 01:01 pm - Permalink
Another approach, which may or may not be faster for a given set of input data, is to loop through the entire input dataset and copy the required data from a pair of input waves to a pair of output waves. Here is a function that I wrote to do this. It does not use DeletePoints but instead calls Redimension once. I have tested it somewhat but don't claim it to be foolproof. It also should work for more than two consecutive identical X values but I have not tested that.
Wave xWave, yWave
Duplicate/FREE xWave, xWaveCopy
Duplicate/FREE yWave, yWaveCopy
Variable numPointsIn = numpnts(xWave)
Variable numPointsOut = 0
Variable previousX = xWaveCopy[0]
Variable previousY = yWaveCopy[0]
Variable currentX, currentY
Variable numPointsWithThisX = 1
Variable sumOfYValuesWithThisX = previousY
Variable i
for(i=1; i<numPointsIn; i+=1) // Handle up to but not including the last point
currentX = xWaveCopy[i]
currentY = yWaveCopy[i]
if (currentX != previousX)
xWave[numPointsOut] = previousX
yWave[numPointsOut] = sumOfYValuesWithThisX / numPointsWithThisX
numPointsOut += 1
numPointsWithThisX = 0
sumOfYValuesWithThisX = 0
endif
numPointsWithThisX += 1
sumOfYValuesWithThisX += currentY
previousX = currentX
previousY = currentY
endfor
// Handle the last input point
if (currentX != previousX)
// Last point is not a duplicate
numPointsWithThisX = 1
sumOfYValuesWithThisX = currentY
endif
xWave[numPointsOut] = currentX
yWave[numPointsOut] = sumOfYValuesWithThisX / numPointsWithThisX
numPointsOut += 1
Redimension/N=(numPointsOut) xWave, yWave
// For debugging only
#if 0
Printf "Number of input points=%d, number of output points=%d\r", numPointsIn, numPointsOut
#endif
return numPointsOut
End
I am attaching the experiment that I used to test this function. The experiment includes another function for timing how long the function takes on a particular XY pair.
April 30, 2018 at 05:50 pm - Permalink
April 30, 2018 at 05:53 pm - Permalink