Need help on structuring programming problem...
PeterR
I am trying to implement a program that for testing purposes should generate synthetic data.
Generally, I have a certain value X, which I want to emulate with a system that has only a predefined set of possible values (these be A,B,C,D).
So A,B,C or D can be chosen to emulate X or come as close as possible to X in the restrictions of the system.
For creating synthetic data, I now want to program a small macro that "decides" on the outcome of a simulated experiment using the probabilities extracted from this data.
For example, I have the following data:
Value: counts/1000:
System 1:
X 12.03
System: 2
A 11.91
B 11.91
C 10.54
D 15.23
I now would have to formulate this input into probabilities, like this:
- A is very close to X, so it should get the highest probability to be chosen to stand for X in the new system
- B is exactly as close to X as A, so A and B should have the same probability to be chosen as best suited replacement for X
- C and D deviate (to different) extent from X, so this should be considered when calculating their probabilities to be chosen with respect to A and B
- probabilities of A,B,C,D should sum to 1
How would you implement this? I'm not asking for the exact code, just the general idea how to solve this problem.
Really would appreciate your input here, thanks a lot in advance for any help!
Regards,
Peter
* Take fc_i = abs(T - S_i)/T as the metric of "closeness", where T is the target and S_i is the signal
* Eliminate duplicate values of fc_i
* Generate the equation sum ac_i * fc_i = 1, which in matrix notation becomes Ac * Fc = I
* Solve for the coefficient terms in Ac
For example, the first Fc term in your set is fc_A = (12.03 - 11.91)/12.03
--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville
October 6, 2014 at 10:16 am - Permalink
Thanks al lot for the quick reply :)
Sounds promising, will try to implement this.
But would elimination of duplicates not result in neglecting A (or B) because they have the same closeness?
Is there a Command in IGOR to solve for coefficients? (I am not very fond of matrix algebra ^^)
How would the Matrix look like in the case of the example?
Another big question mark for me is at the moment: Considering I have the probabilities calculated correctly, how do I decide on which af the replacements is chosen depending on their probability?
I know there has to be randomization process in the code somewhere, but I cannot figure out at the moment how exactly it should be done.
Regards,
Peter
October 6, 2014 at 10:31 am - Permalink
if (rn < 0.4)
(choose A)
elseif (rn < 0.8)
(choose B)
elseif (rn < 0.95)
(choose C)
else
(choose D)
endif
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
October 6, 2014 at 04:07 pm - Permalink
Using your proposed structure, I have meanwhile managed to calculate the "closeness" as proposed by jjweimer.
For X=12.03 they are (same example as above):
A = 73.39 %
B = 87.61 %
C = 99.00 %
D = 99.00 %
The appropriate probabilities I calculated so far are:
A = 0.204445
B = 0.244038
C = 0.275758
D = 0.275758
The question I have now is: how to deal with the Values that have the exact same probability..? And the probabilities sum up to 1, but they are so close to each other that I have the feeling that choosing a random number between 0 and 1 would not make the correct decision...
Any ideas?
October 7, 2014 at 03:07 am - Permalink
Notice that in John's example the if...elseif... chain contains cumulative probabilities. Hence a random number between 0 and 1 can be used to provide a sensible bias in the decision.
Methinks something is not right here - you wanted A and B to have the highest probabilities.
I would suggest something like the following:
1. Calculate the absolute difference of each value from the target X:
D_A = abs (A - X) , and similarly for B, C & D
2. Sum these:
Sum = D_A + D_B + D_C + D_D
3. Calculate a probability based on these differences:
P_A = (1 - D_A / Sum) / (N - 1) , and similarly for B, C & D
where N = number of values (4 in this case)
4. Construct a decision function along the lines of:
(choose A)
elseif (rn < P_A + P_B)
(choose B)
elseif (rn < P_A + P_B + P_C)
(choose C)
else
(choose D)
endif
For the values you provided, the probabilities are (approximately):
P_A = 0.325
P_B = 0.325
P_C = 0.233
P_D = 0.117
HTH,
Kurt
October 7, 2014 at 03:43 am - Permalink
But I still wonder whether A and B are really treated equal in this decision making if-construct...?
October 7, 2014 at 04:52 am - Permalink
Hi Peter,
Perhaps thinking of it like this will help:
The random number rn lies between 0 and 1, and have a uniform distribution. This means that, for example, the probability of 0.2 <= rn < 0.3 has the same as the probability of 0.5 <= rn < 0.6, which has the same probability of 0.9 <= rn < 1.0, and so on.
The if...elseif... construct is basically saying
if 0.0 <= rn < 0.325 then do 'A'
if 0.325 <= rn < 0.650 then do 'B'
( and so on for C and D).
In other words, the 'range' of values of rn that will give rise to 'A' is the same as 'range' of values that will give rise to 'B'. Given the uniform probability of rn to have any value (within the 0 to 1 range), the outcomes 'A' and 'B' must have the same probability.
HTH,
Kurt
October 7, 2014 at 05:09 am - Permalink
I meanwhile calculated the correct probabilities, put them in a wave and sorted them with
Sort
.String Substitution
Variable j
for(j=0;j<(numpnts(Pool));j+=1)
if (random < Probabilities_Sorted [j])
Substitution = Pool_Sorted [j]
break
elseif(random < (Probabilities_Sorted [j] + Probabilities_Sorted [j+1]))
Substitution = Pool_Sorted [j+1]
break
elseif(random < (Probabilities_Sorted [j] + Probabilities_Sorted [j+1] + Probabilities_Sorted [j+2]))
Substitution = Pool_Sorted [j+2]
break
else
Substitution = Pool_Sorted [j+3]
endif
endfor
The "pool" of possible values is made up of 4 values in this case.
In order to redesign the construct to achieve applicability for any size of pool, I think a Do-Loop must be applied...
Thanks for all your help, was a pleasure!
October 7, 2014 at 09:23 am - Permalink
for(j=0;j<(numpnts(Pool));j+=1,k+=1)
if (random < sum(Probabilities_Sorted,0,j)
Substituted = Pool_Sorted [j]
break
endif
endfor
Thanks again for all the help!
Best regards,
Peter
October 7, 2014 at 10:25 am - Permalink
How can I circumvent this unwanted anomaly?
October 13, 2014 at 10:27 pm - Permalink
I may be missing something here, but I can't see the code for where you have calculated the probabilities?
I have re-checked the algorithm I presented previously and changing the target to X=11.91 (i.e. the same as A and B) I get the following probabilities:
P_A = 0.3333
P_B = 0.3333
P_C = 0.2360
P_D = 0.0974
The question of whether this method for calculating the probabilities is appropriate for your needs is one I cannot answer.
HTH,
Kurt
October 13, 2014 at 11:53 pm - Permalink