Goodness of fit and residuals
Hi all,
Just a general question.....When we attempt to fit a spectrum with various Gaussian bands, we can get an idea about the goodness of our fitting through observation of the residuals given on the top in Igor. The general idea is that the residuals should be more or less randomly scattered around 0 and i think that they should show a constant variance throughout the whole fitting range. Assuming that i would like to compare two different fitting efforts, would it make sense to claim that if fitting 1 develops non constant variance (funnel shape) after certain point or if it keeps constant variance but with a higher magnitude than fitting 2, then fitting 1 is definitely worse than 2? Is there anything else that should be also considered? There is the option of comparing the corresponding chi square values but i think that this is not necessarily informative if i am interested in determining the fitting quality ONLY FOR A PART of the whole fitting range. I hope my question was not very confusing.
Konstantinos
I don't think there is an easy answer to what constitutes a good fit.
From a mathematical point of view the fit improves for every degree of freedom you add, but it is also important that the model behind the fit is physically meaningful. Too many free parameters usually mean each parameter becomes physically less meaningful because an error in one can be compensated for by errors in other parameters.
Scientifically the simplest model that gives a good description of your observed data is usually the "best" model. More complex models will always give better residuals, but that doesn't necessarily make them better. You have to make sure the models you compare are of similar complexity.
I don't think residuals of spectra would have constant variance, because the noise often scales as the square root of the signal intensity. Hence at the position of a strong peak you will see more noise and a higher scatter of the residuals. Keeping it simple and using the mean absolute deviation in the region of interest may be the best option.
April 20, 2021 at 11:59 pm - Permalink
In reply to I don't think there is an… by olelytken
I totally agree with you.....I was basically referring to two fitting efforts including the same number of bands and consequently same degree of complexity.....I am attaching an image of the residuals of two different fittings where everything is the same (same number of bands) apart from changing the position of one band from 230 (red residuals) to 250 cm-1 (black residuals), respectively. You can see that around 200 cm-1 there is a change in the scattering of the residuals with the black ones being more spread out.
I also agree with your point that it would be sensible to say that at the top of a strong peak one would expect heteroscedasticity of residuals......I am not exactly sure what you mean by just using the mean absolute deviation though....maybe in this example!
April 21, 2021 at 01:10 am - Permalink
What I mean is something like this:
Wave AbsResidualWave
AbsResidualWave=Abs(ResidualWave)
Variable MeanAbsResidual=Mean(AbsResidualWave, x1, x2)
or you could square ^2 the residual wave instead of taking the absolute Abs() value. It guess depends on how you want to weigh outliers
April 21, 2021 at 03:18 am - Permalink
If those are residuals, you have more problems than just heteroscedasticity, you also have correlated errors. You can see that from the oscillations in the residuals. Unfortunately, Igor doesn't have a good way to deal with that- it requires taking a covariance matrix as input instead of just a weighting wave.
April 21, 2021 at 11:08 am - Permalink
In reply to If those are residuals, you… by johnweeks
Hi John,
i also noticed the oscillations at this limited range (they do not exist in the higher frequencies) but i was not sure about the problem of correlation errors. Could this be due to the fact that maybe additional bands are required for fitting (meaning that maybe the fitting is not the best)? I am not sure what could be the root of this....In addition, can you please elaborate a bit more on the use of a covariance matrix instead of a weighting wave? I am also trying to look it up and get my head around.
April 21, 2021 at 11:36 am - Permalink
Can you explain the approach that you are taking to do the fitting. How many peaks? What are you holding constant from the typical set of inputs (position, half-width, area, height) and what are you allowing to vary? Why (from a first principles stand) are you taking this approach? What aspect of the physical system leads you to believe that your spectrum must be fit in the way that you are fitting?
April 21, 2021 at 03:08 pm - Permalink
In reply to Can you explain the approach… by jjweimer
Essentially i am fitting the Raman spectrum of 30PbO-70B2O3 glass.....I have used the minimum number of bands that are also more or less present in the crystalline counterpart which is often used as a guide since there are similar structural units present (as already shown in literature). If i try to perform the fitting with even one band less the outcome is obviously much worse. Bands that are related to the same species are constrained to have the same width (kept constant). I have tried to vary the positions within a logical range that are to be expected from the corresponding crystalline spectra (with some deviations of course due to the disorder nature) and the heights are the parameters that i more or less let free. Including more bands will be difficult for me to assign since i will not be sure where they would correspond (maybe there are but at the moment i would not know what they mean). After trying multiple fittings (with the fixed numbers of bands decided) i could not yield better residuals than the ones seen in RED in my picture. In principle, there are multiple bands that overlap a lot due to their large width as a result of the glassy state and there would be various combinations that one could come up with.
April 21, 2021 at 03:27 pm - Permalink
You are keeping the band widths constant when fitting spectra from a glassy material with states from a crystalline material. My experience and reading has taught me that order->disorder transformations cause line broadening across all manner of spectroscopy. The reasoning is to account for changes in the local chemistry. The same applies in diffraction -- peak broadening accounts for changes in the local structure as the material moves from crystalline to glassy ordering.
Do you have literature reference or first-principles reasoning that gives you assurance that your approach to constrain the peak widths to the values for crystalline states is truly realistic even for glassy materials? Otherwise, I have to believe that, when you allow peak half-widths to be unconstrained even as you keep peak positions fixed, you will end with a better fit and thereby a clearer way to compare your materials.
April 22, 2021 at 06:39 am - Permalink
In reply to You are keeping the band… by jjweimer
Ok probably i did not explain this correctly.....I am not keeping the same width values for the glass bands as the ones in crystalline....I only used the crystalline reference spectra to get an idea of the numbers of bands and their assignment that i should use since structural units are similar between the two states (not the same but close). Obviously my band widths are far more wide than what you would fit in the crystalline counterpart, otherwise there would be no fitting actually. When i said that i keep the widths constant, i meant between successive glass spectra (e.g. 30%PbO, 40% PbO e.t.c). My assumption is that a specific structural unit that is present in two successive glass spectra should more or less have the same width. I am making this assumption in order to put some constraints otherwise my fitting will never converge due to the large number of highly overlapping bands (up to 19 bands within a frequency range of 4-2000 cm-1). Does it make more sense now? Sorry for the confusion
April 22, 2021 at 06:51 am - Permalink
Yes. This is clearer. I have to wonder whether you may be wrong to keep the widths constant as a function of composition. You betray even to yourself this doubt when you say "My assumption is ... should MORE OR LESS have the same width". What constituents "more or less" is actually not being allowed during your fitting process because more or less is ZERO variation at that point.
April 22, 2021 at 08:23 am - Permalink
In reply to Yes. This is clearer. I have… by jjweimer
Keeping the width constant for nearby compositions is basically an assumption based on the fact that the nature of the specific structural unit does not change much. One could argue that the width is not strictly the same but it should not change drastically because we are dealing with the same species. Due to the severe overlapping of the bands even if the width changes a bit the fitting outcome will be practically the same. I have tried various combinations by varying the width a bit among successive spectra and it was almost the same as keeping it constant (some of the bands that may appear more isolated and well defined, fixing the width is not necessary because it is easy for the software how to fit it). On the other hand, the position appears more sensitive to the neighbouring environment and it is more possible to change and thus i chose not to keep it fixed. However, there is some basis on what to expect on the positions as well. If i let all parameters free (having a certain numbers of bands) Igor either does not converge or does the fitting in an arbitrary way every time that makes no sense in terms of physical meaning. There are no other constraints that i can think of to assist the fitting of these complex spectra.
April 22, 2021 at 08:45 am - Permalink
It seems that you have a system where changes in the distribution of chemistry change both the position and the width of any given component peak. The level of complexity may require that you must constrain the bounds of the peak position and half-width to stay physically relevant rather than fixing certain sets of them as constants. Alternatively, rather than peak fitting, perhaps you would gain better insights by using a different chemometrics approach to analyze your system.
In any case, going back to your original question and tying to Igor Pro ... You will be able to get most anything you need from Igor Pro in order to keep your model within the bounds of your physical system. You may have to work systematically at getting there. This means, you may have to do an unknown amount of trial and error on your assumptions input to the model until what you get back from the model consistently makes sense for the physical systems.
April 22, 2021 at 09:34 am - Permalink
Are you making any correction for excitation frequency and temperature? At these small Raman shifts that could significantly affect the fitting.
April 22, 2021 at 09:54 am - Permalink
I also tried to constrain the bounds of both parameters for several of my bands......However, in many cases the software will converge either parameter to the extremes (either minimum or maximum values of the bounds) which brings me back to the beginning with values that are ''suspicious''. I agree with the trial and error attempt and this is exactly what i have been doing with all the compositions that i have. I have done various attempts with different values for positions and widths. The problem is with bands that are low in intensity but overlap a lot with neighbouring bands that are also weak. When i started my fitting i used bounds instead of fixing values, however for several bands Igor would return the upper or lower values, which was not useful. Despite the fact that i keep some values fixed, this is done with respect to the physics of the system and what makes more sense based on previous published work. This is the story with peak fitting, although a different chemometric approach could be interesting.
PS: I am not still sure how i can use the covariance matrix as input to deal with correlated errors as John Weeks suggested.
April 22, 2021 at 09:57 am - Permalink
In reply to Are you making any… by tony
Hi Tony,
You are definitely right.....I have also tried to see the corrected spectra, but i chose to come back to the raw spectra because using frequency and temperature corrected spectra kill the intensity of the intense boson band which is found below 100 cm-1 and it is interesting for me to check it to determine my intermediate range order structure.
April 22, 2021 at 10:00 am - Permalink
It sounds like you have a wide number of options to explore. Fun stuff with doing spectroscopy of materials!
April 22, 2021 at 10:07 am - Permalink
Konstantinos sent a question about the covariance matrix to support. It seemed like it might be somewhat useful to share my answer.
The instrument with a gear in it could be, for instance, your powder XRD goniometer head. I seem to have lost track of what sort of spectra you are measuring, so I don't know if my gear example is relevant...
April 22, 2021 at 03:11 pm - Permalink
Often residuals like that result from fitting the wrong lineshape. Have you tried Voigt, rather than Gaussian?
The images show a synthetic example. All peaks have a Lorentzian component, but in the first image are fit as Gaussian. In addition, what looks like a single peak on the left is actually the sum of two peaks. After fitting with a more appropriate lineshape (second image) the residuals give an indication that an extra peak is present, but unfit peaks and wrong lineshape can both have a similar effect on the residuals, so it's tricky for real world fitting.
April 23, 2021 at 01:39 am - Permalink
In reply to Often residuals like that… by tony
Hi Tony,
I get your point....Actually the boson band under 100 cm-1 is a huge asymmetric component which is always fitted by a lognormal.....This is stated a lot in literature and this is what i also did.....In glasses, all the other component bands are mainly Gaussian due to intermolecular interactions but i also fitted a couple of bands with Voigt profiles since either Gaussian or Lorentzian were quite insufficient. There is probably something different to these peaks that are probably related to more localized modes (not interacting much to the rest of the network). I have basically tried several things in general. There are a couple of references giving an additional component band almost at 20-30 cm-1 which is related to some sort of LO-TO splitting.....However, i am not sure about this and i decided to leave this out.....obviously it makes the fitting better due to the addition of one component. The fitting can probably become better in terms of mathematics but then i will not be able to explain more bands or other parameter changes, based on the current knowledge of the system's underlying physics.
PS: Very nice examples that you gave....However, for most of the bands in glasses if you try to fit Voigt, you will see that the Lorentzian contribution is almost negligible which shows that the band is mainly Gaussian. In addition, fitting such complex spectra with multiple bands severely overlapping with Voigt profiles can be challenging since there are more degrees of freedom to consider during the fitting. Having less and more well defined bands is probably easier and i have done with crystalline materials in the past. I prefer to keep things more simple if possible (despite the fact that Voigt profiles are the correct profiles describing vibrational spectra in terms of physical meaning).
April 23, 2021 at 01:56 am - Permalink
In reply to Hi Tony, I get your… by kchatzipanagis
Tony,
Here i attach the residuals for the full spectrum fitting and you can also see the actual spectrum below.....It does not look bad just by looking at the whole thing.....The residuals shown in red in my previous example are just a small portion of this image at the left side (small frequencies). Your examples are clearly showing inadequate fitting though. PS: The model (shown in blue) is the result of 19 bands so you can see that it is quite complex.
April 23, 2021 at 04:50 am - Permalink
Another note from the "expert" on curve fitting, but pretty clueless about Raman spectroscopy...
If you try to fit a Voigt profile to a peak that is nearly pure Gaussian or Lorentzian, you invite numerical instability. That arises both from the approximation used to compute the Voigt profile, which doesn't like a shape factor that is very near zero or very large, and from the fitting. As the shape factor becomes quite extreme the derivatives with respect to the shape factor become small, and possibly numerically zero. A zero is what is meant by the infamous "singular matrix error".
April 23, 2021 at 09:45 am - Permalink
In reply to Another note from the … by johnweeks
I agree with you John and i have also seen it in the past when i tried to fit Gaussian like or Lorentzian like peaks with Voigt just for the fun.....It can become quite unstable in terms of converging. I also remember that we discussed a similar issue a few years ago in another topic posted in this forum....Maybe it was myself who brought this discussion in the past. In general, Voigt is a more complex profile so i guess there is no reason to introduce more complexity when there is no need.
April 23, 2021 at 10:40 am - Permalink
If you use a pseudo-Voigt shape instead you will have no instability problems with almost-Gaussian or almost-Lorentzian peak shapes, but honestly to me your fit looks pretty damn good as it is.
April 23, 2021 at 11:31 pm - Permalink