Laggy performance with large data sets
tkessler
Slow loading in itself would not be much of a problem, but following it, running many operations such as appending a wave to a graph, opening the Curve Fit panel window, or performing the fit itself, are accompanied by a pause that lasts a few seconds.
This issue is not the system I'm using (17-inch MacBook Pro 2011, with 16GB RAM, 2.5GHz Quad Core i7, 512GB SSD with ~100GB free, running OS X 10.8.4), since other applications run very fast even when Igor is crunching its numbers on these experiment files. Igor is also using only about 500MB RAM, which is significantly lower than what I'd expect it could use, given that 32-bit applications have a ~4GB limit.
Is there some way to increase Igor's memory allocation if this is a contributing factor, or otherwise make loading and using large data sets faster?
I suspect the main issue is that you have many waves in one data folder. Before you do anything else I recommend that you close the Data Browser window. The DB is going to spend time looking at your data and with 12k waves it may take some time. Next, the presence of a large number of waves also affects the execution of any command that applies to one or more waves as these have to be looked up. To get around this issue you might consider concatenating your individual waves into a 2D matrix where each wave is a single column. These are still easy to display and could reduce the over-all number of waves by a factor of ~600 if you are displaying that many waves per graph.
I hope this helps,
A.G.
WaveMetrics, Inc.
July 22, 2013 at 12:35 pm - Permalink
Anything else that causes data in the graph to change will cause the graph to redraw. You could try closing the graph and saving the recreation macro while you work on the analysis.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
July 22, 2013 at 02:19 pm - Permalink
If I cut out 4000 of the waves so there are only 8000 then the responsiveness greatly increases, even though the waves are still displayed 600 to a graph. This suggests it may be how the waves are stored more than anything else, and that there's some threshold after which Igor gets bottlenecked. For now I will try organizing the waves into subdirectories, and while using a 2D wave approach might help, in the long term it would take me redoing close to 5 years and thousands of lines of programming in Igor to implement that approach (ugh!). However, I'll give it a shot, at least for the routines I'm using for this one project.
July 26, 2013 at 03:04 pm - Permalink
I don't think there's much you can do to make loading of experiments you've already saved be faster. But maybe it's possible for you to load each experiment and reorganize the waves in that experiment into multiple data folders.
In any code you write, make sure you're using functions and not macros as much as possible.
July 27, 2013 at 06:27 am - Permalink
Since there is a big re-write going on for IP7, I wonder if this is the time to jump to using a C++ container? I think using a map container offers O(lnN) performance, which would be faster than O(N) for a linked list. There is also unordered_map.
July 27, 2013 at 09:53 pm - Permalink
And a hash map has constant-time lookup.
It's on the list on the whiteboard in my office. But it is relatively low priority compared to getting a working application out to beta. This topic comes up from time to time- I wonder how many people are affected by lengthy look-up when there are many waves.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
July 29, 2013 at 08:59 am - Permalink
A map container in c++ would store a wave by the hash of the key (wavename). The map container is in the STL, and will be available in most compilers/platforms, whereas hash_map is not. Perhaps I got big O wrong. unordered_map may also be a contender.
From stackoverflow (treat with caution):
"Some of the key differences are in the complexity requirements.
A map requires O(log(N)) time for inserts and finds.
An unordered_map requires an 'average' time of O(1) for inserts and finds but is allowed to have a worst case time of O(N).
So, usually, unordered_map will be faster, but depending on the keys and the hash function you store, can become much worse."
July 29, 2013 at 03:46 pm - Permalink
std::map uses binary search, which results in O(ln(N)). A hash map gets constant time from using the hash as a direct index into an array. The possibility of O(N) comes from hash collisions- when more than one item has the same hash, then the items are simply stored in an array, and the lookup goes back to simply walking the array. But if your hash is reasonably good, the arrays will be very small.
We also have to be careful that in implementing a fast wave lookup that we don't slow down wave *creation*. Some containers feature fast lookup, but slow insertion.
John Weeks
WaveMetrics, Inc.
support@wavemetrics.com
July 29, 2013 at 04:57 pm - Permalink
unordered_map (aka hash map) in namespace std::tr1:: was introduced with C++98 TR1, which e. g. VS2008 already supports. And recent xcode versions I guess also.
July 31, 2013 at 10:04 am - Permalink