GetFileFolderInfo: check only for a few properties
Hi all,
I am developing a program for the user to load the last saved file in a path. Usually, in our environment, users can save files in folder trees randomly (pointless to track where the previous file was saved etc)
I use GetFileFolderInfo in a modified version of WM routine PrintFoldersAndFiles(pathName, extension, recurse, level) to check for file creation dates and find the newest file.
The problem is the speed of operation. Now I have ~1500 files in my folder tree and the operation takes 10 secs! The folder trees have usually more files so the operation will be even slower.
In the same folder structure I called a python script from Igor pro:
"python3 -c \\\"from pathlib import Path;from os.path import getmtime;print(max(list(Path('%s').rglob('*%s')),key=getmtime))"
and the operation took 0.2 secs!
I believe that the bottleneck is the amount of info GetFileFolderInfo checks for. Am I wrong?
Is it possible to add a bit switch to allow GetFileFolderInfo to search only for specific properties (e.g modification date)?
Thanks
You haven't given us the code you are using to call GetFileFolderInfo, so it's hard to say whether changes to your code could make doing this in Igor faster. But I doubt you could this particular thing in Igor faster than in Python because Igor does not give you access to a lower level file/path object like Python does. With the exception of commands that work on open files and take refNum parameters, file access in Igor is based on the name or full path of the file, and looking up that file using the OS's file API commands is relatively slow.
My suggestions to get the best performance possible are as follows:
1. To the extent possible, use the GetFileFolderInfo /P flag and provide the name of an Igor symbolic path and a file name instead of providing a full or partial path via fileOrFolderNameStr. That way Igor only needs to look up a file name within a path for which it already has an internal descriptor.
2. I assume you are using IndexedFile to get a list of file names in a directory. Make sure you are passing -1 as the index parameter so you get a list of all files at once instead of getting one file name at a time. That should be much faster.
If you want to send us the code you're using to iterate through the files and get the timestamp, we may be able to offer advice about how to improve performance. Feel free to send that to support directly if you wish.
May 24, 2023 at 07:24 am - Permalink
In reply to You haven't given us the… by aclight
Hi aclight,
Thanks for your advice/reply.
here is the code I am using now (MXP_GetNewestCreatedFileInPathTree is the function I want to optimise)
/// Load the last file found in the directory tree with root at pathName
string latestfile = ""
variable latestctime = 0
string filepathStr = MXP_GetNewestCreatedFileInPathTree("pMXP_LoadFilesBeamtimeIgorPath", extension, latestfile, latestctime, 1, 0)
WAVE wRef = MXP_WAVELoadSingleDATFile(filepathStr, "")
MXP_DisplayImage(wRef)
print "File loaded: ", filepathStr
return 0
End
Function/S MXP_GetNewestCreatedFileInPathTree(string pathName,
string extension, string &latestfile, variable &latestctime,
variable recurse, variable level)
// MXP_GetNewestCreatedFileInPathTree is a modified WM code of
// PrintFoldersAndFiles(pathName, extension, recurse, level)
// It recursively finds all files in a folder and subfolders looking for
// the creation date of each file, catching the newest one.
// pathName is the name of an Igor symbolic path that you created
// using NewPath or the Misc->New Path menu item.
// extension is a file name extension like ".txt" or "????" for all files.
// recurse is 1 to recurse or 0 to list just the top-level folder.
// level is the recursion level - pass 0 when calling MXP_GetNewestCreatedFileInPathTree.
// latestfile and latestctime are called by reference as the recursive function call would
// reset pass-by-value arguments. We could alternatively use SVAR and NVAR.
/// DO NOT CALL THE FUNCTION DIRECTLY.
PathInfo $pathName
string path = S_path
if(!V_flag) // If path not defined
print "pMXP_LoadFilesBeamtimeIgorPath is not set!"
path = MXP_SetOrResetBeamtimeRootFolder()
endif
// Reset or make the string variable
variable folderIndex, fileIndex
// Add files
fileIndex = 0
do
string fileName
fileName = IndexedFile($pathName, fileIndex, extension)
if (strlen(fileName) == 0)
break
endif
GetFileFolderInfo/Z/Q (path + fileName)
if(V_creationDate > latestctime)
latestfile = (path + fileName)
latestctime = V_creationDate
endif
fileIndex += 1
while(1)
if (recurse) // Do we want to go into subfolder?
folderIndex = 0
do
path = IndexedDir($pathName, folderIndex, 1)
if (strlen(path) == 0)
break // No more folders
endif
string subFolderPathName = "tempPrintFoldersPath_" + num2istr(level+1)
// Now we get the path to the new parent folder
string subFolderPath
subFolderPath = path
NewPath/Q/O $subFolderPathName, subFolderPath
MXP_GetNewestCreatedFileInPathTree(subFolderPathName, extension, latestfile, latestctime, recurse, level+1)
KillPath/Z $subFolderPathName
folderIndex += 1
while(1)
endif
return latestfile
End
I will try to apply your recommendations and see how much I can improve. I will report back.
All the best.
May 24, 2023 at 07:42 am - Permalink
Hi aclight,
I changed the first do-while loop to:
string fileNames = IndexedFile($pathName, -1, extension)
do
string fileName
filename = StringFromList(fileIndex, fileNames)
if (strlen(fileName) == 0)
break
endif
GetFileFolderInfo/P=$pathName/Z/Q fileName
if(V_creationDate > latestctime)
latestfile = (path + fileName)
latestctime = V_creationDate
endif
fileIndex += 1
while(1)
and tested in another folder structure:
1. Previous code: 6.44 sec
2. New code (this post) 4.97 sec
3. Call Python 0.14 sec
I got a 20% improvement.
If I try to do the same change in the second do-while loop things get slightly slower.
So, I guess that the best i can do?
Cheers,
eg
May 24, 2023 at 08:21 am - Permalink
I'm surprised that change didn't have a bigger effect on performance.
Can you reproduce the problem in a single folder with a lot of files (not doing a recursive search)? If so, please call IndexedFile($pathName, -1, "????") and send me the output of that (either here or through support). You can save it in an Igor experiment or in a text file, whatever is easier for you. Then please also tell me the value of the extension string when you are running this code.
I will use the list of all files to create a test directory that contains the same file names as on your system, and then use your code to determine where the bottlenecks are.
If you can't do that, you could try using Igor's function profiling procedure. To use this, add the following include statement to the main procedure window then compile procedures.
#include <FunctionProfiling>
Then select the Windows->Procedures->FunctionProfiling.ipf menu item and read the comments for instructions.
I primarily want for you to confirm that most of the time is spent in GetFileFolderInfo rather than in IndexedFile.
One other important question--are the files on a local drive or a network/shared drive? That could make a big difference in the performance.
Please also provide your OS and version and the version of Igor you are using.
Another idea--if you're using IP9, try adding the /UTC flag with GetFileFolderInfo. That will avoid converting the time from UTC to the local time, which might speed things up a little.
May 24, 2023 at 08:39 am - Permalink
So, I produced a folder with roughly the same number of files as in the folders. The command now takes 2.5 sec to execute.
I use:
/// Load the last file found in the directory tree with root at pathName
string latestfile = ""
variable latestctime = 0
string filepathStr = MXP_GetNewestCreatedFileInPathTree("pMXP_LoadFilesBeamtimeIgorPath", "????", latestfile, latestctime, 1, 0)
variable microSeconds = StopMSTimer(timerRefNum)
print "Time elapsed: ", microSeconds/1e6, " sec"
It's lots. The extention string I use is ".dat".
I attach a .txt with the filenames.
I use Igor9 and adding /UTC has no effect.
Thanks
May 24, 2023 at 09:29 am - Permalink
FWIW, on my machine (with a fast Nvme drive), running your code with all of your file names takes about 0.6 seconds.
In any case, I used a profiler to see where time is spent and it looks like a substantial amount of time is spent asking the OS for the information for the S_creator output variable. I'll report this internally and see if there's something we can do to avoid this. Thanks for providing the info needed to reproduce the problem.
May 24, 2023 at 11:38 am - Permalink
I looked over the current code for GetFileFolderInfo and I don't see a way for you to call the operation in a way that skips the code that is particularly slow. Adding your suggested feature is probably something that would need to wait for IP10.
If you can rely on Python being present then that might be the way to go.
A possible alternative would be to do something like this:
start = StopMSTimer(-2); ExecuteScriptText/B "cmd /c dir /O-D /TC /B /R c:\Windows\system32\*.dll"; print (Stopmstimer(-2)-start)/1e6, ItemsInList(S_value, "\r")
In this case that gives you a directory listing of all .dll files in the given directory, sorted by creation date (/TC), with the most recently created file first. So you would just need the first line of the output of this command for every directory. Then you would need to call GetFileFolderInfo on those files to get the actual creation timestamp to determine which one was created first.
This may or may not be significantly faster than calling GetFileFolderInfo on every file in every folder. On my Windows machine, the line above takes 0.31 seconds to execute and finds 3638 files. As a comparison, the test based on your code and file names took about 0.6 seconds for around 1600 files.
Someone with more command line experience might be able to put together a command that would give you just the most recent file in a directory.
May 24, 2023 at 03:40 pm - Permalink
FWIW, the command string below will sort reverse on macOS as based on this StackExchange report. Run it in the folder of interest (i.e. first execute a cd (folder path of interest).
https://apple.stackexchange.com/questions/86307/can-i-list-files-ordere…
May 26, 2023 at 01:42 pm - Permalink