Checking for same sizes on images before loading?
jjweimer
I would like to confirm that the size of all image files in a folder is exactly the same *before I load the image files*. For example, I should throw a flag when I would select to open all image files in a folder with 10 images at 2MB each and one image at 1.5 MB. The intent is to prevent selecting a folder to load into a stack when the images in the folder are mis-matched in size.
Is there an efficient way to do this, even when it might involve breaking out to the shell or DOS level with ExecuteScript?
How about using Open and FStatus? It was fast enough (<0.1 seconds) for ~80 files.
//Get the path in case it's empty
IF(strlen(strFolder_Path)==0)
NewPath/Q/O/M="Select folder" pTemp_Path //Gets the path to a FOLDER (not a file)
PathInfo pTemp_path
IF(strlen(S_Path)==0)
Return -1
ELSE
strFolder_Path=S_Path
ENDIF
KillPath/Z pTemp_path
ENDIF
Variable vStart=StartMSTimer
//Get the files in the folder
NewPath/O/Q/Z pFolder_Path, strFolder_Path
String strFile_Names_List=IndexedFile(pFolder_Path, -1, "????")
Variable vNum_Files=ItemsInList(strFile_Names_List)
IF(vNum_Files>=2)
Make/O/T/N=(vNum_Files) File_Names=StringFromList(p, strFile_Names_List)
Sort/A File_Names, File_Names
Make/O/L/U/N=(vNum_Files) File_Sizes=0
//Get the file sizes
Close/A
Variable iFileDex, vRefNum
FOR(iFileDex=0;iFileDex<vNum_Files;iFileDex+=1)
Open/R/Z/P=pFolder_Path vRefNum as File_Names[iFileDex]
IF(V_Flag==0)
FStatus vRefNum
File_Sizes[iFileDex]=V_logEOF
ENDIF
Close vRefNum
ENDFOR
Close/A
Variable vStop=StopMSTimer(vStart)
Print vStop/1e6
//See if there are any files with a different size
FindDuplicates/FREE/RN=Unique_File_Sizes File_Sizes
IF(numpnts(Unique_File_Sizes)==1)
Print "File sizes all match!"
ELSE //For each unique size, make a wave with that size and the name
Variable iSizeDex
FOR(iSizeDex=0;iSizeDex<numpnts(Unique_File_Sizes);iSizeDex+=1)
Extract/O File_Sizes, $"Index_"+num2istr(iSizeDex)+"_Sizes", File_Sizes==Unique_File_Sizes[iSizeDex] //Probably could just stuff the size into the wave note of the names wave
Extract/O/T File_Names, $"Index_"+num2istr(iSizeDex)+"_Names", File_Sizes==Unique_File_Sizes[iSizeDex]
ENDFOR
ENDIF
ELSE
Print "There's only one file in the folder."
ENDIF
KillPath/Z pFolder_Path
END
Edit: Changed the file size wave from a double to a long unsigned integer. I don't think the size will ever be negative or a non-integer.
August 9, 2023 at 10:08 pm - Permalink
Instead of Open you could also try:
File_Sizes[iFileDex]=V_logEOF
August 10, 2023 at 02:44 am - Permalink
The help for GetFileFolderInfo says that V_logEOF is the number of bytes in the data fork, while V_logEOF from FStatus is the total number of bytes in the file, which had always made me think that those values would be different. However, when I checked several files types (.h5, .png) the size was the same from both methods. Maybe one of the Wavemetrics folks can chime in.
However, using GetFileFolderInfo in the loop is several times slower than using Open (~0.12 s versus ~0.04 s for 80 files).
August 10, 2023 at 07:29 am - Permalink
I also didn't understand that part, but it is probably fine for finding very different file sizes. But it makes sense that GetFileFolderInfo is slower, since it grabs more info. Better use Open then.
August 10, 2023 at 09:20 am - Permalink
They are both the number of bytes in the data fork.
The FStatus documentation would be more precise if it said "The number of bytes in the opened fork" which is always the data fork.
The Open operation has always opened the data fork only.
Apple dropped support for resource forks a long time ago so the distinction between data fork and resource fork is moot at this point.
August 10, 2023 at 11:46 am - Permalink
Thanks @KZarzana. I will use the approach you've suggested.
August 10, 2023 at 03:50 pm - Permalink
In case anyone might need, here is a version that checks three things. In my applications, the num of files must be four or more, otherwise, the stack becomes an RGB image. I do not allow mixtures of file types to create a stack. Finally, I check for the same file size.
// return 1 if valid, 0 if invalid
Static Function f_IsValidateforStack(string fList, variable sizecheck)
variable nf, nt, vRefNum, ic
string tlist, plist, jlist
string theFile, fName
// check number of files (stacks must be 4+ images)
nt = ItemsInList(flist)
if (nt < 4)
return 0
endif
// check file names (no stacks from combinations of image types)
tlist = ListMatch(fList,"*.tif")
tlist += ListMatch(fList,"*.tiff")
nf = ItemsInList(tlist,";")
nt = nf != 0 ? 1 : 0
plist = ListMatch(fList,"*.png")
nf = ItemsInList(plist,";")
nt = nf != 0 ? nt + 1 : nt
jlist = ListMatch(fList,"*.jpg")
jlist += ListMatch(fList,"*.jpeg")
nf = ItemsInList(jlist,";")
nt = nf != 0 ? nt + 1 : nt
if (nt > 1)
return 0
endif
// check file sizes (images must be same sizes)
if (sizecheck)
nt = ItemsInList(flist)
Make/D/N=(nt)/FREE File_Sizes = NaN
for (ic=0;ic<nt;ic+=1)
theFile = StringFromList(ic,fList)
fName = ParseFilePath(0,theFile,":",1,0)
Open/R/Z/P=imgPath vRefNum as fName
if (v_flag==0)
FStatus vRefNum
File_Sizes[ic]=V_logEOF
endif
Close vRefNum
endfor
Close/A
FindDuplicates/FREE/RN=Unique_File_Sizes File_Sizes
if (numpnts(Unique_File_Sizes) != 1)
return 0
endif
endif
return 1
end
August 11, 2023 at 08:19 am - Permalink
Jeff- I see you allow tiff, png and jpg. If no compression is applied, then the number of bytes in the file will be the same as the number of bytes in the ultimate image. But if any compression is done, then different images may wind up with different file sizes. Especially with jpg, the file size will depend on the quality setting and the amount of high-frequency features in the image. In tiff and png images, I would imagine that large patches of zeroes would compress almost to nothing.
August 11, 2023 at 09:42 am - Permalink
Thanks for the heads up John. I've implemented a restriction to limit the creation of stacks to TIFF images only. As to the possibility of missing the true size differences for compressed TIFFs, only one case will fail in my revised approach. Failure will occur when the individual sizes of each one of a set of TIFF compressed files on the drive are **exactly** the same size but at least one (out of a minimum of four) loaded images is a different uncompressed size than all of the others. I'll take this as an edge case for someone with greater motivation to tackle.
// return 1 if valid, 0 if invalid
Static Function f_IsValidateforStack(string fList, variable sizecheck)
variable nt, vRefNum, ic
string tlist, theFile, fName
// check number of files (stacks must be 4+ images)
nt = ItemsInList(flist)
if (nt < 4)
return 0
endif
// check file names (stacks only allowed from tiff)
tlist = ListMatch(fList,"*.png")
tlist += ListMatch(fList,"*.jpg")
tlist += ListMatch(fList,"*.jpeg")
nt = ItemsInList(tlist,";")
if (nt > 0)
return 0
endif
// check file sizes (images must be same sizes)
if (sizecheck)
nt = ItemsInList(flist)
Make/D/N=(nt)/FREE File_Sizes = NaN
for (ic=0;ic<nt;ic+=1)
theFile = StringFromList(ic,fList)
fName = ParseFilePath(0,theFile,":",1,0)
Open/R/Z/P=imgPath vRefNum as fName
if (v_flag==0)
FStatus vRefNum
File_Sizes[ic]=V_logEOF
endif
Close vRefNum
endfor
Close/A
FindDuplicates/FREE/RN=Unique_File_Sizes File_Sizes
if (numpnts(Unique_File_Sizes) != 1)
return 0
endif
endif
return 1
end
August 11, 2023 at 03:53 pm - Permalink
If I read this correctly, the only passing cases will be sets of TIFFs with zero compression, the false positive edge case that you mention, or a set of tiffs where the compression fortuitously gives the same file size. Why not check the file header (actually, the Image File Directory/Directories) for the image width(s) and height(s)? That's what you're really trying to check, right?
August 14, 2023 at 02:32 am - Permalink
variable refNum
if (strlen(strPath))
Open/R refNum as strPath
else
string fileFilters = "TIFF Files (*.tif,*.tiff:.tif,.tiff;)"
Open/R/F=fileFilters refNum
Print s_filename
endif
if (strlen(s_filename) == 0)
return 0
endif
string strByteOrder = "00"
int nextDirectory, theAnswer, byteOrder, numEntries, width, height
int i, j
int imax = 256 // maximum mumber of images to look for in one file
int iTag, iType, iCount, iValue, iJunk
FBinRead refNum, strByteOrder
strswitch (strByteOrder)
case "II" :
byteOrder = 3
break
case "MM" :
byteOrder = 2
break
endswitch
FBinRead/U/F=2/B=(byteOrder) refNum, theAnswer // should be 42
if (theAnswer != 42)
Close refNum
DoAlert 0, "could not read file"
return 0
endif
// read the Image File Direcory/Directories
for (i=0;i<imax;i++)
FBinRead/U/F=3/B=(byteOrder) refNum, nextDirectory
if (!nextDirectory)
break
endif
FSetPos refNum, nextDirectory
FBinRead/U/F=2/B=(byteOrder) refNum, numEntries
width = 0
height = 0
// loop though Image File Directory
for (j=0;j<numEntries;j++)
FBinRead/U/F=2/B=(byteOrder) refNum, iTag
FBinRead/U/F=2/B=(byteOrder) refNum, iType
FBinRead/U/F=3/B=(byteOrder) refNum, iCount
// for the values we're chasing, iType is 3 or 4 (two or four byte integer)
// the value should always be found in the IFD, no need to interpret a pointer
if (iType == 3)
FBinRead/U/F=2/B=(byteOrder) refNum, iValue
FBinRead/U/F=2/B=(byteOrder) refNum, iJunk
else
FBinRead/U/F=3/B=(byteOrder) refNum, iValue
endif
if (iTag == 256)
width = iValue
elseif (iTag == 257)
height = iValue
endif
endfor
Print "width", width, "height", height
endfor
Close refnum
end
August 15, 2023 at 04:33 am - Permalink
if (theAnswer != 42)
...
LOL, I wonder if this is some deliberate joke by the creators of TIFF.
August 15, 2023 at 05:20 am - Permalink
Thanks Tony. I had thought that I eventually might do a read-only ImageLoad operation to capture the TAGs. Your approach may be less cumbersome.
August 15, 2023 at 10:04 am - Permalink
The TIFF format is described here
@chozo, the documentation describes it as "an arbitrary but carefully chosen number"
@jjweimer, i edited the snippet to properly handle the case where height and width are encoded as 2 byte integers. I doubt that there are actually any files where this is the case.
August 16, 2023 at 04:19 am - Permalink