parsing file system listings using splitstring and PCRE
andyfaff
For example, given the following:
dr-xr-xr-x 8 0 0 4096 Aug 2 00:00 .
drwxr-xr-x 14 plaus_scientists 626688 Aug 2 00:00 ..
dr-xr-xr-x 3 0 0 45056 Jan 6 2011 028
dr-xr-xr-x 3 0 0 4096 Feb 21 03:54 029
dr-xr-xr-x 4 0 0 4096 Apr 1 08:00 030
dr-xr-xr-x 4 0 0 4096 May 11 01:16 031
dr-xr-xr-x 4 0 0 53248 Jun 16 00:00 032
dr-xr-xr-x 4 0 0 57344 Aug 2 00:00 033
-rwxrwx--- 1 plaus staff 353 Nov 28 2010 FIZscan8210.itx
drwxr-x--- 2 plaus staff 81920 Mar 5 2010 0001000
drwxr-x--- 2 plaus staff 81920 Jun 28 2010 0002000
drwxr-x--- 2 plaus staff 81920 Jun 24 2010 0000000
I need to be able to parse out all the directory names (lines starting with d), bearing in mind that there may be spaces in the directory names. I then need to be able to do the same to get filenames.
cheers,
Andrew.
^(?[\-ld])(?([\-r][\-w][\-xs]){3})\s+(?\d+)\s+(?\w+)\s+(?\w+)\s+(?\d+)\s+(?((?\w{3})\s+(?\d{2})\s+(?\d{1,2}):(?\d{2}))|((?\w{3})\s+(?\d{1,2})\s+(?\d{4})))\s+(?.+)$
Arrgh, are there any regex experts who could assemble this into something that splitstring could use?
August 8, 2011 at 11:57 pm - Permalink
Otherwise, I would wonder if an approach that looked backward on a given input line and captured the text until finding either a YYYY or an HH:MM field might be useful to pull out the directory or file name.
The one line RegEx approach is beyond me ... that is one that I too would like to see.
--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville
August 9, 2011 at 05:08 am - Permalink
string directoryline
variable &isdirectory
//directoryline = " dr-xr-xr-x 2 1 1 45056 Jan 6 20:11 q02ee8dddds"
string a, b, c,d,e,f,g,h,i, j,k,l,m,n,o,p,q
//perm
string regex="([dwrx-]+)"
//u
regex+="(\\s)+"
regex+="([[:digit:]]|[[:alpha:]]+)"
//g
regex+="(\\s)+"
regex+="([[:digit:]]|[[:alpha:]]+)"
//o
regex+="(\\s)+"
regex+="([[:digit:]]|[[:alpha:]]+)"
//size
regex+="(\\s)+"
regex+="([[:digit:]]+)"
//month
regex+="(\\s)+"
regex+="([[:alpha:]]{3})"
//day
regex+="(\\s)+"
regex+="([[:digit:]]{1,2})"
//time or year 00:00 or 2010
regex+="(\\s)+"
regex+="([[:digit:]:]{4,5})"
//filename/directory
regex+="(\\s)+"
regex+="(.+)$"
splitstring/E=regex directoryline, a, b, c,d,e,f,g,h,i,j,k,l,m,n,o,p,q
a = replacestring(" ", a, "")
if(stringmatch(a[0], "d"))
isdirectory = 1
endif
return q
End
August 9, 2011 at 06:02 am - Permalink
I'm using easyHttp to build a directory structure of an sftp server, then retrieve certain files. To start with one lists the directory contents (as shown in the first posting) using something like:
easyHttp/pass="user:pass" "sftp://path.to.server.com/directory/tree/", stuff
Then you parse the directory contents (using the regex I just posted), working out what's a file and what's a directory. Then you build a tree structure of directories/files. Finally you retrieve the files with:
At the moment easyHttp is the only way of doing SFTP (amongst other things) as FetchURL and FTPdownload do not have that capability.
August 9, 2011 at 06:12 am - Permalink
It doesn't use regular expressions and is C code, but it's not all that complicated and you could probably convert it to Igor code without too much difficulty.
August 9, 2011 at 08:10 am - Permalink