parsing file system listings using splitstring and PCRE

Does anyone have code that can parse directory listings?

For example, given the following:

dr-xr-xr-x 8 0 0 4096 Aug 2 00:00 .
drwxr-xr-x 14 plaus_scientists 626688 Aug 2 00:00 ..
dr-xr-xr-x 3 0 0 45056 Jan 6 2011 028
dr-xr-xr-x 3 0 0 4096 Feb 21 03:54 029
dr-xr-xr-x 4 0 0 4096 Apr 1 08:00 030
dr-xr-xr-x 4 0 0 4096 May 11 01:16 031
dr-xr-xr-x 4 0 0 53248 Jun 16 00:00 032
dr-xr-xr-x 4 0 0 57344 Aug 2 00:00 033
-rwxrwx--- 1 plaus staff 353 Nov 28 2010 FIZscan8210.itx
drwxr-x--- 2 plaus staff 81920 Mar 5 2010 0001000
drwxr-x--- 2 plaus staff 81920 Jun 28 2010 0002000
drwxr-x--- 2 plaus staff 81920 Jun 24 2010 0000000


I need to be able to parse out all the directory names (lines starting with d), bearing in mind that there may be spaces in the directory names. I then need to be able to do the same to get filenames.
cheers,
Andrew.
From reading around the (C#) regex for this would be something along the lines of:

^(?[\-ld])(?([\-r][\-w][\-xs]){3})\s+(?\d+)\s+(?\w+)\s+(?\w+)\s+(?\d+)\s+(?((?\w{3})\s+(?\d{2})\s+(?\d{1,2}):(?\d{2}))|((?\w{3})\s+(?\d{1,2})\s+(?\d{4})))\s+(?.+)$

Arrgh, are there any regex experts who could assemble this into something that splitstring could use?
Are you perhaps working at the shell level? A quick search on "parse directory listings unix" might be useful to help create a shell command that could do the work via ExecuteScript, for example ... http://www.unix.com/shell-programming-scripting/74630-parsing-directory…

Otherwise, I would wonder if an approach that looked backward on a given input line and captured the text until finding either a YYYY or an HH:MM field might be useful to pull out the directory or file name.

The one line RegEx approach is beyond me ... that is one that I too would like to see.

--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville
Oh my gosh, this seems to work
Function/s test(directoryline, isdirectory)
string directoryline
variable &isdirectory

//directoryline = " dr-xr-xr-x 2 1 1  45056 Jan 6 20:11 q02ee8dddds"
string a, b, c,d,e,f,g,h,i, j,k,l,m,n,o,p,q
//perm
string regex="([dwrx-]+)"
//u
regex+="(\\s)+"
regex+="([[:digit:]]|[[:alpha:]]+)"
//g
regex+="(\\s)+"
regex+="([[:digit:]]|[[:alpha:]]+)"
//o
regex+="(\\s)+"
regex+="([[:digit:]]|[[:alpha:]]+)"

//size
regex+="(\\s)+"
regex+="([[:digit:]]+)"

//month
regex+="(\\s)+"
regex+="([[:alpha:]]{3})"

//day
regex+="(\\s)+"
regex+="([[:digit:]]{1,2})"

//time or year 00:00 or 2010
regex+="(\\s)+"
regex+="([[:digit:]:]{4,5})"

//filename/directory
regex+="(\\s)+"
regex+="(.+)$"
splitstring/E=regex directoryline, a, b, c,d,e,f,g,h,i,j,k,l,m,n,o,p,q

a = replacestring(" ", a, "")
if(stringmatch(a[0], "d"))
    isdirectory = 1
endif
return q
End
jjweimer wrote:
Are you perhaps working at the shell level?


I'm using easyHttp to build a directory structure of an sftp server, then retrieve certain files. To start with one lists the directory contents (as shown in the first posting) using something like:
string stuff = ""
easyHttp/pass="user:pass" "sftp://path.to.server.com/directory/tree/", stuff


Then you parse the directory contents (using the regex I just posted), working out what's a file and what's a directory. Then you build a tree structure of directories/files. Finally you retrieve the files with:
easyHttp/pass="user:pass"/FILE="foobar:andrew:Desktop:myfile.jpg" "sftp://path.to.server.com/directory/tree/myfile.jpg"


At the moment easyHttp is the only way of doing SFTP (amongst other things) as FetchURL and FTPdownload do not have that capability.
The parsing of the output of the FTP LIST command is done within Igor (for the FTP* operations) using a slightly modified version of the code available at http://cr.yp.to/ftpparse.html

It doesn't use regular expressions and is C code, but it's not all that complicated and you could probably convert it to Igor code without too much difficulty.