
parsing file system listings using splitstring and PCRE

andyfaff
For example, given the following:
dr-xr-xr-x 8 0 0 4096 Aug 2 00:00 .
drwxr-xr-x 14 plaus_scientists 626688 Aug 2 00:00 ..
dr-xr-xr-x 3 0 0 45056 Jan 6 2011 028
dr-xr-xr-x 3 0 0 4096 Feb 21 03:54 029
dr-xr-xr-x 4 0 0 4096 Apr 1 08:00 030
dr-xr-xr-x 4 0 0 4096 May 11 01:16 031
dr-xr-xr-x 4 0 0 53248 Jun 16 00:00 032
dr-xr-xr-x 4 0 0 57344 Aug 2 00:00 033
-rwxrwx--- 1 plaus staff 353 Nov 28 2010 FIZscan8210.itx
drwxr-x--- 2 plaus staff 81920 Mar 5 2010 0001000
drwxr-x--- 2 plaus staff 81920 Jun 28 2010 0002000
drwxr-x--- 2 plaus staff 81920 Jun 24 2010 0000000
I need to be able to parse out all the directory names (lines starting with d), bearing in mind that there may be spaces in the directory names. I then need to be able to do the same to get filenames.
cheers,
Andrew.
^(?[\-ld])(?([\-r][\-w][\-xs]){3})\s+(?\d+)\s+(?\w+)\s+(?\w+)\s+(?\d+)\s+(?((?\w{3})\s+(?\d{2})\s+(?\d{1,2}):(?\d{2}))|((?\w{3})\s+(?\d{1,2})\s+(?\d{4})))\s+(?.+)$
Arrgh, are there any regex experts who could assemble this into something that splitstring could use?
August 8, 2011 at 11:57 pm - Permalink
Otherwise, I would wonder if an approach that looked backward on a given input line and captured the text until finding either a YYYY or an HH:MM field might be useful to pull out the directory or file name.
The one line RegEx approach is beyond me ... that is one that I too would like to see.
--
J. J. Weimer
Chemistry / Chemical & Materials Engineering, UAHuntsville
August 9, 2011 at 05:08 am - Permalink
August 9, 2011 at 06:02 am - Permalink
I'm using easyHttp to build a directory structure of an sftp server, then retrieve certain files. To start with one lists the directory contents (as shown in the first posting) using something like:
Then you parse the directory contents (using the regex I just posted), working out what's a file and what's a directory. Then you build a tree structure of directories/files. Finally you retrieve the files with:
At the moment easyHttp is the only way of doing SFTP (amongst other things) as FetchURL and FTPdownload do not have that capability.
August 9, 2011 at 06:12 am - Permalink
It doesn't use regular expressions and is C code, but it's not all that complicated and you could probably convert it to Igor code without too much difficulty.
August 9, 2011 at 08:10 am - Permalink