[X-Unix] question about 'find'

David Ledger dledger at ivdcs.demon.co.uk
Wed Nov 30 16:23:41 PST 2005


>From: "Stewart C. Russell" <scruss at scruss.com>
>Eugene wrote:
>>  Sending the list of filenames to xargs(1) is always nice,
>>  especially because it is much more efficient and speedier.
>
>You do have to be careful with xargs if you're processing a *lot* of
>arguments. BSD xargs has a default argument limit of 5000 (hidden in the
>'-n' para of the manual), so if you're calling xargs through sort (say),
>the results may not be what you expect.
>
>I think (but could be wrong) that the limit for GNU xargs might be
>different, so cross-platform scripters should be wary. I was going to
>give an example, but OS X doesn't include seq(1), to my dismay.

That's not what it means.

As a pedantic sidenote, the arguments to xargs are what you type as 
arguments. The stream of data (pathnames if fed by find) are not 
arguments.

There is a maximum command line length. If a shell wildcard expansion 
would blow this limit it tells you with 'too many args' or similar. 
It's this limit that stops you writing
     file $(find /)
or similar.
xargs avoids this by taking the data it receives on stdin and running 
the sub-command with as many data items as arguments as it thinks it 
can handle. When one invocation of the sub-command has finished, it 
invokes another with the next batch of data as arguments, and so on 
until the data stream is finished. On OS X, xargs thinks all commands 
can cope with 5000 arguments unless you specify otherwise with '-n 
<number>'.

At one time in Bsd, there was a real 1024 char command line buffer 
limit and xargs built up each invocation of the sub-command with 
sufficient arguments to make as near a possible 1024 chars total 
without overflowing. What we have now is similar, but solving a 
technically slightly different problem.

>>  However, unlike other flavors, OS X filenames tend to contain
>>  whitespace characters that break traditional Unix tools  ...
>
>And many others, such as brackets/braces. I assume (hope!) that '/' is
>no longer legal, for it was the scourge of the Solaris-based CAP/AUFS admin.
>
>What's list wisdom on methods of dealing with special characters?

Conventional Unix wisdom is 'If you put spaces or other special 
characters in a file name you will get what you deserve' (ie, chaos). 
Unfortunately we can't take that approach.

I define a shell function which I call fnf (FileName Fix):
     fnf() { sed "s/\([      '\"]\)/\\\\\1/g"; }
which, if fed pathnames one per line on stdin, squirts out the same 
but with spaces, tabs, single and double quotes backslash quoted. I 
then just use this when I need to. (The whitespace after the '[' is 
<space><tab>). When I find I need '$', '\' or any of the others I'll 
work out how to quote them in the function and add them.

>  I find
>'while' to be a somewhat useful construct, as in:
><command> | while read file
>do
>   # do something based on $file
>done

If you mean piping the stream into a while loop and invoking the 
sub-command once for each item, that's even less efficient than 
'-exec'.  xargs is the rough equivalent of
     i=0
     <command> | while read file; do
         files="$files $file"
         i=$((i + 1))
         [ $i -lt 5000 ] && continue
         sub-command $files
         files=""
         i=0
     done

David


-- 
David Ledger - Freelance Unix Sysadmin in the UK.
Chair of HPUX SysAdmin SIG of hpUG technical user group (www.hpug.org.uk)
david.ledger at ivdcs.co.uk
www.ivdcs.co.uk


More information about the X-Unix mailing list