[X-Unix] question about 'find'
David Ledger
dledger at ivdcs.demon.co.uk
Wed Nov 30 16:23:41 PST 2005
>From: "Stewart C. Russell" <scruss at scruss.com>
>Eugene wrote:
>> Sending the list of filenames to xargs(1) is always nice,
>> especially because it is much more efficient and speedier.
>
>You do have to be careful with xargs if you're processing a *lot* of
>arguments. BSD xargs has a default argument limit of 5000 (hidden in the
>'-n' para of the manual), so if you're calling xargs through sort (say),
>the results may not be what you expect.
>
>I think (but could be wrong) that the limit for GNU xargs might be
>different, so cross-platform scripters should be wary. I was going to
>give an example, but OS X doesn't include seq(1), to my dismay.
That's not what it means.
As a pedantic sidenote, the arguments to xargs are what you type as
arguments. The stream of data (pathnames if fed by find) are not
arguments.
There is a maximum command line length. If a shell wildcard expansion
would blow this limit it tells you with 'too many args' or similar.
It's this limit that stops you writing
file $(find /)
or similar.
xargs avoids this by taking the data it receives on stdin and running
the sub-command with as many data items as arguments as it thinks it
can handle. When one invocation of the sub-command has finished, it
invokes another with the next batch of data as arguments, and so on
until the data stream is finished. On OS X, xargs thinks all commands
can cope with 5000 arguments unless you specify otherwise with '-n
<number>'.
At one time in Bsd, there was a real 1024 char command line buffer
limit and xargs built up each invocation of the sub-command with
sufficient arguments to make as near a possible 1024 chars total
without overflowing. What we have now is similar, but solving a
technically slightly different problem.
>> However, unlike other flavors, OS X filenames tend to contain
>> whitespace characters that break traditional Unix tools ...
>
>And many others, such as brackets/braces. I assume (hope!) that '/' is
>no longer legal, for it was the scourge of the Solaris-based CAP/AUFS admin.
>
>What's list wisdom on methods of dealing with special characters?
Conventional Unix wisdom is 'If you put spaces or other special
characters in a file name you will get what you deserve' (ie, chaos).
Unfortunately we can't take that approach.
I define a shell function which I call fnf (FileName Fix):
fnf() { sed "s/\([ '\"]\)/\\\\\1/g"; }
which, if fed pathnames one per line on stdin, squirts out the same
but with spaces, tabs, single and double quotes backslash quoted. I
then just use this when I need to. (The whitespace after the '[' is
<space><tab>). When I find I need '$', '\' or any of the others I'll
work out how to quote them in the function and add them.
> I find
>'while' to be a somewhat useful construct, as in:
><command> | while read file
>do
> # do something based on $file
>done
If you mean piping the stream into a while loop and invoking the
sub-command once for each item, that's even less efficient than
'-exec'. xargs is the rough equivalent of
i=0
<command> | while read file; do
files="$files $file"
i=$((i + 1))
[ $i -lt 5000 ] && continue
sub-command $files
files=""
i=0
done
David
--
David Ledger - Freelance Unix Sysadmin in the UK.
Chair of HPUX SysAdmin SIG of hpUG technical user group (www.hpug.org.uk)
david.ledger at ivdcs.co.uk
www.ivdcs.co.uk
More information about the X-Unix
mailing list