On 03/05/2006, at 15:40 , x-unix- request at listserver.themacintoshguy.com wrote: > I'm trying to write a bash script for something I thought would be > simple, but haven't been able to figure it out. > > I have some files that are essentially text files, but have binary > data in them. For instance, using grep I need to use the "-a" option > to get any output. > > When one opens these files in a text editor you see a readable text Hmm, sounds like they might be utf-8? These files will look like normal text files except for some characters which aren't ASCII. In fact, utf-8 and utf-16 text files will often )(but not always, such as xml files) start with a special sequence which tells the reading software that this is a utf-8 file. As to the end, this could be some kind of 'signature' in a non-ASCII code. Try opening them in TextEdit with another than the default encoding, just to see. Or use XCode to recode/interprete the file contents. You can use od on a command line to look at the character sequences ('od -b' for decmal output, 'od -c' for character output, where non- displayable characters will be in ocal or with a special code). Once you know exactly what you have and what you want to throw away, it should be quite simple to use sed, awk, or perl to get the job done. You could also just delete any characters less than a space with tr -d 'character set'. If it is just the first line, and then from a well known point on all the rest, sed would be: sed '1,1d; /pattern/,$d' file >new_file where pattern is the unique pattern from which line on all will be discarded. You could also usd a substitution to keep the line from which on you will discard the rest. There are, of course, many other ways of accomplishing the same! Robert > file. However, the first line of the file is a few "garbage" > characters, and the last 1-10 pages are all "garbage" binary > characters. > > All I'd like to do is make a script that would strip off the first > line and then remove all the garbage characters from the end of the > file. The text of the files always end with the same set of > characters so I had hoped to find a way to basically do something > like: > > delete from the end of $EndString to the end of the file > > where $EndString would be the last text I want to keep and is unique > in the file. > > The fact that the file has binary data may make line counting hard, > grep didn't seem to be able to return a line number. > > Ben Departement Informatik FGB tel +41 (0)61 267 14 66 Universität Basel fax. +41 (0)61 267 14 61 Robert Frank Klingelbergstrasse 50 Robert.Frank at unibas.ch CH-4056 Basel Switzerland http:// www.informatik.unibas.ch/personen/frank_r.html