On 28 Sep, 2005, at 16:07, Doug McNutt intoned: > At 15:16 -0400 9/28/05, Richard Nagle wrote: > >> What would be the fastest way, >> of splitting a large text document into separate doc, >> via a common identifier "From ???@???" >> >> This From ???@???, is at the top of every email document, >> inside this large single text file. >> this file contains about 15,000 emails. > > That sounds like a job for perl. > > I was fussing with mailboxes produced by Eudora that look a lot > like that. (This list for instance.) I was collecting and sorting > by Subject: line to make one file file for each subject. It's been > a while since I looked at it. When I realized I needed to handle > "Antwort", "AW", and others I got discouraged. Ask off line if > you'd like a copy of my last known perl script. The original file sounds like it is in standard "mbox" format. First line is "From:" Each message is separated from the next by a single blank line above the next "From:" (This format is the format used by Apple's Mail.app to store messages.) Briefly, "mbox" (all messages in one file) is the standard unix "mail" program format, while one file per message is the standard format for the "mh" program. These two formats have existed "forever" and there are any number of tools around to convert from one to the other. That being the case, there are a ton of "digestifiers" and "undigestifiers" floating around on the net (since the beginning of the ARPAnet) which will take single messages and put them into one file as well as take the single file and parse them out into a single file per message. T.T.F.N. William H. Magill # Beige G3 [Rev A motherboard - 300 MHz 768 Meg] OS X 10.2.8 # Flat-panel iMac (2.1) [800MHz - Super Drive - 768 Meg] OS X 10.4.1 # PWS433a [Alpha 21164 Rev 7.2 (EV56)- 64 Meg] Tru64 5.1a # XP1000 [Alpha 21264-3 (EV6) - 256 meg] FreeBSD 5.3 # XP1000 [Alpha 21264-A (EV 6.7) - 384 meg] FreeBSD 5.3 magill at mcgillsociety.org magill at acm.org magill at mac.com whmagill at gmail.com