The first problem:
I have 12500+ emails stored as text files. The majority of these text files contain quoted text from previous emails which I need to remove for the subsequent analysis of the data.
(Example of a standard email text file:)
From: …
To: …
Subject line: …
Text body: …
(If the email has quoted text this sequence is repeated:)
From: …
To: …
Subject line: …
Text body: …
From: …
To: …
Subject line: …
Text body: …
I have a script to sieve through all the text files and count the amount of times the string ‘From:’ occurs and if it is more than once tells me that file needs to be abridged.
The script I need is to delete/remove/strip all text from the text file after the second occurrence of the string ‘From:’
Many thanks for anyone who can give an insight into this problem.
Ali