0

The first problem:

I have 12500+ emails stored as text files. The majority of these text files contain quoted text from previous emails which I need to remove for the subsequent analysis of the data.

(Example of a standard email text file:)

From: …
To: …
Subject line: …
Text body: …

(If the email has quoted text this sequence is repeated:)

From: …
To: …
Subject line: …
Text body: …
From: …
To: …
Subject line: …
Text body: …

I have a script to sieve through all the text files and count the amount of times the string ‘From:’ occurs and if it is more than once tells me that file needs to be abridged.

The script I need is to delete/remove/strip all text from the text file after the second occurrence of the string ‘From:’

Many thanks for anyone who can give an insight into this problem.
Ali

2
Contributors
1
Reply
2
Views
5 Years
Discussion Span
Last Post by Enalicho
0

Well, how much do you know about file manipulation? Your best bet is to -

1)Open a text file
2)Read and store all text until the pattern you want
3)Write a new file, containing the stored text

This topic has been dead for over six months. Start a new discussion instead.
Have something to contribute to this discussion? Please be thoughtful, detailed and courteous, and be sure to adhere to our posting rules.