The first problem:

I have 12500+ emails stored as text files. The majority of these text files contain quoted text from previous emails which I need to remove for the subsequent analysis of the data.

(Example of a standard email text file:)

From: …
To: …
Subject line: …
Text body: …

(If the email has quoted text this sequence is repeated:)

From: …
To: …
Subject line: …
Text body: …
From: …
To: …
Subject line: …
Text body: …

I have a script to sieve through all the text files and count the amount of times the string ‘From:’ occurs and if it is more than once tells me that file needs to be abridged.

The script I need is to delete/remove/strip all text from the text file after the second occurrence of the string ‘From:’

Many thanks for anyone who can give an insight into this problem.
Ali

Member Avatar for Enalicho

Well, how much do you know about file manipulation? Your best bet is to -

1)Open a text file
2)Read and store all text until the pattern you want
3)Write a new file, containing the stored text

Be a part of the DaniWeb community

We're a friendly, industry-focused community of developers, IT pros, digital marketers, and technology enthusiasts meeting, networking, learning, and sharing knowledge.