A few weeks back, prolific anti-spam researcher John Graham-Cumming announced a new site, SpamOrHam.org, where you can donate your time to spam research. That's right, visitors to the site can spend a few minutes (or hours, or weeks, as you deem fit) looking at messages one at a time, and judging whether they think each is a spam or a ham (legitimate, non-spam) message, thereby helping The Cause. This is important because (a) it helps improve the quality of the collection, and (b) coincidentally, it helps build a benchmark for how good or bad actual humans really are at making this judgment. The preliminary results are already interesting.
(Some paranoid visitors might be slightly disturbed by the repeated display of CAPTCHA images, but maybe it's just a glitch. Pay no attention to the man with the British accent behind the curtain. Actually, there's apparently a good reason for those.)
The actual messages which you are being asked to classify are from the Enron corpus. Enron Corporation, if you recall, was briefly the biggest bankruptcy of all time (until it was trumped by WorldCom) — incidentally, the sentences for the former CEOs were announced just last week. Anyway, as part of the Enron investigations, the US Federal Energy Regulatory Commission seized Enron's email records and released them to the public in 2003. Although the database is free for anyone to download and browse, the interface at SpamOrHam actually makes it much easier (for one thing, you don't have to download several hundred megabytes of compressed data to your local hard drive).
Now if you dive in and start clicking, you will find that many spammers have not changed their basic modus operandi at all during the last five years — it's the same Viagra and stock spams that we are seeing today (though they may have evolved their obfuscation techniques a little bit). Also, there are surprisingly many phishing messages already from 2001.
What's more, amongst all the spam and business email, there is a healthy dose of office flirtation, class reunion planning, and various family-related email. This is probably highly typical of organizations which do not have a strict policy on separating business and private email.
The moral? Maybe you want to keep your work email separate from your private communications, unless you want to risk having them exposed in a similar investigation one day. (A subpoena is a subpoena even if you're innocent!)