In my day job, I have access to a metric shitload of data of SEO campaigns. We’re talking thousands of websites here (I oversee campaigns, rather than get directly involve in them now), from a European budget airline, to small businesses, to the usual spammy subjects like payday loans and stuff. As such, these are factors which I have seen deindex sites. These are factors which I am pretty certain cause deindexation – the complete removal from Google.
Is this a complete list? No. Is this all the things Google says causes deindexation? No, because certain things that Google says that it penalises I have been doing for years and I have been fine (paid links? Fine.). Likewise certain things that Google don’t say too much about are pretty dangerous. Anyway, in some sort of order is things which can quite royally screw up your site.
1. Malware – I’ve yet to see anything systematically ruin your site than if you get hacked by Malware – dodgy software hosted on your site, either deliberately or hacked. I’ve seen sites disappear from the rankings in about a week, usually with a day or two warning.

About to go down (no pun intended) *
How To Find This: Sign up with Google Webmaster Tools – you will get an advanced warning on this in the dashboard. Furthermore, make sure you have common email addresses set up (see Google Notices), and you’ll receive an email notification.
How To Fix This: Log in, remove the Malware from the site, change FTP passwords, and breathe easier.
2. Noindex/Blocking The Site – Often, sites during their development stage block search engines, either using the robots.txt or sticking a “noindex” tag in the header (or – if you’re creating a WordPress Blog – checking the “we don’t want our blog appearing in search engines” box). However, many webmasters don’t remove this when the site goes live. Please check this.
How To Find This: Again, there will be a subtler notification from Google Webmaster Tools. If you have an XML Sitemap (and you should have, use Yoast’s WordPress SEO Plugin to create one), it should show that your XML Sitemap is blocked by robots.
How To Fix This: Remove Noindex. Check your Robots.txt. Throw the web team in a river.
3. Duplicate Content – I came accross a site that was largely duplicate content. It’s not impossible to rank for it (and dupe content serves a purpose – such as engaging and useful content: think of technical specifications of the iPad, or last week’s football results as “dupe content”), but it makes your life hard. Dupe content that is accidental (WordPress in particular with it’s post replications – the same post on the category based URL structure and the date based URL structure) isn’t so bad, but cross domain duplicate content can really screw up your rankings. Scrapers aren’t really a huge issue, what I’m more concerned about is any strong links (such as from Digg) outranking your content. My advice is that whenever you submit a link to Digg or any indexable page on a social network, don’t use the first line of your post. Meta Descriptions (if different from the first line of your post) are fine though.
How To Find This: Using a tool such as Copyscape, see if there are any duplications of your text that are flagged up.

Cheeky Sod.....
How To Fix This: Try to rewrite content that you are able to. Don’t worry too much about scrapers, embed an RSS footer – the Yoast’s WordPress SEO Plugin can do this – to get links embedded in scraper sites. Google will usually treat you as the authoritative source of the article.
4. Spam Links ON Your Site – Spammy links pointing to your site doesn’t really do that much damage (despite what Google says), otherwise it’d be easy and cost effective to sabotage a competitiors site by just pushing thousands of dodgy links to it, however hundreds of spammy links on your site pointing to other websites can lead to deindexation (though it can take an attack of a few months to cause this). A few blog comments that slipped through the net isn’t going to cause you issues, but keep an eye on forums, particularly unmoderated ones. These are prime targets.
How To Find This: Ironically a great way to spot an attack is if you suddenly see an influx of traffic in Google Analytics for a variety of phrases – many of which are unrelated to your site. Here is an example of the sort of increase in search traffic before getting hit.

Ooooh shit......
How To Fix This: In Google Analytics, find what pages are causing the issue and close the exploit. It’s usually forums or comment forms.
Anyway, it’s not the most comprehensive list, but sometimes if you think “oh my god! My site’s tanked and it’s because of getting XYZ!”, it rarely is and could be something more fundamental. It’s far easier to screw up a site from the inside than from the outside. With that said, it’s easier to fix it.