Friday, August 12, 2005

from the obey-your-bayesian-filter dept.

Ninety percent of everything is crud.
-- Sturgeon's Law

Sturgeon was clueless. The real number is closer to 100%.
-- Anonymous Google employee

SILLYCON VALLEY -- With Yahoo rapidly expanding the size of its search engine database, Google has decided to take a different approach: shrinking the size of its universe by removing the crud that no sane person (marketing weasels excluded) ever wants to look at.

"Our name might be Google, but there's no reason to maintain a database with over a googol different scams, schemes, and various others species of shit," said a Google spokesperson. "Those yodeling idiots can have their billions of pages of penis-enlarging, mortgage-shrinking insanity."

Google has quietly launched a filter system, dubbed SturgeonAssassin, to eliminate cruddy websites from its database. Of course, the system remains in Alpha testing and can only be accessed through the undocumented "nocrud:" operator.

The idea is quite simple. Users can enable and disable different filters based on their personal preference, and SturgeonAssassin will adjust the page ranking of each site accordingly. The form looks something like this:

Filter sites that have the following properties:

[X] Domain has more than two hyphens (such as

[X] Contains a BLINK tag

[X] Has a "Best viewed with Internet Explorer" disclaimer

[X] Tries to set a third-party cookie from a shady advertising bureau

[X] Tries to set a cookie, period

[X] Repeatedly misuses the words "they're" and "their" or "its" and "it's"

[X] Has a "last updated" notice containing a date before 1997

[X] Includes, or links to, some kind of "mission statement"

[X] Has a high buzzword concentration (at least 1 buzzword per 25 words of text)

[X] Features "tips" about search engine optimization

[X] HTML title includes a phrase like "Title Goes Here" or "Adobe GoLive 4.0"

[X] Code has unnecessary FONT tags

[X] Code has unnecessary TABLE tags

[X] Webmaster obviously doesn't have a clue about the ALT attribute in image tags

[X] Links to a copyright or legal notice that contains more copy than the rest of the site combined

[X] Legal notice prohibits "linking" to the site

[X] URL ends with .htm

[X] Features images in .bmp format -- or worse, embedded in Word documents

[X] Launches pop-up ads using Flash applets designed specifically to bypass Firefox's pop-up blocker

[X] Launches pop-up ads, period

[X] Requires, or simply hints about requiring, user registration

[X] Page contains "tag soup" obviously produced by a Microsoft product

[X] Includes the phrase "As Seen On TV!"

[X] Features text-based ads from an advertising network other than Google

[X] Publishes fake news or sarcasm directed at Google's attempts at world domination

[X] Whois record contains obviously bogus contact info, such as "123 Fake Street"

[X] Attempts to disable the right-click context menu, hide the back button, or perform other nefarious tricks

[X] Warns that the site is best viewed at a weird resolution, such as "320x240" or "4096x3192"

[X] Has a serious lack of proper punctuation with run on sentences that continue on and on the webmaster is obviously some kind have clueless product of the american education system either or learned english by reading nothing but slashdot comments grammar is important

[X] Every link points to a domain that was registered within the last 3 hours

[X] Contains a high concentration of dollar signs (Perl programming sites excluded)

"This is the next logical step in the arms race against blighted websites," explained a random Google Ph.D. "Moreover, the planned Beta version of the filter (due in 2010) will push the envelope by automatically assuming that all websites are crap until proven otherwise. This approach will more closely match the perception of average Internet users. Less is more -- and we're not talking about command-line pipe buffering commands."

Yahoo has quickly downplayed Google's latest innovation. "Those eggheads are out of touch with cold reality. Most people secretly enjoy crap. How do you think Hollywood has managed to make so much money over the years? It's not because of their creative or artistic triumphs! Crap is king, and that's why Yahoo hopes to expand our index to include everything, ranging from the best of the best (the Yahoo homepage) to the absolute worst (Aunt Bertha's Day-By-Day Timeline Of Her Dead Cat Princess III). More is more!"

The staff of Humorix is also eyeing the new filter with suspicion. "We poke fun at Google, we have dubious HTML coding standards, and our grammar and spelling leafs much bee two desired. If this becomes a standard feature of Google, we're screwed!"

