Fun with stats, or, what’s offline about you.
August 22nd, 2006 - Fred StutzmanFun little stats report from the new ClaimID link status checker!
- We checked 3% of all links in the ClaimID DB
- Of those links, 34 were found to be offline/erroring/etc. That’s 4.5% of the checked set.
- Of those 34, 11 were found to be false-positive after the fact, lowering the percentage of offline links to 3% of the checked set. The margin of error on the entire set (all claimID links) was +/- 3.5, so we’ll have to get a bigger sample to further generalize.
- Of the false positives, the majority were from two sites (Amazon and IMDB) that simply don’t like the way our status checking monkey operates.
Honestly, I’m pretty pleased with these results. The status checker was tested on a fairly robust training set, but I was still worried about what was going to happen when we rolled live. We made a bunch of behind-the-scenes tweaks as over the past few days, and to have an overall false-positive rate of 1.5 really isn’t that bad. If we factor in the two sites that don’t like us (we can account for them in code), our false positive rate is way under 1 percent. Not bad at all.
Anyway, we’re going to be working to make this functionality continuously smarter over the next few days and weeks. We’re also going to roll in some new feature requests and tweaks that have come from our user feedback lists, and one especially awesome request from Lyceum Architect John Joseph Bachir.
