2493 items (0 unread) in 19 feeds
Just in time for the holidays, here's another post about our statistics, and this time we'll describe how we deal with metrics issues, how we think we can improve the kinds of statistics we provide, and admit that despite all this number crunching, we still don't know how many dribs are in a drab (but we know that the answer involves Planck's constant).
With over 500,000 feeds now managed, we deal with statistics anomalies like spiked/tanked subscriber counts, podcast counts, and click counts on a weekly, if not daily basis. Some of these are larger issues than others, obviously. We're sure that the good people at ComScore, HitWise, and other CamelCase-named statistics companies would agree that there are always issues and anomalies popping up that have to be beaten back with gusto like so many zombies in Dawn (or Shawn) of the Dead.
The goal we always set for ourselves is to try to maintain apples-to-apples comparisons across all types of counting and aggregator/client treatment. In other words, we try to say that regardless of what bucket some metric goes in, it should always result in the ability to look at a couple different pieces of the data (feeds, aggregators, podcatchers, etc.) and say "these make sense relative to one another." You set up some heuristics and algorithms that you then try to apply those as universally as possible and take your lumps. It's like the never-ending "uniques" debate that the web stats community has — you try to plant some stakes in the ground that get you to reasonable conclusions when you consider all the data, and then jump off the next bridge when you come to it.
Some of the metrics issues that we are continually addressing include:
Across the board, we're seeing more and more distinct kinds of user-agents requesting feeds. Here's a quick chart of the growth in unique user-agents we've seen polling feeds just in the last six months.

Caveat Emptor: These chart numbers don't include user-agents with spammy identifiers that are obviously just long random strings, and hundreds of agents like "Shmucky-bot/1.0" and "Shmucky-bot/2.0" are only counted as one distinct user-agent. All of this data excludes the millions of requests a day we capture from clients with completely blank identifiers. Still, you can see the current count is well over 8,000 different kinds of feed reading entities. Everything from aggregators and search crawlers to thousands of mobile feed readers, hundreds of podcatchers, loads of language specific agents, specialty browser toolbars and more.
One of the questions we bounce around here is "what can we do to help people get more information about their statistics in order to better understand how their content is being distributed?" (although we don't speak to ourselves so eloquently). There are a few things we're always working on in this department: