Staticfree Blog

Back to the main blog

Sun, 14 Dec 2003

Filter chains

In thinking of how news propagates through an information network, Dyfrgi and I came up with the (perhaps not original, but new to us) idea of distributed, pluggable RSS filter modules. Somewhat like what Localfeeds is, except with optional control lines. I like to think of them in the way that Galan thinks of LADSPA plugins: a network of connected modules with separate control lines.


You could run a site that has one of these filters that does Bayesian filtering. The filter is presented with an OPML file of your feed collection (or simply the output feed of the plugin below if you're chaining them), the feed reads the file, buffers the contents according to your refresh rules, filters, and generates an output feed. The output feed has embedded XML-RPC discovery information to allow the upstream aggregator to control the plugins.

Requirements

  • XML-RPC control API - there needs to be user authentication (who is allowed to issue the XML-RPC commands?), OPML or feed subscription management (add, list, delete)
  • XML-RPC API discovery [I know there's a protocol out there for it]
  • Module construction: one or more input feeds, one output feed, one control line

Example

I'm putting HTTP GET-style queries where there could be XML-RPC. It doesn't quite matter what the transport of the control is, provided that it's agreed upon by filter/aggregator. The user/pass could be replaced by a standard sessionid model.

Register

Something like this, for an OPML feed collection:

http://example.com/filters/bayesian?register=http://foo.bar.cow/my/feeds.opml;user=bar;pass=foo1234

Or like this, for a single feed:

http://example.com/filters/bayesian?register=http://news-site.com/articles.rss;user=bar;pass=foo1234;refresh=30

[user waits for system to cache the filtered articles]
User's aggregator queries the filter:

http://example.com/filters/bayesian?getfeed=http://foo.bar.cow/my/feeds.opml;user=bar;pass=foo1234

User interacts with the filter through API passed in the feed

http://example.com/filters/bayesian?update=http://foo.bar.cow/my/feeds.opml;user=[...];articleid=123;score=+2

Pull variation

Dyfrgi noted that there are two models this can take:

13:12 <@dyfrgi> It can either a) filter when an aggregator requests a feed, calling down all the way to the final element, which needs to fetch the actual RSS, or b) have something polling the RSS regularly and cache the filtered representation for later dispersal.

Localfeeds takes the b) model, while a chain of filters might be better suited taking the a) model. With the a) model, the user's aggregator polls the output feed of the filter which then polls and filters the input RSS feed. This would cause any chaining of filters to all be done at the same time, without any need to worry about delays due to caching of the filtered feed.

Authentication

One other problem is authentication chaining. You don't want every filter to know the credentials for each filter down the line. One way around this is to simply make it common for a filter to give an output RSS feed based on username, but require the password for any manipulation of it.

I searched for related ideas and found this, Content Pipeline which has some good ideas of potential filters.

trackback enabled

Comments

Re: Filter chains

@ Thu, 25 Mar 2004 02:41

hey thats pretty cool