Bozo Detection

Universal Feed Parser can parse feeds whether they are well-formed XML or not. However, since some applications may wish to reject or warn users about non-well-formed feeds, Universal Feed Parser sets the bozo bit when it detects that a feed is not well-formed. Thanks to Tim Bray for suggesting this terminology.

Detecting a non-well-formed feed

>>> d = feedparser.parse('http://feedparser.org/docs/examples/atom10.xml')
>>> d.bozo
0
>>> d = feedparser.parse('http://feedparser.org/tests/illformed/rss/aaa_illformed.xml')
>>> d.bozo
1
>>> d.bozo_exception
<xml.sax._exceptions.SAXParseException instance at 0x00BAAA08>
>>> exc = d.bozo_exception
>>> exc.getMessage()
"expected '>'\\n"
>>> exc.getLineNumber()
6

There are many reasons an XML document could be non-well-formed besides this example (incomplete end tags) See Character Encoding Detection for some other ways to trip the bozo bit.

Table Of Contents

Previous topic

Character Encoding Detection

Next topic

HTTP Features

This Page