There's a "wealth" of bad html out there on the web.

fix-html seems to work pretty well for cleaning it up.

This software is owned by The university of California, Irvine - not any version of the GPL. GPL is a fine series of licenses, but the owners of the software need it to be distributed under these terms.

You'll need the BeautifulSoup python module on your python module path for fix-html to work.

Another option is tidy




Back to Dan's tech tidbits