fix-html seems to work pretty well for cleaning it up.
This software is owned by The university of California, Irvine, and is not distributed under any version of the GPL. GPL is a fine series of licenses, but the owners of the software need it to be distributed under these terms.
You'll need the BeautifulSoup python module on your python module path for fix-html to work.
Another option is tidy
You can e-mail the author with questions or comments: