Symphonious

Living in a state of accord.

HTML Diff Tools?

Anyone know of good HTML diff tools that actually work well? For that matter, does anyone know of any diff tools that work well with plain text that isn't line based, eg: the standard type of text you'd find in a blog entry or a book? I'm guessing a combination of word based and line based diffing might work okay for that type of thing but I haven't come across much that actually tries to deal with the problem. At least, not in a way that aims to provide a meaningful result for humans rather than just a form of compression for updates.

I can probably wrangle the HTML side of it well enough if I had an existing tool that could take the plain text and provide meaningful differences.

Category: General
  • Leon Brooks says:

    If you’re looking for differences in the words rather than differences in the HTML structure or the like, you could try this:

    rm -f file1.txt
    rm -f file2.txt
    for word in echo $(lynx -dump $URL1); do echo $word >>file1.txt; done
    for word in echo $(lynx -dump $URL2); do echo $word >>file2.txt; done
    diff -dc file1 file2

    October 11, 2005 at 1:45 am
  • Adrian Sutton says:

    Yeah that’s not a bad way of doing it but would give weird results when changes match up across paragraphs. Possibly doing that on a per-paragraph basis would work so that you see the changes in each paragraph separately. I guess I’ll have to play around and see what works best.

    October 11, 2005 at 6:24 am
  • Byron Ellacott says:

    http://www.logilab.org/projects/xmldiff

    It has an HTML mode, but probably uses libxml2 under the hood, and libxml2′s HTML parser is extremely unforgiving. Passing HTML through tidy with the -asxml flag usually produces usable results for me, even with the most badly mangled of HTML.

    I haven’t used xmldiff, but I have been pondering differencing on XML files as a general case.

    October 11, 2005 at 8:53 am
  • Adrian Sutton says:

    I should have noted, the HTML in this case is coming out of EditLive! for Java so it’s perfectly well formed XHTML and I can depend on that. The real problem with xmldiff though is that it doesn’t give anything close to human readable output – it’s effectively designed for diff/patch style things not highlighting to users what changes occurred. Looks quite useful though.

    October 11, 2005 at 9:01 am
  • Ben Finney says:

    The ‘wdiff’ program (in the Debian package ‘wdiff’) is a front-end to GNU ‘diff’ that compares word sequences, not lines. That may be closer to something you can use.

    October 11, 2005 at 12:49 pm
  • james says:

    Found at a good tool at http://www.jamesdom.com . It diffs by element and even renders the source.

    December 30, 2006 at 12:26 pm
  • Guy Van den Broeck says:

    I’ll add to the blog spam and point you to my HTML differ at http://code.google.com/p/daisydiff/ .
    It should work better than other existing solutions and it’s completely free!

    October 14, 2007 at 12:26 am
  • Rob Dawson says:

    Ironically this came up as number 3 when doing a search for html diff tools in 2009 :(…. (Ironic because I’m looking for a java html diff tool for java, and I work at Ephox currently :))

    January 8, 2009 at 10:05 am

Your email address will not be published. Required fields are marked *

*