ASCII, Dammit!

I just discovered a little chunk of code that made me laugh out loud. Seriously, a coworker came over to ask what was going on and I had to explain it. In case you're wondering too, the chunk of code is called AsciiDammit.py.

<!-- SUMMARY_END -->

One of the banes of a web programmer's existence is dealing with user input, which can come in almost any form. You have to allow for malicious hackers who try to crash your site using strangely-formatted text, sure, but you also have to allow for users who write text in Microsoft apps like Word, which can also crash your site.

Microsoft, for their own insidious reasons, uses non-standard codes to represent things like "smart quotes", short and long dashes, etc. These codes cause errors in other programs unless they are converted, so every web programmer who hasn't sold out to The Man has to scrub user input before doing anything else with it.

Bless Leonard Richardson! He wrote a little Python class that handles this chore (and perfectly captures my feelings about this whole chore) called "AsciiDammit.py". I love it! The class also has a function called htmlDammit(), that works one HTML with "smart quotes" in it. Genius!

P.S. If you're interesting in alternatives to Word, there were a couple good articles about cool alternatives the other day.

UPDATE (2008-2-6): I just figured out that the Leonard Richardson behind ASCII, Dammit! is the same Leonard Richardson who wrote RESTful Web Services and the most excellent HTML parsing lib BeautifulSoup, His site is chock full 'o interesting stuff, like a script that turns a ReST-formatted text into presentation slides. I wanna be like Leonard when I grow up. :)