2012-03-21

Batch Straightening/Deskewing and Autocropping Scanned Images

Recently, I had the need to scan over 300 hand drawn "line art" pictures.

Vuescan's 'autocrop' is beyond worthless: it lacks the capability to properly crop where it should, resulting in me just scanning at 'maximum' and ending up with a mass of images flanked by black on all sides... and many of them, just slightly off center (some, more so than others, very noticeably skewed.)

For such a common task (take [x] amount of images, automatically do [y] and [z] to them), there seems to be precious little out there that'll do what I needed.

In my searching, Adobe Photoshop is fabled to have a batch 'Straighten and Crop' function. I don't have Photoshop, so I don't know if it does.

I DO have gimp, though, and got to the point of stopping just shy of copying a random script off the Internet into my plugins directory to try.

Prior to that, I verified:
- Irfanview couldn't do this in a batch,
- nor could Xnview,
- neither could Paintshop Pro 9
and based on google searches, both
- Windows Live Gallery,
- and Paint.net can't do it, either.

Many programs, of course, make it somewhat easy to select a given color, invert selection, crop to selection, and then slightly adjust the rotation. None (unless Photoshop's hype is correct) actually allow this to be done in a batch.

That brings us to ImageMagick (and not, sadly, its supposed faster cousin, GraphicsMagick, which vulgarly lacks its own deskew option.)

ImageMagick will happily let you batch what you want, and comes with the facility to autocrop and deskew.

Like all good opensource apps, though, ImageMagick's documentation leaves much to be desired: I never did figure out what the various percentages to deskew do (from a few tests, I couldn't see any difference between 1%, 40% (default), and 100%), or how to make the crop option for deskew actually work the way I wanted it to (ie, -set option:deskew:auto-crop [x]).

Also, in an effort to be as lazy as possible, I tried Fred Weinhaus' 'unrotate' script. Unfortunately, it was worse than junk; even after filling the black borders to be white, it still did the 'wrong thing' 100% of the time. Perhaps it's a version/configuration issue --- whatever the case, had it worked, it'd have been exactly what I needed. Since it didn't, I ended up with the following, which worked spectacularly well for me:

for a in *.tif; 
do 
convert $a -set filename:f "%t" -background black -fuzz 75% -deskew 50% -trim +repage out/%[filename:f]_cropped.png;
done;

Some notes for my fellow IM noobs: The -background black is to fill the areas created with the -deskew option. By default, they're white.  My initial (very naive) solution was to do this:

-fill black -floodfill +0+0 white -floodfill +0+90% white -floodfill +90%+1% white -floodfill +90%+98% white

which worked for all but 6 images in a batch of 142.  The -fuzz 75% is for trim, and because the 'black' from around the image's sides is more of a 'mostly black with random coloured pixels thrown in'.  I'm not positive the +repage is needed (again, ImageMagick's documentation is... lacking) but it seems like a good idea to make sure the canvas is resized when the image is (?).  Also, for half my images, I needed to change from portrait to landscape orientation, which was easily accomplished with: -rotate 90.

Further note: mogrify, on my setup, seems to always ignore -path.  After the batching of deskew/trim, I needed to up the DPI for publication: (tiff pixel reduction gave me 150dpi from 300dpi scans) mogrify -path jpg/ -units PixelsPerInch -resample 300 -format jpg *.png -> still results in the jpegs in the same directory.

Reference:
GIMP Plugin Registry - Deskew - Batch script in comments. (No auto-crop) - Didn't try this...
Fred Weinhaus' Unrotate Script - Tried this, couldn't get it to produce worthwhile results with any input I gave it.
ImageMagick Documentation - Some of the commandline parameters even work!
Scan Tailor - A neat, opensource program I ran across searching for a solution; it has an 'automatic batch mode', but unfortunately, its automatic deskew/content selection isn't really there yet. However, it is very useful for tweaking the few images too skewed for IM to properly handle.