Get started

Perspectives

6 Dec 2016

A plate of snow, and other images

Focal cropping, feature detection and cloud vision

Tom Dyson

Tom Dyson

Director, Torchbox

From its origins with the Royal College of Art, Wagtail has always enjoyed strong support for managing and displaying images. An early feature was editorial control over an image's 'focal point'; this means you can specify which area of a photo is cropped around, for example when a listing page displays thumbnails of photos in different aspect ratios to the originals.

Wagtail focal points large


A couple of releases later we added feature detection, which identifies features (typically faces) in photos when they're first uploaded. For sites with large numbers of images this can be a useful shortcut for those editors who don't relish the mildly therapeutic task of drawing rectangles round focal points.

Finding faces is a clever trick, but the most recent work on image management in Wagtail would have felt like science fiction a few years ago. Martin Sandström - from Swedish agency Fröjd - released wagtail-alt-generator, which uses Microsoft's computer vision service to insert image descriptions and tags at the point of upload. Here's a rapid-fire screencast showing it in action:

I select eight photos from my desktop and drag them onto Wagtail's bulk uploader. As each file is uploaded, it's assigned a description and tags based on the content of the image. If you watch carefully you'll spot that not all the descriptions are perfectly accurate: my dog Moscow isn't at all brown, either in real life or in this photo, and the sardine heads discarded from my triumphant tagliatelle con le sarde are weirdly identified as a 'plate covered in snow.' This will improve, of course, as Microsoft's training set grows to include more severed fish heads.

There are other computer-vision-as-a-service providers, although Microsoft's is the only one I've seen that generates natural language descriptions as well as tags. Martin has already started work on a pluggable-provider refactor which will make it easy for developers to swap between image recognition services as they compete on features, quality and cost. As computers chip away at mundane tasks like content tagging, humans can optimise their time crafting perfect sentences. When computers can do that too, we can focus on eating pasta and taking more dog photos.