Computer Vision: Cropping Faces From Images Using OpenCV2

[caption id="attachment_99" align="aligncenter" width="230"] beatles.jpg[/caption]

This text stands as a short introduction to face detection using OpenCV's Python libraries.

Who This is For

This is for beginners like myself.

Why I'm Doing This

Alright, so I love artificial intelligence, but I don't have practical knowledge of the subject, really -- at least, not in a large sense.  This brings me to a project I've been working on involving computer vision.

I thought it'd be cool to have a web application that takes images of people, finds the faces in the images, crops the faces, then saves each face in separate files.  I think it could make way for image recognition in social networking applications.

Face-associative information could greatly personalize the user experience by pairing interests with physical identity rather than words.  For example: Let's say you have a service like MySocialCloud, which gives users the ability to effortlessly showcase their websurfing content.  What would happen if you put a face to that?  Well, nothing really all that different from using a textual name.  HOWEVER -- if you were to fuse your service with some other service that has faces attached to interests, you could modify a user's space by what you find associated with their face in your partnering service.  Then, the user could have a more personalized experience.

So let's say a MySocialCloud user links their cloud to account to their Facebook account.  Then, let's have the user tagged in a number of photos in which there's clearly some sort of activity going on -- let's say soccer.  MySocialCloud could pair the activity with the user through face recognition.  Now, the MySocialCloud user identifies with soccer without having to tell the MySocialCloud backend explicitly that soccer is of interest.  More importantly, MySocialCloud could then make content suggestions based on insightful material -- images.  AND, the user has a greater chance of being exposed to more aspects of living, which is something I greatly favor.

Maybe that idea isn't the most practical or rational, but it's something along the lines of what I have in mind.  I even thought of a fun way to track people over time by their faces by crowdsourcing (actually, a friend of mine thought of the crowdsourcing part; I just thought of putting a location to that).   Unfortunately, I'm not going to explain that here.

Getting Right to It

Here's the code you'll end up with:

'''  
facechop.py

-Takes an image and detects a face in it.  
-For each face, an image file is generated
    -the images are strictly of the faces
'''

import cv2

def facechop(image):  
    facedata = "haarcascade_frontalface_default.xml"
    cascade = cv2.CascadeClassifier(facedata)

    img = cv2.imread(image)

    minisize = (img.shape[1],img.shape[0])
    miniframe = cv2.resize(img, minisize)

    faces = cascade.detectMultiScale(miniframe)

    for f in faces:
        x, y, w, h = [ v for v in f ]
        cv2.rectangle(img, (x,y), (x+w,y+h), (255,255,255))

        sub_face = img[y:y+h, x:x+w]
        face_file_name = "faces/face_" + str(y) + ".jpg"
        cv2.imwrite(face_file_name, sub_face)

    cv2.imshow(image, img)

    return

if __name__  '__main__':  
    facechop("beatles.jpg")

    while(True):
        key = cv2.waitKey(20)
        if key in [27, ord('Q'), ord('q')]: 
            break

What This Does

This code will take a file named beatles.jpg and cut out (crop) the faces from it, saving each face in separate files.

The Break-Down

**Important: You need to have installed OpenCV 2.  To do so, go to WillowGarage for install details: http://opencv.willowgarage.com/wiki/InstallGuide

First, you need to worry about "training data" and where to put the image data.  Training data is the information that's referred to by your bot/script/program as an example of what to classify an environment or other agents by.  In this case, we want data that will show our application what a face looks like so that it'll be able to pick faces out of an image.  Therefore, in our code goes:

    facedata = "haarcascade_frontalface_default.xml"  
    cascade = cv2.CascadeClassifier(facedata)

The face data is the training data, which is found in a file provided with the OpenCV library called haarcascadefrontalfacedefault.xml.  This file can be found in the /data/haarcascades/ directory of your OpenCV install.  I suggest you copy it from there and paste it where your script will run.  For the record, the data was generated for Haar detection.

A  cascade is a "waterfall" of information in this case -- it's just a bunch of info.  See the etymology of the word "cascade."  I'm actually not even sure that's a good name for it.  I'm sure "classifiers" suffices to describe what training data is.  But hey.

Then we need to store the image somewhere:

img = cv2.imread(image)

An image is, essentially, a matrix of colors.  Each pixel is a color space which contains a color spread from the RGB scale.  So each image is treated as a 2-dimensional array that looks sorta like this (in concept):

$$IMAGE =\begin{bmatrix}0 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 0\\0 & 0 & 0 & 0 & 0 & 0\\\end{bmatrix}$$

So above, you'd have zero pixel information for a 6x6 image.  Now imagine this in excess and you'll have pictured an image in its simplest form.  When you think of the math involved, you just really need to think of how each pixel is being manipulated.  Of course, you won't do arithmetic for each pixel.  You'll apply some matrix algebra to your images for such manipulations.

Luckily, for this tutorial, you don't need to worry about math because the math is done for you by OpenCV.  Calculating and locating vertices is part of the "magic" in the library.  To perform the operations we need to extra faces from our images, we'll be treating the image as if it's a 2D array, so smething like img[2,4] would access the pixel at x = 2 and y = 4 in the image matrix.

Simple enough? Yes.

Now we want to place the faces inside something.  In our case, we want a rectangle that contains the detected face.  In order to do that, we resize our focus based on the area of the face.  This is therefore a miniframe to the big frame (the original image).

minisize = (img.shape[1],img.shape[0])  
miniframe = cv2.resize(img, minisize)

We then need to sniff out the smaller frame, which is in this case the detected face.  Each detected face will be stored somewhere like so:

faces = cascade.detectMultiScale(miniframe)

Cropping Each Face

Now we want to crop each face.  For this, we need to do 2 things:

  1. Detect the vertices.
  2. Detect the size of the rectangle in which the face lies.
The vertices of a rectangle are just the corners.  Each vertex has a coordinate.  We store the vertices inside a list so that we can access each one to crop them.
 for f in faces:  
        x, y, w, h = [ v for v in f ]
Once we have each vertex, we can draw a triangle.  I decided to go with the color RGB(255,255,255), wherein 255 is the maximum value for each R-G-B respectively, which indicates all colors in the visible spectrum: white.  Hence, the absolutely clear statement:
cv2.rectangle(img, (x,y), (x+w,y+h), (255,255,255))
Now, we want to find something dubbed the "region of interest," or ROI for short.  That's just a region within a region -- in this case, the bigger region is the entire image and the region within is a face.  For this, we will use the vertices we detected previously.  The notation is fairly similar.  If you know bsic geometry, you can probably visualize the lines you're making in the following line of code:
sub_face = img[y:y+h, x:x+w]
I called it sub_face to indicate that we're defining a sub-region which happens to be a detected face. Saving Each Face in Separate Files Now we have to save each face.  Remember that each iteration through a loop, a face is being focused on and saved in our ROI variable sub_face. Now we need a file name for each face.  Unique names.  Okay, this is something that's totally up to you.  I decided that the easiest way to make sure we don't have any naming disputes is by using one of the vertices detected.  I came to this conclusion thinking that each vertex will be unique inherently because each face is being detected at different places within the bigger map.  Of course, there is the possibility of there being an image that has a vertex which lands directly on top of another vertex, but what are the odds?  Of course, in more demanding situations, I'd take precaution and prepare for a situation like that. But I'm still a programmer, so I'm lazy.  For the purpose of this tutorial, no, I will not think of a new file-naming convention.  Thus:
face_file_name = "faces/face_" + str(y) + ".jpg"
Voila!  Now we have a makeshift naming pattern.  Cool.  But in case you're wondering how your Beatles.jpg file will display, here: [caption id="attachment_121" align="aligncenter" width="384"] beatles.jpg output[/caption]

After we've created the filename for each face, we need to save each face in their own files.  This is easy!  No math involved!  Just write the image:

cv2.imwrite(face_file_name, sub_face)

Fairly easy, right?  Right.

The following is optional.  We don't need to show the image if we don't want to.  I just figure it's nice to see what your program's detected.  Of course, this has no significance as the back-end of something:

   cv2.imshow(image, img)

Now, we just need to run our script.  There isn't much to explain.  It's fundamental Python:

if __name__  '__main__':  
    facechop("beatles.jpg")

    while(True):
        key = cv2.waitKey(20)
        if key in [27, ord('Q'), ord('q')]:
            break

Remember that you need to put the full path of the image in facechop() for this to work properly.  Once you're done, run the code and you should get images like this:

[caption id="attachment124" align="aligncenter" width="89"] face90.jpg[/caption]

[caption id="attachment125" align="aligncenter" width="88"]face_99.jpg face99.jpg[/caption]

[caption id="attachment126" align="aligncenter" width="96"]face_234.jpg face234.jpg[/caption]

[caption id="attachment127" align="aligncenter" width="95"]face_252.jpg face252.jpg[/caption]

What I'm Doing With This

This program up there isn't useful alone.  It doesn't do anything.  So what I decided to do is apply it as a Django application.  A user will be able to use an ImageField to designate a picture for the script to process.  The script will detect faces in the image and save each face to a folder.  After that, the pictures can be used for identification.  I assume I can make it so that it crops the first face detected, asks the user if that's them (because it might be a group picture), and if it is, it stores the face in a database with other user information.  Then regular facial recognition algorithms could be ran over the stored faces for whatever.  A person could use their face as somewhat of an OpenID.  Oh my God!  IDEA!  haha

Next tutorial should be on saving user information in databases using Django.  There might be some stuff in between that.  Been trying to become competent in the area of data structures and algorithms.  Scared of coding interviews.  I'd hate to mess up on a question about linked lists or hash tables.  Stay tuned!  Thanks for reading!

Useful Links

http://docs.opencv.org/modules/highgui/doc/user_interface.html - User Interface stuff for OpenCV

http://www.cse.unr.edu/~bebis/CS474/Lectures/GeometricTransformations.ppt  - Image Processing Fundamentals (Powerpoint)

 

Greg

Software Engineer

Subscribe to GregBlogs