
Raspberry Pi Security Cam From Scratch: Attempting Facial Detection and Performance Bottlenecking [Part 4]

In this part, I will attempt to add a facial detection function in my security camera. Because I notice that there exist a real time facial detection library online. The end result for now is not satisfied by any means. I will try to improve this in the future.

Installing OpenCV

Install Python 3.5 for better compatibility.

Please be aware with the version of OpenCV. I installed the OpenCV 4 in Raspberry in July 2020. In 2020, there are already enormous outdated tutorials on installing OpenCV on Pi. I installed the OpenCV by following the tutorial of LearnOpenCV. At the end of the article, you can even get a script when you subscribe to their newsletter. However, I still had two errors that I need to google around for the solution. It was about downgrading some libraries or re-install a library.

Facial Detection V1

This is my first time experiencing the facial detection. This time, I try to comprehend it by gluing some code online. Where 07-28-2020-12-28-26.h264 is a 30 second, 2MB bitrate video file from the security camera. The shape_predictor_68_face_landmarks.dat is a 100MB training file that I got from a repository.

import cv2
import dlib
import time
from datetime import datetime

# Load the detector
detector = dlib.get_frontal_face_detector()

# Load the predictor
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")

# read the image
cap = cv2.VideoCapture('07-28-2020-12-28-26.h264')

count = 0

status = False

success_time =

TIME_FORMAT = "%m-%d-%Y-%H-%M-%S.h264"

frame_passed = 0

fps = 30

skip_time = 5 * fps

skip_frame = 0

frame_count = 0

while True:
    _, frame =

    if status:
        if frame_passed == skip_time:
            frame_passed = 0
            status = False
            frame_count = 0
            frame_passed +=1
        # Convert image into grayscale

        if frame_count == skip_frame:

            gray = cv2.cvtColor(src=frame, code=cv2.COLOR_BGR2GRAY)

        # Use detector to find landmarks
            faces = detector(gray)

            for face in faces:
                x1 = face.left()  # left point
                y1 =  # top point
                x2 = face.right()  # right point
                y2 = face.bottom()  # bottom point

            # Create landmark object
                landmarks = predictor(image=gray, box=face)

            # Loop through all the points
                for n in range(0, 68):
                    x = landmarks.part(n).x
                    y = landmarks.part(n).y

                # Draw a circle
      , center=(x, y), radius=3, color=(0, 255, 0), thickness=-1)

                cv2.imwrite("frame%d.jpg" % count, frame)
                count += 1
                status = True
                success_time =
            frame_count = 0
            frame_count += 1

# When everything done, release the video capture and video write objects

# Close all windows

The code reads frames and processing a facial detention on the frame. By reading all frames, I need 10 minutes computation time, which is not acceptable for the real time computation.

I did a quick improvement by skipping some frames and skipping some frames after finding one success detection.

When skipping 30 frames, which is skipping every second, I got a 25% hit rate for a success detection in a 30 seconds video. Even with the 30 FPS skipping, the computation time is still over a minute. It is not usable for a security camera.

Multithreading Optimization

To improve computation time, I can utilize parallel computing. When inspecting top, you can see the Pi only use one single process that utilize 100% of CPU. When you utilizing parallel computing, there will be multiple process that utilize 100% CPU. This is my attempt on optimization the performance.

from multiprocessing import Pool, TimeoutError
import time
import os
import cv2
import dlib
from datetime import datetime

def f(frame_i):
    gray = cv2.cvtColor(src=frame_i, code=cv2.COLOR_BGR2GRAY)

    # Use detector to find landmarks
    faces = detector(gray)

    if len(faces) > 0:
        cv2.imwrite("frame%s.jpg" %"%H:%M:%S"), frame_i)

    #for face in faces:
    #    x1 = face.left()  # left point
    #    y1 =  # top point
    #    x2 = face.right()  # right point
    #    y2 = face.bottom()  # bottom point

     #   # Create landmark object
     #   landmarks = predictor(image=gray, box=face)

        # Loop through all the points
#        for n in range(0, 68):
 #           x = landmarks.part(n).x
  #          y = landmarks.part(n).y

        # Draw a circle
   #, center=(x, y), radius=3, color=(0, 255, 0), thickness=-1)

    #    cv2.imwrite("frame%s.jpg" %"%H:%M:%S"), frame_i)
     #   print("Success")

count = 0

if __name__ == '__main__':

    # Load the detector
    detector = dlib.get_frontal_face_detector()

    # Load the predictor
    predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")

    # read the image
    cap = cv2.VideoCapture('07-28-2020-12-28-26.h264')

    TIME_FORMAT = "%m-%d-%Y-%H-%M-%S.h264"
    # start 4 worker processes
    image_list = []

    reading_time =

    while True:
        ret, frame =
        if ret:
            if count == SKIP_FRAME:
                count = 0
                count = count +1

    reading_end = - reading_time

    print("begin mutilprocessing")
    start =

    with Pool(processes=4) as pool:, image_list)
    diff = - start

    print("face detection time:")
    print("total time:")

The performance of skipping every 30FPS is about 2x faster than the previous code. However, there are some problems with this code.

  • The reading time consumed 27 second for a 30 second video. I need to parallelize the reading time.
  • Currently, I save all reading frame to the memory buffer. The memory becomes an issue when loading larger video. I even can not load all frames from this 30 second video.

After I systematic learn the facial detection, I will optimize this code.