Raspberry Pi Security Cam From Scratch: Attempting Facial Detection and Performance Bottlenecking [Part 4]
In this part, I will attempt to add a facial detection function in my security camera. Because I notice that there exist a real time facial detection library online. The end result for now is not satisfied by any means. I will try to improve this in the future.
Installing OpenCV
Please be aware with the version of OpenCV. I installed the OpenCV 4 in Raspberry in July 2020. In 2020, there are already enormous outdated tutorials on installing OpenCV on Pi. I installed the OpenCV by following the tutorial of LearnOpenCV. At the end of the article, you can even get a script when you subscribe to their newsletter. However, I still had two errors that I need to google around for the solution. It was about downgrading some libraries or re-install a library.
Facial Detection V1
This is my first time experiencing the facial detection. This time, I try to comprehend it by gluing some code online. Where 07-28-2020-12-28-26.h264
is a 30 second, 2MB bitrate video file from the security camera. The shape_predictor_68_face_landmarks.dat
is a 100MB training file that I got from a repository.
import cv2
import dlib
import time
from datetime import datetime
# Load the detector
detector = dlib.get_frontal_face_detector()
# Load the predictor
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
# read the image
cap = cv2.VideoCapture('07-28-2020-12-28-26.h264')
count = 0
status = False
success_time = datetime.now()
TIME_FORMAT = "%m-%d-%Y-%H-%M-%S.h264"
frame_passed = 0
fps = 30
skip_time = 5 * fps
skip_frame = 0
frame_count = 0
while True:
_, frame = cap.read()
if status:
if frame_passed == skip_time:
frame_passed = 0
status = False
frame_count = 0
else:
frame_passed +=1
else:
# Convert image into grayscale
if frame_count == skip_frame:
gray = cv2.cvtColor(src=frame, code=cv2.COLOR_BGR2GRAY)
# Use detector to find landmarks
faces = detector(gray)
for face in faces:
x1 = face.left() # left point
y1 = face.top() # top point
x2 = face.right() # right point
y2 = face.bottom() # bottom point
# Create landmark object
landmarks = predictor(image=gray, box=face)
# Loop through all the points
for n in range(0, 68):
x = landmarks.part(n).x
y = landmarks.part(n).y
# Draw a circle
cv2.circle(img=frame, center=(x, y), radius=3, color=(0, 255, 0), thickness=-1)
cv2.imwrite("frame%d.jpg" % count, frame)
count += 1
status = True
success_time = datetime.now()
print("Success")
frame_count = 0
else:
frame_count += 1
# When everything done, release the video capture and video write objects
cap.release()
# Close all windows
cv2.destroyAllWindows()
The code reads frames and processing a facial detention on the frame. By reading all frames, I need 10 minutes computation time, which is not acceptable for the real time computation.
I did a quick improvement by skipping some frames and skipping some frames after finding one success detection.
When skipping 30 frames, which is skipping every second, I got a 25% hit rate for a success detection in a 30 seconds video. Even with the 30 FPS skipping, the computation time is still over a minute. It is not usable for a security camera.
Multithreading Optimization
To improve computation time, I can utilize parallel computing. When inspecting top
, you can see the Pi only use one single process that utilize 100% of CPU. When you utilizing parallel computing, there will be multiple process that utilize 100% CPU. This is my attempt on optimization the performance.
from multiprocessing import Pool, TimeoutError
import time
import os
import cv2
import dlib
from datetime import datetime
def f(frame_i):
gray = cv2.cvtColor(src=frame_i, code=cv2.COLOR_BGR2GRAY)
# Use detector to find landmarks
faces = detector(gray)
#print("here")
if len(faces) > 0:
cv2.imwrite("frame%s.jpg" % datetime.now().strftime("%H:%M:%S"), frame_i)
print("Success")
#for face in faces:
# x1 = face.left() # left point
# y1 = face.top() # top point
# x2 = face.right() # right point
# y2 = face.bottom() # bottom point
# # Create landmark object
# landmarks = predictor(image=gray, box=face)
# Loop through all the points
# for n in range(0, 68):
# x = landmarks.part(n).x
# y = landmarks.part(n).y
# Draw a circle
# cv2.circle(img=frame_i, center=(x, y), radius=3, color=(0, 255, 0), thickness=-1)
# cv2.imwrite("frame%s.jpg" % datetime.now().strftime("%H:%M:%S"), frame_i)
# print("Success")
SKIP_FRAME = 0
count = 0
if __name__ == '__main__':
# Load the detector
detector = dlib.get_frontal_face_detector()
# Load the predictor
predictor = dlib.shape_predictor("shape_predictor_68_face_landmarks.dat")
# read the image
cap = cv2.VideoCapture('07-28-2020-12-28-26.h264')
TIME_FORMAT = "%m-%d-%Y-%H-%M-%S.h264"
# start 4 worker processes
image_list = []
reading_time = datetime.now()
while True:
ret, frame = cap.read()
if ret:
if count == SKIP_FRAME:
image_list.append(frame)
count = 0
else:
count = count +1
else:
break
print(len(image_list))
reading_end = datetime.now() - reading_time
print(str(reading_end))
print("begin mutilprocessing")
start = datetime.now()
with Pool(processes=4) as pool:
pool.map(f, image_list)
diff = datetime.now() - start
print("face detection time:")
print(str(diff))
print("total time:")
print((str(diff+reading_end)))
cap.release()
The performance of skipping every 30FPS is about 2x faster than the previous code. However, there are some problems with this code.
- The reading time consumed 27 second for a 30 second video. I need to parallelize the reading time.
- Currently, I save all reading frame to the memory buffer. The memory becomes an issue when loading larger video. I even can not load all frames from this 30 second video.
After I systematic learn the facial detection, I will optimize this code.