The world’s leading publication for data science, AI, and ML professionals.

Fall Detection using Pose Estimation

Developing a simple AI to detect falls

Pose estimation detection (Screen capture from 大阪 道頓堀 ライブカメラ osaka Dotonbori LiveCamera by ラジカルビデオジョッキー RVJJP)
Pose estimation detection (Screen capture from 大阪 道頓堀 ライブカメラ osaka Dotonbori LiveCamera by ラジカルビデオジョッキー RVJJP)

Fall detection has become an important stepping stone in the research of Action Recognition – which is to train an AI to classify general actions such as walking and sitting down. What humans interpret as an obvious action of a person falling face flat is but a sequence of jumbled up pixels for an AI. To enable the AI to make sense of the input it receives, we need to teach it to detect certain patterns and shapes, and formulate its own rules.

To build an AI to detect falls, I decided not to go through the torture of amassing a large dataset and training a model specifically for this purpose. Instead, I used pose estimation as the building block.

Pose Estimation

Pose estimation is the localisation of human joints – commonly known as keypoints – in images and video frames. Typically, each person will be made up of a number of keypoints. Lines will be drawn between keypoint pairs, effectively mapping a rough shape of the person. There is a variety of pose estimation methods based on input and detection approach. For a more in-depth guide to pose estimation, do check out this article by Sudharshan Chandra Babu.

To make this model easily accessible to everyone, I chose the input as RGB images and processed by OpenCV. This means it is compatible with typical webcams, video files, and even HTTP/RTSP streams.

Pretrained Model

The pose estimation model that I utilised was OpenPifPaf by VITA lab at EPFL. The detection approach is bottom-up, which means that the AI first analyses the entire image and figures out all the keypoints it sees. Then, it groups keypoints together to determine the people in the image. This differs from a top-down approach, where the AI uses a basic person detector to identify regions of interest, before zooming in to identify individual keypoints. To learn more about how OpenPifPaf was developed, do check out their CVPR 2019 paper, or read their source code.

Multi-Stream Input

Most open-source models can only process a single input at any one time. To make this more versatile and scalable in the future, I made use of the multiprocessing library in Python to process multiple streams concurrently using subprocesses. This allows us to fully leverage multiple processors on machines with this capability.

The pose estimation model is able to run concurrently on the two streams (Top, Bottom: Video from CHINA I Beijing I Street Scenes by gracetheglobe)
The pose estimation model is able to run concurrently on the two streams (Top, Bottom: Video from CHINA I Beijing I Street Scenes by gracetheglobe)

Person Tracking

In video frames with multiple people, it can be difficult to figure out a person who falls. This is because the algorithm needs to correlate the same person between consecutive frames. But how does it know whether it is looking at the same person if he/she is moving constantly?

The solution is to implement a multiple person tracker. It doesn’t have to be fancy; just a simple general object tracker will suffice. How tracking is done is pretty straightforward and can be outlined in the following steps:

  1. Compute centroids (taken as the neck points)
  2. Assign unique ID to each centroid
  3. Compute new centroids in the next frame
  4. Calculate the Euclidean distance between centroids of the current and previous frame, and correlate them based on the minimum distance
  5. If the correlation is found, update the new centroid with the ID of the old centroid
  6. If the correlation is not found, give the new centroid a unique ID (new person enters the frame)
  7. If the person goes out of the frame for a set amount of frames, remove the centroid and the ID
Simple person tracking (Video from CHINA I Beijing I Street Scenes by gracetheglobe)
Simple person tracking (Video from CHINA I Beijing I Street Scenes by gracetheglobe)

If you want a step-by-step tutorial on object tracking with actual code, check out this post by Adrian Rosebrock.

Fall Detection Algorithm

The initial Fall Detection algorithm that was conceptualised was relatively simplistic. I first chose the neck as the stable reference point (compare that with swinging arms and legs). Next, I calculated the perceived height of the person based on bounding boxes that defined the entire person. Then, I computed the vertical distance between neck points at intervals of frames. If the vertical distance exceeded half the perceived height of the person, the algorithm would signal a fall.

However, after coming across multiple YouTube videos of people falling, I realised there were different ways and orientations of falling. Some falls were not detected when the field of view was at an angle, as the victims did not appear to have a drastic change in motion. My model was also not robust enough and kept throwing false positives when people bent down to tie their shoelaces, or ran straight down the video frame.

I decided to implement more features to refine my algorithm:

  • Instead of analysing one-dimensional motion (y-axis), I analysed two-dimensional motion (both x and y-axis) to encompass different camera angles.
  • Added a bounding box check to see if the width of the person was larger than his height. This assumes that the person is on the ground and not upright. I was able to eliminate false positives by fast-moving people or cyclists using this method.
  • Added a two-point check to only watch out for falls if both the person’s neck and ankle points can be detected. This prevents inaccurate computation of the person’s height if the person cannot be fully identified due to occlusions.
Results of the improved fall detection algorithm (Top, Center, Bottom: Video from 50 Ways to Fall by Kevin Parry)
Results of the improved fall detection algorithm (Top, Center, Bottom: Video from 50 Ways to Fall by Kevin Parry)

Test Results

As of this writing, extensive fall detection datasets are scarce. I chose the UR Fall Detection Dataset to test my model as it contained different fall scenarios. Out of a total of 30 videos, the model correctly identified 25 falls and missed the other 5 as the subject fell out of the frame. This gave me a precision of 83.33% and an F1 score of 90.91%.

These results can be considered a good start but are far from conclusive due to the small sample size. The lack of other fall-like actions such as tying shoelaces also meant that I could not stress test my model for false positives.

The test was executed on two NVIDIA Quadro GV100s and achieved an average of 6 FPS, which is barely sufficient for real-time processing. The computation as a result of the numerous layers is extremely intensive. Models that claim to run at speeds above 15 FPS are typically inaccurate, or are backed by monstrous GPUs.

Modified OpenPose model with MobileNetV2 as the backbone network (Video from 大阪 道頓堀 ライブカメラ osaka Dotonbori LiveCamera by ラジカルビデオジョッキー RVJJP). This processed at an average of 11 FPS on an Intel Xeon CPU, but is highly inaccurate.
Modified OpenPose model with MobileNetV2 as the backbone network (Video from 大阪 道頓堀 ライブカメラ osaka Dotonbori LiveCamera by ラジカルビデオジョッキー RVJJP). This processed at an average of 11 FPS on an Intel Xeon CPU, but is highly inaccurate.
Performance comparison across three processors. Chart by author.
Performance comparison across three processors. Chart by author.

Applications

Fall detection can be applied in many scenarios to provide assistance. A non-exhaustive list includes:

  • Drunk people
  • The elderly
  • Kids in the playground
  • People who suffer from medical conditions like heart attacks or strokes
  • Careless people who trip and fall

For the more serious cases, swift response to a fall can mean the difference between life and death.

Future Development

The accuracy of fall detection is heavily dependent on the Pose Estimation accuracy. Typical pose estimation models are trained on clean images with a full-frontal view of the subject. However, falls cause the subject to be contorted in weird poses, and most pose estimation models are not able to accurately define the skeleton in such scenarios. Furthermore, the models are not robust enough to overcome occlusions or image noise.

To attain a human-level detection accuracy, current pose estimation models will need to be retrained on a larger variety of poses, and include lower-resolution images with occlusions.

Current hardware limitations also impede the ability of pose estimation models to run smoothly on videos with high frame rates. It will be some time before these models will be able to run easily on any laptop with a basic GPU, or even only with a CPU.

Apart from pose estimation, a deep learning model trained specifically on falls would likely perform as well or even better. The model must be trained carefully to distinguish falls from other fall-like actions. This, of course, must be coupled with extensive, publicly available fall datasets to train the model on. Of course, such a model is limited in scope as it only can identify one particular action, and not a variety of actions.

Another possible approach would also be knowledge-based systems, which is developing a model such that it is able to learn the way humans do. This can be achieved via a rule-based system where it makes decisions based on certain rules, or a case-based system where it applies similarities in past cases it has seen to make an informed judgement about a new case.

Conclusion

To solve the more difficult problem of general action recognition – which comprises a plethora of actions – we must first understand and master the intricacies of detecting a single action. If we are able to develop a model that can easily identify a fall just like you or I would, we will be able to extract certain patterns that can allow the model to just as easily detect other types of actions.

The path to action recognition is still undoubtedly a challenging one; but just like other cutting-edge models such as OpenAI’s GPT-3, we will be able to discover new techniques previously unheard of.


If you would like to share any ideas or opinions, do leave a comment below, or drop me a connect on LinkedIn.

If you would like to see how I developed the full model, do check out my GitHub repository for the source codes.


Related Articles