Dynamic Siting Posture Recognition and Correction
To identify people’s sitting posture in real-time and re-build the awareness of the muscle and spatial position for back pain suffers

Abstract
Lower back pain (LBP) recently became a severe and common problem for most office workers. The majority of people, including office workers and students with poor sitting postures whilst working, are experiencing lower back pain, which causes difficulty in daily moving and other inconveniences. There are many treatments dealing with lower back pain, but most treatments includes ergonomic equipment, so physiotherapy can only offer light relief, rather than solving the problem at the source.
The aim of this project is to develop a novel way to designing a dynamic siting posture recognition and correction system. This system can identify people’s sitting posture and provide its real time information to patients, letting them realise what posture they are acting now, aiming to re-build the awareness of their muscle and spatial position.
Openpose python library is chosen to program this system. This library contains a convolutional neural network that has being trained over 200,000 times, can identify human body key points such as head, neck, shoulder and so on. Compare these key points spatial relationships, then identify peoples’ sitting posture. The advantage of using Openpose library is that a camera will not be necessary; so any input such as a webcam, picture and video with people sitting inside can be analysed without being fed from multi cameras, it can detect posture with any input image.
1 Openpose
Openpose is the first multi-person real-time system to jointly detect key points of human body, hand, facial and foot, it is the most popular Bottom-Up Approach for human pose estimation based on deep learning. It has these functionalities [1, 2].
(1) 2D multi-person keypoints real-time detection
(2) 3D single person keypoints real-time detection
(3) Calibration box
(4) Single person tracking
Human pose estimation can be done according to its skeleton. The skeleton represents the orientation of a human in a 2D graphical format. All the key points of the human body including body, facial, hand, foot, lumbar and shoulder with coordinate will be connected to represent the human skeleton, which describes the pose of people. A sample human skeleton is shown below.

Openpose first detects every key point, then groups them to distinct individuals. The flow chart below is the architecture of Openpose.It first extracts important features from an image by the VGG-19 layers, which are shown above. There are two parallel branches of convolutional layers will wait for the features coming. The top branch in the above figure is going to predict a set of 18 confidence maps. Each of these maps represents a body part of human skeleton particularly. The bottom Layer is going to do the prediction of 38 Part Affinity Fields (PAFs), the PAFs is representation of the degree of relevance between each part in human pose skeleton.According to the part confidence maps from top branch, bipartite images will be formed between the similar pairs of parts. With the calculated PAFs values, the weaker connections between the bipartite images will be corrected. Human pose skeleton can be calculated and assigned to their belonging individuals, in the image following the above steps.


To design this dynamic imaging system, a human pose skeletons map alone is not enough. Some particular human body joints including head, two shoulder tips, spine, hands and legs will be given a spatial coordinate; by comparing the coordinates spatial relation, the system is able to detect the human sitting posture, for instance, when the angle between the spine and leg is less than 70 degrees, meaning the people are hunching back.
2 Algorithm Structure
The model is built on recognising human-object interaction, human head pose, human body orientation, and human pose. The proposed algorithm takes the results of detection as inputs, generates a probabilistic model of personal space and updates it. First, the pose, body orientation, and head direction would help find the target object and reinforce the correct human-object interaction behaviour. Second, the pose, body orientation, and head direction update the parameter in the single person personal space model. The affordance space of multiple detected objects is re-assigned based on human- object interaction behaviour, in the meanwhile, the human-object interaction behaviour is associated with its activity space.

3 Pose Estimation
Since a 2D camera is used in our study, we compile the OpenPose method to detect and collect the key points in the human body: the points of neck, left hip, right hip, left knee and right knee. We can compute the vector of ‘spine’ and ‘leg’ from left hip to neck and from right hip to the neck, from left hip to left knee and from right hip to right knee, respectively. The left spine-leg angel presents the angle between the vectors of left spine and ‘left leg’, and the right spine-leg angle presents the angle between the vectors of ‘right spine’ and ‘right leg’, as shown in Figure 1.
The left or the right spine-leg angle θ can be derived with below formula, where Vspine and Vleg represents the vector of spine and leg, Pneck, Phip, and Pknee represents the keypoints of neck, hip, and knee respectively:

In the real-life setting, the human is not always in front of the camera. If only one spine-leg angle is detected, the final spine-leg angle is defined by this angle. If both of the angles are detected, the final spine-leg angle θ is defined as the mean of the two angles:
Use the previous spine-leg angle
if left spine-leg angle and right spine-leg angle both exist then spine-leg angle
⬇
the mean of (left spine - leg angle and right spine - leg angle)else if left spine-leg angle > right spine-leg angle then Use the measurements of left spine-leg angle else Use the measurements of right spine-leg angle end if
end if
This procedure contains two phases:
Phase 1: Spine-leg angle model learning 1. Collect spine-leg angel samples of the standing pose from the training images/videos. 2. Estimate your model parameters, that is, the mean and the (co)variance, from the sample data. 3. Save the parameters for later use in Phase 2.
Phase 2. Pose detection 1. Set a threshold probability. 2. Take the video from 2D camera as input, get the input spine-leg angle. Adopt Gaussian function and then compare the result with the threshold probability.
Hence, we propose the following Threshold Probability algorithm for detecting body pose. First of all, we calculate P (standing | θspine−leg = α) by applying the Gaussian equation:
if P (standing | θ_(spine−leg) = α)> P (threshold) then pose = Standingelse pose = Sittingend if
A simple hypothesis test has completely specified models, under both the null and alternative hypotheses, which are written as:
H0: Pose = standing

H1: Pose = sitting

The reason for choosing standing as the H0 is that, when people are standing; they have less space to lean the upper body, while they can lean forward and backwards in a large range of angles when sitting. Our data collected in Phase.1 also prove this quantitatively. When people are sitting in front of the camera, the spine-leg angles obey the distribution of G(152.15, 5.12² ), when people are sitting with side facing the camera, spine-leg angles obey the distribution of G(103.84, 7.99² ). In the meantime, the probability density function (PDF) of standing pose is G(168.03, 2.18²). Much smaller variance was found in the PDF of a standing pose. Besides, when people are sitting, the mean angles are very different from different body orientations.
The likelihood ratio test is based on the likelihood ratio, which is often denoted by Λ. In this case, the likelihood ratio is defined as:

The likelihood ratio test provides the decision rule as follows:
If Λ > C, do not reject H0; if Λ < C, reject H0; if Λ = C, reject with probability q, The value if C is selected to meet a desired significance level α, C and q are derived from the relation:
q · P (Λ = C | H0) + P (Λ < C | H0) = α
We set the significance level α = 0.05, then:

The equation can be transformed as follow:

Thus, we can detect the pose is standing or not by examining whether the spine-leg angle is within the range of (163.76, 172.3). Since we easily know that the angle of standing is larger than sitting, the angle range can be extended to (163.76, 180). Therefore, we set the range of spine-leg angle, (163.76, 180) for standing.


4 Results
When inputting Figure 8 as shown below, the system will detect the only person in this picture, then output a skeleton posutre map as shown above; by comparing the angle between spine and leg, evaluating the hands and knees position, and giving information to the people being observed, as seen in Figure 9.


5 Conclusion and Future work
In this project, the dynamic imaging system can estimate a human sitting posture with high accuracy, giving the information of whether the people are sitting straight, reclined or hunching back. With this system, more and more people with lower back pain can start to rebuild their awareness of their muscle, have a better habit of sitting in the right way. Also, the input can be images from anywhere; so extra cameras will no longer be necessary; hence this system saves more computational sources.
Due to my lack of ability, and the hardware limitation, I cannot make the system detect people the sitting posture in a high frequency. In the future, the system can achieve faster frequency with greater accuracy, better structure and a faster CPU.
References
[1] Cao, Z., Hidalgo, G., Simon, T., Wei, S.E. and Sheikh, Y. (2019) “OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields.” ArXiv.org. Available at arxiv.org/abs/1812.08008. [Accessed…]
[2] CMU-Perceptual-Computing-Lab (2020) “CMU-Perceptual-Computing-Lab/Openpose.” GitHub, Available at github.com/CMU-Perceptual-Computing-Lab/openpose [Accessed…]