the result of output of head pose #10

eugeneYz · 2022-01-08T09:00:15Z

A splendid job! My work is about to make use of the head pose(R and T). So how to print the result.

clankill3r · 2025-01-08T20:10:48Z

I wonder the same, in the readme it states:

The predictions include the 3DoF rotation matrix R (Pitch, Yaw, Roll), and the 3DoF translation matrix T (x, y, z),

Some code can be found in __init__.py

def pose(frame, results, color):
    landmarks, params = results

    # rotate matrix
    R = params[:3, :3].copy()

    # decompose matrix to ruler angle
    euler = rotationMatrixToEulerAngles(R)
    print(f"Pitch: {euler[0]}; Yaw: {euler[1]}; Roll: {euler[2]};")

    draw_projection(frame, R, landmarks, color)

(ruler should be euler)

The interesting thing is that draw_projection is given the rotation matrix and the rest of the matrix is lost.
If we look at:

And draw_projection:

def draw_projection(frame, R, landmarks, color, thickness=2):
    # build projection matrix
    radius = np.max(np.max(landmarks, 0) - np.min(landmarks, 0)) // 2
    projections = build_projection_matrix(radius)

    # refine rotate matrix
    rotate_matrix = R[:, :2]
    rotate_matrix[:, 1] *= -1

    # 3D -> 2D
    center = np.mean(landmarks[:27], axis=0)
    points = projections @ rotate_matrix + center
    points = points.astype(np.int32)

    # draw poly
    cv2.polylines(frame, np.take(points, [
        [0, 1], [1, 2], [2, 3], [3, 0],
        [0, 4], [1, 5], [2, 6], [3, 7],
        [4, 5], [5, 6], [6, 7], [7, 4]
    ], axis=0), False, color, thickness, cv2.LINE_AA)

Then we can see that the center of where it draws is taken from the mean value of the landmarks and also it's 2d.
This makes me wonder how usefull the XYZ translation data is that is inside the matrix.

Right now I'm trying to get it out. I will post more when I figured out more.

clankill3r · 2025-01-08T20:18:02Z

One thing I just noted, also the radius is taken from the landmarks.
Using the center and then the radius to estimate the depth can be really poor cause let's say you walk backwards but your face stays on the left edge of the image. Then the x axis of the image doesn't really change, where in the real world you have to move backwards but also to the right at the same time else you would get closer to the center of the video each step.

If I visualise what I expect to be the translation data using t = params[:3, 3].copy() then I get:

For now I will check:

headposeplus
deep-head-pose (Hopenet)
Lightweight-Head-Pose-Estimation
Dense-Head-Pose-Estimation

And maybe I come back if I find hacing the xyz into this one proves to be the best option.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

the result of output of head pose #10

the result of output of head pose #10

eugeneYz commented Jan 8, 2022

clankill3r commented Jan 8, 2025

clankill3r commented Jan 8, 2025 •

edited

Loading

the result of output of head pose #10

the result of output of head pose #10

Comments

eugeneYz commented Jan 8, 2022

clankill3r commented Jan 8, 2025

clankill3r commented Jan 8, 2025 • edited Loading

clankill3r commented Jan 8, 2025 •

edited

Loading