Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the result of output of head pose #10

Open
eugeneYz opened this issue Jan 8, 2022 · 2 comments
Open

the result of output of head pose #10

eugeneYz opened this issue Jan 8, 2022 · 2 comments

Comments

@eugeneYz
Copy link

eugeneYz commented Jan 8, 2022

A splendid job! My work is about to make use of the head pose(R and T). So how to print the result.

@clankill3r
Copy link

I wonder the same, in the readme it states:

The predictions include the 3DoF rotation matrix R (Pitch, Yaw, Roll), and the 3DoF translation matrix T (x, y, z),

Some code can be found in __init__.py

def pose(frame, results, color):
    landmarks, params = results

    # rotate matrix
    R = params[:3, :3].copy()

    # decompose matrix to ruler angle
    euler = rotationMatrixToEulerAngles(R)
    print(f"Pitch: {euler[0]}; Yaw: {euler[1]}; Roll: {euler[2]};")

    draw_projection(frame, R, landmarks, color)

(ruler should be euler)

The interesting thing is that draw_projection is given the rotation matrix and the rest of the matrix is lost.
If we look at:

And draw_projection:

def draw_projection(frame, R, landmarks, color, thickness=2):
    # build projection matrix
    radius = np.max(np.max(landmarks, 0) - np.min(landmarks, 0)) // 2
    projections = build_projection_matrix(radius)

    # refine rotate matrix
    rotate_matrix = R[:, :2]
    rotate_matrix[:, 1] *= -1

    # 3D -> 2D
    center = np.mean(landmarks[:27], axis=0)
    points = projections @ rotate_matrix + center
    points = points.astype(np.int32)

    # draw poly
    cv2.polylines(frame, np.take(points, [
        [0, 1], [1, 2], [2, 3], [3, 0],
        [0, 4], [1, 5], [2, 6], [3, 7],
        [4, 5], [5, 6], [6, 7], [7, 4]
    ], axis=0), False, color, thickness, cv2.LINE_AA)

Then we can see that the center of where it draws is taken from the mean value of the landmarks and also it's 2d.
This makes me wonder how usefull the XYZ translation data is that is inside the matrix.

Right now I'm trying to get it out. I will post more when I figured out more.

@clankill3r
Copy link

clankill3r commented Jan 8, 2025

One thing I just noted, also the radius is taken from the landmarks.
Using the center and then the radius to estimate the depth can be really poor cause let's say you walk backwards but your face stays on the left edge of the image. Then the x axis of the image doesn't really change, where in the real world you have to move backwards but also to the right at the same time else you would get closer to the center of the video each step.

If I visualise what I expect to be the translation data using t = params[:3, 3].copy() then I get:

Screenshot 2025-01-08 204111

For now I will check:

headposeplus
deep-head-pose (Hopenet)
Lightweight-Head-Pose-Estimation
Dense-Head-Pose-Estimation

And maybe I come back if I find hacing the xyz into this one proves to be the best option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants