Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

方向向量计算问题 #14

Open
Eurususu opened this issue Dec 30, 2024 · 2 comments
Open

方向向量计算问题 #14

Eurususu opened this issue Dec 30, 2024 · 2 comments

Comments

@Eurususu
Copy link

你好,根据现在提供的方向向量计算, utils/helpers.py中

dx = int(-length * np.sin(pitch) * np.cos(yaw))
dy = int(-length * np.sin(yaw))

我发现并不准确。例子如下:
图片
根据这这里提供的俯仰角和偏航角,如果通过draw_gaze得到的结果如下:

yaw = np.radians(-58.69)
pitch = np.radians(-15.04)
bbox = [0, 0, 224, 224]
frame = cv2.imread('/home/jia/anktechDrive/09_dataset/test/Image/unused/Face/1.jpg')
draw_gaze(frame, bbox, pitch, yaw)
cv2.imshow('test', frame)
cv2.waitKey()

图片
可以发现视线方向明显不对。
如果将pitch和yaw位置互换得到的结果:

yaw = np.radians(-15.04)
pitch = np.radians(-58.69)
bbox = [0, 0, 224, 224]
frame = cv2.imread('/home/jia/anktechDrive/09_dataset/test/Image/unused/Face/1.jpg')
draw_gaze(frame, bbox, pitch, yaw)
cv2.imshow('test', frame)
cv2.waitKey()

图片
可以看到结果和预设的差不多。

考虑到俯仰角和偏航角确实分别是-58.69和-15.04,那么可能就是计算公式出错,经过修改为如下

dx = int(-length * np.sin(yaw) * np.cos(pitch))
dy = int(-length * np.sin(pitch))

这样输入正确的角度也能得到合理的可视化效果。

望作者审视一下这个问题,如果确实这个是有问题的,那么模型推理过程是否就有问题了?

@yakhyo
Copy link
Owner

yakhyo commented Dec 30, 2024

Hi @Eurususu, thank you for pointing that out. I believe it's a case of variable name mismatches, but it does not impact the overall model functionality.

Below are the results I obtained when performing inference on the image above using the ResNet34 model from this repository*:
image

The naming mismatch issue likely originates from the dataset. While parsing it, I couldn't find explicit details specifying whether the labels followed a pitch-then-yaw order. I opted for pitch and yaw, and the rest of the code was adjusted accordingly.

Let me know if you have any questions or need further improvements for the code or its explanations!

*yellow line shows direction of gaze

@Eurususu
Copy link
Author

THANKS FOR REPLYING!

Although the whole process is fine, you know.However, there are some problems in the naming of some variables, and the root cause of these problems is from the label file.
I tried to parse the label file as following:

# direction vector to pitch yaw
def GazeTo2d(gaze):
  yaw = np.arctan2(gaze[0], -gaze[2])
  pitch = np.arcsin(gaze[1])
  return np.array([yaw, pitch])
# from label file direction vector
gaze = np.array([0.3161872829071495,-0.12964845241024991,-0.9397961911581795])
# preprocess reference from dataset
gaze = gaze * 180 / np.pi
# final yaw and pitch
pitch, yaw = gaze[1], gaze[0]
bbox = [0, 0, 224, 224]
frame = cv2.imread('/home/jia/anktechDrive/09_dataset/test/Image/unused/Face/1.jpg')
draw_gaze(frame, bbox, pitch, yaw)
cv2.imshow('test', frame)
cv2.waitKey()

this vis is correct. However,two things are confusing.
First, if use the right formula

dx = int(-length * np.sin(yaw) * np.cos(pitch))
dy = int(-length * np.sin(pitch))

the vis of label is not right. Check all the processes and find that there are no major problems, then the problem is from label file.

Second, radian *180/pi. I don't really understand what this process means.But it worked.

gaze = gaze * 180 / np.pi

as we know, *180/pi is formula degree to radian. Now the input is radian.


It seems that these two points are really not a big problem, but for the annotation data, it is worth exploring. Now I am using Unity tool to annotate pictures, labeling process is straightforward
图片
the direction vector is matched with the result calculated with formula:

def gaze_direction_from_angles(pitch, yaw):
    x = np.cos(pitch) * np.sin(yaw)
    y = np.sin(pitch)
    z = np.cos(pitch) * np.cos(yaw)
    norm = np.sqrt(x**2 + y**2 + z**2)
    return np.array([x / norm, y / norm, z / norm])

There is no problem, If I use annotated direction vector to see the vis result. This will be simple and easy to understand. Of course draw_gaze uses the correct formula

gaze = np.array([-0.825, -0.259, 0.501])
gaze = GazeTo2d(gaze)
pitch, yaw = gaze[1], gaze[0]
bbox = [0, 0, 224, 224]
frame = cv2.imread('/home/jia/anktechDrive/09_dataset/test/Image/unused/Face/1.jpg')
draw_gaze(frame, bbox, pitch, yaw)
cv2.imshow('test', frame)
cv2.waitKey()

But it's going to be hard if I want to put this annotated data in the form of gaze360 label. This means that all public data cannot be used, as well as pre-trained weights

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants