-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Annotate rigid objects in 2D image with standard 3D cube #3387
Comments
this is so cool feature ... |
@hnuzhy , I agree that we need to improve the functionality. Your explanation is really helpful. Could you please describe your research area and organization? Unfortunately my team has huge amount of requests and we already have an approximate roadmap for Q3'21 and Q4'21. Thus I'm trying to clarify details which will help me to increase the priority of the feature. |
Yes, it is a pretty cool function which is not easy to realize :-( |
@nmanovic Hi, I'm glad you agree to my proposal. I am a PhD student in computer department from SJTU University. My research field is the intersection of AI and education. The detailed research direction is object detection and pose estimation in computer vision. I would like to talk about the motivation of this question from two aspects. Aspect one: Academic Value Recently, I've been studying the methods of attention detection for students in the classroom. Among them, head orientation (head pose estimation) is one of the key factors. However, as far as I know, the head pose estimation algorithm of multi-person in 2D image is not well developed. At present, there are some SOTA algorithms for head pose estimation of a single well cropped head, including FSA-Net(CVPR2019) and WHE-Net(BMVC2020). But their effect is not ideal, and it is not easy to extend to the case of multiple people in a single image. Most importantly, the datasets used by these algorithms are obtained by 3D head projection (300W-LP & AFLW2000-3D), or the 3D Euler collected by depth camera in the experimental scene (CMU Panoptic Studio Dataset).
Dataset has always been the cornerstone of deep learning algorithms, so is head pose estimation. Therefore, I want to try to annotate the 3D head orientation, or three Euler angles of the head directly in the 2D image. As mentioned for the first time in this issue, the most accurate annotation scheme focuses on how to use 3D cube to interact freely on 2D images. In my opinion, once such a dataset is constructed, it will help promote the great progress of the corresponding algorithm research. For example, a bottom-up method could be designed to directly predict the pose of all heads in the image at one time. At the same time, compared with a single captured head image, the complete scene and human body information in the original image can assist more accurate head pose estimation. Aspect two: Enhancement Feasibility: After investigation, I didn't find tools with real 3D cube annotation. Fortunately, close functional options were found in CVAT. The first is Here are three examples of rough annotation results with In a word, it is very useful to add interactive annotation of rigid 3D graphics (which can only be rotated, translated and scaled) to 2D images. In addition to supporting the head orientation marking, the new function can also be extended to the annotation of other rigid objects. After the construction of similar datasets about general objects, we can try to develop a simple and direct 3D object pose estimation algorithm only based on 2D images. We expect that this method can be comparable to estimation algorithms based on RGB-D or 3D point cloud. Finally, I am not good at giving the overall improvement framework of CVAT about this enhancement from UI design or code addition, but I am willing to do what I can. I sincerely thank CVAT's main contributors for their work, and hope to carefully consider adding this task to roadmap. |
I support the request. We also have a need for such functionality. |
This is a growing request from automotive industry as well, we need cuboid annotations to be done on RGB images not points clouds. |
Seconding this, this would be immensely valuable for pose estimation of objects in robotics. |
Hello everyone. For those who are interested in this question, you can refer to the 2D head pose annotation tool I mentioned in https://github.com/hnuzhy/HeadAttribute/. |
My actions before raising this issue
I have read and searched the official docs and past issues for the solution. No one had the same problem with me.
Expected Behaviour
I want to annotate the head orientation of people in 2D image with a standard 3D cube. Here, the head is a rigid object. A standard cube is defined as follows: three sides of any vertex are perpendicular to each other, and all twelve sides are equal in length, or in unit length.
After labeling, we could get the eight projected vertices of the cube in the two-dimensional coordinate system. If three Euler angles (pitch, yaw, roll) are used to represent the orientation of the head, these precise projection points can be converted into corresponding angles.
Current Behaviour
Current cuboid annotation The current provided cuboid annotation function in CVAT is not suitable for rigid object.
These conditions make cuboid can not be used to mark the head orientation. In addition, I also think that such a cuboid is not suitable for labeling cars, chairs and other rigid objects.
Alternative choice: ployline As an alternative, I try to annotate three consecutive non planar edges of the cube by using the
ployline
label. In this way, four points of the three edges can be used to estimate the Euler angles. However, this alternative can only solve the third problem ofcuboid
label mentioned above, and the first and second problems have not been solved. What we actually get are the rotated cuboids.Possible Solution
I have three suggestions or roadmaps for adding unit
cube
label in the new version of CVAT.Improve cuboid The current cuboid is actually oblique. However, objects in the real world should be marked with regular cuboids which satisfy that three edges of each vertex are perpendicular. At the same time, we need to release the third dimension of cuboid and allow it to rotate freely. I don't know if it's easy to implement with TypeScript. Three.js and other open source packages may be used for reference.
Modify cuboid-3d As far as I know, recent versions of CVAT already support 3D point cloud annotation. So is it possible to transplant the 3D cuboid module to the 2D image annotation? I'm not very familiar with the content of point cloud annotation, so it's inconvenient for me to give my opinions.
Add cube If possible, consider adding a new cube label to the candidate label button on the left side of CVAT. Users could choose to add new 3D cube graphics. The cube instance supports rotation at any angle on three dimensions. The software will automatically record the final Euler angle when the shape of cube is fixed.
Here are two examples of 3D model interaction. The first is the rotation interaction of a 3D head model in mayavi. The interactive operation needs to rely on both mouse and keyboard. The second is to use the 3D image editing tool in Windows 10 to place and operate 3D models on 2D images. All you need to do is use the mouse.
Example 1
Example 2
Next steps
Looking forward to your reply. I will be willing to do whatever I can to advance this functional part.
The text was updated successfully, but these errors were encountered: