Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What is the conversion relationship between the annotations in BasicAI LiDAR Fusion format and Kitti 3D Object Detection format? #308

Open
gravity-123 opened this issue Dec 25, 2024 · 3 comments
Assignees

Comments

@gravity-123
Copy link

I want to make a kitti-like dataset without images, for which I studied the conversion of annotations from BasicAI LiDAR Fusion format to Kitti 3D Object Detection format.
I used the LiDAR without any configs dataset provided in this URL https://docs.basic.ai/docs/upload-data and annotated it. The annotation example is shown in the figure below.
2024-12-25 21-21-23屏幕截图
When I exported, I saved the BasicAI LiDAR Fusion format and Kitti 3D Object Detection format respectively. I compared the annotations of the 3d bounding box in the json file and the txt file. As shown in the figure, the size of the box is the same, except for the order. How is the center position obtained?
2024-12-25 21-22-56屏幕截图
Is it obtained by txt_position = camera_external * json_position? But the result is wrong when calculated in this way. I read the description of camera extrinsic parameters on the webpage https://docs.basic.ai/docs/camera-intrinsic-extrinsic-and-distortion-in-camera-calibration#practice-in-basicai-, as shown in the figure. The order of the extrinsic parameters provided in the above website seems to be different.
2024-12-25 21-25-32屏幕截图
The figure below shows the parameters corresponding to the above set of data. This parameter is the second extrinsic parameter in LiDAR_Fusion_without_any_configs/camera_config/08.json.
2024-12-25 21-31-38屏幕截图
So how should the position conversion from BasicAI LiDAR Fusion format to Kitti 3D Object Detection format be calculated? Is my calculation formula wrong, or is there a problem with the camera extrinsic parameters?

@ShirleySprite
Copy link

ShirleySprite commented Jan 17, 2025

Hello,

The KITTI format requires camera parameters because the coordinates in the KITTI format consist of 2D boxes on the image and 3D boxes in the camera coordinate system. Please note that it is the camera coordinate system, not the point cloud coordinate system. Moreover, the representation of these 3D coordinates is specific to the KITTI format. The specific conversion code is as follows:

import numpy as np

def alpha_in_pi(a):
    pi = math.pi
    return a - math.floor((a + pi) / (2 * pi)) * 2 * pi

def gen_alpha(rz, ext_matrix, lidar_center):
    lidar_center = np.hstack((lidar_center, np.array([1])))
    cam_point = ext_matrix @ np.array([np.cos(rz), np.sin(rz), 0, 1])
    cam_point_0 = ext_matrix @ np.array([0, 0, 0, 1])
    ry = -1 * (alpha_in_pi(np.arctan2(cam_point[2] - cam_point_0[2], cam_point[0] - cam_point_0[0])))
    cam_center = ext_matrix @ lidar_center.T
    theta = alpha_in_pi(np.arctan2(cam_center[0], cam_center[2]))
    alpha = ry - theta
    return ry, alpha

# Assuming obj_3d, rect, contour_3d are defined elsewhere with appropriate data.
contour_3d = obj_3d[rect['trackId']]['contour']
length, width, height = contour_3d['size3D'].values()
cur_rz = contour_3d['rotation3D']['z']

ry, alpha = gen_alpha(cur_rz, ext_matrix, np.array(list(contour_3d['center3D'].values())))

point = list(contour_3d['center3D'].values())
temp = np.hstack((np.array([point[0], point[1], point[2] - height / 2]), np.array([1])))
x, y, z = list((ext_matrix @ temp))[:3]
score = 1
string = f"{label} {truncated:.2f} {occluded} {alpha:.2f} " \
         f"{min(x_list):.2f} {min(y_list):.2f} {max(x_list):.2f} {max(y_list):.2f} " \
         f"{height:.2f} {width:.2f} {length:.2f} " \
         f"{x:.2f} {y:.2f} {z:.2f} {ry:.2f} {score}\n"

@gravity-123
Copy link
Author

I don't know enough about how to pass the parameters in the json file to your program, so I assigned the parameters directly to the relevant variables, but the final calculation result is different from the actual kitti annotation. I don't know if there is a problem with my assignment or your program. Can you verify that this conversion idea is correct in a more obvious way by calculating the specific values? The following is my modified code.

import numpy as np
import math

def alpha_in_pi(a):
    pi = math.pi
    return a - math.floor((a + pi) / (2 * pi)) * 2 * pi

def gen_alpha(rz, ext_matrix, lidar_center):
    lidar_center = np.hstack((lidar_center, np.array([1])))
    cam_point = ext_matrix @ np.array([np.cos(rz), np.sin(rz), 0, 1])
    cam_point_0 = ext_matrix @ np.array([0, 0, 0, 1])
    ry = -1 * (alpha_in_pi(np.arctan2(cam_point[2] - cam_point_0[2], cam_point[0] - cam_point_0[0])))
    cam_center = ext_matrix @ lidar_center.T
    theta = alpha_in_pi(np.arctan2(cam_center[0], cam_center[2]))
    alpha = ry - theta
    return ry, alpha

ext_matrix = np.array([[ 0.70944543, -0.70469849, -0.00933913 ,-0.06090484],
                        [ 0.00722819 , 0.0205264,  -0.99976318 , 1.74850379],
                        [ 0.7047233 ,  0.70920992 , 0.01965606 ,-7.00787134],
                        [ 0.  ,        0.     ,     0.    ,      1.        ]])
# Assuming obj_3d, rect, contour_3d are defined elsewhere with appropriate data.
# contour_3d = obj_3d[rect['trackId']]['contour']
length, width, height = 4.2972, 1.8512, 1.4052
center3D = [13.7449,18.8812,0.5131]
cur_rz = -2.3562

ry, alpha = gen_alpha(cur_rz, ext_matrix, center3D)

point = center3D
temp = np.hstack((np.array([point[0], point[1], point[2] - height / 2]), np.array([1])))
x, y, z = list((ext_matrix @ temp))[:3]
score = 1
# string = f"{label} {truncated:.2f} {occluded} {alpha:.2f} " \
#          f"{min(x_list):.2f} {min(y_list):.2f} {max(x_list):.2f} {max(y_list):.2f} " \
#          f"{height:.2f} {width:.2f} {length:.2f} " \
#          f"{x:.2f} {y:.2f} {z:.2f} {ry:.2f} {score}\n"

The xtreme1 annotation, camera parameters, and kitti annotation generated by this software for this object are shown in the figure below. I am sure that these two annotations are corresponding annotations because their length, width, and height parameters are the same.

Image

Image

Image
In addition, I wrote a script xtreme1_to_kitti.py to generate annotations and run it in some point cloud detection tasks. It seems to be fine, but the conversion method is different from the code you gave. So I can't completely confirm that my idea is correct.

@ShirleySprite
Copy link

I am really sorry, but there is a small issue here that you must understand. Neither the person who wrote this conversion script, nor I who took over later, actually use the converted KITTI format ourselves; we do not train models with it. That means this script is intended for others to use. So, after we finished writing the conversion script, if the users found any issues, we made changes accordingly. If they did not raise any problems, we assumed it was correct. If you ask me whether there are any issues with this conversion script, I really cannot give you a definite answer. I can only say that if it does not meet your expectations, then it has issues for you. Please feel free to use the conversion method that meets your expectations. This script is just a reference, and its specific details can be fully adjusted according to your needs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants