Welcome to my GitHub 😄
- 📚 I’m currently pursuing my master's degree in Computer Science at the Hebrew University of Jerusalem, after completing a dual major BSc in CS & Physics.
- My interests include:
- Machine Learning
- Deep Learning
- SLAM (Simultaneous Localization and Mapping)
- Image Processing
- 3D Computer Vision
- Camera Calibration and Geometry
- 🔬 As for my MSc, I'm part of the Micro-Flight Lab - Our goal Understanding the mechanisms of insect flight and implementing them in biomimetic robots.
- My part is Developing and utilizing an end-to-end computer vision system, which includes:
- Multi-camera calibration for capturing high-speed videos (16K fps) of flying insects.
- Engineering an ensemble of instance segmentation and pose estimation deep neural networks (DNNs); labeling, training, and deploying them to analyze insects’ flight dynamics.
- Performing 3D reconstruction and simulation of insects’ wings and body dynamics using multi-camera detections, applying novel optimization methods that have achieved unprecedented, state-of-the-art results in the field.
- Devising and utilizing several validation methods to ensure the stability and accuracy of the system.
- My other projects include all kinds of machine learning, deep learning and computer vision challenges
Explore My Work at Micro-Flight Lab
Get a glimpse into the research and development I conducted during my time at the Micro-Flight Lab.
Four synchronized and calibrated high-speed cameras, capturing at an impressive 16,000 frames per second (fps), record flying insects in real-time. For context, the wingbeat frequency here is around 200 Hz, allowing us to capture intricate details of wing and body dynamics.
Below is a raw video from the lab’s high-speed camera setup:
movie.53.b.-.Made.with.Clipchamp.mp4
The Method
- We developed a robust 3D tracking system to capture specific feature points on a fly’s wings and body.
- We employed a multi-camera 2D tracker powered by an ensemble of deep learning models, and aggregated multi-view detections into accurate 3D reconstructions.
- We Derived the fly’s geometry: body orientation and wing angles using these tracked points.
Main challenges
- Working with a small, custom dataset, leaving no off-the-shelf solutions available.
- Handling multiple self-occlusions, where feature points may be unseen in many frames, and determining which cameras should be used at any given time.
- Ensuring robustness and outlier-free analysis for every recorded movie.
The following videos showcases the output from our video analysis pipeline.
video.roni.2.-.Made.with.Clipchamp.mp4
analysis.2.mp4
On the left side, you’ll see the hand-picked feature points, automatically detected across all four views. This detection is powered by a trained ensemble of deep neural networks, which identifies points in each view independently. These 2D detections from multiple views are later aggregated using novel optimization technics into optimal 3D points.
On the right side, you’ll observe the reconstructed 3D points, enhanced with additional annotations:
- The green plane represents the 'stroke plane': the imaginary plane through which the insect's wings move during each wingbeat.
- The three arrows in the center define the insect's internal 3D coordinate system.
- Each wing features two arrows representing the chord and span (x and y axes) of the wing's coordinate system.
Dive into interactive visualizations showcasing my research and analyses.
- The static movie provides an interactive summary of the flight event, visualizing the positions of the wingtips and the center of mass for each frame. Additionally, it depicts the orientation of the fly at every wingbeat (~70 frames), represented as a cross marker.
- The dynamic movie offers an interactive simulation of the flight dynamics across all frames, showcasing the tracked 3D points, the fly's internal moving coordinate system, and the coordinate systems of each wing, represented by the chord and span of the wings
Click below to open in a new tab:
The input to the pose estimation CNN is a 5-channel image comprising three temporal channels (corresponding to frames at -7, 0, and +7 relative to the current frame) and two binary segmentation masks, which act as an attention mechanism to address left-right wing ambiguity. The output is a multi-channel image containing C Gaussian heatmaps, each representing a distinct feature point.