Tracking Objects as Points
we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art.
CenterTrack, applies a detection model to a pair of images and detections from the prior frame.
CenterTrack localizes objects and predicts their associations with the previous frame.
CenterTrack is simple, online (no peeking into the future), and real-time.
CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes.
Tracking-by-detection， These models rely on a given accurate recognition to identify objects and then link them up through time in a separate stage.
Recent work on simultaneous detection and tracking [1, 8] has made progress in alleviating some of this complexity.
Each object is represented by a single point at the center of its bounding box.
Specifically, we adopt the recent CenterNet detector to localize object centers.
We condition the detector on two consecutive frames, as well as a heatmap of prior tracklets, represented as points. We train the detector to also output an offset vector from the current object center to its center in the previous frame.
If each object in past frames is represented by a single point, a constellation of objects can be represented by a heatmap of points
point-based tracking simplifies object association across time. A simple displacement prediction, akin to sparse optical flow, allows objects in different frames to be linked.
Joint detection and tracking. 将检测器与追踪器合二为一
Motion prediction. Early approaches [2,47] used Kalman filters to model object velocities. Our center offset prediction is analogous to sparse optical flow, but is learned together with the detection network and does not require dense supervision.
Heatmap-conditioned keypoint estimation. A rendered heatmap of prior keypoints [4, 11, 29, 44] is especially appealing in tracking for two reasons. First, the information in the previous frame is freely available and does not slow down the detector. Second, conditional tracking can reason about occluded objects that may no longer be visible in the current frame. The tracker can simply learn to keep those detections from the prior frame around.
3D object detection and tracking.
Tracking objects as points
Association through offsets
Training on video data
Training on static image data
End-to-end 3D object tracking
Simple Unsupervised Multi-Object Tracking
Multiple People Tracking by Lifted Multicut and Person Re-identification