Tracking Objects as Points

主要贡献

构建了一个更简单，更快，更精确的目标检测与追踪联合算法

we present a simultaneous detection and tracking algorithm that is simpler, faster, and more accurate than the state of the art.
CenterTrack, applies a detection model to a pair of images and detections from the prior frame.
CenterTrack localizes objects and predicts their associations with the previous frame.
CenterTrack is simple, online (no peeking into the future), and real-time.
CenterTrack is easily extended to monocular 3D tracking by regressing additional 3D attributes.

简介

Tracking-by-detection， These models rely on a given accurate recognition to identify objects and then link them up through time in a separate stage.

Recent work on simultaneous detection and tracking [1, 8] has made progress in alleviating some of this complexity.

每个目标被描述为bbox的中心点，中心点会被持续跟踪。

Each object is represented by a single point at the center of its bounding box.

使用CenterNet detector

Specifically, we adopt the recent CenterNet detector to localize object centers.

将前后两帧图片和前一帧图片的目标热点图送入网络，网络在输出bbox的同时还会输出当前bbox中心点位置和上一帧同一目标中心点位置的差值。

We condition the detector on two consecutive frames, as well as a heatmap of prior tracklets, represented as points. We train the detector to also output an offset vector from the current object center to its center in the previous frame.

使用贪心算法将上一帧目标的预测位置和实际检测位置进行匹配。

If each object in past frames is represented by a single point, a constellation of objects can be represented by a heatmap of points

point-based tracking simplifies object association across time. A simple displacement prediction, akin to sparse optical flow, allows objects in different frames to be linked.

由于模型倾向于直接输出之前的结果而招致模型的训练损失过小，或者反映为模型拒绝学习追踪过程。所以在训练时采用了具有侵略性的数据增强算法。甚至可以直接在静态图片上训练。

该算法只能关联上下两帧之间的目标，无法实现出现遮挡或消失之后的跟踪恢复。

舍弃这部分的性能来换取模型的简易型，短时追踪的精度与速度。