Analytics Service Spec does not provide enough information to interpret point-coordinate data #430
Unanswered
zachvictor
asked this question in
General
Replies: 1 comment
-
Hi Victor, thanks for the detailed problem description. The analytics normalized coordinate system bases on the encoded video image as defined by the VideoSource Bounds property. Devices may stream shapes and bounding boxes in pixel coordinates as long as they provide a normalizing transform. If that is missing the devices does not adhere to the specification even if it passes the test tool which only checks the content of GetSupportedMetadata but not the actual streamed shape information. Hope this addresses your question. |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Discussed in #409
Originally posted by zachvictor April 1, 2024
Source aspect ratio and/or resolution needed for object coordinate data in video analytics metadata
correctly. Coordinate data needs source resolution or aspect ratio.
Context
Object detections in ONVIF Profile video analytics metadata follow the Analytics Service Specification.1 An Object may have a ShapeDescriptor containing a BoundingBox, CenterOfGravity, and/or Polygon containing point-coordinate data.2
Per the spec, spatial relations involving coordinates use a [-1, 1]-normalized coordinate system, which places (0, 0) at the origin with numbers increasing toward the top right (like the Cartesian plane).3 The spec further defines a Transformation type for transformations between coordinate systems.4
In the spec, the example give coordinate data in pixels (presumably) and provide a Transformation to yield normalized coordinates. For example:5
Issues
Even if the metadata provides a Transformation, without the source image's resolution or aspect ratio, there is not enough information to interpret the coordinates correctly. Pixel units refer to a source resolution. Normalized units refer to a source aspect ratio. So, the standard is lacking: given metadata and video streams of a device that observes the Analytics Service Spec fully and beyond the minimal requirements for conformance, the ONVIF standard does not provide enough information to interpret point-coordinate data correctly.
Example:
References
Footnotes
ONVIF™ Analytics Service Specification. Version 23.12. December, 2023. PDF ↩
ibid., 5.3.1 Objects (pp. 12–15), and 5.3.3 Shape descriptor (pp. 16–17). ↩
ibid., 5.2.2 Spatial Relation (pp. 10–12). ↩
ibid. I have studied ONVIF metadata produced by a number of devices of manufacturers including Axis, Hanwha, and i-PRO. I have not observed the use of the Transformation type in any of these ONVIF implementations. If you know of any implementation that uses the Transformation type, I would be very grateful for any information you could provide, especially the manufacturer, model, and an example XML document. ↩
ibid., 5.3.1 Objects, p.13, "Example". ↩
Search ONVIF Conformant Products for, e.g.,
Hanwha PND-A6081RV and Hanwha XND-8083RV. Edited: strike, these are not valid examples. ↩See ONVIF® Profile M Specification, Version 1.0, June 2021. 8.7 Object classification, 8.7.1 Device requirements (if supported). PDF ↩
Search ONVIF Conformant Products for, e.g., Axis Q1656-LE and Axis P3267-LVE. Axis ARTPEC-8 devices give coordinates already normalized, yet the frames used for object inference are (evidently) the stream characterized by their "Capture Mode": i.e., the lowest-level, highest-resolution stream—which, incidentally, their proprietary APIs describe with an aspect ratio, not a resolution. Even if the default Stream Profile is 16:9, if the Capture Mode is 4:3, then the normalized coordinates always refer to a 4:3 frame.
Beta Was this translation helpful? Give feedback.
All reactions