Pose Estimation using Markers

Image courtesy to TagSlam

Throughout my research, I had to track the $SE(3)$ pose of various objects for several applications. In this post ill summarize the ROS-based implementation of some of the marker-based methods. Basically, most of the marker-based methods that ill cover only requires a USB-Cam, except the last one, which requires the whole OptiTrack motion captures system. We uses OptiTrack at the lab for for the finer applications.

Most of the videos, figures and explanation are taken from the authors implementation and are not mine. I just gather them for future use.

In the following, ill use the notation $T^{1}_{0}$ to describe the homogeneous transformation matrix between coordinates frame 1 to 0. Detailed explanation about marker-based transformation can be seen Here

AprilTag Markers

The AprilTag come in several different families, depending on how many bits a tag can represent. Some tag families have large, coarse bit blocks (16h5), while others are fine and smaller (36h11- most common). For lower resolution camera, consider the 16h5 family, For better ill suggest the 36h11 and for circular object you should use the 21h7 family.

A full detailed explanation how to generate tags is described:

Measuring the tag Size

When entering (ill explain it later) the size of your tag for pose estimation it is important to know what “size” is actually a measure of. The measurement that is called “size” is shown as “edge length” in the image below. The tag size should not be measured from the outside of the tag. The tag size is defined as the distance between the detection corners, or alternately, the length of the edge between the white border and the black border. The following illustration marks the detection corners with red Xs and the tag size with a red arrow for a tag from the 48h12Custom tag family.

Coordinate system

The coordinate system has the origin at the camera center. The z-axis points from the camera center out the camera lens. The x-axis is to the right in the image taken by the camera, and y is up.



Starting with a working ROS installation (Kinetic and Melodic are supported):

export ROS_DISTRO=melodic               # Set this to your distro, e.g. kinetic or melodic
source /opt/ros/$ROS_DISTRO/setup.bash  # Source your ROS distro
mkdir -p ~/catkin_ws/src                # Make a new workspace
cd ~/catkin_ws/src                      # Navigate to the source space
git clone https://github.com/AprilRobotics/apriltag.git      # Clone Apriltag library
git clone https://github.com/AprilRobotics/apriltag_ros.git  # Clone Apriltag ROS wrapper
cd ~/catkin_ws                          # Navigate to the workspace
rosdep install --from-paths src --ignore-src -r -y  # Install any missing packages
catkin build    # Build all packages in the workspace (catkin_make_isolated will work also)

The package works as shown in the below figure. The following default input topics are subscribed to (which can be remapped based on your needs):

  • /camera/image_rect: a sensor_msgs/Image topic which contains the image (e.g. a frame of a video stream coming from a camera). The image is assumed to be undistorted, i.e. produced by a pinhole camera. I recommend to use the image_proc_node, which is meant to sit between the camera driver and vision processing nodes. image_proc removes camera distortion from the raw image stream, and if necessary will convert Bayer or YUV422 format image data to color.
  • /camera/camera_info: a sensor_msgs/CameraInfo topic which contains the camera calibration matrix in /camera/camera_info/K. One can obtain a specific camera’s K via camera intrinsics calibration using any camera calibarion methods (Option 1, Option 2, Option 3 and so on..).

The Apriltag ROS package takes in a rectified camera feed and returns a list of detected tags and their 3D locations. In order for this to work the software needs to know what tags it is looking for and how large they are. These are defined in 2 config files: config/settings.yaml and config/tags.yaml.

The behavior of the ROS wrapper is fully defined by the two configuration files config/tags.yaml (which defines the tags and tag bundles to look for) and config/settings.yaml (which configures the core AprilTag 2 algorithm itself). Then, the following topics are output:

  • /tf: relative pose between the camera frame and each detected tag’s or tag bundle’s frame (specified in tags.yaml) using tf.
  • /tag_detections: the same information as provided by the /tf topic but as a custom message carrying the tag ID(s), size(s) and geometry_msgs/PoseWithCovarianceStamped pose information.
  • /tag_detections_image: the same image as input by /camera/image_rect but with the detected tags highlighted.

settings.yaml parameters

this file configures the detection algorithm parameters (most are self-explained)

tag_family:        'tag36h11' # Tag family
tag_border:        1          # Size (in bits) of the black border. Always 1 if made by optitag
tag_threads:       2          # Number of detection thread. Tune per your CPU
tag_decimate:      1.0        # Reduce the resolution of the image by this number. Increases speed at the sacrifice of detecting smaller tags
tag_blur:          0.0        # tag_blur>0 blurs the image and tag_blur<0 sharpens the image
tag_refine_edges:  1          # improves edge detection and therefore improves pose estimation. Light computation
tag_refine_decode: 0          # reduces false negative detection rate. Medium computation
tag_refine_pose:   0          # improves pose estimation accuracy. Heavy computation
tag_debug:         0          # save debug images to ~/.ros Careful if running with video
publish_tf:        true       # publish tag/bundle poses to /tf topic

tags.yaml parameters

This file tells the algorithm what tags to look for in the environment and how large they are so they can be placed in 3D space. The software assumes distance measurements are in meters and orientation is in quaternions (One can use tf.transformation package for other transformations)

    {id: 10, size: 0.15},
    {id: 20, size: 0.1},
    {id: 30, size: 0.07}
      name: 'my_bundle',
          {id: 0, size: 0.05, x: 0.0000, y: 0.0000, z: 0.0, qw: 1.0, qx: 0.0, qy: 0.0, qz: 0.0},
          {id: 4, size: 0.05, x: 0.0548, y: -0.0522, z: 0.0, qw: 1.0, qx: 0.0, qy: 0.0, qz: 0.0},
          {id: 3, size: 0.05, x: -0.0580, y: -0.0553, z: 0.0, qw: 1.0, qx: 0.0, qy: 0.0, qz: 0.0},
          {id: 2, size: 0.05, x: 0.0543, y: 0.0603, z: 0.0, qw: 1.0, qx: 0.0, qy: 0.0, qz: 0.0},
          {id: 1, size: 0.05, x: -0.0582, y: 0.0573, z: 0.0, qw: 1.0, qx: 0.0, qy: 0.0, qz: 0.0}

As we can see in the above configuration, We can define 2 tracking methods, standalone_tags and tag_bundles. For standalone tags, i.e. each marker represents a unique object in the environment we provide an ID and size for each tag you want to detect. For tag_bundles is the new feature which let you track an object which represented by multiple tags to overcome occlusion and such. Upon detection of a single or multiple tags in the bundle, the software will report the 6 DOF pose of the bundle’s origin. When we create a bundle you specify a list of tags. Each tag has a tag ID, size, and 6 DOF location of the tag in reference to the bundle’s origin.

Important notes mentioned by the authors:

  • No tag ID should appear twice with different sizes (this creates ambiguity in the detection)
  • No tag ID should appear twice in the image (this creates ambiguity in the detection)
  • It is fine for a tag with the same ID to be listed both in standalone_tags and in tag_bundles, as long as it has the same size.

A complete launch file which:

  • Open the camera stream from the video_device with width and height parameters.
  • Launch the image_proc node to remove distortions.
  • Load the settings.yaml and tags.yaml configurations files to the parameters server.
  • Run the apriltag_ros core node and publish the detected transformation.

  <node name="camera" pkg="usb_cam" type="usb_cam_node" output="screen" >
    <param name="video_device" value="/dev/video4" />
    <param name="image_width" value="1280" />
    <param name="image_height" value="720" />
    <param name="pixel_format" value="yuyv" />
    <param name="camera_frame_id" value="rgb_cam_link" />

  <node pkg="image_proc" type="image_proc" name="iamge_proc_node" ns="camera" />

  <arg name="launch_prefix" default="" /> <!-- set to value="gdbserver localhost:10000" for remote debugging -->
  <arg name="node_namespace" default="apriltag_ros_continuous_node" />
  <arg name="camera_name" default="/camera" />
  <arg name="camera_frame" default="camera" />
  <arg name="image_topic" default="image_rect" />

  <!-- Set parameters -->
  <rosparam command="load" file="$(find apriltag_ros)/config/settings.yaml" ns="$(arg node_namespace)" />
  <rosparam command="load" file="$(find apriltag_ros)/config/tags.yaml" ns="$(arg node_namespace)" />

  <node pkg="apriltag_ros" type="apriltag_ros_continuous_node" name="$(arg node_namespace)" clear_params="true" output="screen" launch-prefix="$(arg launch_prefix)" >
    <!-- Remap topics from those used in code to those on the ROS network -->
    <remap from="image_rect" to="$(arg camera_name)/$(arg image_topic)" />
    <remap from="camera_info" to="$(arg camera_name)/camera_info" />

    <param name="camera_frame" type="str" value="$(arg camera_frame)" />
    <param name="publish_tag_detections_image" type="bool" value="true" />      <!-- default: false -->


ArTag Markers

In this part ill explain the usage of ar_track_alvar package for AR tag tracking. This package is a ROS wrapper for Alvar, an open source AR tag tracking library.

ar_track_alvar node has 4 main functionalities:

  • Generating AR tags of varying size, resolution, and data/ID encoding
  • Identifying and tracking the pose of individual AR tags, optionally integrating kinect depth data (when a kinect is available) for better pose estimates. Identifying and tracking the pose of bundles consisting of multiple tags. This allows for more stable pose estimates, robustness to occlusions, and tracking of multi-sided objects.

Alvar features adaptive thresholding to handle a variety of lighting conditions, optical flow based tracking for more stable pose estimation, and an improved tag identification method that does not significantly slow down as the number of tags increases.


for melodic-devel:

sudo apt-get install ros-melodic-ar-track-alvar ros-melodic-ar-track-alvar-msgs ros-melodic-image-proc

or just clone and build the github repo Github.

In order to use AR Marker properly with your camera, be sure to add the camera model to the launch command when using AR Marker (camera_calibration process as described in aprtiltag section.

Generating AR tags

Two pdf files are in the markers directory containing tags 0-8 and 9-17, respectively. Alternativly, we can get the tags from ar_track_alvar and resize to our use. If you want to generate your own markers with different ID numbers, border widths, or sizes, run:

rosrun ar_track_alvar createMarker 0 -s 10.0

This will create MarkerData_0.png file that stores a 10cm x 10cm marker with id 0. Print this file on a sheet of paper.

Due to differences in printer setups, the actual size of the printed marker may be different. Make sure the marker_size parameter represents the actual size (in centimeters) of the AR tag.


In order to identify and track the poses of (possibly) multiple AR tags that are each considered individually. The packges uses the node individualMarkers which takes the following parameters:

  • marker_size (double) – The width in centimeters of one side of the black square marker border
  • max_new_marker_error (double) – A threshold determining when new markers can be detected under uncertainty
  • max_track_error (double) – A threshold determining how much tracking error can be observed before an tag is considered to have disappeared
  • camera_image (string) – The name of the topic that provides camera frames for detecting the AR tags. This can be mono or color, but should be an UNrectified image, since rectification takes place in this package
  • camera_info (string) – The name of the topic that provides the camera calibration parameters so that the image can be rectified
  • output_frame (string) – The name of the frame that the published Cartesian locations of the AR tags will be relative to

IMPORTANT: The node assumes that a Kinect camera being used as the camera, so that depth data can be integrated for better pose estimates. If you are not using a Kinect or do not desire to use depth data improvements, use individualMarkersNoKinect instead.

To use the package we need to create a new launch file. Inside launch/ directory add alvar_track.launch with the following content:


<arg name="marker_frame_id"     default="world"/>
<arg name="user_marker_size"	  default="7.0"/>

<arg name="camera_model" default="astra_pro" doc="model type [astra_pro, realsense_d435, raspicam]"/>
<arg name="camera_namespace" default="camera"/>
<arg name="rgb_camera_info_url"   default="package://open_manipulator_p_camera/camera_info/$(arg camera_model).yaml" />
<arg name="depth_camera_info_url" default="" />

<include file="$(find ar_track_alvar)/launch/pr2_indiv_no_kinect.launch">
  <arg name="marker_size" value="$(arg user_marker_size)" />
  <arg name="max_new_marker_error" value="0.08" />
  <arg name="max_track_error" value="0.2" />
  <arg name="cam_image_topic" value="$(arg camera_namespace)/image_raw" />
  <arg name="cam_info_topic" value="$(arg camera_namespace)/camera_info" />
  <arg name="output_frame" value="$(arg marker_frame_id)" />



Sometimes it is advantageous to treat “bundles” of multiple tags as a single unit. For example, this can allow for the estimation of the pose of a many-sided object, even when some of the tags cannot be seen. A tag bundle is defined by an XML file that lists a set of tag IDs and their positions relative to a master tag. The master tag always comes first in the XML file and defines a coordinate system for the rest of the tags.

IMPORTANT: this coordinate system is different from the standard system used in ROS! In this system, when facing the tag, positive-z comes out of the front of the tag toward the viewer, positive-x is to the right, and positive-y is up.

To create a bundle, first choose which tag you want to be the master tag. Treat the center of the master tag as (0,0,0). Then, after placing the rest of the tags, measure the x, y, and z coordinate for each of the 4 corners of all of the tags, relative to the master tag origin. Enter these measurements for each tag into the XML file starting with the lower left corner and progressing counter-clockwise around the tag. After creating the XML file all you need is to add the following parameter to your launch file.

  • bundle_files (multiple strings) – A list of XML file names, one for each bundle you wish to detect.

An example XML file for 2 markers:

<?xml version="1.0" encoding="UTF-8" standalone="no" ?>
<multimarker markers="2">
    <marker index="8" status="1">
        <corner x="-2.2" y="-2.2" z="0.0" />
        <corner x="2.2" y="-2.2" z="0.0" />
        <corner x="2.2" y="2.2" z="0.0" />
        <corner x="-2.2" y="2.2" z="0.0" />
    <marker index="9" status="1">
        <corner x="-2.2" y="11.8" z="0.0" />
        <corner x="2.2" y="11.8" z="0.0" />
        <corner x="2.2" y="16.2" z="0.0" />
        <corner x="-2.2" y="16.2" z="0.0" />

ArUco Marker




Osher Azulay
Osher Azulay

My research interests include robotic manipulation, deep reinforcement learning.