Augmented Reality tries to fuse the real-world object with virtual content. To place a 3D model in a scene, we need to know its pose with regard to a camera that we use to obtain the video frames. We will use a Euclidian transformation in the Cartesian coordinate system to represent such a pose.
The position of the marker in 3D and its corresponding projection in 2D is restricted by the following equation:
P = A * [R|T] * M;
M denotes a point in a 3D space
[R|T] denotes a [3|4] matrix representing a Euclidian transformation
A denotes a camera matrix or a matrix of intrinsic parameters
P denotes projection of M in screen space
After performing the marker detection step we now know the position of the four marker corners in 2D (projections in screen space). In the next section you will learn how to obtain the A matrix and M vector parameters and calculate the [R|T] transformation.