We have all of this data in our 3D world, representing objects and other visuals that we need to get onto the screen; however, how do we go about describing what is in our limited view? For this, we need a camera. A camera in this case is a virtual representation of our view into the world. It stores information about where we are, what direction we are looking in, and how we need to transform the third dimension so that it either looks real or useful. Using cameras, we can also define part of the gameplay. Players will play a first person shooter (where the camera is in the head of the player character) differently than a third person shooter, where the camera sits over the shoulder of the player character. Back in Chapter 2, Drawing 2D Sprites, I described the two different projections we use to do that transformation: the perspective and orthographic projections.
By combining all of the data into a single matrix, we have a nice and easy way to transform all of the different vertices through the different spaces until we reach the correct location on the 2D screen to display the pixels.
Now we will take a quick look at the different spaces, and how we can use a camera to go from a collection of vertices to pixels on the screen.
The vertices begin in Model Space. This is a coordinate space where all of the vertices are relative to an origin that was specified when the model was created. For example, the origin of a character model could be at the center of its body; however, that character may be nowhere near [0,0,0] when placed in the world. We can use this space to work with the single model without worrying about the fact that we may use this model at a distant location in the world.
These vertices are then transformed into World Space by using matrices derived from the position, rotation, and scale of the model. Now we are putting this object in context with everything else in the world.
Once it has its place within the world, we need to think about how to get everything relative to the camera. The next step is to transform the vertices into View Space, which is a coordinate space relative to the camera (the camera is at [0,0,0]).
Now that we have everything relative to the camera, we need to project the vertices in front of us onto the 2D screen plane, which transforms the vertices into Screen Space (also known as Post Projection Space). Now we have vertices in their correct locations relative to the screen (with an origin in the center of the screen), allowing the API to handle the final transforms and clipping required to get a 2D pixel grid for rendering.
By combining each of these transformations together we form a World-View-Projection transform that takes a vertex right through to screen space. This is done within the vertex shader, which will be explained later on.
Direct3D doesn't have any functions to do that for you, but DirectXMath (also provided in the Windows 8 SDK) has plenty of functions to do the math and ensure you get the right result.