Tools
Telecommunication can be seen as being comprised of the two elements of signaling and media, and so can the tools to debug and troubleshoot it.
But this comes with a caveat: Anything that runs in a server, or through wires, is just packets of data. So, for media too, we will have to get the stream of data describing our audio (or video, or fax), then convert it into a playable format, and hear what the end user experience was.
Bottom line: Packet capture, analysis, conversion, editing, archiving, slicing and dicing is the bread and butter of diagnosing VoIP for all that concerns codecs, routing, networking, infrastructure, and the like, while media replaying (say, listening or watching captured RTP packets) has in itself a lesser role. But actual end user experience can only be understood via media replaying (are there audio artifacts? Low volume? Echo? Noise? Clipping?). Understanding end user experience "in their own words" is fundamental for a smooth support assistance ("then, since...