Pros and cons of Megatron-LM and Mesh-TensorFlow
In general, Mesh-TensorFlow is built on top of TensorFlow, which is not very popular when compared with PyTorch-based solutions. The most important thing is that TensorFlow code is more complicated to write when compared with PyTorch.
From a research standpoint, Mesh-TensorFlow does not involve significant research when compared with Megatron-LM.
Therefore, in a nutshell, we suggest that you use Megatron-LM.