Using TFS
TFS is a native model server for TensorFlow 1, TensorFlow 2, and Keras models. It is designed to provide a flexible and high-performance runtime environment with an extensive management API and operational features (such as logging and metrics). AWS provides TFS as part of TensorFlow inference containers (https://github.com/aws/deep-learning-containers/tree/master/tensorflow/inference/docker).
Reviewing TFS concepts
TFS has a concept known as servable that encapsulates all model and code assets required for inference. To prepare servable for TFS serving, you need to package the trained model into SavedModel format. A SavedModel contains a complete TensorFlow program, including trained parameters and computation. It does not require the original model building code to run, which makes it useful for sharing or deploying across the TFS ecosystem (for example, using TFLite, TensorFlow.js, or TFS). You can package more than one model as well as specific model lookups or...