Model vs Estimator

라이브러리/Tensorflow keras

Model vs Estimator

rongxian 2022. 1. 3. 17:01

Background

The Estimators API was added to Tensorflow in Release 1.1, and provides a high-level abstraction over lower-level Tensorflow core operations. It works with an Estimator instance, which is TensorFlow's high-level representation of a complete model.

Keras is similar to the Estimators API in that it abstracts deep learning model components such as layers, activation functions and optimizers, to make it easier for developers. It is a model-level library, and does not handle low-level operations, which is the job of tensor manipulation libraries, or backends. Keras supports three backends - Tensorflow, Theano and CNTK.

Keras was not part of Tensorflow until Release 1.4.0 (2 Nov 2017). Now, when you use tf.keras (or talk about 'Tensorflow Keras'), you are simply using the Keras interface with the Tensorflow backend to build and train your model.

So both the Estimator API and Keras API provides a high-level API over low-level core Tensorflow API, and you can use either to train your model. But in most cases, if you are working with Tensorflow, you'd want to use the Estimators API for the reasons listed below.

Distribution

You can conduct distributed training across multiple servers with the Estimators API, but not with Keras API.

From the Tensorflow Keras Guide, it says that:

The Estimators API is used for training models for distributed environments.

And from the Tensorflow Estimators Guide, it says that:

You can run Estimator-based models on a local host or on a distributed multi-server environment without changing your model. Furthermore, you can run Estimator-based models on CPUs, GPUs, or TPUs without recoding your model.

Pre-made Estimator

Whilst Keras provides abstractions that makes building your models easier, you still have to write code to build your model. With Estimators, Tensorflow provides Pre-made Estimators, which are models which you can use straight away, simply by plugging in the hyperparameters.

Pre-made Estimators are similar to how you'd work with scikit-learn. For example, the tf.estimator.LinearRegressor from Tensorflow is similar to the sklearn.linear_model.LinearRegression from scikit-learn.

Integration with Other Tensorflow Tools

Tensorflow provides a vistualzation tool called TensorBoard that helps you visualize your graph and statistics. By using an Estimator, you can easily save summaries to be visualized with Tensorboard.

Converting Keras Model to Estimator

To migrate a Keras model to an Estimator, use the tf.keras.estimator.model_to_estimator method.

https://stackoverflow.com/questions/51455863/whats-the-difference-between-a-tensorflow-keras-model-and-estimator

What's the difference between a Tensorflow Keras Model and Estimator?

Both Tensorflow Keras models and Tensorflow Estimators are able to train neural network models and use them to predict new data. They are both high-level APIs that sits on top of the low-level core

stackoverflow.com

코드 예시

import os
import time
import tensorflow as tf
import numpy as np
LABEL_DIMENSIONS = 10

(X_train, y_train), (X_test, y_test) = tf.keras.datasets.mnist.load_data()
TRIANING_SIZE = len(X_train)
TEST_SIZE = len(X_test)

X_train = X_train.astype(np.float32) / 255.
X_test = X_test.astype(np.float32) / 255.

X_train = np.expand_dims(X_train, axis=-1)
X_test = np.expand_dims(X_test, axis=-1)

y_train = tf.keras.utils.to_categorical(y_train, LABEL_DIMENSIONS)
y_test = tf.keras.utils.to_categorical(y_test, LABEL_DIMENSIONS)

print(X_train.shape)

inputs = tf.keras.Input(shape=(28,28,1))
x = tf.keras.layers.Conv2D(filters=32, kernel_size=(3,3), activation='relu')(inputs)
x = tf.keras.layers.MaxPooling2D(pool_size=(2,2), strides=2)(x)
x = tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3), activation='relu')(x)
x = tf.keras.layers.MaxPooling2D(pool_size=(2,2), strides=2)(x)
x = tf.keras.layers.Conv2D(filters=64, kernel_size=(3,3), activation='relu')(x)
x = tf.keras.layers.Flatten()(x)
x = tf.keras.layers.Dense(64, activation='relu')(x)
predictions = tf.keras.layers.Dense(LABEL_DIMENSIONS, activation='softmax')(x)
model = tf.keras.Model(inputs=inputs, outputs=predictions)
model.summary()

optimizer = tf.keras.optimizers.SGD()
model.compile(loss='categorical_crossentropy',
              optimizer=optimizer, metrics=['accuracy'])
strategy = None
# strategy = tf.distribute.MirroredStrategy()
config = tf.estimator.RunConfig(train_distribute=strategy)

estimator = tf.keras.estimator.model_to_estimator(model, config=config)

def input_fn(images, labels, epochs, batch_size):
    dataset = tf.data.Dataset.from_tensor_slices((images, labels))
    
    SHUFFLE_SIZE = 5000
    dataset = dataset.shuffle(SHUFFLE_SIZE).repeat(epochs).batch(batch_size)
    dataset = dataset.prefetch(None)
    
    return dataset
    
    
BATCH_SIZE = 512
EPOCHS = 50
estimator_train_result = estimator.train(input_fn=lambda:input_fn(X_train, y_train, epochs=EPOCHS, batch_size=BATCH_SIZE))

"""
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Warm-starting with WarmStartSettings: WarmStartSettings(ckpt_to_initialize_from='/tmp/tmp1pvrd1zf/keras/keras_model.ckpt', vars_to_warm_start='.*', var_name_to_vocab_info={}, var_name_to_prev_var_name={})
INFO:tensorflow:Warm-starting from: /tmp/tmp1pvrd1zf/keras/keras_model.ckpt
INFO:tensorflow:Warm-starting variables only in TRAINABLE_VARIABLES.
INFO:tensorflow:Warm-started 10 variables.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 0...
INFO:tensorflow:Saving checkpoints for 0 into /tmp/tmp1pvrd1zf/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 0...
INFO:tensorflow:loss = 2.3034368, step = 0
INFO:tensorflow:global_step/sec: 178.284
INFO:tensorflow:loss = 2.239314, step = 100 (0.562 sec)
INFO:tensorflow:global_step/sec: 191.352
INFO:tensorflow:loss = 2.0096796, step = 200 (0.523 sec)
INFO:tensorflow:global_step/sec: 194.9
INFO:tensorflow:loss = 0.97391784, step = 300 (0.513 sec)
INFO:tensorflow:global_step/sec: 192.675
INFO:tensorflow:loss = 0.54486215, step = 400 (0.519 sec)
INFO:tensorflow:global_step/sec: 184.113
INFO:tensorflow:loss = 0.37952545, step = 500 (0.544 sec)
INFO:tensorflow:global_step/sec: 184.681
INFO:tensorflow:loss = 0.32539126, step = 600 (0.543 sec)
INFO:tensorflow:global_step/sec: 178.475
INFO:tensorflow:loss = 0.27206874, step = 700 (0.559 sec)
INFO:tensorflow:global_step/sec: 183.713
INFO:tensorflow:loss = 0.2957946, step = 800 (0.544 sec)
INFO:tensorflow:global_step/sec: 179.175
INFO:tensorflow:loss = 0.2544106, step = 900 (0.558 sec)
INFO:tensorflow:global_step/sec: 183.919
INFO:tensorflow:loss = 0.18423726, step = 1000 (0.544 sec)
INFO:tensorflow:global_step/sec: 179.745
INFO:tensorflow:loss = 0.2556684, step = 1100 (0.556 sec)
INFO:tensorflow:global_step/sec: 189.261
INFO:tensorflow:loss = 0.18466713, step = 1200 (0.529 sec)
INFO:tensorflow:global_step/sec: 184.846
INFO:tensorflow:loss = 0.23052645, step = 1300 (0.540 sec)
INFO:tensorflow:global_step/sec: 183.611
INFO:tensorflow:loss = 0.14340779, step = 1400 (0.545 sec)
INFO:tensorflow:global_step/sec: 184.251
INFO:tensorflow:loss = 0.17525977, step = 1500 (0.542 sec)
INFO:tensorflow:global_step/sec: 180.979
INFO:tensorflow:loss = 0.1672284, step = 1600 (0.553 sec)
INFO:tensorflow:global_step/sec: 181.072
INFO:tensorflow:loss = 0.15280569, step = 1700 (0.552 sec)
INFO:tensorflow:global_step/sec: 182.724
INFO:tensorflow:loss = 0.18575056, step = 1800 (0.548 sec)
INFO:tensorflow:global_step/sec: 188.865
INFO:tensorflow:loss = 0.15243016, step = 1900 (0.529 sec)
INFO:tensorflow:global_step/sec: 179.765
INFO:tensorflow:loss = 0.20070502, step = 2000 (0.557 sec)
INFO:tensorflow:global_step/sec: 179.259
INFO:tensorflow:loss = 0.09927731, step = 2100 (0.557 sec)
INFO:tensorflow:global_step/sec: 179.88
INFO:tensorflow:loss = 0.10036966, step = 2200 (0.556 sec)
INFO:tensorflow:global_step/sec: 178.489
INFO:tensorflow:loss = 0.1511803, step = 2300 (0.560 sec)
INFO:tensorflow:global_step/sec: 178.423
INFO:tensorflow:loss = 0.13978498, step = 2400 (0.561 sec)
INFO:tensorflow:global_step/sec: 179.009
INFO:tensorflow:loss = 0.09983465, step = 2500 (0.559 sec)
INFO:tensorflow:global_step/sec: 178.036
INFO:tensorflow:loss = 0.12313335, step = 2600 (0.562 sec)
INFO:tensorflow:global_step/sec: 177.117
INFO:tensorflow:loss = 0.09517769, step = 2700 (0.564 sec)
INFO:tensorflow:global_step/sec: 175.814
INFO:tensorflow:loss = 0.1088136, step = 2800 (0.568 sec)
INFO:tensorflow:global_step/sec: 179.151
INFO:tensorflow:loss = 0.11427465, step = 2900 (0.559 sec)
INFO:tensorflow:global_step/sec: 176.516
INFO:tensorflow:loss = 0.1161906, step = 3000 (0.566 sec)
INFO:tensorflow:global_step/sec: 185.02
INFO:tensorflow:loss = 0.12519513, step = 3100 (0.541 sec)
INFO:tensorflow:global_step/sec: 179.65
INFO:tensorflow:loss = 0.123464614, step = 3200 (0.557 sec)
INFO:tensorflow:global_step/sec: 178.158
INFO:tensorflow:loss = 0.08784182, step = 3300 (0.561 sec)
INFO:tensorflow:global_step/sec: 180.627
INFO:tensorflow:loss = 0.054795217, step = 3400 (0.555 sec)
INFO:tensorflow:global_step/sec: 177.08
INFO:tensorflow:loss = 0.07353416, step = 3500 (0.564 sec)
INFO:tensorflow:global_step/sec: 179.721
INFO:tensorflow:loss = 0.09652375, step = 3600 (0.556 sec)
INFO:tensorflow:global_step/sec: 178.426
INFO:tensorflow:loss = 0.10172101, step = 3700 (0.560 sec)
INFO:tensorflow:global_step/sec: 179.012
INFO:tensorflow:loss = 0.08302882, step = 3800 (0.559 sec)
INFO:tensorflow:global_step/sec: 178.118
INFO:tensorflow:loss = 0.09580868, step = 3900 (0.561 sec)
INFO:tensorflow:global_step/sec: 178.444
INFO:tensorflow:loss = 0.0684932, step = 4000 (0.560 sec)
INFO:tensorflow:global_step/sec: 176.784
INFO:tensorflow:loss = 0.081661016, step = 4100 (0.566 sec)
INFO:tensorflow:global_step/sec: 181.015
INFO:tensorflow:loss = 0.07420032, step = 4200 (0.552 sec)
INFO:tensorflow:global_step/sec: 180.23
INFO:tensorflow:loss = 0.06586884, step = 4300 (0.555 sec)
INFO:tensorflow:global_step/sec: 180.78
INFO:tensorflow:loss = 0.08294612, step = 4400 (0.553 sec)
INFO:tensorflow:global_step/sec: 180.108
INFO:tensorflow:loss = 0.09678737, step = 4500 (0.556 sec)
INFO:tensorflow:global_step/sec: 180.694
INFO:tensorflow:loss = 0.07421845, step = 4600 (0.553 sec)
INFO:tensorflow:global_step/sec: 180.526
INFO:tensorflow:loss = 0.06995378, step = 4700 (0.553 sec)
INFO:tensorflow:global_step/sec: 179.617
INFO:tensorflow:loss = 0.082376555, step = 4800 (0.557 sec)
INFO:tensorflow:global_step/sec: 180.769
INFO:tensorflow:loss = 0.08412885, step = 4900 (0.553 sec)
INFO:tensorflow:global_step/sec: 181.961
INFO:tensorflow:loss = 0.06900973, step = 5000 (0.550 sec)
INFO:tensorflow:global_step/sec: 179.384
INFO:tensorflow:loss = 0.06294246, step = 5100 (0.557 sec)
INFO:tensorflow:global_step/sec: 181.96
INFO:tensorflow:loss = 0.07214558, step = 5200 (0.550 sec)
INFO:tensorflow:global_step/sec: 182.539
INFO:tensorflow:loss = 0.05346456, step = 5300 (0.548 sec)
INFO:tensorflow:global_step/sec: 181.052
INFO:tensorflow:loss = 0.07074894, step = 5400 (0.553 sec)
INFO:tensorflow:global_step/sec: 179.281
INFO:tensorflow:loss = 0.07418828, step = 5500 (0.558 sec)
INFO:tensorflow:global_step/sec: 180.93
INFO:tensorflow:loss = 0.092001915, step = 5600 (0.553 sec)
INFO:tensorflow:global_step/sec: 178.848
INFO:tensorflow:loss = 0.06587793, step = 5700 (0.559 sec)
INFO:tensorflow:global_step/sec: 177.853
INFO:tensorflow:loss = 0.04739695, step = 5800 (0.562 sec)
INFO:tensorflow:Calling checkpoint listeners before saving checkpoint 5860...
INFO:tensorflow:Saving checkpoints for 5860 into /tmp/tmp1pvrd1zf/model.ckpt.
INFO:tensorflow:Calling checkpoint listeners after saving checkpoint 5860...
INFO:tensorflow:Loss for final step: 0.053352103.
"""


estimator.evaluate(lambda: input_fn(X_test, y_test, epochs=1, batch_size=BATCH_SIZE))
"""
INFO:tensorflow:Calling model_fn.
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Starting evaluation at 2022-01-03T08:03:44
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Restoring parameters from /tmp/tmp1pvrd1zf/model.ckpt-5860
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Inference Time : 0.30421s
INFO:tensorflow:Finished evaluation at 2022-01-03-08:03:45
INFO:tensorflow:Saving dict for global step 5860: accuracy = 0.981, global_step = 5860, loss = 0.062440597
INFO:tensorflow:Saving 'checkpoint_path' summary for global step 5860: /tmp/tmp1pvrd1zf/model.ckpt-5860
{'accuracy': 0.981, 'loss': 0.062440597, 'global_step': 5860}
"""