Training scripts #15

leondgarse · 2020-12-09T02:11:30Z

leondgarse
Dec 9, 2020
Maintainer

ResNet101V2 using nadam and finetuning with triplet

Summary
- Trained using nadam optimizer.
- Fine-tuning using BatchHardTripletLoss only, which may reach better score on these eval datasets, but may harm the margin distance between different classes.
- Training bottleneckOnly may be not a necessary.
Training script

from tensorflow import keras
import losses, train, models

basic_model = models.buildin_models("ResNet101V2", dropout=0.4, emb_shape=512, output_layer="E")
data_path = '/datasets/faces_emore_112x112_folders'
eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']

tt = train.Train(data_path, save_path='keras_resnet101_emore_II_triplet.h5', eval_paths=eval_paths,
                basic_model=basic_model, lr_base=0.001, batch_size=896, random_status=3)
sch = [
  {"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "epoch": 25, "optimizer": "nadam"},
  {"loss": losses.ArcfaceLoss(scale=64), "bottleneckOnly": True, "epoch": 4},
  {"loss": losses.ArcfaceLoss(scale=64), "epoch": 35},
  {"loss": losses.BatchHardTripletLoss(0.35), "epoch": 10},
  {"loss": losses.BatchHardTripletLoss(0.3), "epoch": 10},
  {"loss": losses.BatchHardTripletLoss(0.25), "epoch": 10},
  {"loss": losses.BatchHardTripletLoss(0.2), "epoch": 10},
  {"loss": losses.BatchHardTripletLoss(0.15), "epoch": 10},
]
tt.train(sch, 0)

Plot result

import plot
customs = ["lfw", "agedb_30", "cfp_fp", "lr"]
epochs = [25, 4, 35, 10, 10, 10, 10, 10]
names = ["Softmax", "Bottleneck Arcface", "Arcface scale=64", "Triplet alpha=0.35", "Triplet alpha=0.3", "Triplet alpha=0.25", "Triplet alpha=0.2", "Triplet alpha=0.15"]
axes, _ = plot.hist_plot_split('./checkpoints/keras_resnet101_emore_II_triplet_hist.json', epochs, names=names, customs=customs, fig_label='Resnet101, BS=896, label_smoothing=0.1')

leondgarse · 2021-01-16T08:24:37Z

leondgarse
Jan 16, 2021
Maintainer Author

Mobilenet using Adamw + SGDW training on Emore dataset

Training script
Train first 80 epochs softmax + Adamw

import losses, train, models
import tensorflow_addons as tfa

data_path = '/datasets/faces_emore_112x112_folders'
eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']

basic_model = models.buildin_models("mobilenet", dropout=0, emb_shape=256, output_layer='GDC')
tt = train.Train(data_path, save_path='keras_mobilenet_emore_adamw_5e5_soft_baseline.h5', eval_paths=eval_paths,
    basic_model=basic_model, lr_base=0.001, batch_size=256, random_status=0)
optimizer = tfa.optimizers.AdamW(learning_rate=0.001, weight_decay=5e-5)
sch = [
    {"loss": keras.losses.CategoricalCrossentropy(label_smoothing=0.1), "epoch": 80, "optimizer": optimizer},
    {"loss": losses.ArcfaceLoss(scale=64), "epoch": 2, "bottleneckOnly": True},
]

tt.train(sch, 0)

Reload model and train 30 epochs arcface + SGDW

import losses, train, models
import tensorflow_addons as tfa

data_path = '/datasets/faces_emore_112x112_folders'
eval_paths = ['/datasets/faces_emore/lfw.bin', '/datasets/faces_emore/cfp_fp.bin', '/datasets/faces_emore/agedb_30.bin']

tt = train.Train(data_path, save_path='keras_mobilenet_emore_adamw_5e5_soft_baseline.h5', eval_paths=eval_paths,
    basic_model=None, model="checkpoints/keras_mobilenet_emore_adamw_5e5_soft_baseline.h5",
    lr_base=0.0001, lr_decay=0.1, lr_decay_steps=[90, 100], batch_size=256, random_status=0)
optimizer = tfa.optimizers.SGDW(learning_rate=0.0001, weight_decay=5e-6, momentum=0.9)

# tt.train([{"loss": losses.ArcfaceLoss(scale=64), "epoch": 30, "optimizer": optimizer}], 80)
tt.train_single_scheduler(loss=losses.ArcfaceLoss(scale=64), epoch=30, optimizer=optimizer, initial_epoch=80)

Plot and result

Mobilenet Adamw + SGDW, Emore 1e-06 1e-05 0.0001 0.001 0.01 0.1

IJBB_11 0.393184 0.765433 0.887147 0.93593 0.964362 0.982278

IJBC_11 0.741423 0.848699 0.911745 0.951629 0.974127 0.98737

3 replies

aidansmyth95ifx Nov 12, 2024

Why was faces_emore used instead of the usual ms1m-retinaface-t1 for this model? Or did it not really matter? Better performance?

leondgarse Nov 14, 2024
Maintainer Author

Just showing the result on this dataset, no specific reason, and MS1M definitely performs better in most tests.

aidansmyth95ifx Nov 14, 2024

Thank you again :-) Makes sense

leondgarse · 2021-01-30T09:09:13Z

leondgarse
Jan 30, 2021
Maintainer Author

Ghostnet using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

models.replace_ReLU_with_PReLU(basic_model, target_activation="swish") will replace all relu activation with swish.

Training script

import losses, train, models
data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

basic_model = models.buildin_models("ghostnet", dropout=0, emb_shape=512, output_layer='GDC', bn_momentum=0.9, bn_epsilon=1e-5)
basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)
basic_model = models.replace_ReLU_with_PReLU(basic_model, target_activation="swish")

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_ghostnet_s2_swish_GDC_lr01_bs512_test.h5', basic_model=basic_model, model=None,
    lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5,
    batch_size=512, random_status=0, eval_freq=1, output_weight_decay=1)

optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [
    {"loss": losses.ArcfaceLoss(scale=32), "epoch": 1, "optimizer": optimizer},
    {"loss": losses.ArcfaceLoss(scale=64), "epoch": 49},
]
tt.train(sch, 0)

Plot and result

Ghostnet MS1MV3	1e-06	1e-05	0.0001	0.001	0.01	0.1
IJBB_11, PReLU	0.352483	0.835151	0.912463	0.950341	0.97001	0.983739
IJBB_11, hard_swish	0.366018	0.865823	0.921519	0.952678	0.971568	0.984421
IJBB_11, swish	0.361831	0.866796	0.923661	0.952191	0.970399	0.984907

IJBC_11, PReLU	0.817406	0.889656	0.934499	0.96119	0.977451	0.988495
IJBC_11, hard_swish	0.861533	0.906172	0.941402	0.963747	0.979496	0.988597
IJBC_11, swish	0.8623	0.910774	0.941402	0.963287	0.977911	0.988904

0 replies

leondgarse · 2021-04-08T07:37:32Z

leondgarse
Apr 8, 2021
Maintainer Author

Ghostnet strides=1 float16, using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

Training script

Just specify strides=1 in models.buildin_models, will set strides=1 in the first Conv2d layer.
Training this will be ~3x times slower than the default strides=2, and inference time is also ~4x times longer.
models.replace_ReLU_with_PReLU(basic_model, target_activation="swish") will replace all relu activation with swish, uncluding se_module.

import losses, train, models
data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

basic_model = models.buildin_models("ghostnet", dropout=0, emb_shape=512, output_layer='GDC', bn_momentum=0.9, bn_epsilon=1e-5, strides=1)
basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)
basic_model = models.replace_ReLU_with_PReLU(basic_model, target_activation="swish")

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_ghostnet_s1_swish_GDC_lr003125_bs160_test.h5', basic_model=basic_model, model=None,
    lr_base=0.03125, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5,
    batch_size=160, random_status=0, eval_freq=1, output_weight_decay=1)

optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [
    {"loss": losses.ArcfaceLoss(scale=32), "epoch": 1, "optimizer": optimizer},
    {"loss": losses.ArcfaceLoss(scale=64), "epoch": 49},
]
tt.train(sch, 0)

Plot and result

Ghostnet strides 1 MS1MV3	1e-06	1e-05	0.0001	0.001	0.01	0.1
IJBB_11, PReLU, bs1024	0.360467	0.879065	0.931159	0.957644	0.97186	0.985102
IJBB_11, hard_swish, bs160	0.376728	0.884518	0.933106	0.957254	0.972833	0.9852
IJBB_11, swish, bs160	0.42298	0.89221	0.937098	0.959202	0.973418	0.984518
IJBB_11, swish, bs512	0.350536	0.894255	0.936319	0.960078	0.97332	0.986271

IJBC_11, PReLU, bs1024	0.873038	0.921563	0.94943	0.967684	0.979189	0.989569
IJBC_11, hard_swish, bs160	0.890525	0.927238	0.950759	0.968196	0.981234	0.989518
IJBC_11, swish, bs160	0.887764	0.929846	0.95306	0.968707	0.980467	0.989313
IJBC_11, swish, bs512	0.891343	0.929999	0.952549	0.968963	0.981132	0.989671

Go on training for another 17 epochs with lr_base=0.025 * 160 / 512

import losses, train, models
data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_ghostnet_s1_swish_GDC_lr003125_bs160_test_E50_arc.h5',
    basic_model=None, model="checkpoints/TT_ghostnet_s1_swish_GDC_lr003125_bs160_test.h5",
    lr_base=0.025 * 160 / 512, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5,
    batch_size=160, random_status=0, eval_freq=1, output_weight_decay=1)

optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [{"loss": losses.ArcfaceLoss(scale=64), "epoch": 17, "optimizer": optimizer}]
tt.train(sch, 0)

Ghostnet strides 1 MS1MV3	1e-06	1e-05	0.0001	0.001	0.01	0.1
IJBB_11, swish, swish	0.380428	0.892697	0.93739	0.959104	0.972639	0.985102
IJBB_11, swish, swish	0.8894	0.930255	0.953163	0.969372	0.980672	0.98962

Inference time using tflite float16 conversion + xnn + threads=4, header GDC + emb_shape=512 on ARM64 Qualcomm Technologies, Inc SDM630

Model Size (MB) Time (ms)

ghostnet strides=2 8.06546 11.125

ghostnet strides=1 8.16576 46.142

0 replies

leondgarse · 2021-04-08T07:38:34Z

leondgarse
Apr 8, 2021
Maintainer Author

Botnet50 using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

Botnet50 from my another project keras_cv_attention_models.

pip install -U git+https://github.com/leondgarse/keras_cv_attention_models

Training script

import losses, train, models
data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

from keras_cv_attention_models import botnet
basic_model = botnet.BotNet50(input_shape=(112, 112, 3), strides=1, num_classes=0)
basic_model = models.buildin_models(basic_model, dropout=0, emb_shape=512, output_layer='GDC', bn_momentum=0.9, bn_epsilon=1e-5)
basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_botnet50_relu_GDC_arc_emb512_dr0_sgd_l2_5e4_bs1024_ms1m_bnm09_bne1e5_cos16_batch_restart_2_bias_false.h5',
    basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-4,
    batch_size=1024, random_status=0, eval_freq=2000, output_weight_decay=1)

optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [
    {"loss": losses.ArcfaceLoss(scale=32), "epoch": 1, "optimizer": optimizer},
    {"loss": losses.ArcfaceLoss(scale=64), "epoch": 49},
]
tt.train(sch, 0)

Plot and result

Botnet50_relu MS1MV3 1e-06 1e-05 0.0001 0.001 0.01 0.1

IJBB_11 0.384226 0.89591 0.940019 0.958325 0.973126 0.984323

IJBC_11 0.8894 0.933834 0.95577 0.970292 0.981439 0.988546

0 replies

leondgarse · 2021-05-27T07:53:20Z

leondgarse
May 27, 2021
Maintainer Author

Resnet50V2 / Resnet101V2 swish using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

Training script

AotNet50V2 / AotNet101V2 is a typical Resnet50V2 / Resnet101V2 with use_bias=False in Conv2D layers.
```
pip install keras-cv-attention-models
```
strides=1 will set kernel_size=3, strides=1 instead of kernel_size=7, strides=2 in the first Conv2D layer.

with tf.distribute.MirroredStrategy().scope():
    import losses, train, models
    keras.mixed_precision.set_global_policy("mixed_float16")
  
    data_basic_path = '/datasets/ms1m-retinaface-t1'
    data_path = data_basic_path + '_112x112_folders'
    eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

    from keras_cv_attention_models import aotnet
    basic_model = aotnet.AotNet50V2(input_shape=(112, 112, 3), strides=1, num_classes=0)
    # basic_model = aotnet.AotNet101V2(input_shape=(112, 112, 3), strides=1, num_classes=0)
    basic_model = models.buildin_models(basic_model, dropout=0, emb_shape=512, output_layer='GDC', bn_momentum=0.9, bn_epsilon=1e-5)
    basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)
    basic_model = models.replace_ReLU_with_PReLU(basic_model, target_activation='swish')
  
    tt = train.Train(data_path, eval_paths=eval_paths,
        save_path='TT_resnet101v2_swish_pad_same_first_conv_k3_stride_1_conv_no_bias_E_arc_emb512_dr04_sgd_l2_5e4_bs384_ms1m_bnm09_bne1e4_cos16.h5',
        basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5,
        batch_size=384, random_status=0, eval_freq=2000, output_weight_decay=1)
  
    optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
    sch = [
        {"loss": losses.ArcfaceLoss(scale=32), "epoch": 1, "optimizer": optimizer},
        {"loss": losses.ArcfaceLoss(scale=64), "epoch": 49},
    ]
    tt.train(sch, 0)

Plot and result

resnet50v2, swish 1e-06 1e-05 0.0001 0.001 0.01 0.1

IJBB_11 0.40224 0.916943 0.949951 0.96446 0.976728 0.98666

IJBC_11 0.909853 0.946106 0.963696 0.974383 0.983842 0.990694

resnet101v2, swish 1e-06 1e-05 0.0001 0.001 0.01 0.1

IJBB_11 0.397371 0.914606 0.952483 0.967381 0.978773 0.987439

IJBC_11 0.900138 0.948816 0.966406 0.977144 0.985172 0.99187

2 replies

gitlabspy Jan 19, 2022

您好，请问源码中botnet是被移除了吗？

leondgarse Jan 19, 2022
Maintainer Author

是的，放到另一个项目 keras_cv_attention_models/botnet 里了，参照上面的 BotNet 部分，这里的写错了

! pip install keras-cv-attention-models

...
from keras_cv_attention_models import botnet
basic_model = botnet.BotNet50(input_shape=(112, 112, 3), strides=1, num_classes=0)
...

leondgarse · 2021-06-04T03:19:00Z

leondgarse
Jun 4, 2021
Maintainer Author

r50 swish using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

Training script

import losses, train, models
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

basic_model = models.buildin_models("r50", dropout=0.4, emb_shape=512, output_layer='E', bn_momentum=0.9, bn_epsilon=1e-4, scale=True, use_bias=False)
basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)
basic_model = models.replace_ReLU_with_PReLU(basic_model, target_activation='swish')

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_r50_swish_E_arc_emb512_dr04_sgd_l2_5e4_bs1024_ms1m_cleaned_bnm09_bne1e4_cos16.h5',
    basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5,
    batch_size=1024, random_status=0, eval_freq=2000, output_weight_decay=1)

optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [
    {"loss": losses.ArcfaceLoss(scale=32), "epoch": 1, "optimizer": optimizer},
    {"loss": losses.ArcfaceLoss(scale=64), "epoch": 49},
]
tt.train(sch, 0)

Plot and result

r50, swish 1e-06 1e-05 0.0001 0.001 0.01 0.1

IJBB_11 0.385589 0.915871 0.950828 0.965141 0.976923 0.985784

IJBC_11 0.896712 0.946106 0.964463 0.97607 0.984251 0.990131

0 replies

leondgarse · 2021-06-22T08:15:01Z

leondgarse
Jun 22, 2021
Maintainer Author

se_r50 swish SD (1, 0.8) using SGD + L2 regularizer + cosine lr decay + randaug training on MS1MV3 dataset

Training script

Set use_se=True in models.buildin_models will create a se_r50 / se_r100 model.
models.replace_add_with_stochastic_depth replace add layers in resnet blocks with tensorflow_addons.layers.StochasticDepth.
Training se_r50 without SD may need higher regularizer, and lower for training SD without se.
Setting random_status=100 will use randaug with magnitude=5. The default available_ops for randaug contains only color related ones and cutout. Randaug methods like shear / tansform are removed, as they are not good for face data.

import losses, train, models
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

basic_model = models.buildin_models("r50", dropout=0.4, emb_shape=512, output_layer='E', bn_momentum=0.9, bn_epsilon=1e-4, scale=True, use_bias=False, use_se=True)
basic_model = models.replace_add_with_stochastic_depth(basic_model)
basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)
basic_model = models.replace_ReLU_with_PReLU(basic_model, target_activation='swish')

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_se_r50_SD_swish_E_arc_emb512_dr04_sgd_l2_5e4_bs1024_ms1m_cleaned_bnm09_bne1e4_cos16.h5',
    basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5,
    batch_size=1024, random_status=100, eval_freq=2000, output_weight_decay=1)

optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [
    {"loss": losses.ArcfaceLoss(scale=32), "epoch": 1, "optimizer": optimizer},
    {"loss": losses.ArcfaceLoss(scale=64), "epoch": 49},
]
tt.train(sch, 0)

Plot and result

Method	lfw	cfp_fp	agedb_30	IJBB	IJBC
r50, swish	0.998333	0.989571	0.984333	0.950828	0.964463
se_r50, swish	0.998333	0.989714	0.984	0.950536
r50, swish, SD 0.8	0.9985	0.989714	0.983667	0.949757
se_r50, swish, SD 0.8	0.9985	0.989429	0.9845	0.954333	0.966252
se_r50, swish, SD 0.8, randaug	0.9985	0.990429	0.984000	0.955209	0.967582

Go on training for another 17 epochs with lr_base=0.025

import losses, train, models
import tensorflow_addons as tfa
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_se_r50_SD_swish_E_arc_emb512_dr04_sgd_l2_5e4_bs1024_ms1m_randaug_cutout_bnm09_bne1e4_cos16_float16_E50_arc_sgd_LA.h5',
    basic_model=None, model='checkpoints/TT_se_r50_SD_swish_E_arc_emb512_dr04_sgd_l2_5e4_bs1024_ms1m_randaug_cutout_bnm09_bne1e4_cos16_float16.h5',
    lr_base=0.025, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-4,
    batch_size=1024, random_status=100, output_weight_decay=1)

optimizer = tfa.optimizers.Lookahead(keras.optimizers.SGD(learning_rate=0.1, momentum=0.9))
tt.train({"loss": losses.ArcfaceLoss(scale=64), "epoch": 17, "optimizer": optimizer}, 0)

se_r50, swish, SD 0.8, randaug IJBB / IJBC result detail

se_r50, swish, SD	1e-06	1e-05	0.0001	0.001	0.01	0.1	AUC
IJBB_11	0.380428	0.906329	0.956378	0.968647	0.979844	0.987634	0.994233
IJBC_11	0.897837	0.950248	0.968144	0.978269	0.985836	0.991359	0.995978

0 replies

leondgarse · 2021-06-29T02:11:51Z

leondgarse
Jun 29, 2021
Maintainer Author

Converted official r18 / r34 / r50 / r100 model on glint360k trained by partial fc + cosface

Model architecture

import models

mm = models.buildin_models(
      "r50",  # Or "r18" / "r34" / "r100"
      dropout=0,
      emb_shape=512,
      output_layer='E',
      bn_momentum=0.9,
      bn_epsilon=1e-5,
      use_bias=True,
      scale=True,
      activation='PReLU'
)

Basic accuracy

model	lfw	cfp_fp	agedb_30
glint360k_r18	0.997500	0.977143	0.976500
glint360k_r34	0.998167	0.987000	0.982833
glint360k_r50	0.998333	0.991000	0.983500
glint360k_r100	0.998500	0.992286	0.985167

IJBB accuracy

model	1e-06	1e-05	0.0001	0.001	0.01	0.1	AUC
r18_IJBB	0.40224	0.893574	0.936806	0.959202	0.976534	0.986952	0.99489
r34_IJBB	0.416748	0.917819	0.951801	0.968257	0.979552	0.989484	0.994851
r50_IJBB	0.436611	0.926972	0.957157	0.970691	0.979065	0.986855	0.993991
r100_IJBB	0.438169	0.935443	0.962512	0.972249	0.979649	0.987439	0.993628

IJBC accuracy

model 1e-06 1e-05 0.0001 0.001 0.01 0.1 AUC

r50_IJBC 0.912001 0.956026 0.970292 0.980212 0.986143 0.991717 0.995828

r100_IJBC 0.877947 0.962622 0.974689 0.981643 0.98691 0.991768 0.995767

0 replies

leondgarse · 2021-12-02T09:28:57Z

leondgarse
Dec 2, 2021
Maintainer Author

EffcientNetV2S swish drop_conn 0.2 dropout 0.2 using SGD + L2 regularizer + cosine lr decay + randaug training on MS1MV3 dataset

Training script

EfficientnetV2 from my another project keras_cv_attention_models.
```
pip install keras-cv-attention-models
```
Default EffcientNetV2S using strides=2 on first conv layer. Param first_strides=1 will set it 1, as input shape is only (112, 112).
For output_layer, output_layer='F', add_pointwise_conv=False will have 52,446,048 params, output_layer='F', add_pointwise_conv=True is 33,835,872 params, and output_layer='GDC' is 21,056,608 params.

import losses, train, models
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

from keras_cv_attention_models import efficientnet
basic_model = efficientnet.EfficientNetV2S(input_shape=(112, 112, 3), num_classes=0, drop_connect_rate=0.2, pretrained="imagenet", first_strides=1)
basic_model = models.buildin_models(basic_model, dropout=0.2, emb_shape=512, output_layer='F', bn_momentum=0.9, bn_epsilon=1e-5, add_pointwise_conv=True, pointwise_conv_act="swish", scale=True, use_bias=False)
basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_effv2_s_strides1_pw512_F_dr02_drc02_lr_01_wd5e4_arc_emb512_sgd_bs512_ms1m_float16.h5',
    basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5,
    batch_size=512, random_status=100, eval_freq=4000, output_weight_decay=1)

optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [
    {"loss": losses.ArcfaceLoss(scale=16), "epoch": 1, "optimizer": optimizer},
    {"loss": losses.ArcfaceLoss(scale=32), "epoch": 3},
    {"loss": losses.ArcfaceLoss(scale=64), "epoch": 46},
]
tt.train(sch, 0)

Plot and result

output_layer	pointwise_conv	weight_decay	dropout	lfw	cfp_fp	agedb_30	IJBB
GDC	False	5e-4	0	0.998500	0.990714	0.983667	0.952678
F	False	5e-4	0.4	0.998333	0.990714	0.984500	0.954333
F	True	5e-4	0	0.998667	0.990714	0.983833	0.954333
F	True	5e-4	0.2	0.998500	0.991571	0.985167	0.955307
F	True	5e-4	0.4	0.998500	0.990571	0.983167	0.954236

Go on training for another 17 epochs with lr_base=0.025

import losses, train, models
import tensorflow_addons as tfa
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_effv2_s_strides1_pw512_F_dr02_drc02_lr_01_wd5e4_arc_emb512_sgd_bs512_ms1m_randaug_bnm09_bne1e5_cos16_float16_E50_arc_sgd_LA.h5',
    basic_model=None, model='checkpoints/TT_effv2_s_strides1_pw512_F_dr02_drc02_lr_01_wd5e4_arc_emb512_sgd_bs512_ms1m_randaug_bnm09_bne1e5_cos16_float16.h5',
    lr_base=0.025, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-5,
    batch_size=512, random_status=100, eval_freq=4000, output_weight_decay=1)

optimizer = tfa.optimizers.Lookahead(keras.optimizers.SGD(learning_rate=0.1, momentum=0.9))
tt.train({"loss": losses.ArcfaceLoss(scale=64), "epoch": 17, "optimizer": optimizer}, 0)

IJBB / IJBC result detail

1e-06 1e-05 0.0001 0.001 0.01 0.1 AUC

IJBB 0.40594 0.918987 0.956475 0.968647 0.980136 0.987829 0.993917

IJBC 0.912359 0.952702 0.968605 0.979189 0.98645 0.991921 0.995623

0 replies

leondgarse · 2021-12-16T07:03:17Z

leondgarse
Dec 16, 2021
Maintainer Author

EffcientNetV2S swish drop_conn 0.2 dropout 0.2 using AdamW + cosine lr decay + randaug training on MS1MV3 dataset

Training script

This recipe still under testing.
EfficientnetV2 from my another project keras_cv_attention_models.
```
pip install keras-cv-attention-models
```
Default EffcientNetV2S using strides=2 on first conv layer. Param first_strides=1 will set it 1, as input shape is only (112, 112).
For output_layer, output_layer='F', add_pointwise_conv=False will have 52,446,048 params, output_layer='F', add_pointwise_conv=True is 33,835,872 params, and output_layer='GDC' is 21,056,608 params.

import losses, train, models
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

from keras_cv_attention_models import efficientnet
basic_model = efficientnet.EfficientNetV2S(input_shape=(112, 112, 3), num_classes=0, drop_connect_rate=0.2, pretrained="imagenet", first_strides=1)
basic_model = models.buildin_models(basic_model, dropout=0.2, emb_shape=512, output_layer='F', bn_momentum=0.9, bn_epsilon=1e-5, add_pointwise_conv=True, pointwise_conv_act="swish", scale=True, use_bias=False)
# basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_effv2_s_strides1_pw512_F_dr02_drc02_lr_01_wd5e2lr_arc_emb512_adamw_bs512_ms1m_float16.h5',
    basic_model=basic_model, model=None, lr_base=0.01, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-6, lr_warmup_steps=3,
    batch_size=512, random_status=100, eval_freq=4000, output_weight_decay=1)

# optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
import tensorflow_addons as tfa
# optimizer = tfa.optimizers.SGDW(learning_rate=0.1, weight_decay=5e-5, momentum=0.9)
# optimizer = tfa.optimizers.AdamW(learning_rate=1e-2, weight_decay=1e-4, exclude_from_weight_decay=["/gamma", "/beta"])
optimizer = tfa.optimizers.AdamW(learning_rate=1e-2, weight_decay=5e-4)
sch = [
    {"loss": losses.ArcfaceLoss(scale=16), "epoch": 4, "optimizer": optimizer},
    {"loss": losses.ArcfaceLoss(scale=32), "epoch": 3},
    {"loss": losses.ArcfaceLoss(scale=64), "epoch": 46},
]
tt.train(sch, 0)

Plot and result
IJBB / IJBC result detail

1e-06 1e-05 0.0001 0.001 0.01 0.1 AUC

IJBB 0.425024 0.922006 0.957449 0.970983 0.980331 0.988121 0.993372

IJBC 0.914711 0.953878 0.97065 0.980928 0.986501 0.99187 0.995515

0 replies

leondgarse · 2022-02-24T02:01:27Z

leondgarse
Feb 24, 2022
Maintainer Author

EffcientNetV2S swish drop_conn 0.2 dropout 0.2 using AdamW + cosine lr decay + randaug + Magface training on MS1MV3 dataset

Training script

EfficientnetV2 from my another project keras_cv_attention_models.
```
pip install keras-cv-attention-models
```
Default EffcientNetV2S using strides=2 on first conv layer. Param first_strides=1 will set it 1, as input shape is only (112, 112).
For output_layer, output_layer='F', add_pointwise_conv=False will have 52,446,048 params, output_layer='F', add_pointwise_conv=True is 33,835,872 params, and output_layer='GDC' is 21,056,608 params.

import losses, train, models
import tensorflow_addons as tfa
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

from keras_cv_attention_models import efficientnet
basic_model = efficientnet.EfficientNetV2S(input_shape=(112, 112, 3), num_classes=0, drop_connect_rate=0.2, pretrained="imagenet", first_strides=1)
basic_model = models.buildin_models(basic_model, dropout=0.2, emb_shape=512, output_layer='F', bn_momentum=0.9, bn_epsilon=1e-5, add_pointwise_conv=True, pointwise_conv_act="swish", scale=True, use_bias=False)
# basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_effv2_s_strides1_pw512_F_dr02_drc02_lr_001_wd5e2lr_mag_10_110_04_08_35_emb512_adamw_bs512_ms1m_float16.h5',
    basic_model=basic_model, model=None, lr_base=0.01, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-6, lr_warmup_steps=3,
    batch_size=512, random_status=100, eval_freq=4000, output_weight_decay=1)

optimizer = tfa.optimizers.AdamW(learning_rate=1e-2, weight_decay=5e-4, exclude_from_weight_decay=["/gamma", "/beta"])
# optimizer = tfa.optimizers.AdamW(learning_rate=1e-2, weight_decay=1e-2)
sch = [
    {"loss": losses.MagFaceLoss(scale=16, min_feature_norm=10, max_feature_norm=110, min_margin=0.4, max_margin=0.8, regularizer_loss_lambda=35), "epoch": 4, "optimizer": optimizer},
    {"loss": losses.MagFaceLoss(scale=32, min_feature_norm=10, max_feature_norm=110, min_margin=0.4, max_margin=0.8, regularizer_loss_lambda=35), "epoch": 3},
    {"loss": losses.MagFaceLoss(scale=64, min_feature_norm=10, max_feature_norm=110, min_margin=0.4, max_margin=0.8, regularizer_loss_lambda=35), "epoch": 46},
]
tt.train(sch, 0)
# exit() # exit ipython on the end

Plot and result

margin min max	Epoch 53 margin mean	lfw	cfp_fp	agedb_30	IJBB	IJBC
0.45, 0.8	missing, ~0.5	0.998500	0.991429	0.985000	0.957157
0.5, 0.8	missing, ~0.6	0.998500	0.991714	0.984500	0.957352	0.970803
0.5, 1.0	0.554971933	0.998500	0.991429	0.984167	0.956865	0.969832
0.4, 1.0	0.487977058	0.998167	0.991571	0.983333	0.957157
0.4, 0.8	0.480038822	0.998500	0.991571	0.984667	0.958325	0.971212

IJBB / IJBC result detail

1e-06 1e-05 0.0001 0.001 0.01 0.1 AUC

IJBB 0.427653 0.920643 0.958325 0.969718 0.980428 0.987634 0.993897

IJBC 0.919671 0.955617 0.971212 0.979956 0.986757 0.991972 0.995575

Face quality test

import evals
bb = keras.models.load_model('checkpoints/TT_effv2_s_strides1_pw512_F_dr02_drc02_lr_001_wd5e2lr_mag_10_110_04_08_35_emb512_adamw_bs512_ms1m_float16_basic_model_latest.h5')

data_basic_path = '/datasets/ms1m-retinaface-t1'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]
for ee in eval_paths:
    eea = evals.eval_callback(bb, ee, batch_size=16)
    eea.on_epoch_end()

    # Plot face quality distribution using norm value of feature
    norm_embs = tf.norm(eea.embs, axis=1).numpy()
    _ = plt.hist(norm_embs, bins=512, alpha=1/len(eval_paths), label=eea.test_names + ' quality')
plt.legend()
plt.tight_layout()

0 replies

leondgarse · 2022-07-10T02:55:37Z

leondgarse
Jul 10, 2022
Maintainer Author

r50/r100 PReLU dropout 0.4 using SGD + l2 regularizer + randaug + AdaFace training on MS1MV3 dataset

Training script

Set use_max_pool=True will use MaxPool instead of Conv + BN for first residual shortcut branch.

import losses, train, models
import tensorflow_addons as tfa
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

# basic_model = models.buildin_models('r50', dropout=0.4, emb_shape=512, output_layer='E', bn_momentum=0.9, bn_epsilon=1e-5, scale=True, use_bias=False, activation='prelu', use_max_pool=True)
basic_model = models.buildin_models('r100', dropout=0.4, emb_shape=512, output_layer='E', bn_momentum=0.9, bn_epsilon=1e-5, scale=True, use_bias=False, activation='prelu', use_max_pool=True)
basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_r100_max_pool_E_prelu_dr04_lr_01_l2_5e4_adaface_emb512_sgd_m09_bs512_ms1m_64_only_margin_SG_scale_true_bias_false_random_100.h5',
    basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-6, lr_warmup_steps=3,
    batch_size=512, random_status=100, eval_freq=4000, output_weight_decay=1)

# optimizer = tfa.optimizers.AdamW(learning_rate=1e-2, weight_decay=5e-4, exclude_from_weight_decay=["/gamma", "/beta"])
# optimizer = tfa.optimizers.SGDW(learning_rate=1e-2, weight_decay=5e-6, momentum=0.9, exclude_from_weight_decay=["/gamma", "/beta"])
optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [{"loss": losses.AdaFaceLoss(scale=64), "epoch": 53, "optimizer": optimizer}]
tt.train(sch, 0)
# exit() # exit ipython on the end

Plot and result

model lfw cfp_fp agedb_30 IJBB@1e-4 IJBC@1e-4

r50 0.998667 0.989143 0.9845 0.955501 0.969269

r100 0.998667 0.992286 0.984333 0.961636 0.972849

IJBB / IJBC result detail

	1e-06	1e-05	0.0001	0.001	0.01	0.1	AUC
r50 IJBB	0.393379	0.91334	0.955501	0.970204	0.978773	0.986465	0.993366
r50 IJBC	0.888633	0.952702	0.969269	0.979496	0.985734	0.991052	0.995485
r100 IJBB	0.392697	0.928724	0.961636	0.971373	0.980136	0.986855	0.99346
r100 IJBC	0.902439	0.958634	0.972849	0.980672	0.986808	0.991359	0.99559

0 replies

leondgarse · 2022-07-19T12:04:20Z

leondgarse
Jul 19, 2022
Maintainer Author

r100 PReLU dropout 0.4 using SGD + l2 regularizer + randaug + AdaFace training on Glint360K dataset and partial FC

Training script

Set use_max_pool=True will use MaxPool instead of Conv + BN for first residual shortcut branch.

import losses, train, models
import tensorflow_addons as tfa
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/celeb_deepglint'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

basic_model = models.buildin_models('r100', dropout=0.4, emb_shape=512, output_layer='E', bn_momentum=0.9, bn_epsilon=1e-5, scale=True, use_bias=False, activation='prelu', use_max_pool=True)
basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='TT_r100_max_pool_E_prelu_dr04_lr_01_l2_5e4_adaface_emb512_sgd_m09_bs512_glint360k_64_only_margin_SG_scale_true_bias_false_random_100_partial4.h5',
    basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-6, lr_warmup_steps=3,
    batch_size=512, random_status=100, eval_freq=4000, output_weight_decay=1, partial_fc_split=4)

# optimizer = tfa.optimizers.AdamW(learning_rate=1e-2, weight_decay=5e-4, exclude_from_weight_decay=["/gamma", "/beta"])
# optimizer = tfa.optimizers.SGDW(learning_rate=1e-2, weight_decay=5e-4, momentum=0.9, exclude_from_weight_decay=["/gamma", "/beta"])
optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
# optimizer = tfa.optimizers.AdamW(learning_rate=1e-2, weight_decay=1e-2)
sch = [{"loss": losses.AdaFaceLoss(scale=64), "epoch": 53, "optimizer": optimizer}]
tt.train(sch, 0)
# exit() # exit ipython on the end

Plot and result

model lfw cfp_fp agedb_30 IJBB@1e-4 IJBC@1e-4

r100 0.998500 0.993000 0.986000 0.962415 0.974843
IJBB / IJBC result detail

1e-06 1e-05 0.0001 0.001 0.01 0.1 AUC

r100 IJBB 0.471081 0.937975 0.962415 0.972055 0.981013 0.988121 0.99398

r100 IJBC 0.886435 0.962724 0.974843 0.981643 0.987063 0.992279 0.99595

4 replies

Infinitay Aug 30, 2022

Is there a reason why this is being trained on Glint360K as opposed to dataset(s) from WebFace? If I am interpreting this chart correctly, wouldn't training models on any WebFace dataset result in higher accuracies than with Glint?

Source: arXiv:2204.10149

leondgarse Sep 2, 2022
Maintainer Author

Ya, it's reported better. Just It's limited for commercial usage, and I don't have an academic group for applying them...

Infinitay Sep 3, 2022

Oh my mistake. I was under the assumption that at the least WebFace4M was publicly available, and the WebFace260M dataset was the one requiring academic purposes.

EDIT: Upon re-reading the paper, I realized my error. It's not that there are different available datasets, but rather a subset of 260M taking a % of the data as necessary.

leondgarse Sep 3, 2022
Maintainer Author

You can also take this as a base compare between MS1MV3 and WebFace4M, but also notice my MS1MV3 is trained 53 epochs, while WebFace4M is 26 epochs. We can see WebFace4M is a definite better for IJBC test set, but not others.

leondgarse · 2023-05-15T12:40:13Z

leondgarse
May 15, 2023
Maintainer Author

SE_MobileFaceNet using SGD + cosine lr decay training on MS1MV3 dataset

Training script using TF>=2.11.0 applying weight_decay on SGD directly.

import losses, train, models
import tensorflow_addons as tfa
keras.mixed_precision.set_global_policy("mixed_float16")

data_basic_path = '/datasets/ms1m-retinaface-t1'
data_path = data_basic_path + '_112x112_folders'
eval_paths = [os.path.join(data_basic_path, ii) for ii in ['lfw.bin', 'cfp_fp.bin', 'agedb_30.bin']]

basic_model = models.buildin_models(
    'se_mobilefacenet', dropout=0, emb_shape=256, output_layer='GDC', bn_epsilon=1e-4, bn_momentum=0.9,
    scale=True, use_bias=False, add_pointwise_conv=True, pointwise_conv_act='prelu'
)
# Comment off for using `TF<2.11.0` `SGD` not supporting weight_decay
# basic_model = models.add_l2_regularizer_2_model(basic_model, weight_decay=5e-4, apply_to_batch_normal=False)

tt = train.Train(data_path, eval_paths=eval_paths,
    save_path='se_mobilefacenet_pointwise_GDC_arc_emb256_dr0_sgd_bs512_ms1m_rand_0_bnm09_bne1e4_cos16_batch_float16.h5',
    basic_model=basic_model, model=None, lr_base=0.1, lr_decay=0.5, lr_decay_steps=16, lr_min=1e-4,
    batch_size=512, random_status=0, output_weight_decay=1)

optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9, weight_decay=5e-3)  # TF>=2.11.0, apply weight_decay directly
optimizer.exclude_from_weight_decay(var_names=['/alpha', '/bias', '/gamma', '/beta'])  # TF>=2.11.0, apply weight_decay directly
# optimizer = keras.optimizers.SGD(learning_rate=0.1, momentum=0.9)
sch = [
    {"loss": losses.ArcfaceLoss(scale=16), "epoch": 5, "optimizer": optimizer},
    {"loss": losses.ArcfaceLoss(scale=32), "epoch": 5},
    {"loss": losses.ArcfaceLoss(scale=64), "epoch": 40},
]
tt.train(sch, 0)

Plot and result

model	lfw	cfp_fp	agedb_30	IJBB@1e-4	IJBC@1e-4
SE_MobileFaceNet, l2_regularizer 5e-4	0.99717	0.971803	0.969333	0.921811	0.940891
SE_MobileFaceNet, SGD weight decay 5e-4	0.99667	0.970286	0.968333
SE_MobileFaceNet, SGD weight decay 5e-3	0.997333	0.969286	0.973000	0.922103	0.941913

SE_MobileFaceNet	1e-06	1e-05	0.0001	0.001	0.01	0.1	AUC
l2_regularizer 5e-4, IJBB	0.358715	0.868939	0.921811	0.949854	0.969133	0.985102	0.993065
l2_regularizer 5e-4, IJBC	0.863271	0.909904	0.940891	0.962264	0.976274	0.988649	0.994375
SGD weight decay 5e-3, IJBB	0.349854	0.86222	0.922103	0.948491	0.97001	0.984129	0.992758
SGD weight decay 5e-3, IJBC	0.86322	0.909189	0.941913	0.961906	0.977297	0.987933	0.994408

0 replies

Training scripts #15

leondgarse Dec 9, 2020 Maintainer

ResNet101V2 using nadam and finetuning with triplet

Replies: 14 comments · 9 replies

leondgarse Jan 16, 2021 Maintainer Author

Mobilenet using Adamw + SGDW training on Emore dataset

aidansmyth95ifx Nov 12, 2024

leondgarse Nov 14, 2024 Maintainer Author

aidansmyth95ifx Nov 14, 2024

leondgarse Jan 30, 2021 Maintainer Author

Ghostnet using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

leondgarse Apr 8, 2021 Maintainer Author

Ghostnet strides=1 float16, using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

leondgarse Apr 8, 2021 Maintainer Author

Botnet50 using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

leondgarse May 27, 2021 Maintainer Author

Resnet50V2 / Resnet101V2 swish using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

gitlabspy Jan 19, 2022

leondgarse Jan 19, 2022 Maintainer Author

leondgarse Jun 4, 2021 Maintainer Author

r50 swish using SGD + L2 regularizer + cosine lr decay training on MS1MV3 dataset

leondgarse Jun 22, 2021 Maintainer Author

se_r50 swish SD (1, 0.8) using SGD + L2 regularizer + cosine lr decay + randaug training on MS1MV3 dataset

leondgarse Jun 29, 2021 Maintainer Author

Converted official r18 / r34 / r50 / r100 model on glint360k trained by partial fc + cosface

leondgarse Dec 2, 2021 Maintainer Author

EffcientNetV2S swish drop_conn 0.2 dropout 0.2 using SGD + L2 regularizer + cosine lr decay + randaug training on MS1MV3 dataset

leondgarse Dec 16, 2021 Maintainer Author

EffcientNetV2S swish drop_conn 0.2 dropout 0.2 using AdamW + cosine lr decay + randaug training on MS1MV3 dataset

leondgarse Feb 24, 2022 Maintainer Author

EffcientNetV2S swish drop_conn 0.2 dropout 0.2 using AdamW + cosine lr decay + randaug + Magface training on MS1MV3 dataset

leondgarse Jul 10, 2022 Maintainer Author

r50/r100 PReLU dropout 0.4 using SGD + l2 regularizer + randaug + AdaFace training on MS1MV3 dataset

leondgarse Jul 19, 2022 Maintainer Author

r100 PReLU dropout 0.4 using SGD + l2 regularizer + randaug + AdaFace training on Glint360K dataset and partial FC

Infinitay Aug 30, 2022

leondgarse Sep 2, 2022 Maintainer Author

Infinitay Sep 3, 2022

leondgarse Sep 3, 2022 Maintainer Author

leondgarse May 15, 2023 Maintainer Author

SE_MobileFaceNet using SGD + cosine lr decay training on MS1MV3 dataset

leondgarse
Dec 9, 2020
Maintainer

Replies: 14 comments 9 replies

leondgarse
Jan 16, 2021
Maintainer Author

leondgarse Nov 14, 2024
Maintainer Author

leondgarse
Jan 30, 2021
Maintainer Author

leondgarse
Apr 8, 2021
Maintainer Author

leondgarse
Apr 8, 2021
Maintainer Author

leondgarse
May 27, 2021
Maintainer Author

leondgarse Jan 19, 2022
Maintainer Author

leondgarse
Jun 4, 2021
Maintainer Author

leondgarse
Jun 22, 2021
Maintainer Author

leondgarse
Jun 29, 2021
Maintainer Author

leondgarse
Dec 2, 2021
Maintainer Author

leondgarse
Dec 16, 2021
Maintainer Author

leondgarse
Feb 24, 2022
Maintainer Author

leondgarse
Jul 10, 2022
Maintainer Author

leondgarse
Jul 19, 2022
Maintainer Author

leondgarse Sep 2, 2022
Maintainer Author

leondgarse Sep 3, 2022
Maintainer Author

leondgarse
May 15, 2023
Maintainer Author