MobileNet v1

Use case : `Image classification`

Model description

MobileNet is a well known architecture that can be used in multiple use cases. Input size and width factor called alpha are parameters to be adapted to various use cases complexity. The alpha parameter is used to increase or decrease the number of filters in each layer, allowing also to reduce the number of multiply-adds and then the inference time.

The original paper demonstrates the performance of MobileNet models using alpha values of 1.0, 0.75, 0.5 and 0.25.

(source: https://keras.io/api/applications/mobilenet/)

The model is quantized in int8 using tensorflow lite converter.

Network information

Network Information	Value
Framework	TensorFlow Lite
MParams alpha=1.0	1.3 M
Quantization	int8
Provenance	https://www.tensorflow.org/api_docs/python/tf/keras/applications/mobilenet
Paper	https://arxiv.org/abs/1704.04861

The models are quantized using tensorflow lite converter.

Network inputs / outputs

For an image resolution of NxM and P classes

Input Shape	Description
(1, N, M, 3)	Single NxM RGB image with UINT8 values between 0 and 255

Output Shape	Description
(1, P)	Per-class confidence for P classes in FLOAT32

Recommended platforms

Platform	Supported	Recommended
STM32L0	[]	[]
STM32L4	[x]	[]
STM32U5	[x]	[]
STM32H7	[x]	[x]
STM32MP1	[x]	[x]
STM32MP2	[x]	[x]
STM32N6	[x]	[x]

Performances

Metrics

Measures are done with default STM32Cube.AI configuration with enabled input / output allocated option.
tfs stands for "training from scratch", meaning that the model weights were randomly initialized before training.
tl stands for "transfer learning", meaning that the model backbone weights were initialized from a pre-trained model, then only the last layer was unfrozen during the training.
fft stands for "full fine-tuning", meaning that the full model weights were initialized from a transfer learning pre-trained model, and all the layers were unfrozen during the training.

Reference NPU memory footprint on food-101 and ImageNet dataset (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Series	Internal RAM	Weights Flash	STM32Cube.AI version	STEdgeAI Core version
MobileNet v1 0.25 fft	food-101	Int8	224x224x3	STM32N6	588	304.72	10.2.0	2.2.0
MobileNet v1 0.5 fft	food-101	Int8	224x224x3	STM32N6	588	992.67	10.2.0	2.2.0
MobileNet v1 1.0 fft	food-101	Int8	224x224x3	STM32N6	1568	3602.97	10.2.0	2.2.0
MobileNet v1 0.25	ImageNet	Int8	224x224x3	STM32N6	588	533.38	10.2.0	2.2.0
MobileNet v1 0.5	ImageNet	Int8	224x224x3	STM32N6	588	1446.06	10.2.0	2.2.0
MobileNet v1 1.0	ImageNet	Int8	224x224x3	STM32N6	1568	4505.86	10.2.0	2.2.0

Reference NPU inference time on food-101 and ImageNet dataset (see Accuracy for details on dataset)

Model	Dataset	Format	Resolution	Board	Execution Engine	Inference time (ms)	Inf / sec	STM32Cube.AI version	STEdgeAI Core version
MobileNet v1 0.25 fft	food-101	Int8	224x224x3	STM32N6570-DK	NPU/MCU	2.81	355.87	10.2.0	2.2.0
MobileNet v1 0.5 fft	food-101	Int8	224x224x3	STM32N6570-DK	NPU/MCU	6.03	165.83	10.2.0	2.2.0
MobileNet v1 1.0 fft	food-101	Int8	224x224x3	STM32N6570-DK	NPU/MCU	16.79	59.55	10.2.0	2.2.0
MobileNet v1 0.25	Imagenet	Int8	224x224x3	STM32N6570-DK	NPU/MCU	3.56	280.89	10.2.0	2.2.0
MobileNet v1 0.5	Imagenet	Int8	224x224x3	STM32N6570-DK	NPU/MCU	7.35	136.05	10.2.0	2.2.0
MobileNet v1 1.0	Imagenet	Int8	224x224x3	STM32N6570-DK	NPU/MCU	19.26	51.93	10.2.0	2.2.0

Reference MCU memory footprint based on Flowers dataset and ImageNet dataset (see Accuracy for details on dataset)

Model	Format	Resolution	Series	Activation RAM	Runtime RAM	Weights Flash	Code Flash	Total RAM	Total Flash	STM32Cube.AI version
MobileNet v1 0.25 fft	Int8	224x224x3	STM32H7	272.96 KiB	16.38 KiB	214.69 KiB	67.24 KiB	289.34 KiB	281.93 KiB	10.2.0
MobileNet v1 0.5 fft	Int8	224x224x3	STM32H7	449.58 KiB	16.38 KiB	812.61 KiB	80.61 KiB	465.96 KiB	893.22 KiB	10.2.0
MobileNet v1 0.25 fft	Int8	96x96x3	STM32H7	66.96 KiB	16.33 KiB	214.69 KiB	67.19 KiB	83.29 KiB	281.88 KiB	10.2.0
MobileNet v1 0.25 tfs	Int8	96x96x1	STM32H7	52.8 KiB	16.33 KiB	214.55 KiB	69.28 KiB	69.13 KiB	283.83 KiB	10.2.0
MobileNet v1 0.25	Int8	224x224x3	STM32H7	267.2 KiB	16.44 KiB	467.33 KiB	68.37 KiB	283.64 KiB	535.7 KiB	10.2.0
MobileNet v1 0.5	Int8	224x224x3	STM32H7	404.28 KiB	16.44 KiB	1314 KiB	81.72 KiB	447.51 KiB	1395.72 KiB	10.2.0
MobileNet v1 1.0	Int8	224x224x3	STM32H7	1331.13 KiB	16.48 KiB	4157.09 KiB	108.46 KiB	1347.61 KiB	4265.55 KiB	10.2.0

Reference MCU inference time based on Flowers dataset and ImageNet dataset (see Accuracy for details on dataset)

Model	Format	Resolution	Board	Execution Engine	Frequency	Inference time (ms)	STM32Cube.AI version
MobileNet v1 0.25 fft	Int8	224x224x3	STM32H747I-DISCO	1 CPU	400 MHz	166.9 ms	10.2.0
MobileNet v1 0.5 fft	Int8	224x224x3	STM32H747I-DISCO	1 CPU	400 MHz	471.68 ms	10.2.0
MobileNet v1 0.25 fft	Int8	96x96x3	STM32H747I-DISCO	1 CPU	400 MHz	30.63 ms	10.2.0
MobileNet v1 0.25 tfs	Int8	96x96x1	STM32H747I-DISCO	1 CPU	400 MHz	29.04 ms	10.2.0
MobileNet v1 0.25	Int8	224x224x3	STM32H747I-DISCO	1 CPU	400 MHz	170.37 ms	10.2.0
MobileNet v1 0.5	Int8	224x224x3	STM32H747I-DISCO	1 CPU	400 MHz	477.79 ms	10.2.0
MobileNet v1 1.0	Int8	224x224x3	STM32H747I-DISCO	1 CPU	400 MHz	1656.41 ms	10.2.0

Reference MPU inference time based on Flowers dataset (see Accuracy for details on dataset)

Model	Format	Resolution	Quantization	Board	Execution Engine	Frequency	Inference time (ms)	%NPU	%GPU	%CPU	X-LINUX-AI version	Framework
MobileNet v1 0.25 fft	Int8	224x224x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	14.27 ms	7.54	92.46	0	v6.1.0	OpenVX
MobileNet v1 0.5 fft	Int8	224x224x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	32.79 ms	3.83	96.17	0	v6.1.0	OpenVX
MobileNet v1 0.25 fft	Int8	96x96x3	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	3.81 ms	15.36	84.64	0	v6.1.0	OpenVX
MobileNet v1 0.25 tfs	Int8	96x96x1	per-channel**	STM32MP257F-DK2	NPU/GPU	800 MHz	3.66 ms	13.91	86.09	0	v6.1.0	OpenVX
MobileNet v1 0.25 fft	Int8	224x224x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	33.91ms	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v1 0.5 fft	Int8	224x224x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	90.6 ms	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v1 0.25 fft	Int8	96x96x3	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	6.32 ms	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v1 0.25 tfs	Int8	96x96x1	per-channel	STM32MP157F-DK2	2 CPU	800 MHz	5.83 ms	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v1 0.25 fft	Int8	224x224x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	52.39 ms	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v1 0.5 fft	Int8	224x224x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	144.47 ms	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v1 0.25 fft	Int8	96x96x3	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	9.31 ms	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0
MobileNet v1 0.25 tfs	Int8	96x96x1	per-channel	STM32MP135F-DK2	1 CPU	1000 MHz	9.37 ms	NA	NA	100	v6.1.0	TensorFlowLite 2.18.0

** To get the most out of MP25 NPU hardware acceleration, please use per-tensor quantization

Accuracy with Flowers dataset

Dataset details: link , License CC BY 2.0 , Quotation[1] , Number of classes: 5, Number of images: 3 670

Model	Format	Resolution	Top 1 Accuracy
MobileNet v1 0.25 tfs	Float	224x224x3	88.83 %
MobileNet v1 0.25 tfs	Int8	224x224x3	89.37 %
MobileNet v1 0.25 tl	Float	224x224x3	85.83 %
MobileNet v1 0.25 tl	Int8	224x224x3	83.24 %
MobileNet v1 0.25 fft	Float	224x224x3	93.05 %
MobileNet v1 0.25 fft	Int8	224x224x3	92.1 %
MobileNet v1 0.5 tfs	Float	224x224x3	92.1 %
MobileNet v1 0.5 tfs	Int8	224x224x3	91.55 %
MobileNet v1 0.5 tl	Float	224x224x3	88.56 %
MobileNet v1 0.5 tl	Int8	224x224x3	87.74 %
MobileNet v1 0.5 fft	Float	224x224x3	95.1 %
MobileNet v1 0.5 fft	Int8	224x224x3	94.41 %
MobileNet v1 0.25 fft	Float	96x96x3	87.47 %
MobileNet v1 0.25 fft	Int8	96x96x3	87.06 %
MobileNet v1 0.25 tfs	Float	96x96x1	74.93 %
MobileNet v1 0.25 tfs	Int8	96x96x1	74.93 %

Accuracy with Plant-village dataset

Dataset details: link , License CC0 1.0, Quotation[2] , Number of classes: 39, Number of images: 61 486

Model	Format	Resolution	Top 1 Accuracy
MobileNet v1 0.25 tfs	Float	224x224x3	99.92 %
MobileNet v1 0.25 tfs	Int8	224x224x3	99.92 %
MobileNet v1 0.25 tl	Float	224x224x3	85.38 %
MobileNet v1 0.25 tl	Int8	224x224x3	83.7 %
MobileNet v1 0.25 fft	Float	224x224x3	99.95 %
MobileNet v1 0.25 fft	Int8	224x224x3	99.82 %
MobileNet v1 0.5 tfs	Float	224x224x3	99.9 %
MobileNet v1 0.5 tfs	Int8	224x224x3	99.83 %
MobileNet v1 0.5 tl	Float	224x224x3	93.05 %
MobileNet v1 0.5 tl	Int8	224x224x3	92.7 %
MobileNet v1 0.5 fft	Float	224x224x3	99.94 %
MobileNet v1 0.5 fft	Int8	224x224x3	99.85 %

Accuracy with Food-101 dataset

Dataset details: link, Quotation[3] , Number of classes: 101 , Number of images: 101 000

Model	Format	Resolution	Top 1 Accuracy
MobileNet v1 0.25 tfs	Float	224x224x3	72.16 %
MobileNet v1 0.25 tfs	Int8	224x224x3	71.13 %
MobileNet v1 0.25 tl	Float	224x224x3	43.21 %
MobileNet v1 0.25 tl	Int8	224x224x3	39.89 %
MobileNet v1 0.25 fft	Float	224x224x3	72.36 %
MobileNet v1 0.25 fft	Int8	224x224x3	69.52 %
MobileNet v1 0.5 tfs	Float	224x224x3	76.97 %
MobileNet v1 0.5 tfs	Int8	224x224x3	76.37 %
MobileNet v1 0.5 tl	Float	224x224x3	48.78 %
MobileNet v1 0.5 tl	Int8	224x224x3	45.89 %
MobileNet v1 0.5 fft	Float	224x224x3	76.72 %
MobileNet v1 0.5 fft	Int8	224x224x3	74.82 %
MobileNet v1 1.0 fft	Float	224x224x3	80.38 %
MobileNet v1 1.0 fft	Int8	224x224x3	79.43 %

Accuracy with ImageNet dataset

Dataset details: link, Quotation[4]. Number of classes: 1000. To perform the quantization, we calibrated the activations with a random subset of the training set. For the sake of simplicity, the accuracy reported here was estimated on the 50000 labelled images of the validation set.

model	Format	Resolution	Top 1 Accuracy
MobileNet v1 0.25	Float	224x224x3	48.96 %
MobileNet v1 0.25	Int8	224x224x3	46.34 %
MobileNet v1 0.5	Float	224x224x3	62.11 %
MobileNet v1 0.5	Int8	224x224x3	59.92 %
MobileNet v1 1.0	Float	224x224x3	69.52 %
MobileNet v1 1.0	Int8	224x224x3	68.64 %

Retraining and Integration in a simple example:

Please refer to the stm32ai-modelzoo-services GitHub here

References

[1] "Tf_flowers : tensorflow datasets," TensorFlow. [Online]. Available: https://www.tensorflow.org/datasets/catalog/tf_flowers.

[2] J, ARUN PANDIAN; GOPAL, GEETHARAMANI (2019), "Data for: Identification of Plant Leaf Diseases Using a 9-layer Deep Convolutional Neural Network", Mendeley Data, V1, doi: 10.17632/tywbtsjrjv.1

[3] L. Bossard, M. Guillaumin, and L. Van Gool, "Food-101 -- Mining Discriminative Components with Random Forests." European Conference on Computer Vision, 2014.

Downloads last month: -; Downloads are not tracked for this model. How to track