[ad_1]
Every byte and every operation counts when you’re trying to build a faster model, especially if the model has to run on a device. Neural Architecture Search (NAS) algorithms develop sophisticated model architectures by searching a larger model space than is possible by hand. Various NAS algorithms such as MNasNet and TuNAS have been proposed and several efficient model architectures have been found, including MobileNetV3, EfficientNet.
Here, we present LayerNAS, an approach that formulates the multi-objective NAS problem under combinatorial optimization to significantly reduce complexity, leading to a scaling reduction in the number of model candidates to be searched, with less computation required. Multiple trials search and model architectures that perform better overall. Using a search space built on backbones taken from MobileNetV2 and MobileNetV3, we find models with top-1 accuracy on ImageNet up to 4.9% better than state-of-the-art alternatives.
Formulation of the problem
NAS solves different problems in different search spaces. To understand what LayerNAS solves, let’s start with a simple example: you’re the owner of GBurger and you’re building a flagship burger that consists of three layers, each with four options at different costs. The burgers taste different with a mix of different options. You want to make the tastiest burger you can within a certain budget.
Build your burger with different options available for each tier, each with a different cost and different benefits. |
Just like a neural network architecture, the perfect burger search space follows a layer pattern, where each layer has several options with different trade-offs in cost and performance. This simplified model illustrates a common approach to setting up search spaces. For example, for models based on Convolutional Neural Networks (CNN) such as MobileNet, the NAS algorithm can select different number of options – filters, steps or kernel sizes, etc. – for the convolution layer.
method
We base our approach on search spaces that satisfy two conditions:
- An optimal model can be constructed by using one of the model candidates generated by searching the previous layer and applying those search options to the current layer.
- If we impose a FLOP constraint on the current layer, we can impose constraints on the previous layer by reducing the current layer’s FLOPs.
Under these conditions, a linear search is possible, from layer 1 to layer N Knows that when looking for the best layer option I, any changes to the front layer will not improve the performance of the model. We can select candidates based on their value so that only a limited number of candidates are stored per layer. If two models have the same FLOP but one has better precision, we just keep the better one and assume that it will not affect the architecture of the next layers. Since the search space of a full processing expands exponentially with layers, as the full range of options is available at each layer, our layered cost-based approach allows us to significantly reduce the search space while being able to make rigorous judgments about polynomial complexity. algorithm. Our experimental evaluation shows that within these constraints we can find the highest quality models.
NAS as a combinatorial optimization problem
Using a layer-value approach, we reduce the NAS to a combinatorial optimization problem. That is, for the layer IWe can calculate the cost and reward after training with a given component SI . This implies the following combinatorial problem: How do we get the best reward if we choose one choice per layer within the cost budget? This problem can be solved in many different ways, one of the easiest of which is to use dynamic programming, as described in the following pseudo code:
while True: # select a candidate to search in Layer i candidate = select_candidate(layeri) if searchable(candidate): # Use the layerwise structural information to generate the children. children = generate_children(candidate) reward = train(children) bucket = bucketize(children) if memorial_table[i][bucket] < reward: memorial_table[i][bucket] = children move to next layer |
LayerNAS pseudocode. |
An illustration of the LayerNAS approach for an example of trying to create the best burger within a $7-$9 budget. We have four options for the first layer, resulting in four burger candidates. Using the four options on the second layer, we have a total of 16 candidates. We then rank them into $1–$2, $3–$4, $5–$6, and $7–$8 ranges, keeping only the tastiest burgers in each bucket, or four candidates. Then, for these four candidates, we construct 16 candidates using the preselected variants for the first two layers and four variants for each candidate for the third layer. We repackage them, select burgers within budget, and save the best. |
Experimental results
When comparing NAS algorithms, we evaluate the following metrics:
- quality: What is the most accurate model that the algorithm can find?
- stability: How stable is good model selection? Is it possible to consistently discover high-accuracy models in successive trials of the algorithm?
- efficiency: How long does the algorithm take to find a high-accuracy model?
We evaluate our algorithm on the standard benchmark NATS-Bench using 100 NAS runs and compare it to other NAS algorithms previously described in the NATS-Bench paper: Random Search, Regularized Evolution, and Proximal Policy Optimization. Below, we visualize the differences between these search algorithms for the metrics described above. For each comparison, we report the mean accuracy and the variation in accuracy (variation is indicated by the shaded region corresponding to the 25% to 75% interquartile range).
The NATS-Bench size search defines a 5-layer CNN model, where each layer can choose from eight different options, each with different channels on the convolutional layers. Our goal is to find the best model with 50% of the FLOPs required by the largest model. The performance of LayerNAS is outstanding because it frames the problem in a different way, separating cost and reward to avoid searching a significant amount of irrelevant model architecture. We find that model candidates with fewer channels in the previous layers perform better, which explains how LayerNAS finds better models much faster than other algorithms because it avoids spending time on models outside the desired value range. Note that the accuracy curve slopes slightly due to the lack of correlation between validation accuracy and test accuracy, i.e. Some model architectures with higher validation accuracy have lower test accuracy in the NATS-Bench size search.
We construct search spaces based on MobileNetV2, MobileNetV2 1.4x, MobileNetV3 Small and MobileNetV3 Large and search for the optimal model architecture under various #MADDs (number of multiplication additions per image) constraints. Among all parameters, LayerNAS finds the model with better accuracy over ImageNet. See paper for details.
Comparison on models in different #MAdds. |
conclusion
In this post, we showed how to recast NAS as a combinatorial optimization problem and proposed LayerNAS as a solution that only requires polynomial search complexity. We compared LayerNAS with existing popular NAS algorithms and showed that it can find improved models on NATS-Bench. We also use the method to find better architectures based on MobileNetV2 and MobileNetV3.
Acknowledgments
We would like to thank Jingyue Shen, Keshav Kumar, Dai Pen, Mingxing Tan, Esteban Real, Peter Yang, Weijun Wang, Qifei Wang, Chuan Dong, Xin Wang, Yingjie Miao, Yun Long, Zhuo Wang, Da-Cheng Juan, Deqiang. Chen, Fotis Iliopoulos, Han-Byul Kim, Rino Lee, Andrew Howard, Eric Wei, Rina Panigrahi, Ravi Kumar, and Andrew Tomkins for their contributions, collaboration, and advice.
[ad_2]
Source link