Max-Affine Spline Insights Into Deep Network Pruning

1Rice University, 2Meta AI Research
Interpolate start reference image.

(a) Input space partitioning presents how deeper layers successively subdivide the space in a toy DN with 2 dimensional input space and three layers, where the newly introduced boundaries are in dark and previously built ones are in grey. We see that the turning point of splines in later layers are exactly located at previous ones, and splines in the final classification layer are exactly the decision boundary (denoted as blue lines); (b) Node (structured) pruning removes entire subdivision splines; (c) Weight (Unstructured) pruning quantizes the partition splines to be colinear to the space axes.


Abstract

State-of-the-art (SOTA) approaches to deep network (DN) training overparametrize the model and then prune a posteriori to obtain a “winning ticket” subnetwork that can achieve high accuracy. Using a recently developed spline interpretation of DNs, we obtain novel insights into how DN pruning affects its mapping. In particular, under the realm of spline operators, we are able to pinpoint the impact of pruning onto the DN's underlying input space partition and per-region affine mappings, opening new avenues in understanding why and when are pruned DNs able to maintain high performance.

We also discover that a DN's spline mapping exhibits an early-bird (EB) phenomenon whereby the spline's partition converges at early training stages, bridging the recently developed DN spline theory and lottery ticket hypothesis of DNs. We finally leverage this new insight to develop a principled and efficient pruning strategy whose goal is to prune isolated groups of nodes that have a redundant contribution in the forming of the spline partition. Extensive experiments on four networks and three datasets validate that our new spline-based DN pruning approach reduces training FLOPs by up to 3.5x while achieving similar or even better accuracy than current state-of-the-art methods.



Q1: How Splines Evolve During Training?

Interpolate start reference image. Interpolate start reference image.

An example using a two-layer network and a 2D input space with an X-shape decision boundary.



Q2: Understand Network Pruning From a Spline Perspective?


Interpolate start reference image.

Pruning indeed removes redundant subdivision lines so that the decision boundary remains an X-shape until 80% nodes are pruned; Ideally, one blue subdivision line would be sufficient to provide two turning points for the decision boundary.



Q3: Detect Spline Early-Bird Tickets?




Visualization of spline trajectories, which mainly adapt during early phase of training demonstrating the early-bird ticket hypothesis for DN partitions.



Interpolate start reference image.

Visualization of the early-bird (EB) phenomenon. Each sub-figure visualizes the quantitative distance over the whole training process. Both x and y axis represent the epoch where we draw the binary code to represent the DN input space partition. Each point means the distance between the binary code drawn from x-th epoch and y-th epoch. The quantitative distances between consecutive epochs change rapidly in the first few training epochs (denoted by dashed red box) and remain similar after that, we then draw Spline EB tickets at such epoch.



Q4: Spline Pruning Policy?




The idea is to merge and remove redundant splines that contribute little to the decision boundary. Left part is a small DN input space partition. Middle part denotes the pruning criteria. Right part draws the pruned input space partition based on the criteria.


BibTeX

@article{you2022maxaffine,
    title={Max-Affine Spline Insights Into Deep Network Pruning},
    author={Haoran You and Randall Balestriero and Zhihan Lu and Yutong Kou and Huihong Shi and Shunyao Zhang and Shang Wu and Yingyan Lin and Richard Baraniuk},
    journal={Transactions on Machine Learning Research},
    year={2022},
    url={https://openreview.net/forum?id=bMar2OkxVu},
}