The optimization target of learning the filter-wise and channel-wise structured sparsity can be defined as:
Our approach tends to remove less important filters and channels. Note that zeroing out a filter in the l-th layer results in a dummy zero output feature map, which in turn makes a corresponding channel in the (l + 1)-th layer useless. Hence, we combine the filter-wise and channel-wise structured sparsity in the learning simultaneously.
Learning Structured Sparsity in Deep Neural Networks, Wei Wen, 2016, NIPS