×

DROPOUT

96
cuizixin Excellent
2018.10.26 15:56 字数 350
  • important technique for regularization

流程

image.png

Imagine that you have one layer that connects to another layer.
The values that go from one layer to the next are often called activations.
Take those activations randomly, for every example you train your network on, set half of them to 0.
Then we get the next figure:

image.png

Completely and randomly, you basically take half of the data that's flowing through your network, and just destroy it.
And then randomly again.

image.png

So what happens with dropout?

image.png

Your network can never rely on any given activation to be present, because they might be squashed at any given moment.
So it is forced to learn a redundant representation for everything to make sure that at least some of the information remains. It's like a game of whack-a-mole (打地鼠). One activations get smashed, but there is always one or more that do the same job. And that don't get killed. So everything remains fine at the end.

image.png

Forcing your network to learn redundant representations might sound very inefficient.
But in practice, it makes things more robust, and prevents over fitting.
It also makes your network act as if taking the consensus over an ensemble of networks, which is always a good way to improve performance.

If dropout doesn't work for you, you should probably be using a bigger network.

??How to do evaluation when using dropout

evaluation的时候把keep_prob设为1不就行了吗?为什么要求平均呢?


image.png

When you evaluate the network that's been trained with dropout, you obviously don't want this randomness. you want your evaluation result deterministic.
Instead, you're going to want to take the consensus over this redundant models. You get the consensus opinion by averaging the activations. That is Y_e = E(Y_t).
Here is a trick to make sure this expectation holds. During training, not only do you use zero out so the activations that you drop out, but your also scale the remaining activations by a factor of 2.

计算机
Web note ad 1