diff options
author | Soumith Chintala <soumith@fb.com> | 2016-02-04 20:45:22 +0300 |
---|---|---|
committer | Soumith Chintala <soumith@fb.com> | 2016-02-04 20:45:29 +0300 |
commit | 47cbf26827dacf13d05d187931fe624d2a36afc8 (patch) | |
tree | f789be3d9e02178da6b12f2d246f9d57fa0e6a3d /blog | |
parent | 0eca342bd10b7b7e9f06c17a91dad116c760809d (diff) |
formatting
Diffstat (limited to 'blog')
-rw-r--r-- | blog/_posts/2016-02-04-resnets.md | 2 |
1 files changed, 1 insertions, 1 deletions
diff --git a/blog/_posts/2016-02-04-resnets.md b/blog/_posts/2016-02-04-resnets.md index 1747fc8..0fe8c38 100644 --- a/blog/_posts/2016-02-04-resnets.md +++ b/blog/_posts/2016-02-04-resnets.md @@ -24,7 +24,7 @@ The central idea of the paper itself is simple and elegant. They take a standard An example residual block is shown in the figure below. -<p align='center'><img width="100%" src="https://raw.githubusercontent.com/torch/torch.github.io/master/blog/_posts/images/resnets_1.png"></p> +<p align='center'><img width="30%" src="https://raw.githubusercontent.com/torch/torch.github.io/master/blog/_posts/images/resnets_1.png"></p> Deep feed-forward conv nets tend to suffer from optimization difficulty. Beyond a certain depth, adding extra layers results in higher training error and higher validation error, even when batch normalization is used. The authors of the ResNet paper argue that this underfitting is unlikely to be caused by vanishing gradients, since this difficulty occurs even with batch normalized networks. The residual network architecture solves this by adding shortcut connections that are summed with the output of the convolution layers. |