Skip to content

Commit 330faf3

Browse files
committed
small update with repetition
1 parent 7019acc commit 330faf3

File tree

8 files changed

+735
-161
lines changed

8 files changed

+735
-161
lines changed

doc/Projects/.DS_Store

0 Bytes
Binary file not shown.

doc/pub/week3/html/week3-bs.html

Lines changed: 97 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -46,6 +46,20 @@
4646
None,
4747
'reminder-on-books-with-hands-on-material-and-codes'),
4848
('Reading recommendations', 2, None, 'reading-recommendations'),
49+
('From last week: Overarching view of a neural network',
50+
2,
51+
None,
52+
'from-last-week-overarching-view-of-a-neural-network'),
53+
('The optimization problem', 2, None, 'the-optimization-problem'),
54+
('Parameters of neural networks',
55+
2,
56+
None,
57+
'parameters-of-neural-networks'),
58+
('Other ingredients of a neural network',
59+
2,
60+
None,
61+
'other-ingredients-of-a-neural-network'),
62+
('Other parameters', 2, None, 'other-parameters'),
4963
('From last week, overarching discussions of neural networks: '
5064
'Fine-tuning neural network hyperparameters',
5165
2,
@@ -249,6 +263,11 @@
249263
<!-- navigation toc: --> <li><a href="#mathematics-of-deep-learning" style="font-size: 80%;"><b>Mathematics of deep learning</b></a></li>
250264
<!-- navigation toc: --> <li><a href="#reminder-on-books-with-hands-on-material-and-codes" style="font-size: 80%;"><b>Reminder on books with hands-on material and codes</b></a></li>
251265
<!-- navigation toc: --> <li><a href="#reading-recommendations" style="font-size: 80%;"><b>Reading recommendations</b></a></li>
266+
<!-- navigation toc: --> <li><a href="#from-last-week-overarching-view-of-a-neural-network" style="font-size: 80%;"><b>From last week: Overarching view of a neural network</b></a></li>
267+
<!-- navigation toc: --> <li><a href="#the-optimization-problem" style="font-size: 80%;"><b>The optimization problem</b></a></li>
268+
<!-- navigation toc: --> <li><a href="#parameters-of-neural-networks" style="font-size: 80%;"><b>Parameters of neural networks</b></a></li>
269+
<!-- navigation toc: --> <li><a href="#other-ingredients-of-a-neural-network" style="font-size: 80%;"><b>Other ingredients of a neural network</b></a></li>
270+
<!-- navigation toc: --> <li><a href="#other-parameters" style="font-size: 80%;"><b>Other parameters</b></a></li>
252271
<!-- navigation toc: --> <li><a href="#from-last-week-overarching-discussions-of-neural-networks-fine-tuning-neural-network-hyperparameters" style="font-size: 80%;"><b>From last week, overarching discussions of neural networks: Fine-tuning neural network hyperparameters</b></a></li>
253272
<!-- navigation toc: --> <li><a href="#hidden-layers" style="font-size: 80%;"><b>Hidden layers</b></a></li>
254273
<!-- navigation toc: --> <li><a href="#which-activation-function-should-i-use" style="font-size: 80%;"><b>Which activation function should I use?</b></a></li>
@@ -401,6 +420,84 @@ <h2 id="reading-recommendations" class="anchor">Reading recommendations </h2>
401420
<li> Rashkca et al., chapters 11-13 for NNs and chapter 14 for CNNs, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_self">GitHub</a></li>
402421
<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background. For CNNs see chapter 9.</li>
403422
</ol>
423+
<!-- !split -->
424+
<h2 id="from-last-week-overarching-view-of-a-neural-network" class="anchor">From last week: Overarching view of a neural network </h2>
425+
426+
<p>The architecture of a neural network defines our model. This model
427+
aims at describing some function \( f(\boldsymbol{x} \) that is meant to describe
428+
some final result (outputs or target values \( bm{y} \)) given a specific input
429+
\( \boldsymbol{x} \). Note that here \( \boldsymbol{y} \) and \( \boldsymbol{x} \) are not limited to be
430+
vectors.
431+
</p>
432+
433+
<p>The architecture consists of</p>
434+
<ol>
435+
<li> An input and an output layer where the input layer is defined by the inputs \( \boldsymbol{x} \). The output layer produces the model ouput \( \boldsymbol{\tilde{y}} \) which is compared with the target value \( \boldsymbol{y} \)</li>
436+
<li> A given number of hidden layers and neurons/nodes/units for each layer (this may vary)</li>
437+
<li> A given activation function \( \sigma(\boldsymbol{z}) \) with arguments \( \boldsymbol{z} \) to be defined below. The activation functions may differ from layer to layer.</li>
438+
<li> The last layer, normally called <b>output</b> layer has an activation function tailored to the specific problem</li>
439+
<li> Finally, we define a so-called cost or loss function which is used to gauge the quality of our model.</li>
440+
</ol>
441+
<!-- !split -->
442+
<h2 id="the-optimization-problem" class="anchor">The optimization problem </h2>
443+
444+
<p>The cost function is a function of the unknown parameters
445+
\( \boldsymbol{\Theta} \) where the latter is a container for all possible
446+
parameters needed to define a neural network
447+
</p>
448+
449+
<p>If we are dealing with a regression task a typical cost/loss function
450+
is the mean squared error
451+
</p>
452+
$$
453+
C(\boldsymbol{\Theta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)\right\}.
454+
$$
455+
456+
<p>This function represents one of many possible ways to define
457+
the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters \( \boldsymbol{\Theta} \). This is in general not the case.
458+
</p>
459+
460+
<!-- !split -->
461+
<h2 id="parameters-of-neural-networks" class="anchor">Parameters of neural networks </h2>
462+
<p>For neural networks the parameters
463+
\( \boldsymbol{\Theta} \) are given by the so-called weights and biases (to be
464+
defined below).
465+
</p>
466+
467+
<p>The weights are given by matrix elements \( w_{ij}^{(l)} \) where the
468+
superscript indicates the layer number. The biases are typically given
469+
by vector elements representing each single node of a given layer,
470+
that is \( b_j^{(l)} \).
471+
</p>
472+
473+
<!-- !split -->
474+
<h2 id="other-ingredients-of-a-neural-network" class="anchor">Other ingredients of a neural network </h2>
475+
476+
<p>Having defined the architecture of a neural network, the optimization
477+
of the cost function with respect to the parameters \( \boldsymbol{\Theta} \),
478+
involves the calculations of gradients and their optimization. The
479+
gradients represent the derivatives of a multidimensional object and
480+
are often approximated by various gradient methods, including
481+
</p>
482+
<ol>
483+
<li> various quasi-Newton methods,</li>
484+
<li> plain gradient descent (GD) with a constant learning rate \( \eta \),</li>
485+
<li> GD with momentum and other approximations to the learning rates such as</li>
486+
<ul>
487+
<li> Adapative gradient (ADAgrad)</li>
488+
<li> Root mean-square propagation (RMSprop)</li>
489+
<li> Adaptive gradient with momentum (ADAM) and many other</li>
490+
</ul>
491+
<li> Stochastic gradient descent and various families of learning rate approximations</li>
492+
</ol>
493+
<!-- !split -->
494+
<h2 id="other-parameters" class="anchor">Other parameters </h2>
495+
496+
<p>In addition to the above, there are often additional hyperparamaters
497+
which are included in the setup of a neural network. These will be
498+
discussed below.
499+
</p>
500+
404501
<!-- !split -->
405502
<h2 id="from-last-week-overarching-discussions-of-neural-networks-fine-tuning-neural-network-hyperparameters" class="anchor">From last week, overarching discussions of neural networks: Fine-tuning neural network hyperparameters </h2>
406503

doc/pub/week3/html/week3-reveal.html

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -247,6 +247,97 @@ <h2 id="reading-recommendations">Reading recommendations </h2>
247247
</ol>
248248
</section>
249249

250+
<section>
251+
<h2 id="from-last-week-overarching-view-of-a-neural-network">From last week: Overarching view of a neural network </h2>
252+
253+
<p>The architecture of a neural network defines our model. This model
254+
aims at describing some function \( f(\boldsymbol{x} \) that is meant to describe
255+
some final result (outputs or target values \( bm{y} \)) given a specific input
256+
\( \boldsymbol{x} \). Note that here \( \boldsymbol{y} \) and \( \boldsymbol{x} \) are not limited to be
257+
vectors.
258+
</p>
259+
260+
<p>The architecture consists of</p>
261+
<ol>
262+
<p><li> An input and an output layer where the input layer is defined by the inputs \( \boldsymbol{x} \). The output layer produces the model ouput \( \boldsymbol{\tilde{y}} \) which is compared with the target value \( \boldsymbol{y} \)</li>
263+
<p><li> A given number of hidden layers and neurons/nodes/units for each layer (this may vary)</li>
264+
<p><li> A given activation function \( \sigma(\boldsymbol{z}) \) with arguments \( \boldsymbol{z} \) to be defined below. The activation functions may differ from layer to layer.</li>
265+
<p><li> The last layer, normally called <b>output</b> layer has an activation function tailored to the specific problem</li>
266+
<p><li> Finally, we define a so-called cost or loss function which is used to gauge the quality of our model.</li>
267+
</ol>
268+
</section>
269+
270+
<section>
271+
<h2 id="the-optimization-problem">The optimization problem </h2>
272+
273+
<p>The cost function is a function of the unknown parameters
274+
\( \boldsymbol{\Theta} \) where the latter is a container for all possible
275+
parameters needed to define a neural network
276+
</p>
277+
278+
<p>If we are dealing with a regression task a typical cost/loss function
279+
is the mean squared error
280+
</p>
281+
<p>&nbsp;<br>
282+
$$
283+
C(\boldsymbol{\Theta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)\right\}.
284+
$$
285+
<p>&nbsp;<br>
286+
287+
<p>This function represents one of many possible ways to define
288+
the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters \( \boldsymbol{\Theta} \). This is in general not the case.
289+
</p>
290+
</section>
291+
292+
<section>
293+
<h2 id="parameters-of-neural-networks">Parameters of neural networks </h2>
294+
<p>For neural networks the parameters
295+
\( \boldsymbol{\Theta} \) are given by the so-called weights and biases (to be
296+
defined below).
297+
</p>
298+
299+
<p>The weights are given by matrix elements \( w_{ij}^{(l)} \) where the
300+
superscript indicates the layer number. The biases are typically given
301+
by vector elements representing each single node of a given layer,
302+
that is \( b_j^{(l)} \).
303+
</p>
304+
</section>
305+
306+
<section>
307+
<h2 id="other-ingredients-of-a-neural-network">Other ingredients of a neural network </h2>
308+
309+
<p>Having defined the architecture of a neural network, the optimization
310+
of the cost function with respect to the parameters \( \boldsymbol{\Theta} \),
311+
involves the calculations of gradients and their optimization. The
312+
gradients represent the derivatives of a multidimensional object and
313+
are often approximated by various gradient methods, including
314+
</p>
315+
<ol>
316+
<p><li> various quasi-Newton methods,</li>
317+
<p><li> plain gradient descent (GD) with a constant learning rate \( \eta \),</li>
318+
<p><li> GD with momentum and other approximations to the learning rates such as</li>
319+
<ul>
320+
321+
<p><li> Adapative gradient (ADAgrad)</li>
322+
323+
<p><li> Root mean-square propagation (RMSprop)</li>
324+
325+
<p><li> Adaptive gradient with momentum (ADAM) and many other</li>
326+
</ul>
327+
<p>
328+
<p><li> Stochastic gradient descent and various families of learning rate approximations</li>
329+
</ol>
330+
</section>
331+
332+
<section>
333+
<h2 id="other-parameters">Other parameters </h2>
334+
335+
<p>In addition to the above, there are often additional hyperparamaters
336+
which are included in the setup of a neural network. These will be
337+
discussed below.
338+
</p>
339+
</section>
340+
250341
<section>
251342
<h2 id="from-last-week-overarching-discussions-of-neural-networks-fine-tuning-neural-network-hyperparameters">From last week, overarching discussions of neural networks: Fine-tuning neural network hyperparameters </h2>
252343

doc/pub/week3/html/week3-solarized.html

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,20 @@
7373
None,
7474
'reminder-on-books-with-hands-on-material-and-codes'),
7575
('Reading recommendations', 2, None, 'reading-recommendations'),
76+
('From last week: Overarching view of a neural network',
77+
2,
78+
None,
79+
'from-last-week-overarching-view-of-a-neural-network'),
80+
('The optimization problem', 2, None, 'the-optimization-problem'),
81+
('Parameters of neural networks',
82+
2,
83+
None,
84+
'parameters-of-neural-networks'),
85+
('Other ingredients of a neural network',
86+
2,
87+
None,
88+
'other-ingredients-of-a-neural-network'),
89+
('Other parameters', 2, None, 'other-parameters'),
7690
('From last week, overarching discussions of neural networks: '
7791
'Fine-tuning neural network hyperparameters',
7892
2,
@@ -330,6 +344,84 @@ <h2 id="reading-recommendations">Reading recommendations </h2>
330344
<li> Rashkca et al., chapters 11-13 for NNs and chapter 14 for CNNs, jupyter-notebook sent separately, from <a href="https://github.com/rasbt/machine-learning-book" target="_blank">GitHub</a></li>
331345
<li> Goodfellow et al, chapter 6 and 7 contain most of the neural network background. For CNNs see chapter 9.</li>
332346
</ol>
347+
<!-- !split --><br><br><br><br><br><br><br><br><br><br>
348+
<h2 id="from-last-week-overarching-view-of-a-neural-network">From last week: Overarching view of a neural network </h2>
349+
350+
<p>The architecture of a neural network defines our model. This model
351+
aims at describing some function \( f(\boldsymbol{x} \) that is meant to describe
352+
some final result (outputs or target values \( bm{y} \)) given a specific input
353+
\( \boldsymbol{x} \). Note that here \( \boldsymbol{y} \) and \( \boldsymbol{x} \) are not limited to be
354+
vectors.
355+
</p>
356+
357+
<p>The architecture consists of</p>
358+
<ol>
359+
<li> An input and an output layer where the input layer is defined by the inputs \( \boldsymbol{x} \). The output layer produces the model ouput \( \boldsymbol{\tilde{y}} \) which is compared with the target value \( \boldsymbol{y} \)</li>
360+
<li> A given number of hidden layers and neurons/nodes/units for each layer (this may vary)</li>
361+
<li> A given activation function \( \sigma(\boldsymbol{z}) \) with arguments \( \boldsymbol{z} \) to be defined below. The activation functions may differ from layer to layer.</li>
362+
<li> The last layer, normally called <b>output</b> layer has an activation function tailored to the specific problem</li>
363+
<li> Finally, we define a so-called cost or loss function which is used to gauge the quality of our model.</li>
364+
</ol>
365+
<!-- !split --><br><br><br><br><br><br><br><br><br><br>
366+
<h2 id="the-optimization-problem">The optimization problem </h2>
367+
368+
<p>The cost function is a function of the unknown parameters
369+
\( \boldsymbol{\Theta} \) where the latter is a container for all possible
370+
parameters needed to define a neural network
371+
</p>
372+
373+
<p>If we are dealing with a regression task a typical cost/loss function
374+
is the mean squared error
375+
</p>
376+
$$
377+
C(\boldsymbol{\Theta})=\frac{1}{n}\left\{\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)^T\left(\boldsymbol{y}-\boldsymbol{X}\boldsymbol{\theta}\right)\right\}.
378+
$$
379+
380+
<p>This function represents one of many possible ways to define
381+
the so-called cost function. Note that here we have assumed a linear dependence in terms of the paramters \( \boldsymbol{\Theta} \). This is in general not the case.
382+
</p>
383+
384+
<!-- !split --><br><br><br><br><br><br><br><br><br><br>
385+
<h2 id="parameters-of-neural-networks">Parameters of neural networks </h2>
386+
<p>For neural networks the parameters
387+
\( \boldsymbol{\Theta} \) are given by the so-called weights and biases (to be
388+
defined below).
389+
</p>
390+
391+
<p>The weights are given by matrix elements \( w_{ij}^{(l)} \) where the
392+
superscript indicates the layer number. The biases are typically given
393+
by vector elements representing each single node of a given layer,
394+
that is \( b_j^{(l)} \).
395+
</p>
396+
397+
<!-- !split --><br><br><br><br><br><br><br><br><br><br>
398+
<h2 id="other-ingredients-of-a-neural-network">Other ingredients of a neural network </h2>
399+
400+
<p>Having defined the architecture of a neural network, the optimization
401+
of the cost function with respect to the parameters \( \boldsymbol{\Theta} \),
402+
involves the calculations of gradients and their optimization. The
403+
gradients represent the derivatives of a multidimensional object and
404+
are often approximated by various gradient methods, including
405+
</p>
406+
<ol>
407+
<li> various quasi-Newton methods,</li>
408+
<li> plain gradient descent (GD) with a constant learning rate \( \eta \),</li>
409+
<li> GD with momentum and other approximations to the learning rates such as</li>
410+
<ul>
411+
<li> Adapative gradient (ADAgrad)</li>
412+
<li> Root mean-square propagation (RMSprop)</li>
413+
<li> Adaptive gradient with momentum (ADAM) and many other</li>
414+
</ul>
415+
<li> Stochastic gradient descent and various families of learning rate approximations</li>
416+
</ol>
417+
<!-- !split --><br><br><br><br><br><br><br><br><br><br>
418+
<h2 id="other-parameters">Other parameters </h2>
419+
420+
<p>In addition to the above, there are often additional hyperparamaters
421+
which are included in the setup of a neural network. These will be
422+
discussed below.
423+
</p>
424+
333425
<!-- !split --><br><br><br><br><br><br><br><br><br><br>
334426
<h2 id="from-last-week-overarching-discussions-of-neural-networks-fine-tuning-neural-network-hyperparameters">From last week, overarching discussions of neural networks: Fine-tuning neural network hyperparameters </h2>
335427

0 commit comments

Comments
 (0)