Skip to content

Commit 14a76bf

Browse files
authored
Merge pull request #30 from aperloff/bayes_nn
Bayesian Neural Network Documentation
2 parents 6292633 + 68ba211 commit 14a76bf

File tree

8 files changed

+305
-0
lines changed

8 files changed

+305
-0
lines changed
625 KB
Loading
208 KB
Loading
836 KB
Loading
441 KB
Loading
85.4 KB
Loading
482 KB
Loading

content/training/BayesianNN.md

Lines changed: 304 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,304 @@
1+
# Bayesian Neural Network
2+
3+
Usually, neural networks are optimized in order to get a fixed value for the weights and biases which allow the model perform a specific task successfully. In a Bayesian neural network the weights and biases are distributed rather than fixed. This type of model could be treated as an ensemble of many neural networks, train using a Bayesian inference.
4+
5+
Using a Bayesian approach for the neural network training allows the analyzer to estimate the uncertainty and to make the decision of the model more robust against the input data.
6+
7+
8+
9+
### Difference between usual NN and BNN
10+
11+
![Placeholder](../images/training/BayesianNN/diff.png)
12+
13+
14+
### Training of NN and BNN
15+
16+
=== "NN"
17+
![Placeholder](../images/training/BayesianNN/trainingNN.png)
18+
The parameters ![formula](https://render.githubusercontent.com/render/math?math=\theta ) are optimized in order to minimaze the loss function.
19+
20+
=== "BNN"
21+
![Placeholder](../images/training/BayesianNN/bayesNN.png)
22+
The process is to learn the probability distributions for weights and biases that maximize the likelihood of getting a high probability for the correct data/label ![formula](https://render.githubusercontent.com/render/math?math=D(x,y) ) pairs. The parameters of the weight distributions -- mean and standard deviation -- are the results of the loss function optimization.
23+
24+
#### Training Procedure
25+
26+
1. Introduce the prior distribution over model parameter w
27+
2. Compute posterio p(w|D) using Bayesian rule
28+
3. Take the average over the posterior distribution
29+
30+
### Prediction of NN and BNN
31+
32+
=== "NN"
33+
![Placeholder](../images/training/BayesianNN/PredictionNN.png)
34+
35+
=== "BNN"
36+
![Placeholder](../images/training/BayesianNN/PredictionBNN.png)
37+
38+
### Uncertainty
39+
40+
There are two types of BNN uncertainties:
41+
42+
=== "Alletonic"
43+
Alletonic - uncertainties due to the lack of knowledge, comes from data or enviroment
44+
![formula](https://render.githubusercontent.com/render/math?math=p (\theta|D) )
45+
=== "Epistemic"
46+
Epistemic - uncertainties of the model parameter
47+
![formula](https://render.githubusercontent.com/render/math?math=p(y|x,\theta))
48+
49+
## Packages
50+
51+
Here we will list a few of the machine learning packages which can be used to develop a probabilistic neural network.
52+
53+
=== "Tensorflow"
54+
```python linenums="1"
55+
pip install --upgrade tensorflow-probability
56+
```
57+
=== "Pyro"
58+
```python linenums="1"
59+
pip install pyro
60+
```
61+
62+
63+
## Modules Description:
64+
65+
### Distribution and sampling
66+
67+
=== "Tensorflow"
68+
69+
=== "Pyro"
70+
71+
### Distribution and sampling
72+
73+
=== "Tensorflow"
74+
75+
=== "Pyro"
76+
77+
## Example
78+
79+
Let's consider simple linear regression as an example and compare it to the Bayesian analog.
80+
81+
Lets consider simple dataset D(x, y) and we want to fit some linear function:
82+
y=ax+b+e, where a,b are learnable parameters and e is observation noise.
83+
84+
=== "Synthetic dataset"
85+
```python linenums="1"
86+
87+
import numpy as np
88+
w0 = 0.125
89+
b0 = 5.
90+
x_range = [-20, 60]
91+
92+
def load_dataset(n=150, n_tst=150):
93+
np.random.seed(43)
94+
def s(x):
95+
g = (x - x_range[0]) / (x_range[1] - x_range[0])
96+
return 3 * (0.25 + g**2.)
97+
x = (x_range[1] - x_range[0]) * np.random.rand(n) + x_range[0]
98+
eps = np.random.randn(n) * s(x)
99+
y = (w0 * x * (1. + np.sin(x)) + b0) + eps
100+
x = x[..., np.newaxis]
101+
x_tst = np.linspace(*x_range, num=n_tst).astype(np.float32)
102+
x_tst = x_tst[..., np.newaxis]
103+
return y, x, x_tst
104+
105+
y, x, x_tst = load_dataset()
106+
```
107+
108+
=== "tensorflow_probability"
109+
110+
Let's consider you write your network model in a single `tf.function`.
111+
112+
```python linenums="1"
113+
import tensorflow as tf
114+
import tensorflow_probability as tfp
115+
tfd = tfp.distributions
116+
117+
# Build model.
118+
model = tf.keras.Sequential([
119+
tf.keras.layers.Dense(1),
120+
tfp.layers.DistributionLambda(lambda t: tfd.Normal(loc=t, scale=1)),
121+
])
122+
123+
# Define the loss:
124+
negloglik = lambda y, rv_y: -rv_y.log_prob(y)
125+
126+
# Do inference.
127+
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.05), loss=negloglik)
128+
model.fit(x, y, epochs=500, verbose=False)
129+
130+
# Make predictions.
131+
yhat = model(x_tst)
132+
```
133+
134+
=== "pyro"
135+
136+
```python linenums="1"
137+
# coding: utf-8
138+
139+
from pyro.nn import PyroSample
140+
141+
# Specify model.
142+
143+
class BayesianRegression(PyroModule):
144+
def __init__(self, in_features, out_features):
145+
super().__init__()
146+
self.linear = PyroModule[nn.Linear](in_features, out_features)
147+
self.linear.weight = PyroSample(dist.Normal(0., 1.).expand([out_features, in_features]).to_event(2))
148+
self.linear.bias = PyroSample(dist.Normal(0., 10.).expand([out_features]).to_event(1))
149+
150+
def forward(self, x, y=None):
151+
sigma = pyro.sample("sigma", dist.Uniform(0., 10.))
152+
mean = self.linear(x).squeeze(-1)
153+
with pyro.plate("data", x.shape[0]):
154+
obs = pyro.sample("obs", dist.Normal(mean, sigma), obs=y)
155+
return mean
156+
157+
158+
159+
# Build model.
160+
model = BayesianRegression()
161+
162+
# Fit model given data.
163+
coeffs, linear_response, is_converged, num_iter = tfp.glm.fit(
164+
model_matrix=features[:, tf.newaxis],
165+
response=tf.cast(labels, dtype=tf.float32),
166+
model=model)
167+
# ==> coeffs is approximately [1.618] (We're golden!)
168+
169+
# Do inference.
170+
model.compile(optimizer=tf.optimizers.Adam(learning_rate=0.01), loss=negloglik)
171+
model.fit(x, y, epochs=1000, verbose=False);
172+
173+
# Profit.
174+
[print(np.squeeze(w.numpy())) for w in model.weights];
175+
yhat = model(x_tst)
176+
assert isinstance(yhat, tfd.Distribution)
177+
178+
```
179+
180+
181+
The output of the model:
182+
183+
![Placeholder](../images/training/BayesianNN/lr.png)
184+
185+
186+
## Variational Autoencoder
187+
188+
Generative models can be built using a Bayesian neural network. The variational autoencoder is one popular way to forma generative model.
189+
190+
Let's consider the example of generating the images:
191+
192+
The generating process consist of two steps:
193+
194+
1. Sampling the latent variable from prior distribution
195+
196+
2. Drawing the sample from stochastic process ![formula](https://render.githubusercontent.com/render/math?math=x-p(z|x))
197+
198+
Objective:
199+
200+
![formula](https://render.githubusercontent.com/render/math?math=p(z)) the prior on the latent representation ![formula](https://render.githubusercontent.com/render/math?math=z) ,
201+
![formula](https://render.githubusercontent.com/render/math?math=q(z|x)), the variational encoder, and
202+
![formula](https://render.githubusercontent.com/render/math?math=p(x|z)), the decoder — how likely is the image x given the latent representation z.
203+
204+
### Loss
205+
206+
Once we define the procedure for the generation process the objective function should be chosen for the optimization process. In order to train the network, we maximize the ELBO (Evidence Lower Bound) objective.
207+
208+
209+
### Prior
210+
p(z), the prior on the latent representation z,
211+
212+
q(z|x), the variational encoder, and
213+
214+
p(x|z), the decoder — how likely is the image x given the latent representation z.
215+
216+
217+
### Encoder and Decoder
218+
=== "tensorflow"
219+
220+
```python linenums="1"
221+
```
222+
=== "pyro"
223+
224+
```python linenums="1"
225+
```
226+
227+
### Training
228+
=== "tensorflow"
229+
230+
```python linenums="1"
231+
```
232+
=== "pyro"
233+
234+
```python linenums="1"
235+
```
236+
237+
238+
### Results
239+
=== "tensorflow"
240+
241+
```python linenums="1"
242+
```
243+
=== "pyro"
244+
245+
```python linenums="1"
246+
```
247+
248+
249+
250+
## Normalizing Flows
251+
252+
### Defition
253+
254+
=== "tensorflow"
255+
256+
```python linenums="1"
257+
```
258+
=== "pyro"
259+
260+
```python linenums="1"
261+
```
262+
263+
### Training
264+
=== "tensorflow"
265+
266+
```python linenums="1"
267+
```
268+
=== "pyro"
269+
270+
```python linenums="1"
271+
```
272+
273+
### Inference
274+
=== "tensorflow"
275+
276+
```python linenums="1"
277+
```
278+
=== "pyro"
279+
280+
```python linenums="1"
281+
```
282+
283+
## Resources
284+
285+
286+
### Bayesian NN
287+
288+
1. https://arxiv.org/pdf/2007.06823.pdf
289+
2. http://krasserm.github.io/2019/03/14/bayesian-neural-networks/
290+
3. https://arxiv.org/pdf/1807.02811.pdf
291+
292+
### Normalizing Flow:
293+
294+
1. https://arxiv.org/abs/1908.09257
295+
2. https://arxiv.org/pdf/1505.05770.pdf
296+
297+
### Variational AutoEncoder:
298+
299+
1. https://arxiv.org/abs/1312.6114
300+
2. https://pyro.ai/examples/vae.html
301+
3. https://www.tensorflow.org/probability/examples/Probabilistic_Layers_VAE
302+
303+
304+

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -164,6 +164,7 @@ nav:
164164
- Successful integrations:
165165
- ParticleNet: inference/particlenet.md
166166
- Training:
167+
- Bayesian Neural Network: training/BayesianNN.md
167168
- Decorrelation: training/Decorrelation.md
168169
- Training as a Service:
169170
- MLaaS4HEP: training/MLaaS4HEP.md

0 commit comments

Comments
 (0)