## Network Types

- Recurrent neural network - network which has cyclic connections (can do temporal stuff)
- Feed-forward neural network - network which does not have cyclic connections

## Activation Functions

### Logistic -

https://www.coursera.org/learn/neural-networks/lecture/wfTkN/learning-the-weights-of-a-logistic-output-neuron-4-min

### Softmax - converts real numbers into discrete categorical distribution

https://www.coursera.org/learn/neural-networks/lecture/68Koq/another-diversion-the-softmax-output-function-7-min

## Avoiding Overfitting / Improving Generalisation

https://www.coursera.org/learn/neural-networks/home/week/9

- Get more data
- Weight decay - L2 (squared) / L1 (absolute)
- Limit # hidden layers
- Early stopping
- Noise

## Hopfield Nets

https://www.coursera.org/learn/neural-networks/home/week/11

- One of the simplest kinds of energy based models (has global energy function)
- Used to store memories as distributed activity
- Composed of binary threshold units (units which can only have one of two values - usually 0 and 1 or -1 and 1)

### Energy Function

- The global energy function is the sum of many
*contributions*
- Each contribution comes from one connection weight and the binary states of two neurons
- Low energy is good!

`E = - sum_(i) s_i b_i - sum_(i<j) s_i s_j w_(ij)`

- w_ij - symmetric connection strength between neurons i and j
- b_i - bias term for unit i
- s_i - activity for unit i (0 or 1)
- s_j - activity for unit j (0 or 1)

The bias term is like a unit that is always on

#### Energy Gap

The amount unit i will change the global energy E

`Delta E_i = E(s_i = 0) - E(s_i = 1) = b_i + sum_j s_j w_ij`

#### Update Rule

*When using 0 and 1 for binary thresholds*

`Delta w_(ij) = 4(s_i - 1/2) (s_j - 1/2)`

*When using 1 and -1 for binary thresholds*

`Delta w_(ij) = s_i s_j`

### Binary Stochastic Units

- Binary units with some random noise added in
- The noise helps to escape from spurious minima
- Have a concept of “temperature” - the size of random jump

`P(S_i = 1) = 1 / (1 + e^(-(Delta E_i) / T)`

- T - temperature
- s_i - activity for unit i

This idea of using a temperature is called *simulated annealing*

## Boltzman Machines

Boltzman machines are stochastic Hopfield nets with hidden units

- Good at modelling binary data
- e.g. for a document represented by binary word vector - find a similar document