## Network Types

• Recurrent neural network - network which has cyclic connections (can do temporal stuff)
• Feed-forward neural network - network which does not have cyclic connections

## Activation Functions

### Logistic -

https://www.coursera.org/learn/neural-networks/lecture/wfTkN/learning-the-weights-of-a-logistic-output-neuron-4-min

### Softmax - converts real numbers into discrete categorical distribution

https://www.coursera.org/learn/neural-networks/lecture/68Koq/another-diversion-the-softmax-output-function-7-min

## Avoiding Overfitting / Improving Generalisation

https://www.coursera.org/learn/neural-networks/home/week/9

• Get more data
• Weight decay - L2 (squared) / L1 (absolute)
• Limit # hidden layers
• Early stopping
• Noise

## Hopfield Nets

https://www.coursera.org/learn/neural-networks/home/week/11

• One of the simplest kinds of energy based models (has global energy function)
• Used to store memories as distributed activity
• Composed of binary threshold units (units which can only have one of two values - usually 0 and 1 or -1 and 1)

### Energy Function

• The global energy function is the sum of many contributions
• Each contribution comes from one connection weight and the binary states of two neurons
• Low energy is good!

E = - sum_(i) s_i b_i - sum_(i<j) s_i s_j w_(ij)

• w_ij - symmetric connection strength between neurons i and j
• b_i - bias term for unit i
• s_i - activity for unit i (0 or 1)
• s_j - activity for unit j (0 or 1)

The bias term is like a unit that is always on

#### Energy Gap

The amount unit i will change the global energy E

Delta E_i = E(s_i = 0) - E(s_i = 1) = b_i + sum_j s_j w_ij

#### Update Rule

When using 0 and 1 for binary thresholds

Delta w_(ij) = 4(s_i - 1/2) (s_j - 1/2)

When using 1 and -1 for binary thresholds

Delta w_(ij) = s_i s_j

### Binary Stochastic Units

• Binary units with some random noise added in
• The noise helps to escape from spurious minima
• Have a concept of “temperature” - the size of random jump

P(S_i = 1) = 1 / (1 + e^(-(Delta E_i) / T)

• T - temperature
• s_i - activity for unit i

This idea of using a temperature is called simulated annealing

## Boltzman Machines

Boltzman machines are stochastic Hopfield nets with hidden units

• Good at modelling binary data
• e.g. for a document represented by binary word vector - find a similar document