Network Types
- Recurrent neural network - network which has cyclic connections (can do temporal stuff)
- Feed-forward neural network - network which does not have cyclic connections
Activation Functions
Logistic -
https://www.coursera.org/learn/neural-networks/lecture/wfTkN/learning-the-weights-of-a-logistic-output-neuron-4-min
Softmax - converts real numbers into discrete categorical distribution
https://www.coursera.org/learn/neural-networks/lecture/68Koq/another-diversion-the-softmax-output-function-7-min
Avoiding Overfitting / Improving Generalisation
https://www.coursera.org/learn/neural-networks/home/week/9
- Get more data
- Weight decay - L2 (squared) / L1 (absolute)
- Limit # hidden layers
- Early stopping
- Noise
Hopfield Nets
https://www.coursera.org/learn/neural-networks/home/week/11
- One of the simplest kinds of energy based models (has global energy function)
- Used to store memories as distributed activity
- Composed of binary threshold units (units which can only have one of two values - usually 0 and 1 or -1 and 1)
Energy Function
- The global energy function is the sum of many contributions
- Each contribution comes from one connection weight and the binary states of two neurons
- Low energy is good!
`E = - sum_(i) s_i b_i - sum_(i<j) s_i s_j w_(ij)`
- w_ij - symmetric connection strength between neurons i and j
- b_i - bias term for unit i
- s_i - activity for unit i (0 or 1)
- s_j - activity for unit j (0 or 1)
The bias term is like a unit that is always on
Energy Gap
The amount unit i will change the global energy E
`Delta E_i = E(s_i = 0) - E(s_i = 1) = b_i + sum_j s_j w_ij`
Update Rule
When using 0 and 1 for binary thresholds
`Delta w_(ij) = 4(s_i - 1/2) (s_j - 1/2)`
When using 1 and -1 for binary thresholds
`Delta w_(ij) = s_i s_j`
Binary Stochastic Units
- Binary units with some random noise added in
- The noise helps to escape from spurious minima
- Have a concept of “temperature” - the size of random jump
`P(S_i = 1) = 1 / (1 + e^(-(Delta E_i) / T)`
- T - temperature
- s_i - activity for unit i
This idea of using a temperature is called simulated annealing
Boltzman Machines
Boltzman machines are stochastic Hopfield nets with hidden units
- Good at modelling binary data
- e.g. for a document represented by binary word vector - find a similar document