The blog post updated in December, 2017 based on feedback from @AlexSherstinsky; Thanks!
This is a simple implementation of Long short-term memory (LSTM) module on numpy from scratch. This is for learning purposes. The network is trained with stochastic gradient descent with a batch size of 1 using AdaGrad algorithm (with momentum).
You can download the jupyter notebook from http://blog.varunajayasiri.com/ml/numpy_lstm.ipynb
The model usually reaches an error of about 45 after 5000 iterations when tested with 100,000 character sample from Shakespeare. However it sometimes get stuck in a local minima; reinitialize the weights if this happens.
You need to place the input text file as `input.txt` in the same folder as the python code.