After overcoming the initial struggles with maintaining context over time, the next leap in AI’s memory capabilities came with the development of the Long Short-Term Memory (LSTM) architecture, which added a new level of sophistication to how machines retain and process information.
Birth of the Long Short-Term Memory
In 1997, Hochreiter and Schmidhuber introduced the Long Short-Term Memory (LSTM) architecture. But to understand its genius, let's first understand the human memory system it mirrors:
Human Memory System LSTM Gates
------------------ ----------
Attention Filter → Input Gate
(what to remember)
Working Memory → Memory Cell
(current state)
Memory Consolidation → Forget Gate
(what to forget)
Memory Retrieval → Output Gate
(what to use)
Explore how memory systems in AI evolved in Part IV of this series.
The mathematics of the LSTM tells this story of selective memory:
Input Gate:
i_t = σ(W_i[h_(t-1), x_t] + b_i)
Forget Gate:
f_t = σ(W_f[h_(t-1), x_t] + b_f)
Memory Cell:
c_t = f_t ⊙ c_(t-1) + i_t ⊙ tanh(W_c[h_(t-1), x_t] + b_c)
Output Gate:
o_t = σ(W_o[h_(t-1), x_t] + b_o)
Each equation represents a crucial aspect of conscious memory:
- The Input Gate decides what new information is worth remembering
- The Forget Gate determines what old memories can fade
- The Memory Cell maintains the current state of understanding
- The Output Gate chooses what memories are relevant now
The Dance of Memory: How LSTMs Learn
Think of an LSTM as a master storyteller, constantly deciding:
- Which details to emphasize
- Which to let fade
- How to connect distant events
- When to recall earlier information
This mirrors how human consciousness works with memory:
Example: Reading a Mystery Novel
Human Process LSTM Process
------------- ------------
Note key clues → Input Gate activates
for important information
Hold suspects → Memory Cell maintains
in mind key details
Discard red → Forget Gate removes
herrings irrelevant information
Connect final → Output Gate retrieves
clues stored information