Recurrent Neural Network
Also known as: RNN, recurrent net, recurrent network
- Recurrent Neural Network
- A recurrent neural network (RNN) is a neural network architecture that processes sequential data by passing a hidden state from one time step to the next, allowing the model to retain and use information from earlier inputs when making predictions.
A recurrent neural network is a type of neural network that processes sequential data by maintaining a hidden state — an internal memory that carries information from previous inputs forward through each step.
What It Is
Most data you work with every day has an order that matters. Chat messages, stock prices, sensor readings, typed text — the meaning changes depending on what came before. Standard neural networks treat each input independently, as if every data point arrived in isolation. Recurrent neural networks solve this by building memory directly into the architecture, giving the model a way to remember what it has already seen.
Think of reading a sentence word by word. You don’t start over at each word — you carry the meaning of earlier words forward in your head. An RNN works the same way. At each step in the sequence (called a “time step”), the network receives two inputs: the current data point and a hidden state from the previous step. The hidden state functions as a running summary of everything the network has processed so far. The network combines both, produces an output, and passes an updated hidden state to the next time step.
This loop — current input plus previous hidden state in, output plus updated hidden state out — is what makes the network “recurrent.” The same set of weights (the learned parameters) is applied at every time step, so the network learns one set of rules and reuses them across the entire sequence. This weight sharing makes RNNs naturally suited for sequences of varying length, whether that’s a three-word phrase or a full paragraph.
Basic RNNs have a well-known weakness: they struggle with long sequences. As information passes through many time steps, the hidden state gradually loses track of earlier inputs. This problem, called the vanishing gradient, means the network effectively forgets what happened dozens of steps ago. Two widely adopted solutions address this. Long Short-Term Memory (LSTM) networks add gating mechanisms — selective filters that control what the hidden state keeps, discards, or updates at each step. Gated Recurrent Units (GRU) offer a simpler variant with fewer gates but similar benefits. Both architectures let the network carry important information across hundreds of time steps without losing it.
How It’s Used in Practice
If you’ve ever used predictive text on your phone, you’ve seen the result of sequence modeling — the core task RNNs were designed for. The keyboard predicts your next word based on the words you’ve already typed, treating your sentence as an ordered sequence where each word depends on what came before.
RNNs and their LSTM/GRU variants power a range of sequence-dependent tasks. In natural language processing, they handle text classification, sentiment analysis, and machine translation by reading input sequences token by token and building context through hidden states. In speech recognition, they convert audio signals (a sequence of sound samples over time) into text. Time series forecasting — predicting tomorrow’s demand based on last month’s sales patterns — is another common application where the sequential nature of the data makes RNNs a natural fit.
While transformer-based models like GPT and Claude have largely replaced RNNs for text generation tasks, RNNs remain widely used on edge devices (where model size matters), in real-time signal processing, and as building blocks inside larger systems. Understanding how hidden states carry context forward is also foundational for grasping how newer architectures like xLSTM extend these ideas.
Pro Tip: If you’re choosing between an RNN and a transformer for a sequence task, start with the sequence length. RNNs handle short-to-medium sequences efficiently with less compute. For long documents or tasks requiring attention across thousands of tokens, transformers typically perform better.
When to Use / When Not
| Scenario | Use | Avoid |
|---|---|---|
| Short text sequences like product reviews or chat messages | ✅ | |
| Documents longer than a few thousand tokens | ❌ | |
| Real-time sensor data on resource-constrained devices | ✅ | |
| Tasks requiring attention across an entire long document | ❌ | |
| Time series forecasting with clear sequential patterns | ✅ | |
| Image classification where pixel order doesn’t matter | ❌ |
Common Misconception
Myth: RNNs are completely obsolete now that transformers exist, so there’s no reason to learn how they work. Reality: RNNs and their LSTM/GRU variants are still actively deployed in production, especially on edge devices, in real-time audio processing, and in systems where computational budget is limited. Understanding how hidden states carry context through a sequence is also foundational for grasping newer recurrent architectures like xLSTM that blend recurrent principles with modern design.
One Sentence to Remember
An RNN gives a neural network short-term memory by passing a hidden state from one step to the next — which is exactly why it works for any task where the order of data matters, and why its limitations inspired the gated architectures that followed.
FAQ
Q: What is the difference between an RNN and a standard neural network? A: A standard neural network processes each input independently. An RNN adds a feedback loop through hidden states, allowing it to carry information from earlier inputs forward and handle sequential data.
Q: Why do basic RNNs struggle with long sequences? A: The vanishing gradient problem causes hidden state updates to shrink over many time steps. Earlier information fades, making it hard for the network to connect distant parts of a sequence.
Q: How does an LSTM improve on a basic RNN? A: An LSTM adds gates — forget, input, and output — that selectively control what information flows through the hidden state. This lets the network preserve important context across much longer sequences.
Expert Takes
RNNs introduced a deceptively simple idea: feed the output back into the input. That single design choice — the hidden state loop — turned a static function into something that could track temporal dependencies. Not memory in the human sense. A compressed summary that updates with every step. The vanishing gradient was not a bug to fix later. It was a fundamental constraint that shaped every recurrent architecture that followed, from LSTM to xLSTM.
When a sequence model ignores earlier context, the usual culprit is the hidden state capacity — not the training data. Basic RNNs pass one fixed-size vector through every step, and that vector silently drops information as sequences grow. The fix was gating: LSTM and GRU architectures let the model decide what to keep and what to discard at each step. If you’re building context-aware workflows, the principle applies directly — define what persists between steps and what gets cleared.
RNNs are not the headline model anymore, and that’s exactly why they matter for practical decisions. Transformers own the spotlight, but they also own the compute bill. For teams running real-time inference on constrained hardware — think mobile apps, IoT sensors, or embedded systems — RNN variants still deliver results at a fraction of the cost. You either pick the right architecture for your constraints, or you overspend on infrastructure your use case never needed.
Hidden states in an RNN compress an entire input history into a single vector. That compression is inherently lossy — and the model never tells you what it dropped. In safety-critical applications like medical monitoring or autonomous navigation, the forgotten data point could be the one that mattered most. When a recurrent model makes a decision based on a sequence, who verifies that the hidden state actually captured the relevant context?