Bidirectional RNNs

Short Definition

Bidirectional RNNs are recurrent neural networks that process sequences in both forward and backward directions to capture past and future context.

Definition

A Bidirectional Recurrent Neural Network (BiRNN) consists of two separate recurrent layers: one processes the sequence from left to right (forward pass), and the other processes it from right to left (backward pass). Their hidden states are combined at each time step, allowing the model to use information from both past and future tokens.

Context becomes two-sided.

Why It Matters

Standard RNNs only incorporate past context:

  • prediction at time t depends on inputs ≤ t

However, in many tasks:

  • future tokens are also informative
  • full-sequence context improves accuracy
  • ambiguity can be resolved by looking ahead

Bidirectionality increases representational power.

Core Mechanism

For input sequence ( x_1, x_2, …, x_T ):

Forward pass:

h_t^→ = f(x_t, h_{t-1}^→)

Backward pass:

h_t^← = f(x_t, h_{t+1}^←)

Final representation:

h_t = concat(h_t^→, h_t^←)

Each position sees both past and future.

Minimal Conceptual Illustration

Forward: x₁ → x₂ → x₃ → x₄
Backward: x₁ ← x₂ ← x₃ ← x₄
Combined:

[h₁^→ ; h₁^←]

[h₂^→ ; h₂^←]

[h₃^→ ; h₃^←]

[h₄^→ ; h₄^←]

Information flows in both temporal directions.

Common Implementations

Bidirectionality can be applied to:

  • Vanilla RNNs
  • LSTMs (BiLSTM)
  • GRUs (BiGRU)

BiLSTMs are particularly common in NLP.

Applications

Bidirectional RNNs have been widely used in:

  • Named entity recognition
  • Part-of-speech tagging
  • Sentiment analysis
  • Speech processing
  • Sequence labeling tasks

They excel when full sequence context is available.

Limitations

Bidirectional models:

  • cannot be used in real-time streaming (future context unavailable)
  • require full sequence before prediction
  • double computational cost compared to unidirectional RNNs

Bidirectionality sacrifices causality.

Bidirectional RNN vs Standard RNN

AspectStandard RNNBidirectional RNN
ContextPast onlyPast + Future
Streaming useYesNo
Parameter countLowerHigher
Accuracy (offline)LowerHigher

Future context improves representation.

Relationship to Seq2Seq

Bidirectional encoders are often used in:

  • Seq2Seq encoder stages
  • Machine translation systems
  • Text classification pipelines

The decoder remains autoregressive.

Comparison to Transformers

Transformers:

  • use full-sequence self-attention
  • achieve bidirectional context without recurrence
  • parallelize across time steps

Attention generalizes bidirectionality.

Practical Considerations

When using Bidirectional RNNs:

  • ensure full sequence availability
  • monitor memory usage
  • apply dropout for regularization
  • avoid use in strictly causal systems

Use when future context is valid.

Common Pitfalls

  • applying bidirectional models in streaming tasks
  • forgetting doubled hidden dimension after concatenation
  • underestimating memory cost
  • assuming bidirectionality solves long-term dependency limits

Bidirectional is not omniscient.

Summary Characteristics

AspectBidirectional RNN
Architecture typeDual recurrent
ContextPast + Future
Use caseOffline sequence tasks
Streaming compatibilityNo
Modern alternativeTransformers

Related Concepts