Markovian Architectural Bias of Recurrent Neural Networks

Peter Tino; M Cernansky; L Benuskova

doi:10.1109/TNN.2003.820839

Markovian Architectural Bias of Recurrent Neural Networks

Peter Tino, M Cernansky, L Benuskova

Computer Science

Research output: Contribution to journal › Article

154 Citations (Scopus)

Abstract

In this paper, we elaborate upon the claim that clustering in the recurrent layer of recurrent neural networks (RNNs) reflects meaningful information processing states even prior to training [1], [2]. By concentrating on activation clusters in RNNs, while not throwing away the continuous state space network dynamics, we extract predictive models that we call neural prediction machines (NPMs). When RNNs with sigmoid activation functions are initialized with small weights (a common technique in the RNN community), the clusters of recurrent activations emerging prior to training are indeed meaningful and correspond to Markov prediction contexts. In this case, the extracted NPMs correspond to a class of Markov models, called variable memory length Markov models (VLMMs). In order to appreciate how much information has really been induced during the training, the RNN performance should always be compared with that of VLMMs and NPMs extracted before training as the "null" base models. Our arguments are supported by experiments on a chaotic symbolic sequence and a context-free language with a deep recursive structure. Index Terms-Complex symbolic sequences, information latching problem, iterative function systems, Markov models, recurrent neural networks (RNNs).

Original language	English
Pages (from-to)	6-15
Number of pages	10
Journal	IEEE Transactions on Neural Networks
Volume	15
Issue number	1
DOIs	https://doi.org/10.1109/TNN.2003.820839
Publication status	Published - 1 Jan 2004

Keywords

iterative function systems
Markov models
information latching problem
complex symbolic sequences
recurrent neural networks (RNNs)

Access to Document

10.1109/TNN.2003.820839

Cite this

@article{bc8bd381fa5147b7be68ab9d303dee75,

title = "Markovian Architectural Bias of Recurrent Neural Networks",

abstract = "In this paper, we elaborate upon the claim that clustering in the recurrent layer of recurrent neural networks (RNNs) reflects meaningful information processing states even prior to training [1], [2]. By concentrating on activation clusters in RNNs, while not throwing away the continuous state space network dynamics, we extract predictive models that we call neural prediction machines (NPMs). When RNNs with sigmoid activation functions are initialized with small weights (a common technique in the RNN community), the clusters of recurrent activations emerging prior to training are indeed meaningful and correspond to Markov prediction contexts. In this case, the extracted NPMs correspond to a class of Markov models, called variable memory length Markov models (VLMMs). In order to appreciate how much information has really been induced during the training, the RNN performance should always be compared with that of VLMMs and NPMs extracted before training as the {"}null{"} base models. Our arguments are supported by experiments on a chaotic symbolic sequence and a context-free language with a deep recursive structure. Index Terms-Complex symbolic sequences, information latching problem, iterative function systems, Markov models, recurrent neural networks (RNNs).",

keywords = "iterative function systems, Markov models, information latching problem, complex symbolic sequences, recurrent neural networks (RNNs)",

author = "Peter Tino and M Cernansky and L Benuskova",

year = "2004",

month = jan,

day = "1",

doi = "10.1109/TNN.2003.820839",

language = "English",

volume = "15",

pages = "6--15",

journal = "IEEE Transactions on Neural Networks",

publisher = "Institute of Electrical and Electronics Engineers (IEEE)",

number = "1",

}

TY - JOUR

T1 - Markovian Architectural Bias of Recurrent Neural Networks

AU - Tino, Peter

AU - Cernansky, M

AU - Benuskova, L

PY - 2004/1/1

Y1 - 2004/1/1

N2 - In this paper, we elaborate upon the claim that clustering in the recurrent layer of recurrent neural networks (RNNs) reflects meaningful information processing states even prior to training [1], [2]. By concentrating on activation clusters in RNNs, while not throwing away the continuous state space network dynamics, we extract predictive models that we call neural prediction machines (NPMs). When RNNs with sigmoid activation functions are initialized with small weights (a common technique in the RNN community), the clusters of recurrent activations emerging prior to training are indeed meaningful and correspond to Markov prediction contexts. In this case, the extracted NPMs correspond to a class of Markov models, called variable memory length Markov models (VLMMs). In order to appreciate how much information has really been induced during the training, the RNN performance should always be compared with that of VLMMs and NPMs extracted before training as the "null" base models. Our arguments are supported by experiments on a chaotic symbolic sequence and a context-free language with a deep recursive structure. Index Terms-Complex symbolic sequences, information latching problem, iterative function systems, Markov models, recurrent neural networks (RNNs).

AB - In this paper, we elaborate upon the claim that clustering in the recurrent layer of recurrent neural networks (RNNs) reflects meaningful information processing states even prior to training [1], [2]. By concentrating on activation clusters in RNNs, while not throwing away the continuous state space network dynamics, we extract predictive models that we call neural prediction machines (NPMs). When RNNs with sigmoid activation functions are initialized with small weights (a common technique in the RNN community), the clusters of recurrent activations emerging prior to training are indeed meaningful and correspond to Markov prediction contexts. In this case, the extracted NPMs correspond to a class of Markov models, called variable memory length Markov models (VLMMs). In order to appreciate how much information has really been induced during the training, the RNN performance should always be compared with that of VLMMs and NPMs extracted before training as the "null" base models. Our arguments are supported by experiments on a chaotic symbolic sequence and a context-free language with a deep recursive structure. Index Terms-Complex symbolic sequences, information latching problem, iterative function systems, Markov models, recurrent neural networks (RNNs).

KW - iterative function systems

KW - Markov models

KW - information latching problem

KW - complex symbolic sequences

KW - recurrent neural networks (RNNs)

UR - http://www.scopus.com/inward/record.url?scp=6044234526&partnerID=8YFLogxK

U2 - 10.1109/TNN.2003.820839

DO - 10.1109/TNN.2003.820839

M3 - Article

C2 - 15387243

VL - 15

SP - 6

EP - 15

JO - IEEE Transactions on Neural Networks

JF - IEEE Transactions on Neural Networks

IS - 1

ER -

Markovian Architectural Bias of Recurrent Neural Networks

Abstract

Keywords

Access to Document

Fingerprint

Cite this