Architectural Bias in Recurrent Neural Networks: Fractal Analysis

Peter Tino; B Hammer

doi:10.1162/08997660360675099

Architectural Bias in Recurrent Neural Networks: Fractal Analysis

Peter Tino, B Hammer

Computer Science

Research output: Contribution to journal › Article

21 Citations (Scopus)

Abstract

We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002, Tino, Cernansky, & Benuskova, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram-a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.

Original language	English
Pages (from-to)	1931-1957
Number of pages	27
Journal	Neural Computation
Volume	15
Issue number	8
DOIs	https://doi.org/10.1162/08997660360675099
Publication status	Published - 1 Aug 2003

Access to Document

10.1162/08997660360675099

Cite this

@article{c59fd09f2ad54deba2a0aa011ea73620,

title = "Architectural Bias in Recurrent Neural Networks: Fractal Analysis",

abstract = "We have recently shown that when initialized with {"}small{"} weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002, Tino, Cernansky, & Benuskova, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram-a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.",

author = "Peter Tino and B Hammer",

year = "2003",

month = aug,

day = "1",

doi = "10.1162/08997660360675099",

language = "English",

volume = "15",

pages = "1931--1957",

journal = "Neural Computation",

issn = "1530-888X",

publisher = "Massachusetts Institute of Technology Press",

number = "8",

}

TY - JOUR

T1 - Architectural Bias in Recurrent Neural Networks: Fractal Analysis

AU - Tino, Peter

AU - Hammer, B

PY - 2003/8/1

Y1 - 2003/8/1

N2 - We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002, Tino, Cernansky, & Benuskova, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram-a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.

AB - We have recently shown that when initialized with "small" weights, recurrent neural networks (RNNs) with standard sigmoid-type activation functions are inherently biased toward Markov models; even prior to any training, RNN dynamics can be readily used to extract finite memory machines (Hammer & Tino, 2002, Tino, Cernansky, & Benuskova, 2002a, 2002b). Following Christiansen and Chater (1999), we refer to this phenomenon as the architectural bias of RNNs. In this article, we extend our work on the architectural bias in RNNs by performing a rigorous fractal analysis of recurrent activation patterns. We assume the network is driven by sequences obtained by traversing an underlying finite-state transition diagram-a scenario that has been frequently considered in the past, for example, when studying RNN-based learning and implementation of regular grammars and finite-state transducers. We obtain lower and upper bounds on various types of fractal dimensions, such as box counting and Hausdorff dimensions. It turns out that not only can the recurrent activations inside RNNs with small initial weights be explored to build Markovian predictive models, but also the activations form fractal clusters, the dimension of which can be bounded by the scaled entropy of the underlying driving source. The scaling factors are fixed and are given by the RNN parameters.

UR - http://www.scopus.com/inward/record.url?scp=0042827445&partnerID=8YFLogxK

U2 - 10.1162/08997660360675099

DO - 10.1162/08997660360675099

M3 - Article

SN - 1530-888X

VL - 15

SP - 1931

EP - 1957

JO - Neural Computation

JF - Neural Computation

IS - 8

ER -

Architectural Bias in Recurrent Neural Networks: Fractal Analysis

Abstract

Access to Document

Fingerprint

Cite this