Files
Abstract
While memory is fundamental in enabling intelligence, the development of neural memory architectures has largely fallen behind compared to the recent flourish of time-independent neural models. In this thesis, we contribute to the advancement of this field by proposing a novel neural memory model, Multigrid Neural Memory, in which we study in detail three orthogonal dimensions in building the system: architectural design, domain-agnostic processing, and local operators.
First, we introduce a radical new approach to endowing neural networks with access to long-term and large-scale memory. Architecting networks with internal multigrid structure and connectivity, while distributing memory cells alongside computation throughout this topology, we observe that coherent memory subsystems emerge as a result of training. Our design both drastically differs from and is far simpler than prior efforts, such as the recently proposed Differentiable Neural Computer (DNC), which uses intricately crafted controllers to connect neural networks to external memory banks. Our hierarchical spatial organization, parameterized convolutionally, permits efficient instantiation of large-capacity memories. Our multigrid topology provides short internal routing pathways, allowing convolutional networks to efficiently approximate the behavior of fully connected networks. Such networks have an implicit capacity for internal attention; augmented with memory, they learn to read and write specific memory locations in a dynamic data-dependent manner. We demonstrate these capabilities on synthetic exploration and mapping tasks, where our network is able to self-organize and retain long-term memory for trajectories of thousands of time steps, outperforming the DNC. On tasks without any notion of spatial geometry: sorting, associative recall, and question answering, our design functions as a truly generic memory and yields excellent results.
Second, we introduce a novel processing scheme that helps enabling domain-agnostic neural architectures. The domain-agnostic property represents the ability to handle data regardless of its nature, either in 1D, 2D, or higher dimensional forms. This property is enforced through the transformations between different spaces enabled by Hilbert curve and positional encoding, which preserve data locality in its original space. Data is then simultaneously and complementarily processed in multiple sub-spaces. The experiments in tasks involving 1D, 2D, or a combination of both data domains show the effectiveness and genericity of the proposed method.
Third, we further investigate the effect of various new local operators that do not exist in the original multigrid neural memory architecture. Particularly, we study different variants of comparison operators: self-comparison, cross-scale comparison, split-channel comparison, and spatial comparison. Among those operators, experimental results show that split-channel comparison exhibits consistent improvements, especially in settings where there are multiple sources of information.