Architectural Constraints on Feature Emergence in Neural Audio Models

2024 – Present, Yale University

Project Overview

Through their mathematics, modern neural architectures define, in advance, which internal representations they can reliably stabilize. This project investigates how architectural structure constrains the representational capacity of neural audio models.

I distinguish between a model’s latent ontology (the representational affordances implied by its architecture) and its operational ontology (the internal structures that actually stabilize under training). While training determines which features emerge, architecture determines what kinds of features are even capable of becoming independently controllable internal variables. This distinction allows us to separate limitations imposed by data from those imposed by model design.

Methodologically, this work combines mathematical analysis of operator structure with intervention-based probing of trained models. Across a family of neural audio models, I show that compression and downsampling structurally induce feature entanglement, creating a representational ceiling independent of scale. Certain oscillatory features, despite being foundational to human semantic descriptions of sound, rarely stabilize as internal variables except in sparse edge cases. Increasing model or dataset size does not eliminate the entanglement problem imposed by architectural constraints (in the set of models I've studied).

Taken together, this research argues that steerability and safety are not purely matters of training dynamics, data, or scale. They are also bounded by the representational affordances embedded in architectural design.

Check out my dissertation blog

Publications & Talks

*The paper and talks below were given over a very early, cursory version of this project. In the time since, I've gone into much greater detail.

Publications

N. Cosme-Clifford, “Decomposing Audio into Timbral Features with Convolutional Neural Networks”, 2024 IEEE International Conference on Big Data (BigData), Washington D.C., 2024, pp. 3168-3173.

Talks

IEEE BigData 2024, Washington D.C. (Dec. 2024)
Acoustical Society of America Annual Meeting, Virtual (Nov. 2024)
New England Conference of Music Theorists, Boston University (Apr. 2024) ––Keynote panel