Contents
Co-Creativity in Music, Sound, and AI
Improvisation, Interaction, Composition
Artificial intelligence is transforming contemporary music, sound, and audiovisual practices. This conference explores co-creativity as a dynamic interaction between human and computational agents, focusing on improvisation, interaction, and composition.
Conference Schedule
Keynote Lectures
Artificial intelligence systems such as Large Language Models learn systems of representation by ingesting vast corpora of human-authored texts. Through attention mechanisms in Transformer architectures, these systems evaluate relationships among tokens, generating context-aware probabilistic structures embedded in high-dimensional semantic spaces.
This keynote examines how such processes enable AI systems to infer implicit rules governing complex representational systems. It compares human cognition with machine-based forms of cognition, asking whether AI can be considered cognitive and, if so, in what sense.
The talk explores the nature of creativity in AI systems in relation to human creativity, addressing both their potential and their limitations. It concludes by proposing strategies for engaging AI as a co-creative partner in ways that stimulate, rather than diminish, human creative capacities.
Artificial intelligence in music has a long history, extending back to early experiments by Lejaren Hiller and Leonard Isaacson prior to the Dartmouth Conference of 1956. Recent technological developments—particularly the use of GPUs and pre-trained transformer models—have sparked a new wave of AI-based music practices, marking a renewed phase of exploration in co-creative systems.
At the same time, these developments have generated critical responses. Alongside ethical and environmental concerns, a central aesthetic critique has emerged: that AI-generated music tends toward the average, often described as "mid." This concern is especially evident in commercial AI music platforms such as Suno and Udio. A primary focus of this keynote is to examine compositional strategies that move beyond this tendency, exploring how artists can engage AI in ways that produce distinctive and compelling musical outcomes.
The talk will also address the shifting relationship between academic and industry-based research in AI music. Whereas earlier developments in computer music were largely driven by academic research, recent advances have been led by private-sector initiatives with access to large datasets and significant computational resources. In contrast, emerging academic practices emphasize smaller datasets, local computation, and critical engagement with the ethical implications of AI.
Finally, these perspectives will be situated within the presenter's long-term work in algorithmic sound design. Systems such as Mushroom and SLURP will be discussed as examples of generative approaches to sound processing, along with new possibilities for integrating AI into these established compositional frameworks.
Panel Discussion
This roundtable brings together scholars and music industry professionals to examine the evolving role of artificial intelligence in musical creativity. Topics include the impact of AI on institutions, industries, and labels, as well as its influence on collaboration, genre formation, and artistic practice. By bridging theoretical and practical perspectives, the panel aims to foster an open and dynamic exchange while encouraging active audience engagement.
Kathryn Agnes Huether is a Postdoctoral Research Associate in Antisemitism Studies at UCLA. Her research examines sound as a political and cultural force, connecting Holocaust and Genocide Studies with sound studies and media theory. She investigates how sonic practices mediate trauma, violence, and collective memory, and how listening becomes a site where ideologies are encoded and contested.
Her recent work extends to AI, authenticity, ethics, and voice, exploring how algorithmic systems reshape presence and testimony. She holds a Ph.D. in Musicology from the University of Minnesota and an M.A. in Religious Studies from the University of Colorado Boulder.
Frank Duchêne is a Belgian music producer, sound designer, and lecturer whose work blends musical practice, recording technology, and critical analysis. He began his career as a recording artist with Hooverphonic (Columbia Records) and served as an in-house engineer at Galaxy Studios. He then established a long-standing freelance practice as a producer, mixer, and engineer for artists, record labels, and audiovisual media.
For more than 18 years, Duchêne has been a key faculty member at PXL University of Applied Sciences and Arts in Belgium, teaching Music Production and overseeing international projects. His academic focus includes critical listening, production methods, and the evolving relationship between musical creativity and recording technologies—spanning analog, digital, and hybrid workflows.
He handles curriculum development and assessment and contributes to research on sound production as both an analytical and a creative practice. Outside academia, Duchêne remains active in music production and audio post-production, working on records, documentaries, podcasts, and radio plays. More recently, he has created location-based audio experiences for museums. His current interests focus on how emerging technologies—particularly AI-assisted tools—affect creative authorship, aesthetic judgment, and collaboration in modern music and sound production.
Mesmi is an artist, songwriter, producer and consultant based in Los Angeles. Over the years, her role expanded from singer-songwriter origins into the recording studio, gaining skills in production, engineering and mixing. After achieving placements in competitions like the International Songwriting Competition and GRAMMY Amplifier, Mesmi released her self-produced/engineered album "Slow Bloom," which the press described as "powerful, epic, yet fragile and beautiful." Independently distributed and marketed, the LP has since racked up 325K+ plays on streaming worldwide and led to her first television show music placement.
Mesmi was recently honored to be personally chosen and mentored by producer 9th Wonder (Jay-Z, Kendrick Lamar) as part of Sophia Chang's Unlock Her Potential program, for which she currently co-leads the UP Music Industry Chapter, as well as selected for the inaugural cohort of Paramount/MTV Group's First Time Composers Program. Aside from her own projects, Mesmi offers specialized services such as vocal training, production and music consulting through her company VATOCA Studios; she also founded and runs SOUND OFFF, a digital space dedicated to highlighting Asian Americans in the modern music industry, borne out of the need to increase AsAm visibility and strengthen ties within the community.
Amy Skjerseth's research explores intersections of music, media, material culture, and technology. Her forthcoming book Preprogrammed: How Electronic Presets Changed Music and Media (UC Press, 2026) examines the cultural impact of technological defaults from early radio to AI systems. She is also co-editor of The Routledge Companion to Voice and Identity and Principal Investigator of the UCHRI working group "Defying Defaults in Technology and Culture."
Liz Przybylski is a scholar of hip hop and the global popular music industry. She is the author of Sonic Sovereignty: Hip Hop, Indigeneity and Shifting Popular Music Mainstreams (NYU Press), and Hybrid Ethnography. Her work addresses music, technology, labor, and identity across contemporary media environments.
A recipient of an NEH Faculty Fellowship, she serves on the Board of the Society for Ethnomusicology and teaches courses on ethnographic methods, popular music, and cultural studies.
Session 1 — Posthuman Voice and Distributed Agency
This article reconfigures current debates on artificial intelligence (AI) in opera by shifting the focus from questions of authorship and machinic creativity to the infrastructural conditions that shape operatic experience. Rather than introducing mediation into opera, AI reveals mediation as its constitutive foundation, foregrounding the distributed systems that sustain voice, presence, liveness, and authority.
Drawing on media theory, performance studies, and posthuman thought, the study proposes a four-register model of vocality—voice as informational pattern, corporeal grain, iterable trace, and objectal surplus—to analyze how digital infrastructures redistribute agency across human and computational actors. Through this framework, opera emerges as a historically composite media system in which voice is never fully anchored in a singular body but circulates across technological, institutional, and perceptual networks.
The central case study, chasing waterfalls (Semperoper Dresden, 2022), stages AI as a performing subject capable of generating text and vocal material in real time. This transforms liveness from a condition of embodied presence into one of procedural contingency, dispersing aura across a networked ecology of performers, systems, and audiences. A comparative analysis with platform-native vocal systems such as Hatsune Miku highlights divergent regimes of posthuman vocality, contrasting operatic risk and instability with infrastructural iteration and reproducibility.
The article concludes that AI opera redefines sustainability not as the preservation of stable works, but as the maintenance of executable systems and perceptual ecologies. In this context, opera becomes a laboratory for posthuman performance, where voice, agency, and presence are continuously reconfigured within evolving technological environments.
Within the Euro-American art-music tradition, creativity has largely been framed through humanist paradigms privileging individual authorship, intentionality, and stylistic innovation. The growing use of generative AI (GenAI) and machine learning (ML) in composition and performance challenges these assumptions and calls for a rethinking of creative agency.
This paper asks: how does the integration of GenAI into contemporary musical practice reshape musicological concepts of authorship, performance, and agency?
The study develops a posthumanist framework drawing on Donna Haraway's situated hybrid subjectivities, Rosi Braidotti's posthuman subject, and Deleuze and Guattari's notion of assemblage. Dominic Pettman's reflections on vocal relationality further inform the analysis, foregrounding the voice as a site where species, technology, and affective proximity intersect.
Methodologically, the paper combines philosophical inquiry with musicological analysis, focusing on vocal technique, performer-interface interaction, improvisational structures, and the role of artificial neural networks (ANNs) in shaping musical form and timbre. Particular attention is given to how agency is distributed across composers, programmers, performers, ANNs, and technological infrastructure in live contexts.
The theoretical framework is applied to the analysis of two case studies: Tomomibot, by Tomomi Adachi, Andreas Dzialocha and Marcello Lussana, and ULTRACHUNK, by Jennifer Walshe and Memo Akten. In these vocal improvisations between humans and ANNs, voice and body are diffracted through a technologically mediated space and connect rhizomatically with each other, reconstituting themselves in a socio-technical assemblage of co-creation comprising humans, technology, and the shared environment.
The analysis focuses on the co-creative interaction between humans and computers: in these improvisations, real-time vocalizations intertwine with sound outputs generated by ANNs, in a distributed co-construction without primary and secondary roles, but rather interactive nodes within an interconnected network. This sort of hybridization between human and non-human actors interrogates whether "artificial creativity" can be understood not as simulation, but as a materially embedded process of distributed agency, with improvisational structures that take shape precisely from the human-AI interaction. While on one hand we find highly experienced performers of experimental extended vocality, on the other we find artificial voices produced by the computer via ML algorithms (using unsupervised training, GAN and variational autoencoders in ULTRACHUNK, and Long Short-Term Memory in Tomomibot) trained with pre-existing musical material, capable of operating as actants thanks to their non-human agency.
A specific role is given to the voice, which acts as a privileged bridge between biology and technology, which in turn (through 'posthuman listening') can be reimagined not as opposing and mutually exclusive poles, but as elements situated within a continuum.
The contribution of this study is to propose a posthuman redefinition of musical creativity that integrates philosophical theory with close analysis of contemporary experimental vocal practice, offering new conceptual tools for understanding AI-mediated composition and performance within musicology.
The argument put forward in this paper is that these performances represent an attempt to inhabit the extimacy constitutive of both voice and subjectivity: an inner-outer space within us that is constantly shared and traversed by others, with whom we interact to rethink and shape new ways of being together.
This paper introduces Material Synthesis Composition (MSC), a methodology for sonic co-creativity in which relational material substrates serve as primary compositional feed alongside human performers and artificial intelligence and machine learning (AI/ML) systems. Material substrates include organic matter and inorganic data derived from situated cultural accumulations. MSC emerged from Speculocultural Technopoiesis (ST), a framework developed through the author's doctoral research that examines how Black speculative traditions and sonic epistemologies can guide the modification and design of digital audio technologies.
MSC belongs to a longer lineage of Black creative-technological practice. Sun Ra's Arkestra, whose self-mythology fused Afrofuturist cosmology with experimental electronics, modeled how Black artists can occupy and redefine the space of technology on their own epistemic terms. Alvin Lucier's I Am Sitting in a Room demonstrated that material environments are themselves compositional agents. Black performance artists later extended this logic on cultural grounds. Okwui Okpokwasili's on the way, undone, a processional work responding to Simone Leigh's Brick House, stages the Black body moving through public space as both archive and instrument, encoding embodied cultural memory in the act of transit. More recently, Rashaad Newsome's practice has made these stakes legible at the level of AI. From Shade Compositions, which treats Black vernacular gesture as compositional system, to Being, an AI griot trained on texts by bell hooks, Audre Lorde, and Cornel West, Newsome shows that who trains AI, and on what, is already an aesthetic and ethical question. MSC takes up that question as a compositional one.
The case study is Organic Memory (Triptych), a spatial composition currently in development. Three interactive installations generate sound through shared material transformation: substrate vibrations via piezoelectric sensors, sonic residue from dissolution in water via hydrophones, and gestural properties of hair via computer vision. Each movement uses distinct AI/ML tools, including Somax2, for classification, corpus querying, and co-improvisation. These systems are trained on culturally specific corpora, including NASA data sonifications and African-American spirituals. A composer-defined motif threads through all three movements, transformed by interaction and AI elaboration. We draw on findings from early prototype testing and lay out the conceptual scaffolding guiding the triptych's completion.
We advance three contributions to co-creativity discourse. First, we show how material affordances, read through their cultural epistemic situatedness, generate compositional structure when abstracted via sensor data. We call this "material synthesis." The concept shares ground with spectral music's acoustic materialism, but where spectralism tends toward acoustic universalism, material synthesis insists on cultural specificity. Second, we show how AI systems trained on culturally grounded corpora mediate between heterogeneous material languages, translating from earth rhythm to water texture to gestural melody. Third, we examine how Julius Eastman's concept of Organic Music is realized through relational materiality and intra-actions that become computational and compositional input for distributed human-material-machine authorship.
Compositional intelligence emerges through negotiations among culture, material, and algorithm in acts of listening and transduction. Meaningful co-creativity can only exist because such entanglements do.
Session 2 — AI Systems and Co-Creative Practices
Field recordings and passive acoustic monitoring (PAM) generate large archives that provide acoustic windows into ecosystem interactions. Alongside their scientific and aesthetic value, these recordings provide a creative corpus from which to draw sound material that is intimately tied to specific locations. A key challenge is how such large recording sets can be used musically, particularly in a live improvisation setting. Here I explore two approaches to developing systems for co-creativity using algorithmic composition and AI to traverse large recording archives. The first approach is the use of algorithmic systems for recording analysis and playback that respond to real-time inputs such as performer audio or listener presence.
The second is the use of autoencoders, either in the preparation or performance stage, to facilitate real-time interaction with the recording corpus. These approaches build on the innovations in soundscape ecology (such as the use of acoustic indices and PAM) and leverage tools for real-time music creation such as RAVE to open new possibilities for both music composition and deeper listening to soundscape recordings.
Two projects will illustrate these approaches: Resonance Ecology takes an algorithmic approach to facilitating performer interaction with large recording sets. The system mirrors the design of an ecosystem with audible actions triggering reactions in the system (e.g., the sound of a performer influences the system to play different recordings or to process a sound differently). The algorithmic system analyzes frequency, amplitude, and a variety of timbral descriptors to track sonic change over time and to make probabilistic assessments about the current state of the sonic ecosystem; these data points then inform the choices of the algorithm as it navigates the recording archive. The performer interacts with the system through their musical performance; the score for the piece is open, giving the performer agency to respond to the sounds they hear and influence the algorithmic system. Sonifying the Arctic links large PAM datasets with weather data through autoencoders to extend the system beyond the original recording period. For this specific realization, eight months of PAM recordings from Iceland's national parks were used to create a sonification system that extends across more than two years. Autoencoders are also used during performance in Sonifying the Arctic through a RAVE model trained on PAM recordings and traversed by performer and data-driven instrument input such as a motion controller.
Together, these approaches demonstrate how AI-mediated systems can transform large environmental recording archives into interactive frameworks for improvisation and co-creative musical performance. These systems demonstrate the capability for sonic systems to aid in the processing, analysis, and understanding of large recording corpora.
This paper introduces Glitch Voice, a real-time neural effect unit and aesthetic inquiry designed to deconstruct semantic speech into a non-meaning-making "glitched" vernacular. While current research in neural audio synthesis predominantly prioritizes high-fidelity replication and semantic clarity, these frameworks often erase the paralinguistic meaning and semiotic space inherent in human vocalization, such as the involuntary physiological tremors, the breathy sound, and the raw, unpolished textures of the vocal apparatus that resist linguistic organization. By utilizing IRCAM's RAVE architecture within Max/MSP, this aesthetic research seeks to develop a real-time effect unit that transforms semantic voice to a visceral and embodied sonic output.
The system is trained on two carefully curated datasets of non-semantic vocal "outliers": "Flow" (the sustained drones) and "Burst" (emotional bursts). To facilitate intuitive control over the sonic output, a custom pressure-sensitive interface was developed. By modulating physical grip, the performer can easily morph between these two states, and effectively translate their semantic meaning into a visceral, glitched vernacular.
This work proposes a framework for "Neural Transcoding," where the machine functions as a neural mirror that re-interprets the performer's vocal energy through latent space. By centering the aesthetic output on the outliers of vocal expression—the gasp, the friction, the stutter, the scream, etc.—the system intends to process voice based on the subconscious layer of language. This research contributes to the field of performance studies by providing a low-latency, performative tool that bridges the gap between generative audio models and live embodied expression. Building upon a lineage of radical vocal exploration (e.g., Yoko Ono, Trevor Wishart, Pamela Z), this project also serves as an aesthetic exploration of the embodiment of machine learning tools.
For metal musician BOI WHAT, the world ends not with a bang or a whimper, but with the soundscape of SpongeBob SquarePants. The most famous of BOI WHAT's songs is "Neon Tide," which is chiefly "sung" by the SpongeBob character Plankton, and is about masterminding an apocalypse. BOI WHAT is one of many online creators using AI generated voices—Plankton in this case—in musical contexts. While most such creations are "AI covers," simulating a well-known song being sung by an equally well-known pop-culture character, "Neon Tide" is an example of how emerging AI technology can also be used to create original compositions that make use of these familiar media materials. BOI WHAT leverages the uncanniness of AI-assisted voice modulation and familiarity of the aesthetics of SpongeBob within his audience to craft a surreal apocalyptic narrative.
Art critics often see AI as an inherent threat to creative expression and personal narratives within art. In this talk, however, I will argue that the AI elements of "Neon Tide" demonstrate one way that use of AI technology can expand the narrative and expressive complexity of textual music. I demonstrate this claim by placing audio-based AI technology within the context of "remediation," as proposed by Jay David Bolter and Richard Grusin, and use this context to illustrate AI advancements as part of a growing trend of hypermediacy within both popular music and digital interaction with art as a whole. As artist and audience become closer than ever before online, it becomes all the more relevant to address the ways in which digital musicians place the audience's knowledge of culture, context, and even the artistic process at the forefront, and use the growing accessibility of technology to achieve this.
Additionally, through analysis of the vocal performance, stylistic choices, and lyrics, I assert that this technology and its online context is its own unique force in connecting the elements of the song together and helping to create a cohesive narrative that would not be possible without this technology. I prove this by laying out a conceptual integration network, as theorized by Nicholas Cook, to clarify the layers of meaning within the song. I then analyze the song within this framework by highlighting the use of breath in the vocal performance, production techniques within the instrumentation, and lyrical in-jokes to validate that the use of AI voice changers and generators is integral to the understanding of the song's overall meaning. I also draw upon the statements made by BOI WHAT himself about the song's process to show how, in his own words, the use of AI informs his own vocal delivery and production choices in his music. Though it is only one layer of his process, the use of AI technology both limits and expands the way he creates his sound.
AI in music is here to stay. Analyzing its real-world practice is essential to predicting its uses, both problematic and productive, as it develops further.
Session 3 — Cultural, Political, and Economic Implications of AI Music
Drawing on music studies, media analysis, digital ethnography, and political theory, I analyze how users circulate AI-generated songs as symbolic content. "We Are Charlie Kirk," created by the anonymous act Spalexma, encodes Christian and nationalist values that function like coordinated messaging even without top-down coordination. Its genre choices — contemporary Christian worship and country anthems — carry racialized and class-coded meanings, historically functioning as sonic markers of white, rural, and working-class identity. AI tools, trained on datasets that reflect existing cultural stereotypes, replicate these associations, constructing an imagined audience as white, Christian, and economically aggrieved. Rather than neutralizing cultural bias, AI-generated music magnifies it, producing identity-coded content.
Ironic remixes of such songs can reinforce the narratives they appear to subvert: oppositional recuts retain the original melodic hooks while underscoring the shaping of cultural identity and group thinking. Timbre and arrangement carry ideological weight, a dynamic visible in historical parallels such as the Nazi promotion of martyrdom through song and country music's role in post-draft military recruitment — cases where musical familiarity lowered resistance to political messaging.
AI music generation now produces a new form of participatory mythmaking without direct state control, paralleling how 20th-century fascist governments fused messaging with expanding radio infrastructure. AI composition tools automate the replication of genre markers that musicologists identify as community-bonding devices. As tools such as Suno, AIVA, and Udio become normalized in classrooms and culture, debates over "responsible" inclusion obscure how AI is fundamentally reshaping identity construction and our sense of reality.
This dynamic is underscored by the circulation of Iran-aligned AI-generated LEGO rap videos, in which youthful aesthetics and rap genre conventions are deployed to deliver ideologically charged messaging to American audiences. The accessibility and low cost of AI-generated music further incentivize state-adjacent and independent actors to produce compelling content, lowering the barrier to sophisticated influence operations and making the cultural landscape increasingly difficult to navigate.
In current music production pipelines, composers are often requested to perform both style replication and quick production of convincing performance sequences ready for publishing. As those tasks involve largely mechanical and technical procedures, automated-music algorithms emerge as a potentially efficient solution. In film and video for example, when montage specificity achieved by temp clips or stock music leads directors to request equivalents, construction and style possibilities tend to be limited to the references' musical features. This re-elaboration of musical structures using genre constraints has occurred often in commercial and popular fields when musicians experiment through the inspiration of a particular piece or author. Analogously, emerging algorithms are able to perform style replication using references or caption. Private research funding into AI music is surging to fill a potential niche of AI music in the music production business.
As in other fields, human labor in music faces the impending possibility of replacement by automation. Business models of commissioned music may exploit vague definitions found on current authoring copyright laws in the field of synthetic music. Based on the legal framework, this paper explores possible scenarios for adaptation, assimilation, and revision of the music authorship concept in light of AI music. Starting by describing several perspectives of synthetic music reception that have had commercial viability, I examine current contractual frameworks for composers to find overlapping parameters. Then, I illustrate how style replication among human music delves in the blurry zone between copyright infringement and fair use through legal cases. Tying these perspectives, I gather judiciary and legal readings on copyright for AI materials and explore current and potential plagiarism scenarios to inquire our understandings of authorship. Finally, I formulate mechanisms and predictions of how AI technologies may be incorporated into business and authorship legal frameworks.
Committees & Support
Scientific / Program Committee
Organizing Committee
Institutional Support