With ecologist and engineering colleagues I am working on the development of acoustic indices as a tool for rapid biodiversity assessment, and the development of low power hardware meshworks to create acoustic biodiversity monitoring systems. This work is driven by an urgent, global need for efficient and effective ways to monitor the planet’s critically endangered species, and to understand and evidence the impact of extractive industries on the fast disappearing pristine rainforests. In the race to develop new acoustic indices, fundamental oversights are being made regarding basic properties of digital audio recording which seriously compromise the effectiveness of this potentially transformative tool. As well as the need for more technical research, ethical considerations around the use of pervasive acoustic monitoring in the wild are yet to make it onto the agenda.
Humanising Algorithmic Listening might mean exploiting (rather than overlooking) the fundamental differences between machine and human listening in order to design meaningful high level algorithms in new application domains.
Humanising Algorithmic Listening might mean considering the rights of humans even where the wellbeing of other species is the primary concern.
A similar enthusiasm for computational methods to afford ‘distant or close listening’ is shared by researchers in the Digital Humanities. The fundamentally aural nature of poetry and oral history have traditionally been overlooked in humanities research, where textual sources are most common. Interviews are transcribed; the semantics of the written word is studied but the expression, intonation and rhythm voice is seldom preserved. Just as musically-meaningful features have been built from low level audio features for application in Music Information Retrieval, there is a huge potential for humanities researchers to engage with large oral and sonic archives in new ways with algorithmic methods. Responses to an exploratory workshop in the application of machine listening in Oral History that we ran at DH2016 and for arts and humanities doctoral students suggest this is a rich future research direction.
Enthusiasm over new applications of machine listening in research is balanced by social anxiety over their increasing incorporation into everyday consumer devices. Amazon’s Echo, Google’s Home, Apple’s, Siri, Microsoft’s Cortana, and Matell’s Hello Barbie are all part of an emerging range of voice-activated products that record audio and conversations from phones, wearables, and in-home agents. Recent controversy over access to the recording archive of an Amazon ECHO in a US murder trial raises questions around privacy and the Internet of Things in general, but these ‘always-on’ listening devices are seen to be particularly problematic: They are pervasive, appearing in all aspects of our lives, and able to listen in all directions; they are persistent, with no current legislation over how long records are stored for; and they process the data they collect, seeking to understand what people are saying and acting on what they are able to understand.
This insidious surveillance contributes to widespread social anxieties around automation as it impacts both labour and recreational activities. This automation anxiety is a primary research topic for colleagues in our lab, where doctoral students are also questioning the ethical implications of these always-on listening devices in every-day consumer products through practice based research.
Mattell’s controversial Hello Barbie features speech recognition and progressive learning features to create “the first fashion doll that can have two-way conversation with girls”. The girls’ conversations are stored on a remote server. It is not clear if it will also talk with boys.
Part of this social anxiety stems from the unknown. There is no equivalent of an ingredients list for these algorithmically-enhanced consumer devices; commercial companies are not required to disclose their algorithms. From a public perspective, these are black boxes, the details of their operation is a Great Unknown. In some cases, the workings of bleeding edge machine listening and learning algorithms are equally evasive to the software developers that created them. For example, in the case of multi-layered neural networks used in Deep Learning, the maths is well understood, but in many cases we don’t know why a particular model is successful and we can’t predict it’s response to a particular input without actually trying it.
The extraordinary minds at Deep Mind are no doubt developing analytical understandings and make a concerted effort to communicate the science of deep learning. In a poetic turn, others are attempting to deepen our understanding of how deep learning for machine listening operates by listening to the learned features in a convolution neural network (check out Susurrant too). Through sonification, rather than formal analysis, we can gain an intuitive understanding of how each layer functions, in sonic terms. In a later blog post, network member David Kant will introduce his Happy Valley Band project which unpacks source separation and other machine listening algorithms by deconstructing the Great American Song Book and performing human recompositions of machine decompositions live.
A belief in the value of practice-based, creative methods in enriching our understanding of technological mediation is common to many network members. And machine listening has been extensively explored and developed in interactive music systems for decades. By imbuing computers with even simple pitch and amplitude tracking abilities we can build software instruments which we can interact with in fundamentally different ways from traditional acoustic, electronic, or even manually controlled digital instruments. This was my first introduction to machine listening. As a cello-playing PhD student I was interested in how to build software performance systems which could convincingly improvise with a human instrumentalist, the litmus test being the ability to make convincing musical responses and more ambitiously, suggestions. When performing with, rather than formally analysing software systems we gain a different, experiential, understanding of machine listening, and its role in melding human-machine agencies. This will the main focus for workshop 2 and the topic of a future blog post by Paul Stapleton.
The range of concerns and ambitions present today, resonate with positions held throughout the history of Philosophy of Technology: utopian views of technology post-enlightenment; twentieth century dystopian warnings on the need to limit the rush of technological invention (Habermas, Hans Jonas etc.) to contemporary views that we are natural born cyborgs and inevitably intertwined with technologies of all kinds (Clarke, Latour, Haraway etc.).
The aim of this network is not to reconcile these different perspectives, but to curate conversations across contemporary research communities in order that technical advances, practical ambition and creative investigation mutually inform and are nourished by critical, philosophical perspectives into how technology mediates our existence. The integration of technical, philosophical, creative and practical perspectives may be messy at first, but is necessary in order to design and manage the applications of effective, ethical and meaningful listening algorithms for the future.
Follow us on twitter to get alerts for future blog posts.
Alice Eldridge INTRODUCTIONS