The Intellistener Project, an overview


The intellistener project revolves around annotated audio. It deals with creating annotations over audio tracks, maintaining a collection of rich audio files, creating a mesh of links between those files and with playing the audio back constrained by this mesh.

The start of the project came about from a listening behavior I discovered with myself. While most radio is often only passively consumed, almost as a background pattern, I found a special kind of radio, acquiring a much more intense, active listening. This type of radio invokes a higher level of interest by allowing interviews to last for over than 20 minutes, so they become conversations. It plays out songs completely till the end; it leaves pauses in to let the listener think and contemplate. Therefore it also attracts a special kind of listeners, at least much less mainstream. My favorite radio program, de avonden ("the evenings", dutch public radio by the VPRO), deals with cultural subjects and features interviews with writers, artists and philosophers. But I believe this kind of slow-radio can also be found elsewhere in other subject-specific fields (e.g. science, social politics or theatre).

While listening intensely, it struck me that many topics addressed in these radio shows, have close relations to each other. Different arguments are raised in different conversations with different people throughout a period of years. Often I felt a need to quickly jump back to a previous recording, to review (or relisten) the exact statement. And I felt the need to share this with other people: "Hey, listen to this. It might be related to our previous discussion". These observations and wishes seeded the plan for a system of annotated audio.

Annotated audio

Markers in an audio recording, or side notes "in the margin" should be bound to a specific time in the audio, so that the actual spoken material can be accurately accessed. I see these markers or notes as sitting in their own specific layer on top of the audio timeline. So you'll have a layer with textual side notes, a layer with keywords indexing the subjects spoken about, or a layer of markers referencing other audio files. And these synchronized layers of metadata, relevant to the bare audio file, should be bundled with that audio file in an atomical annotated audio wrapper. This allows these now annotated audio files to be shared and edited independently. This structure of markers in layers and layers on top of files can be seen as similar to what the written medium (e.g. books, journals, newspapers) has had for a long time: title sheets, pages of contents, headings, footnotes, page numbers, a margin to scribble things in, indexes, and references.


With the use of computers, these kinds of structures could even grow into much more complex systems. You could call that hyperaudio, analogue to hypertext, the system designed by Ted Nelson1. The greater goals of this project must then also be seen along these lines: to create a system so well structured and with so many layers of abstraction, context, semantics and reference that it functions as a speaking collective knowledge repository, which ­as Doug Engelbart1 would say it­ augments human intellect. I envision that "dreamed" system as an artificial intelligent conversation partner. Together with the machine the user should be able to flow seamlessly from topic to topic, from thread to thread. Then the ultimate goal would be a system, which gets, in symbiosis with the user, into a state of singularity, of convergence and leaves the user after some insightful paradigm shifts, with a feeling of euphoria about how well everything fits together: "Eureka, the world is finally understood!".


Hence the name "intellistener". As it is derivative from intelligent and listener. But also because the Latin intelligent actually stems from inter (in between) and lego, legare (to collect, to gather, to read, to read out loud or speak justice and to overhear). So intel-ligent refers to the ability to read in between the lines, to get the broader context from a collection. Therefore intellistener does not only stand for a system that affords "smarter" listening, but also for one that weaves multiple threads of audio together, creating a meaning significance from their common essence.

Beta version

Currently I have an application, running natively on Mac OS X. The application has two modes of operation: an editing mode and a playback mode. In the editing mode, markers are set and notes are written. In the playback mode a graph of connected audio fragments is rendered and the user can traverse the different branches coming off from the current audio playing. By switching between editing and playback the user can hear (and see) how his or her editing decisions turn out in the graph and judge the resulting narrative effect.


To test out the beta version of Intellistener, there is a series of workshops organized, of which one at the final show. The workshops are an opportunity for the beta testers to discover the application, but also to discuss the process of annotating audio and to experiment and play with a shared collection of connected audio files. Which experiment exactly the workshop participants will engage in, or what feature, effect or narrative they will try out, is yet to be seen, or heard, actually.


Workshop participants are free to bring and share their own audio files, but there is also content provided for them. This is material from "de avonden". All the episodes of the last 4 years of this VPRO program, which airs every weekday between 20:00 and 23:00 on radio 747, are available online from the VPRO website. A selection of that huge archive is available for the participants to annotate.


The current Intellistener application features two distinct kinds of annotations. It has a text notes annotation and a links annotation. The text notes are just that: descriptive texts on the timeline. Those text notes can perform a variety of functions: they can explain the context, index arguments and propositions or just transcribe quotes. The links annotation is a set of markers on the timeline connected with a set of links between those markers. Both the markers and the links in between can have titles. Titles of markers would indicate exactly where it is coming from or going to (e.g. "end of answer", "begin of comment", "quote starts", "quote stops"). The titles of the actual links themselves can be used to semantically categorize the type of linkage (e.g. "fits in bigger theme:", "an example of this:", "concluding:"). The graph of the playback represents the separate threads that can be followed, based on this linkage. These two annotations (i.e. TextNotes and Links) are just two very basic kinds of annotations, but they allow for easy experimentation and valuable feedback on this beta version.


Clearly, this project has a much wider breadth than the scope of a MA final project. And the design of the project and of the application does take that future into account. That is why I try to follow the principles of participatory design and include test users in an early stage of development. The application is built for growth through the design of the plugin architecture. The functional components of each kind of metadata (e.g. the TextNotes annotation or the Links annotation) are packaged in a plugin. These plugins, which pervade throughout the complete application (i.e they extend the server, the editing, and the playback), can be removed, updated, factorized or extended. Third party developers can contribute by making their own plugins, based on a publicly available Software Developers Kit (SDK).

As the system is being adopted by developers and users (both pure listeners and authors), Intellistener will get more feature rich. Already there is a long list of interesting feature requests, from simply special kinds of annotations to drastic user interface transformations or incorporation of other media and extending to other platforms. I'll conclude with an overview of just a few of these future features: a TypedLink annotation that works with a class hierarchy of semantic relations; a Context annotation which holds a nested structure of contexts, each with an attached audio clip as indicator; Complete audio browsing: all interface elements provide audio feedback and AudioNotes to record spoken comments, during playback, similar to multitrack recording; a DNA-like mutation and recombination technique that uses an AudioAnalysis annotation to test out different compositions with the listener; web streaming of these annotated audio files; or the incorporation into the iPod with its dial interface.
Quite some paths to traverse and threads to follow, I should say.

1. Theodor H. Nelson / Douglas C. Engelbart.
The great works of Nelson and Engelbart are far too often misunderstood or implemented with unprecedented over-simplification.
Ted Nelson invented the term hypertext and designed an elaborate system to collectively work with linked documents. His hypertext system which developed under the name Xanadu, is a far more sophisticated and smarter system than the hypertext we know now from html, http and the www.
Doug Engelbart developed his oNLine System (NLS), which also already allowed for collaboration on structured documents and which has many levels and constructions for working fast, marking up sections and creating a knowledge system with the computer. However, he is most credited for inventing the mouse. And after several years of research, parts of his NLS were used in other computer systems. But the original goals of "augmenting human intellect" (Engelbart 1962) were abandoned.