ICMC panel discussion¶
Date: Tuesday, August 26th, 15:15 - 17:15 at Queen's University Belfast, Peter Froggatt Lecture Room G07
As outlined in the panel description, the general idea is to set on the development of a file format to describe, store and share spatial audio scenes across 2D/3D audio applications and concert venues. It was not intended to agree on a format at this point in time, rather than to increase awareness about the lack of a standardization and to create the momentum to start the development process as a collaborative effort amongst sonic artists, researchers and commercial developers.
The panelists in the order of appearance were:
- Stephen Travis Pope: CREATE, University of Santa Barbara, US (pre-arranged video)
- Gary Kendall: SARC, Queens University Belfast, UK
- Eric Lyon: SARC, Queens University Belfast, UK
- Trond Lossius: BEK - Bergen Center for Electronic Arts, Norway
- Gabriel Gatzsche: Fraunhofer IDMT, Ilmenau, Germany
- Nils Peters: McGill University / CIRMMT, Montreal, Canada
- Matthias Geier: Deutsche Telekom Laboratories / Quality and Usability Lab, TU-Berlin, Germany
Several questions and maybe a few answers came up in the panel discussion. Here are some notes Matthias Geier and Nils Peters took:
Lowest Common Denominator:¶
There was an agreement that we need to find a lowest common denominator in spatial audio scenes. Another name for that was "Auditory Spatial Gist" (Gary Kendall).
Eric Lyon: a musical score can consist of only melody and rhythm and still be interpreted as music. However, many details like phrasing and articulation can be added to the score to define the musical performance/interpretation more completely. For an interchange format this could mean that the "core" of a spatial composition is transported in a scene format and some interpretation details are hand-tuned on the concert venue.
Exact replication vs. free interpretation?¶
Should the goal of an exchange format be the exact replication of an event or is it sufficient to specify the basic situation and leave the rest for a manual adaptation?
Additional descriptors which do not directly correlate with audio renderer commands could be used as annotations or guide for the adaptation.
Source Size/Source Width¶
An audience member remarked that for a scene based approach there is the need to specify the source size/ source width. The panelists agreed to that point. However, It is still subject to current research how to realize a plausible representation of this feature in several sound reproduction methods/algorithms.
Scene based approach vs. diffusion practice¶
Eric Lyon and several people from the audience were questioning the artistic relevance and expressiveness of a scene based approach where audio source signals are processed according to a defined position in space by a rendering engine which automatically outputs the necessary loudspeaker signals.
The loudspeaker based diffusion practice however allows for a unique artistic output but has the disadvantage that it cannot be easily transferred to a different loudspeaker setup.
Members from the audience mentioned that they need a solution which works inside their DAWs (digital audio workstations) where they arrange their dry audio material. In general people do not feel much supported by commercial developers in terms of spatial audio beyond the home cinema formats 5.1 and 7.1. .
A member from the audience reported his problems with exchanging Higher Order Ambisonic audio files with another research center.
Members of the audience requested the possibility to have an easy option for a stereo mix-down.
The ability to share information with visual rendering software was requested to allow multimedia work.
Space Dinner- August 28th 2008¶
- GK: Gary Kendal (SARC)
- EL: Eric Lyon (SARC)
- HT: Hans Tuschku (Harvard)
- VP: Viktor Pushkar (Kiew?)
- JS: Jan Schacher (ICST)
- NP: Nils Peters (McGill)
- MG: Matthias Geier (Deutsche Telekom Laboratories/TU Berlin)
- GG: Gabriel Gatzsche (Fraunhofer IDMT)
- BR: Butch Rovan (Brown University)
- TL: Trond Lossius (BEK)
GK: surprisingly a lot of Nuts-and-Bolt questions during the panel discussion
EL: the practice has been developed across so many spaces, attitudes... I don't think there is anything to standardize, there is a conversation that has to happen... about what individual people doing
HT: ... but then we go back to the "practicality question". When you ask a composer how a future standard would be, the first response would be: how would this influence my personal working.... and then you get this practical questions...of course that's important but then...? The dilemma of this discussion (as Gary said) is that everybody in the room has another interest of having a standard. From the point of a eveloper...that's your business -- you might need a standard. For myself, from the standpoint of a composer, I know that anyhow nobody has ever listen to what I need...I have always to adapt and to tweak the things which are out there... so why you want to have the feedback of a composer?
GK: I think as an inspiration to this, Nils mentioned that, was the SDIF standard for handling spectral data. A standard that allowed people to change files. So that seems something that work, so...
HT: ... I don't agree, I am sorry. I was trying to work with SDIF between OM and Max, etc.... there are so few things you can really exchange.
EL: It's totally frustrating.
HT: yes, it's a nightmare.
HT: Unfortunately. I mean if you want just to store the fundamental frequency...OK, that's fine. but then you talk about a mono-dimensional Breakpoint-function. I don't need a standard for that. As soon as you go into Chord analysis and partial tracking... it's not compatible....and there are only a few programs which implemented it. I mean the only standard which made it was OSC.
EL: OSC is not a standard... it's a protocol and you can define anything on top of it.
NP: there is no content involved... rather than midi, which is very strictly defined... and is still in use since more than 25 year.
HT, EL: which is also a problem.
JS: My impression, I was in the audience, were the different point of origin of the interests. Gabriel (Gaztsche) summed it up, is the professional audio producers , and then there is the artist, and then there is the scientist. So you end up with all the different feature sets which you ultimately really have to tackle - and then it is hard to agree to the point from where you start. So I think it needs to have discussion on both levels and they should inform each other. Also on the low level, the nuts-and-bolts questions, people start to work. There is actually groundwork being laid that also influence decisions on the high level. The SDIF debacle is also a result of political competitions across institutions, so, we should avoid that. Trying to agree on a smaller set of things and keep the bigger goals more open to discussion and not try to nail it.
EL: what do you see are the bigger goals?
JS: well, there are these issues, as EL said, about the methodology about working with sound in space. How do you compose with space, you asking questions in that direction and how you formalize it. A format should reflect a little bit of the attitude of how you work/compose with space. It shouldn't impose, but should show that you actually have thought about certain fundamental aspects.
BR: I mean, this could be a real-time approach, or this could be a non-real time approach. These are two different things. One is just simply a way to transfer your information to get ready for a specific system. The other thing is actually something to use during a performance..
JS: you mean stream-based, or file-based?
HT: One question, I was trying to formalize during the discussion. For myself, I can't think of such a format without thinking about how my source sounds are encoded. And there is a real difference, many treatments just generate space. If you have a delayline, where the direct sound comes out of my left channel, and my delayed sound comes out of the right channel, I have a movement which is encoded within these two channels. So I can't separate the space out of this file. If I make a rotation between 8 channels and I have one 8-channel file the movement is encoded into that file. So how can I extract that information from the sound file and deliver it as a separate data stream. I can't.
MG: but that's not the idea. The idea is that you have the original composition already in a high-level representation -- not mixed down to a fixed number of channels.
BR: aha, that's the idea, OK!
MG: for me it is.
HT: OK, that's an approach, but then. How can I, before you define this format, come up with this compositional tools which today allow me to make that.
MG: Yes, that's the main problem, there are no such tools yet. It's about designing them.
BR: For example, you take your ProTools session and export it into an OMF-file, and you can have some basic ideas - Panning data - I mean if that would somehow be useful for the other goals.
HT: I just composed a piece where I was using at the same time an 8-channel array and WFS. So it's a 24 channel piece where I decided to have 8-channels are just voices - no movement data- and I send this Open Sound Control data to the WFS system and the system is position them. I composed totally different because I know that they are just voices, and in the studio I need a simulation tool where I can in my ProTools session, tell ProTools, OK this is not just a mono voice I assigned to a speaker, but ultimately it will move around. There is no way of doing this.
TL: yes, that was one of my personal conclusions after the panel discussion, that once you start to discussing this, you immediately see the need for tools to work with spatialization is more or less lacking. And the other thing is also that you could imagine to create compositions that have spatial informations and there is no sort of standards or formats for providing the soundfiles to be spatialized together with the metadata that is telling how to spatialize it. So, I mean, once we start to discus SpatDIF we immediately see the need for a distribution of the whole packet that is the composition and we see the need to developing the tools for working with it during the compositional process.
HT: Absolutely! I guess what I would need is a plugin, which is an ambisonic simulator, where I can tell how many speakers I use today....
JS: I made such PlugIn last year, with Pluggo for ProTools or Logic
HT: So, what's the control data?
JS: it's the automation of the host program.
HT: So you can't export the automation data?
JS: Well, you could just send OSC out, or you could export the automation in a kind .... I think technically it is possible, but the workflow is very cumbersome. A colleague of mine was doing a 70 minutes radio piece in 3rd order Ambisonics piece in Logic Pro. The biggest problem was that the final result was a 16 GB of data. I think technically it is possible, but we have to figure out how to work with these things. We are using the host automation for movements etc. We are using the GUI to visualize and move the sources and the encode that into the automation - read and write into the automation. But the thing is that you totally dependent on your host software. I was implementing the PlugIn with Pluggo, so you depend on Cycling'74 and I was using our own Ambisonics things, so there is a triple dependency. So it has been worked on that direction but it is not trivial and I think we also fight not only against technical limitations but also against mind set of the large software companies. I mean ProTools is nice, but it is one mind set. So on the practical level there has to be work done to find practical solutions to work with it, but on the theoretical, more abstract level, there also has to be work done to find some, let's say more universal way of describe what you are doing. So it's not just geometrical data or space data, it is also how files are organized, time is a very important factor, it is actually essential because time can be non-linear - can be compressed etc. So what are the fundamental issues of the format. What does the format do? What audience does it serve? What technology does it target at the beginning? and what do we leave out because we don't have any control over it, like changing commercial products.
GK: An audio render, such as Ambisonics, is one tool of the toolbox. It is important to almost all the work what people do that they have a set of tools because they do things that are violating conventions, using things in an unusual way. So how you can have an overwriting (interpreting) system on top of that? For the people who are interested in e.g. formalized scene representation... we are thinking about interactive systems, maybe this is one of the components of the toolbox, that can be combined with other things. Somehow like filmsound works: there are things which are realistic and there are things which are totally not - being combined at the same time. it's functional because people know how to use it well. So it seems like one thing to do is taking the notion of some overwriting system, set it aside as a part of a toolbox.
HT: Let's say I have a sound, and attached to that sound there is the time data describing some movement, if I for some reasons, I try to cut out 5 seconds of that sound, and make a crossfade between them, I should also be able to cut out the associated spatial information and not redo everything because...
JS: Yes, that's a matter of how the tool is structured. whether the tool just piggybag some information on top of your stream or whether it's actually relate to the edits.
BR: Nils, with all the SDIF problems, maybe the concepts were interesting, where you have tracks for many different kinds of data, so maybe this could be a container for all sorts of things. Where you stack pre-rendered audiofiles, very simple speaker configurations -- this is one part of it -- and then could be expandable in various ways. Hopefully in a way that is more robust and dependable than SDIF.
JS: One questions I asked myself is the separation between content and metadata. We implicitly always talk about metadata and sound files, but a lot of the fileformats actually embed metadata. So how to do that is technically solved. You have the chunks and then you put the data in ... but on a more practical level how to deal with embedded information vs. separated information. Like your metadata is in a separate file and a different file and you transport them in different ways, is it human readable text, etc. These are also fundamental issues with practical usage. I am not proposing either one or the other solution, there might be other solutions, but on a very practical level we are starting to think about tools we know and then trying to imagine tools we don't know.
An experience I had with an Ambisonics piece: It was bounced to B-format, but then you actually have to send alongside a description which tells: the first channel is actually the omni channel and so on. It is not explicitly visible, inherently integrated.
This goes in the direction where you have soundfiles which embed certain informations as a part of the bitstream, and you can't really extract it. And then you have this more "symbolic" information, and the question is where they are located. We've been discussion similar things just for an Ambisonics format - exchanging B-format stream, and there is some kind of standard coming up. I send my B-format file to Graz, and the question is: 1. will it work and 2. how does it sound.
EL (18:45): it sounds like you need multiple representation or abstractions which sit on top of the source material. One concept I had, which is not formalized at all, is that you have source material that went somewhat back of what eventually becomes what you are feeding to the speakers, such that if you go to different situations/venues, you recompile or recreate the tracks for that.
JS: we have an intermittent layer between your source material and your output algorithm - you have to have a kind of interpretation layer, whatever shape that has.
EL: the question to me is to what extent this can or should be automated. To go from the source audio that represents your piece that might be transferred in various ways, including changing the timing, if you go into a space that has a lot more reverberation than another space. I really have no idea.
JS: but wouldn't that be known to the tool that we use, what the tool can?
JS: ... need or interpret the data you feeding in or not?
EL: Well, I think the one thing you really can rely on is that we can build any tool that we want, within reason. Anything that we can specify, if we know how to do it, you can build a tool to do that.
JS: Does it have to be in a format? means does your program has to to be capable of time-stretching etc.
EL: No, what I wanted to say is that in order for this to even have a possibility of succeeding, it cannot be constrained in terms of what the final tools are. It has to be something you can extend. As Hans said, OSC is a success because you can continually build on it. So there should never be a limit on what the strategies are, because we are still developing these strategies. To me the really hard question is to what extent you can actually formalize abstractions about what happens to your source material, to turn it into something you can actually feed into your speakers in a specific space.
GK: What our problem is that we often develop a piece with a specific target reproduction setting in mind. But the reality is that it would be much more probable to have a software that is an interface to reproduction settings and that we for instance keep a lot of our source tracks separate. And if you have spatialized techniques - that's just part of the synthesis. But at the end your can just rewire it and maybe apply spatial effects that are specific to the location. And in that way we don't think of our works are specific to one setting.
JS: So you are talking about applying a methodology for producing pieces.
GK: The smallest thing of methodology I can think of.
JS: I totally agree, we all getting into situations where we stop thinking about a specific rig, because we all encountered different situations, e.g. going from a professional studio to a home studio to a concert hall. So I think this is a common denominator. According to what Eric (Lyon) said, I think there should be a minimum set of things that have to be applied and implemented and everything else is ... nice to have. And if it can do it it will do it otherwise not.
VP: I think that it is useful to split the problem into three parts. First part is the multichannel audio interchange format, like PCM... the second problem is related to the metadata. Also metadata are already used in different file formats, metadata are usually incompatible. It would be good if everyone would use PrTools, but not everyone likes it. (laughter)
The problem is similar to reverbs effects: a lot of reverbs have different parameter, and it would be possible to define a basic set of common used parameter. And if you want something more complex you have to specify it. However, all reverbs sound a bit different. and probably the third part is the description of the speaker array. The only widespread standards are 2.0 and 5.1, sometimes 7.1.
EL: I think the latter one is relatively easy to solve. Today we were taking an octophonic piece [for performance at the ICMC] from someone who had numbered the channels clockwise, to our own speaker mapping. It cost us 10 minutes [to reassign the channels to speakers] where ideally we would have automated the mapping. But conceptually I think that's quiet simple. I think the other extremely important point you made is that it is going to sound different. And it seems to me that there are possibly two different philosophies, maybe more. One would be to say that every space is different, just like every concert hall is different. And what you want to do is for your piece to sound wonderful in that space according to how you define wonderful. Another philosophy would be: "I have a conception of how my piece should sound and I want to realize that conception in as many spaces as possible. And I want to have the tools to do that." These are for me the two different philosophies.
TL: I think that's absolute right, I think with a format, you have to realize what is possible and where does it end. You can automate things and you can try to make standard formats that are portable. But always you have to make some judgment and adjustment you have to do by hand. But that process still might be easier because it helps you at least with parts on the way.
EL: Much as I am sympathetic to going with the space, I am just as sympathetic to perfectionist point of view. For somebody who has an absolutely clear image of what they want they piece to be, they should have that ability [to implement it] as much as possible.
Not everybody has ears that so precise, but you should have the tools to make this happen.
HT: I am also as split up as you (Eric Lyon) are between being a musician or being an interpreter of saying I have certain ideas of what it is, I am in a certain space and I need to adapt my music to that space. The same time I am also an organizer and I am receiving all these pieces and I try to make some sense out of it, without make them sound like my music. (laughter)
JS: isn't that the point of organizing ? (laughter)
HT: So I totally see that some exchange format can be of an incredibly value, from my perspective there are probably two scenarios: One is that I have a certain piece that is in a certain number of channels and I would like to reproduce this in another hall. I guess that's the even easier scenario where I can attach to that part of certain metadata of where the different channels come from, what qualities they should have..... The second scenario for me is much more complicated where I deliver a certain number of streams and then movement should happened in that space. And that the metadata would actually describe the movement. And I am not sure whether that is entirely possible.
BR: I was thinking of taking a real world example, like your performance today -- what would have helped you, or what would have benefited you?
HT: what I would need is out of the discussion, is a unit for which I can easily make a snapshot of that space, figure out where the resonances are. I have a pretty good ear, but I spend 20 minutes of my rehearsal time to figure out where they are in this room if I send a signal to this ring of speakers or to that ring of speakers etc.
JS: So some kind of Impulse response availability of that space.... so then we are talking about a Room Description Format.
JS: a RoomDIF, a complementary to the SpatDIF which describes the target venue and not the piece.
HT: The problem with the RoomDIF is that, let's say we have these IR for SARC, and now you have movable wall panel, you have sound absorption for the basement room... I mean how reliable is this? Do we have 10 people in the hall while measuring of 50.... ?
GK: unless there is something dynamic binaural way of reproducing the room, it's not an accurate representation anyway,
JS: Even between the rehearsal and the performance you have dramatic changes that you have to account for, so it's not a perfect world either way. But it would be a step into the right way. Even with some dummy sound material you would get an idea of how the space sounds like.
HT: Coming back to your question I guess when the composer is used to diffuse his music, such a format is probably not the tool. He has his ears, he can do something.....If I send my piece to you and I don't know the space and you have to figure out what to do with that piece.....
It is for those cases where the composer knows the piece inside out but is not there, he can attach some data to it - "OK this is what I would like to sound"
JS: But are we getting into a domain of having some kind of scoring, descriptive language for EA music. That would be metainformation about the piece, intention of the piece, the character of the piece,... that to me that sounds like a description of the piece
GK: there is always going to be differences in the quality of the work. And there will be bad interpretation and there will be good interpretation. That's why we are getting experience and it will always comes down to an artistic judgment.
JS: There is no reason why this format couldn't actually carry this information - it's just another piece of metainformation. On a directical level the implications are more profound, but describing essential properties of the music in some kind of symbolic representation that people understand. This is opening another kind of description where you have to define the terminology.
GK: In terms of terminology of describing spatial images, there is a huge amount of work done. People were working years and years describing the qualitative properties of images. It's reducible of a finite list of elements. Bit I have a very hard time to imagine myself sitting down and describing my piece in this terms, except maybe if there would be a special reason to do it. So it seems to for me a more practical thing to suggest a fundamental set of descriptors that could be provided as metadata and can be applied when the composer thinks this is a useful thing to use it.
JS: You mean, that e.g. "this piece needs movement in the space" and the you give a kind of description of the movements...?
TL: I think right now the discussion is based on assumption of being used for EA-music, but I think in you presentation Gabriel (Gatzsche) that a format should primarily minded towards audio engineering researchers. So probably If you mean that, some things we are discussing here right now might be more or less irrelevant. I mean, who is this format for?
GG: What we have to accept is that we can't realize to account for everybody's imagination in that format. So what you can do is to look at the requirements of a specific target group. I think that's what we are doing here right now for EA-composer with that discussion.
JS: Yes, it is a valid use case to start from, but only one use case. The delicate issue after having started is to not block other. The constraining aspect of the format should be minimum.
HT: And also what you mentioned in your talk (Trond), that you often have to adapt the position of the speakers depending on the Space, and where the painter put his pictures etc. A lot of ideas about spatial distribution of sound is actually given by the space. (Agreeing) So I guess the format can't replace interpretation, and should not. Honestly, a lot of the ideas I did today during my performance came during the rehearsal yesterday.
TL: I think, the format should facilitate interpretation, make it easier to interpret it better.
GK: I can imagine a system that matches input channels to output channels in a variety of different ways. But then when you do this work there other kinds of things that seem to be happen like adjusting the dynamic range according to the reverberation of your venue. That is one consideration which is not spatial per se but you might want to consider. I am wondering of other critical adjustments you might want to make, that it would be a useful thing to have in adapting to another room.
EL, JS: Timing
BR: For example a basic one: In my 5.1 piece I played here, there was no center speaker, and it was very hard to create the balance of the two speaker to make them center-speaker-like. A calibration mode where you are in the space and you could more easy re-calibrate the system. Some sort of real-time analyzer to get the basic image right. Such a simple thing took us quiet a time.
NP: What have you done ?
BR: I was tuning by ear, but I was thinking this is just a basic thing, this could be solved somehow.
JS: I think the hole acoustic engineering has all the tools for this, they might be expensive, the idea to set up a measurement microphone in the space and calibrate the speakers automatically via sine sweeps. (41:00) I think is more a question if the infrastructure if the venue itself. Of course we could have metadata of the piece such as: this SPL at the frequency or the loudest peak should be this dB value, I am speaking more technical terms. What I've said at the end of the panel was that we have to differentiate between the needs for room acoustics and creating and exchanging spatial music. These are complementary and sometimes they can't be separated. I am not sure whether the format should transport all of this technical information.
HT: I am sorry being so tool-oriented, but that's what I am doing. Let's say, I would deliver for my piece three snippets of an impulse response, saying "at second 23 this is how the space should sound" and "this texture should sound like this". And then you calibrate the system with the room as close as you get to these impulse responses.
VP: but actually you can use two different reverbs and gradually switch from one to the other - in a simple case. It might become more complex - with a set of effects you crossfading.
EL: you mean like a live convolution?
VP: That's not what I meant. Let's say we have two acoustical connected rooms, one is small and dry and the other room is reverberant with 5 seconds reverb time. We can make a transition, just varying the reverb sound. And in most cases it would be satisfactory I think.
EL: Even though it sounds like a simple case, I think the case of the problem of [mapping from] 5.1 where all of a sudden where you don't have a center speaker, might be a perfect example of where you really do have to re-conceptualize the distribution to the speakers, because you're never going to get the same sound, I think, from a virtual source as you would get straight out of a loudspeaker.
TL: I think, we are discussing two things in parallel here and the question is these be mixed together or should they be separated. One question is the intension, and the other one is the actual rendering process, in order to play back the sounds in one system. Because to me when me and Nils (Peters) were discussion SpatDIF earlier on, the really great idea about it when he first suggested it to me is to have a very clear description of intensions, which means that you can try different ways of the actual rendering in a room, you can just author the rendering, but the intensions are independently of the room you are rendering the sound for.
BR: So in order to render for a particular room you have to go to the room to render for it? (45:15)
TL: I am more thinking like, if you have a composition and you want to describe the positioning of the different sounds - this is the intension - and then the next challenge is how do I get as close as possible to those intensions in this particular room. And that might means that you have to test different ways of spatialization whether you use VBAP or Ambisonics or a combination of things. But it makes it easier to try them because the format where you describing you intensions is the same regardless of which kind of rendering algorithm you want to address. The description is independently and also if you want to have information about the particular acoustics of the space your piece beforehand should be described in a way it would behave in a neutral space, if such a space exist. And the one would have to investigate the actual acoustics of this space and apply the appropriate delays or EQs or whatever you have to do, in order bring it as close to neutral as you can. And of course, if you only have four speakers you really have a problem to reproduce 5.1 material, but at least the intension is clear and then the next challenge is how do we get as close as possible to recreating this intension. To me it seems really important to separate these two things and you can think of them in a very modular way.
JS: Who knows what techniques will come. I think we are not done yet. So what you want to say that the information should be extracted from the actual technology as much as possible.
TL: yes, because to me, one of the benefits of this is that it would make it much easier to try different rendering approaches to see how they work in that particular venue you are working in. For example in a gallery space due to the reverberation, Ambisonics sounds crap, to I just want to delete the Ambisonics module and throw in another one, for example VBAP and see how do this work - "...Ah, that works much better..."
JS: I am going into the direction you said, the same with filtering, delay, reverberation, etc. - the change you have to make in order to achieve the final goal should not have to be defined, while you should have to have a reference. So I like the idea that you send along some kind of reference point ...
HT: ...You just make a sweep or a noise burst in the studio where you are and send that along...
MG: I think that will not work. The thing is if you make a recording with one loudspeaker and with one microphone it may be possible to recreate your intention in one point in the target room, but in all other places the sound will be distorted and will sound crap.
NP: crap? Let's say you don't know how it sounds in other places.
MG: You will equalize the room for one point, you need at least multiple microphone measurements, but still then it is not guaranteed.
NP: ... but at least you have an idea
MG: yes, an idea for one point, that's really dangerous. If you apply this measurement of one point to the entire room. It may in theory will sound as you expected in one point, but it will differ in all other areas and it will be phasy and crazyly distorted.
GK: One thing that I would agree with is that the way things work in a small space when you translate it into a large space is a completely different game. The way panning works ... that all changes, and I think there is no real way to get around that. It is always something a little different because as soon as you put in longer time delays all the psychoacoustics is different. That's something everybody needs to know. I don't know that physical measurements whatever at least completely satisfy the goal. Physical measurements at least accomplish a certain thing, but the subjective impression will be different.
EL: there is a completely different line of attack, which is to develop a practice. Hans mentioned a diffusion of a composer by the student that was suboptimal, because the student might not understand how to present that piece in the space. So there might also be the need for people who know how to present these pieces in the spaces.
GK: We talked about certain set of tools, Ambisonics being one, etc. Nils (Peters) hasn't said anything about sort of microphone simulation, multimicrophone reproduction simulation, it seems to me that there are a lot of people who would approach this problem in terms of "Oh I mic things this way and reproducing that way". That's another whole paradigm. It seems to me one of the tools in the toolkit is that kind of other tools, that looks at it from the recording point.
NP: Well, for me it is just another concept of creating space. Like another kind of paintbrush in the variety of other paintbrushes - a special paintbrush maybe, with a certain quality. (52:20)
TL: I think, right from the start we have to absolutely recognize the fact that any kind of format will never be able to replace the need for doing judgments (strong agreement)
MG: personally, for me the idea of such a proposed format would be not to recreate the process of composition as it is done now, like loudspeaker channel based, because I think the way EA-music is done today and was done like 60 years ago is always limited by the number and the position - the current settings of the loudspeaker. It would be now the possibility to step one level up. I think as a composer it must be relieving not to have to think in loudspeakers. Loudspeakers are just a tool - silly boxes - so it would be much nicer to think in a more abstract level.
BR: but for some people the loudspeaker is an instrument, something that is actually very important.
MG: that's one category of composition. But other people have other, higher level, ideas. Or instead of loudspeakers they have virtual loudspeakers as an instrument.
JS: I think there shouldn't be an opposition between those two paradigms. It's just that one paradigm is larger than the other one. So there is no problem of sending an 8 channel file and saying 8 speakers please.
MG: The point I am trying to make is that 5.1 or 8 channel pieces are actually a restriction.
HT: OK, that's the only restriction we have, because there are no tools to compose otherwise.
MG: sure, so we are trying to think about the future and some fundamentally new paradigms.
VP: The simple increase of the loudspeaker number is not always useful. Because there is a big improvement when we switch from mono to stereo, from stereo to quad, but very little improvement when we switch from 40 speakers to 56. And it is also limited in terms of listeners attention. It is much more complex follow 56 speakers, each of them having a different function.
MG: That's two completely different things, one is that more abstract scene based approach and the other one is just traditional loudspeaker-based composition, and no composer has to stop doing that.
GK: I think that for composers, there needs to be a concept of a different practice. If somebody came in with a piece that define space in a high level way. And we are all blown away, the next day we will all trying to figure out how that was done.
MG: There is a general difference if you use single speakers in a stereophonic way like you do in 8 channel pieces or with 5.1 systems or you use many speakers in soundfield reproduction like Wave field synthesis, that's fundamentally different.
NP: Do you want to have a 900 channel soundfile you want to play back at a big WFS-system? probably not.
TL: I think we have to leave soon , so the important question is where do we go from here, from this discussion?
GK: Eric (Lyon) and I talked about designing an spatial orchestration tool and we need a lot of input.
JS: You should look in Holospat and IANNIX from France, there are quiet a few attempts that have been made. And during this review you will find elements of SpatDIF, even if they are not called that. Because everybody developed a kind of personal description format for that.
GK: One problem of SpatDIF right now is that we haven't done enough analysis of spatiality of EA-music. There is almost no published I am aware of. I am convinced that people have a lot of conceptualization. But we can't all refer to Denis Smalley's article. So I think a lot of analysis can help us.
HT: There is pretty much literature out in French. There is the "l'espace du son" series, but it is not translated.(http://www.electrocd.com/en/cat/lien_1998-11/)
BR: What would be very useful would be a compendium of users approaches. (agreement)
NP: I was doing an online survey about how composers are using 3D audio systems, so I have information of more than 50 composers. I am going to make a summary public if I've found the time to analyze the data.
TL: We have also the mailing list at www.spatdif.org. To me it would be useful to use the mailing list not to discus SpatDIF as the solution but more to discus the general problem we are talking about. Also from the panel discussion yesterday, it is clear that there are several proposals. So there is no need to choose one proposal, but much more important to discuss the problems. So this might be a place to keep the discussion going.
GG: I think a WIKI would be good.
TL: Yes, that's a good point. We want to set that up at spatdif.org at some point.