Beethoven Quartet Op. 18 No. 1 1st Movement – stereo bounce
Following up on the discussion yesterday I today did a case study on spatialisation in Logic Pro to get a sense of the workflow in DAW (Digital Audio Workstation) software, and how/if the stratified approach makes sense.
The Logic Pro 9 project is based on a MIDI file of the 1st movement from the String Quartet Op 18 no. 1 by Beethoven, retrieved from Kunst der Fuge, an eminent online resource on classical music MIDI files.
The four voices are all synthesized using the Audio Unit instrument Synful Orchestra v. 2.5.2. Synful uses reconstructive phrase-modeling for mapping of gestural performance data to the synthesis engine. Further details on Synful can be found here.
On the master channel two effect processes are applied as inserts: The AudioEase Altiverb v. 6 convolution reverb (Audio Unit plug-in) and Equalizer (ships with Logic).
The resulting audio file is provided above. The Logic project can be downloaded here.
From the quick test the layered approach does seem relevant for spatialisation within a DAW context, and the various layers are easily identified. The additional layers proposed from Spat as discussed yesterday do seem relevant to the discussion of this example:
- Authoring: Synful offers possibilities for describing the positioning of the musicians on the virtual stage. In this project the virtual musicians are positioned in the standard way for string quartets, centered on the stage with Violin 1 – Violin 2 – Viola – Cello from left to right (as seen from the audience), and Violin 2 and Viola slightly further back on the stage as compared to Violin 1 and Cello. Neither musicians nor listener are moving, so the scene description is static. If dynamic repositioning were to be desired, it would be difficult to achieve with Synful. The Synful VST plug-in do not offer any parameters for automation, and the AudioUnit seems equally limited in terms of automation.
- Source pre-processing: Synful use the positions of the musicians to emulate Interaural Time Difference (ITD) and Interaural Level Difference (ILD). Localization cues are further improved through emulation of early reflections. In the test Synful has been set to synthesize early reflections using four walls.
- Room modeling: processing of the room model is split between Synful (early reflections) and AltiVerb (convolution reverb). The Altiverb impulse responses contains both early reflection and late reverb. Gain levels for early reflections have been supressed in Altiverb, so that they are left ut in preference of the early reflections generated in Synful.
- Encoding and Decoding: The Logic Session is stereo, and hence limited in terms of capacity for surround reproduction as compared to multi-speaker surround setups in Max, Jamoma and Spat. Still, as playback has been done over headphones rather than stereo speakers, a binaural post-processing stereo plugin was applied at the end of the insert chain on the master strip, after the post-processing discussed below.
- Post-processing of output signals: EQ is applied to the output signal. This was required for improved balance between 1st violin versus viola and cello. The system seemed to emphasize the low and mid range frequencies. This has been adjusted partly by raising the 1st violin gain in the mix, and partly by shelf filters.
- Hardware abstraction layer. Logic is set to use built-in audio.
- Hardware layer. Although the mix was intended for playback over stereo speakers, the case study was carried out using headphones. The binaural post-processing stereo plugin was assumed to offer interchangeability between speakers and headphones. In reality the binaural post-processing was felt to affect not only spatial qualities, but also the EQ curve of the signal, causing a more mellow mix with less high frequencies. A MultiMeter plugin for spectral analysis of the mastered signal seemed to confirmed that the binaural plug-in reduced spectral energy above approx 1 kHz.
Compared to the layered model proposed in Peters et. al. (2009) Spat further details the DSP processing part of the spatialisation. The identification of source pre-processing (early reflections), room modeling (convolution reverb) and post-processing (EQ) all appeared relevant in the example. The separation of early reflections and late reverb makes sense also in terms of how they contribute to the spatial impression: While early reflections might contribute to the localization of the source within the space, the late reverb mainly provides clues regarding acoustic properties of the room the sound is situated in as offering a general colorization and blurring of the source that often will be considered aesthetically pleasing. Spat seems to offer a more precise model for specifying and processing reverberation as compared to our model proposed at SMC 2009.
The Synful plug-in simulates early reflection, but provides no modelling of late reverb, instead assuming that it will be handled by a subsequent reverb unit. The fact that Eric Lindemann of Synful previously worked at Ircam might contribute to explaining the separation of early and late reverb and the inclusion of early reflections in the Synful plug-in.
The added configurations offered by AltiVerb v.6 as compared to earlier versions enables suppression of early reflections, in this example replaced by output from Synful. The ability to substitute early reflections from Altiverb for Synful has its limitations though: Instead of convolving the dry signal only with the impulse response, the dry signal and early reflections from Synful are convolved, and the resulting late reverb is expected to be denser than if only the Altiverb reverb is used.
As a final observation this example illustrates that the layered approach is not necessarily strictly mirrored in the signal processing flow. In this example binaural encoding have to be considered to belong to the decoding layer. Still the binaural plu-in is inserted at the end of the signal chain, after EQ post-processing.
Concluding this post, I have also spent time today looking into and reading up on how surround processing is handled in Logic. It is restricted to established consumer/prosumer formats (mono, stereo, quadrophonic, 5.1, 7.1), not offering possibilities for arbitrary configurations of speakers and channels. But for the formats it is catering for, I have to say hat I am pretty impressed by what it has to offer. The up-scaling of mono and stereo effects for multichannel processing very much resembles ideas we have within the Jamoma team for Jamoma Multicore effect processing.