Supplementary information Detailed Materials and Methods Subjects The experiment included twelve subjects: ten sighted subjects and two blind. Five of the ten sighted subjects were expert users of a visual-to-auditory SSD called The voice, and the other five had learned associations between voice sounds and object names without interpreting the shape information in the sounds. All sighted subjects beside one were females, and all were aged 18 to 25. The two blind subjects were also expert users of The voice and employed this SSD in their daily lives. Subject A.S. is a 37 year old male, congenitally blind due to retinopathy of prematurity. At the time of testing, he was able to see faint light but not forms, shapes or colors. P.F. is a 45 year old female who became profoundly blind at the age of 21 secondary to a chemical accident. Her residual vision was limited to faint light perception in her right eye (the left eye had been enucleated). All subjects were right-handed as assessed by the Oldfield questionnaire had normal hearing, and had no neurological or psychiatric conditions. Sighted experts training procedures: Subjects were trained for 20 consecutive weekdays. Firstly, subjects were played a voice sound (soundscape) and asked to select from four choices the picture that it corresponded to. When they chose incorrectly, they were shown the correct answer and given feedback by an investigator. These multiple choice questions increased in difficulty as the training progressed. Lines and simple shapes were used for the easy stimuli, capital letters and more complex shapes for more difficult stimuli, and pictures of real objects for the most difficult stimuli. We created a library of hundreds of object related soundscapes so that each stimulus was effectively novel and subjects were not able to learn by simple pair association. Instead, they had to interpret the shape information contained in the soundscape to pick the correct answer. Secondly, subjects were blindfolded and set up with video-camera glasses, a laptop running The voice software, and headphones. They were then asked to use The voice to see and locate various objects placed in front of them. 1
During training, subjects were tested with multiple choice questions to assess their ability to interpret novel voice soundscapes. The same tests were used each week to monitor progress, and subjects were never offered feedback on these tests. All subjects achieved a minimum level of 50% success on these multiple choice tests by the end of the training period; since each question offered four possible choices, this represented twice the level of success expected by chance. Our subjects achieved an average level of 69.5% correct by the end of training (range 53% to 86%). Sighted association subjects training procedures: Each subject had 3-5 training sessions of 2 hours with the 8 associations required for the fmri experiment (see general experimental design). During these sessions subjects were played the voice soundscapes and told the names of the objects represented by the sounds. This process was repeated until the subjects were able to accurately recognize and name the 8 sounds required. The variable length of training represents the variable time it took the control subjects to learn the associations. The criterion for success was 100% accurate recognition in two complete sets of the 8 stimuli (16/16 associations), repeated on two consecutive days. These subjects were not told how the sounds were created instead they were told that the sounds had been randomly generated. After the scanning session subjects were debriefed. They were questioned about how they had learned the associations to exclude subjects that they had developed a strategy involving shape. No subjects had to be excluded on these grounds. To demonstrate a behavioral difference between our two groups we also asked these subjects to attempt the multiple choices testing (on novel soundscapes) that our sighted experts took. This was carried out after the scanning so that any learning that might have occurred during the testing did not impact on the fmri data. The association subjects achieved an average of 30.8% success (range 20%-36%), which, as expected, was significantly worse than our experts (P=0.004, Wilcoxon rank sum test). No association subject reached our success threshold for voice expert subjects, which was 50% success. 2
General Experiment design: Plastic toys were obtained representing the following objects: Horse, Cow, Elephant, Rooster, Jeep, Saw, Hammer and Whistle. These objects were chosen because they are easily recognizable both by their shape and by the sounds they make. Photographs of the objects were taken and converted into soundscapes using The voice visuo-auditory conversion algorithm (Suppl Fig 2). Short recordings were obtained of the sounds made by these objects (e.g. the banging of a hammer). Testing Protocol During the scanning, subjects were first given auditory instructions through a pair of MRI compatible headphones. During voiceobj, AudObj and TacObj subjects were then played two sounds of the same type (e.g. voice, scramble, animal/object noise etc) each repeated 3 times. Each sound lasted 2 seconds. Each pair of trials lasted a total of 12 seconds and was terminated with the auditory instruction stop. An 8 second rest period followed, before the next trial. Subjects had to recognize each object and covertly name the object s identity. Data acquisition began only when subjects were able to correctly identify all 8 objects inside the scanner (overtly). During half of the runs, subjects were asked to identify the stimuli as man-made or animal, and to respond via a two-button response box. This allowed verification that the subjects were correctly identifying objects during the scanning. To control for hand movements, subjects had to press the response buttons randomly also during the control conditions after each cue for a stimulus (AudScr, voicescr, SenMot). A sample trial is shown in the schematic figure below: Instruction listen stop Stimulus Hammer Soundscape X 3 Elephant Soundscape X 3 Time (s) 1 6 6 8 3
Data analysis and MRI acquisition. Data analysis was performed using the Brain Voyager QX 1.8 software package (Brain Innovation, Maastricht, Netherlands). Preprocessing included head motion correction, slice scan time correction and high-pass temporal smoothing in the frequency domain to remove drifts and to improve signal to noise ratio. To compute statistical parametric maps we applied a general linear model (GLM) using predictors convoluted with a typical hemodynamic response function (Boynton, 1996). We used a statistical threshold criterion of p < 0.05 corrected for multiple comparisons using a cluster-size threshold adjustment. This was done based on the Forman et al. Monte Carlo stimulation approach, extended into 3D data sets using the threshold size plug-in BrainVoyager QX. (For the original algorithm see Forman, S.D. et al. Improved assessment of significant activation in functional magnetic resonance imaging (fmri): use of a cluster-size threshold. Magn. Reson. Med. 33, 636-647 (1995)). The Cluster Threshold plug-in implements a randomization technique to estimate a cluster-level confidence on the current overlaid 3D maps (VMR/VMP) given the current voxel-level confidence level. The intrinsic smoothness of the map is estimated (4mm kernel) and a user-specified number of simulations (the recommended 1000 iterations in our case) is performed to assign an alpha-value to each active cluster. Based on the simulations, a minimum cluster size threshold is set for the current map to achieve a corrected p of 0.05. The resulting 3D volume (VMR/VMP) maps were then projected on the individual cortex reconstruction. For the group results (Fig. 1c, Fig. 2a,b and Suppl. Fig. 3) we used a random-effects GLM analysis with correction for multiple comparisons. 3D recording and cortex reconstruction Separate 3D recordings were used for surface reconstruction. High resolution 3D anatomical volumes were collected using high-resolution T1-weighted images using a 3D-turbo field echo (TFE) T1-weighted sequence (equivalent to MP-RAGE). Typical parameters were: Field of View (FOV) 23cm (RL) x 23cm (VD) x 17cm (AP); Foldover-axis: RL, data matrix: 160x160x144 zero-filled to 256 in all directions (approx 1mm isovoxel native data), TR/TE=9ms/6ms, flip angle = 8deg. Acquisition was segmented x 3 in order to enhance gray/white matter contrast. NEX = 2 in separate acquisitions. The 4
parallel imaging head coil (SENSE head) was used to reduce scan time. Reduction factor = 2.5 (total acquisition time: 5 minutes). fmri recording parameters The BOLD fmri measurements were performed in a whole-body 3-T, Philips scanner equipped with 22 mt/m field gradients with a slew rate of 120 T/m/s (Echospeed). The pulse sequence used was the gradient-echo echo planar imaging EPI sequence. We used 30-33 slices of 3mm thickness, with an interslice gap of 1 mm. Data in-plane matrix size was 128x128, field of view (FOV) 24cm x 24cm, time to repetition (TR) = 3000ms and time to echo (TE) = 35ms. Each experiment had 180 data point with four repetitions. The first five images (during the first baseline rest condition) were excluded from the analysis because of non-steady state magnetization. Percent signal change analysis For the LO/LOtv signal magnitude analysis, activation was sampled from the peak occipito-temporal voxel of the left hemisphere (Fig. 2c) and the right hemisphere (Suppl. Fig. 4) for the tactile objects versus sensory-motor control contrast and averaged across the four runs (using in each subject separately the peak voxel in a smoothed volume, after convolution with a Gaussian kernel of 4 mm full width at half maximum). The averaged percent signal change and standard errors were then calculated for each condition. 5