DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION

Size: px

Start display at page:

Download "DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION"

Moris Osborne
5 years ago
Views:

1 DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2, Giambattista Parascandolo 2, Stefano Squartini 1, Tuomas Virtanen 2 1 Università Politecnica delle Marche, Italy 2 Tampere University of Technology, Finland

2 DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2, Giambattista Parascandolo 2, Stefano Squartini 1, Tuomas Virtanen 2 1 Università Politecnica delle Marche, Italy 2 Tampere University of Technology, Finland

3 Outline Introduction Our system Training modes Results Challenge ranking

4 Introduction What is acoustic scene classification?

5 Introduction What is acoustic scene classification? Home Car Forest path Audio

6 Our system Overview Audio Feature extraction Sequence splitting CNN Scores averaging Label

7 Our system Audio Features Features Raw audio Log-mel spectrogram

8 Our system Features Sequence splitting Sequence Raw audio segment Log-mel spectrogram Sequence splitting

9 Our system Convolutional neural network Sequence

10 Our system Convolutional neural network Sequences CNN 128 Sequence Feature maps

11 Our system Convolutional neural network Sequences CNN 128 Batch normalization Sequence Feature maps

12 Our system Convolutional neural network Sequences CNN Sequence Feature maps Subsampled feature maps

Our system Convolutional neural network Sequences CNN 128 128

13 Our system Convolutional neural network Sequences CNN Sequence Feature maps Subsampled feature maps New feature maps

14 Our system Convolutional neural network Sequences CNN Time shrinking Sequence Feature maps Subsampled feature maps New feature maps

Our system Sequences CNN Convolutional neural network Flattening 128

15 Our system Sequences CNN Convolutional neural network Flattening Sequence Feature maps Subsampled feature maps New feature maps

16 Our system Sequences CNN Convolutional neural network Fully-connected softmax layer 256 Sequence Feature maps Subsampled feature maps New feature maps

17 Our system Sequences Convolutional neural network 128 Sequence Feature maps New Subsampled feature maps feature maps CNN

18 Our system Scores averaging Class prediction scores Prediction scores Scores averaging

19 Our system Prediction scores Scores averaging Scores averaging Class prediction scores! " Σ argmax File s class

20 Training

21 Training Cross-validation setup Training + validation Test Fold 1 Test Fold 2 Test Fold 3 Test Fold 4

22 Training Non-full training Training + validation Fold n Test Training Validation

23 Training Non-full training Training + validation Fold n Test Training Non-full training Validation

24 Training Non-full training Training Training + validation Fold n Test Training Validation Accuracies Validation Epochs

25 Training Non-full training Training Training + validation Fold n Test Training Validation Accuracies Convergence time Validation Epochs

26 Training Non-full training Training + validation Fold n Test Training Validation Training

27 Training Training + validation Fold n Test Non-full training Full training Training Training Validation

28 Results Test data Training + validation Test Fold 1 Test Fold 2 Test Fold 3 Test Fold 4

29 Results Sequence length 80 Non-full training Full training Accuracy (%) ,5 1, Sequence length (s)

30 Results Sequence length 80 Non-full training Full training Accuracy (%) ,5 1, Sequence length (s)

31 Results Sequence length 80 Non-full training Full training Accuracy (%) ,5 1, Sequence length (s)

32 Results Class accuracies Class Accuracy (%) Beach 75.6 Bus 76.9 Café/Restaurant 74.4 Car 91.0 City center 93.6 Forest path 96.2 Grocery store 88.5 Home 80.8 Class Accuracy (%) Library 66.6 Metro station 96.2 Office 97.4 Park 59.0 Residential area 73.1 Train 46.2 Tram 78.2

33 Results Class accuracies Class Accuracy (%) Beach 75.6 Bus 76.9 Café/Restaurant 74.4 Car 91.0 City center 93.6 Forest path 96.2 Grocery store 88.5 Home 80.8 Class Accuracy (%) Library 66.6 Metro station 96.2 Office 34.6% 97.4 Residential area Park 59.0 Residential area 73.1 Train 46.2 Tram % Bus

34 Results Other classifiers System Sequence length (s) Non-full training Accuracy (%) Full training Baseline GMM (MFCC) Two-layer CNN (MFCC) Two-layer MLP (log-mel) One-layer CNN (log-mel) Two-layer CNN (log-mel)

35 Challenge ranking Final training Extended training set Training + validation + test Evaluation set Secret challenge data

36 Challenge ranking Final training Extended training set Training + validation + test Evaluation set Secret challenge data New training New validation

37 Challenge ranking Final training Extended training set Training + validation + test Evaluation set Secret challenge data New training New validation 400 epochs convergence

38 Challenge ranking Final training Extended training set Training + validation + test Evaluation set Secret challenge data Final training for 400 epochs

Challenge ranking 100 90 80 70 60 50 40 30 20 10 0 89,7

39 Challenge ranking ,7 88,7 87,7 87,2 86,4 86,4 86,2 85,9 85,6 85,4 84,6 84,1 77,2 62,8

40 DCASE 2016 CONVOLUTIONAL NEURAL NETWORKS FOR ACOUSTIC SCENE CLASSIFICATION Michele Valenti 1 (valenti.michele.w@gmail.com), Aleksandr Diment 2, Giambattista Parascandolo 2, Stefano Squartini 1, Tuomas Virtanen 2 1 Università Politecnica delle Marche, Italy 2 Tampere University of Technology, Finland

41 Results Feature comparison System Sequence length (s) Non-full training Accuracy (%) Full training Two-layer CNN (MFCC) Two-layer CNN (log-mel)

Improved Acoustic Scene Classification with DNN and CNN

Improved Acoustic Scene Classification with DNN and CNN Please contact the conference organizers at dcasechallenge@gmail.com if you require an accessible file, as the files provided by ConfTool Pro to reviewers are filtered to remove author information, and