BirdCLEF_2022_8th_place

My participation in BirdCLEF 2022 - Identify bird calls in soundscapes. I have reached solo gold medal.

There are my last training notebook and inference notebook.

About my journey on this competition:

kaggle link

First step

I started with public notebook with change mel spectrogram hop length and input shape [224, 512] and add class weights to a submission. Weights array is 500 divided to amount of each birds species, and clamp it to max value 10. Then I multiply model output to weights array. 0.77 Public LB.
I tried this approach with secondary labels with 0.3, 0.4 and 0.5 labels. And with ensembling I'v got 0.79 on LB

Augmentations

For waveform:

Compose([OneOf(
     [AddGaussianNoise(min_amplitude=0.001, max_amplitude=0.011, p=1),
            NoiseInjection(p=1, max_noise_level=0.04)], p=0.4),                                
            PitchShift(min_semitones=-4, max_semitones=4, p=0.1),
            Shift(min_fraction=-0.5, max_fraction=0.5, p=0.1),
            Gain(min_gain_in_db=-12, max_gain_in_db=12, p=0.2),    
            Normalize(p=1.) ])

For spectrogram:

torchaudio.transforms.FrequencyMasking(24)
torchaudio.transforms.TimeMasking(96)

Model

SED with tf_efficientnet_b0_ns backbone

Training

Epochs = 30
LR = 0.001
weight_decay = 0.0001
dropout = 0.4
Loss: Focal BCE Loss
Optimizer: Adam (betas=(0.95, 0.999))
Scheduler: CosineAnnealingLR

Validation

I validate models on 7 folds CV with f1-macro metric. Thresholds for f1: [0.5, 0.1, 0.15, 0.20, 0.25, 0.30, 0.35, 0.4, 0.45, 0.5]. Best checkpoints chosen by 0.3 and 0.5 thresholds.

What didn't work on this step

Noise reduction
Oversampling rare birds
Trim long silence with noise reduction
Weighted loss
3 channels input, where 1st channel is power_to_db, 2nd melspectrogram, 3rd normalized melspectrogram

Second step

I manually trim all segments without bird's sounds from the scored audios, except skylar and houfin, on them I processed only 70 records per class.
Splited data to 15 seconds chunks for audios with length less than 1 minute and 30 seconds chunks for more than 1 minute. Got ~40.000 records.
Used weights arrays with 0.25-0.75 power to reduce their impact to inference.
Training on this data with previous pipeline and ensemble with previous models gives me 0.81 on LB

Data preprocessing

Same

Augmentations

Same

Model

SED with tf_efficientnet_b0_ns backbone

Training

Epochs = 60
LR = 0.0013
weight_decay = 0.0001
dropout = 0.4
Loss: Focal BCE Loss
Optimizer: Adam (betas=(0.95, 0.999))
Scheduler: CosineAnnealingLR

What didn't work on this step

PaSST model
AST model
Linear head models
PaSST preprocessing

Third step

I used similar with AST preprocessing.
Every epoch from all data randomly choose up to 300 records for every class.
Train on random 5 seconds crop, validate on first 5 seconds.
Secondary labels 0.4
Class weights array clamped with max value 8. And used with power 0.6.
Used mixup 0.4 for first 15 epochs, mixup 0.07 for 16-22 epochs, and no mixup for others.
Ensembling of this approach with previous gives me 0.82 LB

Data preprocessing

waveform, sr = ta.load(filename)
waveform = crop_or_pad(waveform, sr=SR, mode=self.mode)
waveform = waveform - waveform.mean()
waveform = torch.tensor(self.wave_transforms(samples=waveform[0].numpy(), sample_rate=SR)).unsqueeze(0)
fbank = ta.compliance.kaldi.fbank(waveform, htk_compat=True, sample_frequency=SR, use_energy=False, 
                          window_type='hanning', num_mel_bins=self.melbins, dither=0.0, frame_shift=9.7)
fbank = (fbank - self.norm_mean) / (self.norm_std * 2)

Mean and std calculated for all train dataset.

Model

SED with tf_efficientnet_b0_ns backbone

Augmentations

Same

Training

Epochs = 85
mixup 0.4, when epochs < 15
LR = 0.0008
weight_decay = 0.0001
dropout = 0.4
Loss: Focal BCE Loss
Optimizer: Adam (betas=(0.95, 0.999))
Scheduler: CosineAnnealingLR

Postprocessing

Mean-median averaging of predictions

          full_med = np.median(full_events, axis=1)                    
          full_mean = np.mean(full_events, axis=1)
          full_events = np.mean(np.array([full_med, full_mean]), axis=0)

Max adder.

       logits_max = full_events.max(0)
       for jk in range(full_events.shape[1]):
           if logits_max[jk] > threshold * 2.5:
               full_events[:, jk] += threshold * 0.5

Story of gold medal

When there were 2 days left until the end of the competition, I thought about using a pseudo labeling. And separate my data by 0.8 threshold to remove all noisy records. Only ~15.000 left from ~40.000. Then I trained a new model on this data. Because there were not enough time I trained model on all train data and validate it on some random part and random part of only scored birds. All training and preprocessing parameters was the same with previous.

It was last day of competition and only 5 submissions. So the first 3 I lost because i forgot to make model.eval() :) 4th attempt I used with single new model and get 0.79 LB. And the last one was mean between my best previous attempt and this pseudo labeling new model.

And it gave me 0.79 private, when my previous best private was 0.78

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
README.md		README.md
inference.ipynb		inference.ipynb
train.ipynb		train.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BirdCLEF_2022_8th_place

About my journey on this competition:

First step

Augmentations

Model

Training

Validation

What didn't work on this step

Second step

Data preprocessing

Augmentations

Model

Training

What didn't work on this step

Third step

Data preprocessing

Model

Augmentations

Training

Postprocessing

Story of gold medal

About

Releases

Packages

Languages

ffs333/BirdCLEF_2022_8th_place

Folders and files

Latest commit

History

Repository files navigation

BirdCLEF_2022_8th_place

About my journey on this competition:

First step

Augmentations

Model

Training

Validation

What didn't work on this step

Second step

Data preprocessing

Augmentations

Model

Training

What didn't work on this step

Third step

Data preprocessing

Model

Augmentations

Training

Postprocessing

Story of gold medal

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages