-
I am using this config - https://github.com/NVIDIA/NeMo/blob/ae5d7e81b8e446e5650082b1700eb92dd2e7c1bd/examples/asr/conf/fastconformer/hybrid_cache_aware_streaming/fastconformer_hybrid_transducer_ctc_bpe_streaming.yaml. And I'd like to train / finetune a model for live streaming that doesn't get any context from future frames. In other words, it must be doing cache-aware streaming and only able to make predictions on what it's heard so far and what it's predicted so far. It seems like doing |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
|
Beta Was this translation helpful? Give feedback.
-
Also set causal_downsampling=true to make downsampling causal. Using layernorm instead of bacthnorm or disabling normalization for the preprocessing is also needed to make it easier for streaming. But if you using that streaming config, all these stuff are already set, and you just need to set att_context_size to what @titu1994 suggested. |
Beta Was this translation helpful? Give feedback.
NeMo/examples/asr/conf/fastconformer/hybrid_cache_aware_streaming/fastconformer_hybrid_transducer_ctc_bpe_streaming.yaml
Line 117 in ae5d7e8
NeMo/examples/asr/conf/fastconformer/hybrid_cache_aware_streaming/fastconformer_hybrid_transducer_ctc_bpe_streaming.yaml
Line 131 in ae5d7e8