-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"time_frequency" sonification as introduced in mir_eval 0.5 makes it hard to hear chord changes #310
Comments
I've also noticed this while qualitatively testing chord models, and I agree that it's confusing to listen to. It's been a while since I looked at this code, but I wonder how difficult it would be to add a flag to clip time-frequency sonification at zero-crossings rather than taper to zero by interpolation? That way, we can still have crisp transitions without crackle. |
I started wrapping my mind around the implementation a bit: I believe the problem here is that time_frequency should be capable of handling two rather different kinds of "grams":
I am not 100% sure, but I believe that your suggestion of clipping the individual waveforms at the last possible zero crossing could be problematic in the first scenario, since one would somehow need to ensure that no "gaps" in the wave would occur between temporally neighbored time-frequency bins I have a different solution though: With the current implementation, it is only the long intervals that are problematic. So one could simple split each of those long intervals up into three new ones: One very short "attack" interval at the beginning, a very short "decay" interval at the end and the remainder interval on the middle The following function implements this solution: def prepare_gram_for_time_frequency_sonification(gram, times, max_interval_len=0.2):
if times.ndim == 1:
times = util.boundaries_to_intervals(times)
mod_gram_inds = []
mod_times = []
for m in range(gram.shape[1]):
if times[m,1] - times[m,0] > max_interval_len:
mod_gram_inds += [m,m,m]
mod_times.append(np.array([times[m,0],times[m,0]+max_interval_len/3]))
mod_times.append(np.array([times[m,0]+max_interval_len/3,times[m,1]-max_interval_len/3]))
mod_times.append(np.array([times[m,1]-max_interval_len/3,times[m,1]]))
else:
mod_gram_inds.append(m)
mod_times.append(times[m,:])
mod_times = np.array(mod_times)
mod_gram = gram[:,mod_gram_inds]
return mod_gram, mod_times (I am not a very experienced Python Programmer, so please excuse the "non-Pythonic" style) |
I didn't realize people were using it for the second use-case you had listed; in that case the interpolation doesn't really make sense. I think it makes sense to interpolate over the minimum of the interval length or some pre-defined short interval. Does that make sense? |
How about interpolating over, say, two cycles at the frequency being synthesized? |
@craffel The second use-case is exactly what happens when you call mir_eval.sonify.chords(...). Each column of the internally constructed gram corresponds to one interval/chord-label in the original given chord sequence and therefore, each column also corresponds to the full duration of a chord (which can potentially be VERY long, even for real-world examples). I think your suggestion of interpolating at a fixed, potentially even frequency-dependent rate is very good! I'll see if I can come up with something. |
This seems fine unless there's an interval which is shorter than two cycles of the frequency. Then again if the interval is that short the user should expect it to sound clicky. |
Exactly: if that's the case, then you wouldn't perceive it as a tone anyway. I guess one cycle of fade-in and one of fade-out would be sufficient. If the interval is less than two cycles, this reduces nicely to a triangle window whose height is inversely proportional to the base. This would effectively blunt out any impulses due to short intervals (as opposed to being due to zc alignment), which seems like a nice property. |
As described in #255, there was an issue with crackling sound in the time_frequency sonification function, which was fixed by adding some amplitude envelope interpolation. Although the implemented fix indeed prevents any crackling from happening, it also makes it very hard to hear, for example, the timing of chord changes in the sonification due to the very smooth transitions.
In the attached example, you can hear the original audio in the left channel and the sonification of a chord-estimate in the right.
example.zip
Maybe we could add a switch parameter for being able to choose between smooth transitions (without crackling but lack of "temporal resolution") and crisp transitions (potential crackling but clear transitions)?
All the best,
Jonathan
The text was updated successfully, but these errors were encountered: