This is not really possible, or would require a huge amount of work for something that could be far from useable. YM is just a list of registers to send to the PSG. So it does not convey information such as “sound 1 is a bip, starts here and ends there”. You would have to decode the stream and “guess” where each note starts, ends, and recreates the instruments and tracks accordingly.
It would be hugely complex and what the algorithm would consider a “good” sound would actually be unusable by a human. I’m sorry, but this can not be really done.