Would it help to identify all sounds and images as a "package" with a SHA-1? This way we're not tied to names being unique and we can add more metadata to the samples easily. In the backend we can serve images and sounds with the same key. This would make templating easier?
The sounds and images could be a json-object, example
{ '097d80883f7c5e0f49c1bd24e15942e9ba11f37b': {
'wav': 'sound/097d80883f7c5e0f49c1bd24e15942e9ba11f37b.wav',
'length': '10',
'image': image/097d80883f7c5e0f49c1bd24e15942e9ba11f37b.png',
'mime-type': 'image/png' }
}