How do you represent music in a data structure? - data-structures

How would you model a simple musical score for a single instrument written in regular standard notation? Certainly there are plenty of libraries out there that do exactly this. I'm mostly curious about different ways to represent music in a data structure. What works well and what doesn't?
Ignoring some of the trickier aspects like dynamics, the obvious way would be a literal translation of everything into Objects - a Scores is made of Measures is made of Notes. Synthesis, I suppose, would mean figuring out the start/end time of each note and blending sine waves.
Is the obvious way a good way? What are other ways to do this?

Many people doing new common Western music notation projects use MusicXML as a starting point. It provides a complete representation of music notation that you can subset to meet your needs. There is now an XSD schema definition that projects like ProxyMusic use to create MusicXML object models. ProxyMusic creates these in Java, but you should be able to do something similar with other XML data binding tools in other languages.
As one MusicXML customer put it:
"A very important benefit of all of your hard work on MusicXML as far as I am concerned is that I use it as a clear, structured and very ‘real-world practical’ specification of what music ‘is’ in order to design and implement my application’s internal data structures."
There's much more information available - XSDs and DTDs, sample files, a tutorial, a list of supported applications, a list of publications, and more - at
http://www.makemusic.com/musicxml
MIDI is not a very good model for a simple musical score in standard notation. MIDI lacks many of the basic concepts of music notation. It was designed to be a performance format, not a notation format.
It is true that music notation is not hierarchical. Since XML is hierarchical, MusicXML uses paired start-stop elements for representing non-hierarchical information. A native data structure can represent things more directly, which is one reason that MusicXML is just a starting point for the data structure.
For a more direct way of representing music notation that captures its simultaneous horizontal and vertical structure, look at the Humdrum format, which uses more of a spreadsheet/lattice model. Humdrum is especially used in musicology and music analysis applications where its data structure works particularly well.

MIDI files would be the usual way to do this. MIDI is a standard format for storing data about musical notes, including start and end times, note volume, which instrument it's played on, and various special characteristics; you can find plenty of prewritten libraries (including some open source) for reading and writing the files and representing the data in them in terms of arrays or objects, though they don't usually do it by having an object for each note, which would add up to a lot of memory overhead.
The instruments defined in MIDI are just numbers from 1 to 128 which have symbolic names, like violin or trumpet, but MIDI itself doesn't say anything about what the instruments should actually sound like. That is the job of a synthesizer, which takes the high-level MIDI data an converts it into sound. In principle, yes, you can create any sound by superposing sine waves, but that doesn't work that well in practice because it becomes computationally intensive once you get to playing a few tracks in parallel; also, a simple Fourier spectrum (the relative intensities of the sine waves) is just not adequate when you're trying to reproduce the real sound of an instrument and the expressiveness of a human playing it. (I've written a simple synthesizer to do just that so I know hard it can be produce a decent sound) There's a lot of research being done in the science of synthesis, and more generally DSP (digital signal processing), so you should certainly be able to find plenty of books and web pages to read about it if you'd like.
Also, this may only be tangentially related to what the question, but you might be interested in an audio programming language called ChucK. It was designed by people at the crossroads of programming and music, and you can probably get a good idea of the current state of sound synthesis by playing around with it.

Music in a data structure, standard notation, ...
Sounds like you would be interested in LilyPond.
Most things about musical notation are almost purely mechanical (there are rules and guidelines even for the complex, non-trivial parts of notation), and LilyPond does a beautiful job of taking care of all those mechanical aspects. What's left is input files that are simple to write in any text editor. In addition to PDFs, LilyPond can also produce Midi files.
If you felt so inclined, you could generate the text files algorythimically with a program and call LilyPond to convert it to notation and a midi file for you.
I doubt you could find a more complete and concise way to express music than an input file for LilyPond.
Please understand that music and musical notation is not hierarchical and can't be modelled(well) by strict adherence to hierarchical thinking. Read this for mor information on that subject.
Have fun!

Hmmm, fun problem.
Actually, I'd be tempted to turn it into Command pattern along with Composite. This is kind of turning the normal OO approach on its head, as you are in a sense making the modeled objects verbs instead of nouns. It would go like this:
a Note is a class with one method, play(), and a ctor takinglengthandtone`.
you need an Instrument which defines the behavior of the synth: timbre, attack, and so on.
You would then have a Score, which has a TimeSignature, and is a Composite pattern containing Measures; the Measures contain the Notes.
Actually playing it means interpreting some other things, like Repeats and Codas, which are other Containers. To play it, you interpret the hierarchical structure of the Composite, inserting a note into a queue; as the notes move through the queue based on the tempi, each Note has its play() method called.
Hmmm, might invert that; each Note is given as input to the Instrument, which interprets it by synthesizing the wave form as required. That comes back around to something like your original scheme.
Another approach to the decomposition is to apply Parnas' Law: you decompose in order to keep secret places where requirements could change. But I think that ends up with a similar decomposition; You can change the time signature and the tuning, you can change the instrument --- a Note doesn't care if you play it on a violin, a piano, or a marimba.
Interesting problem.

My music composition software (see my profile for the link) uses Notes as the primary unit (with properties like starting position, length, volume, balance, release duration etc.). Notes are grouped into Patterns (which have their own starting positions and repetition properties) which are grouped into Tracks (which have their own instrument or instruments).
Blending sine waves is one method of synthesizing sounds, but it's pretty rare (it's expensive and doesn't sound very good). Wavetable synthesis (which my software uses) is computationally inexpensive and relatively easy to code, and is essentially unlimited in the variety of sounds it can produce.

The usefulness of a model can only be evaluated within a given context. What is it you are trying to do with this model?
Many respondents have said that music is non-hierarchical. I sort of agree with this, but instead suggest that music can be viewed hierarchically from many different points of view, each giving rise to a different hierarchy. We may want to view it as a list of voices, each of which has notes with on/off/velocity/etc attributes. Or we may want to view it as vertical sonorities for the purpose of harmonic analysis. Or we may want to view it in a way suitable for contrapuntal analysis. Or many other possibilities. Worse still, we may want to see it from these different points of view for a single purpose.
Having made several attempts to model music for the purposes of generating species counterpoint, analysing harmony and tonal centers, and many other things, I have been continuously frustrated by music's reluctance to yield to my modelling skills. I'm beginning to think that the best model may be relational, simply because to a large extent, models based on the relational model of data strive not to take a point of view about the context of use. However, that may simply be pushing the problem somewhere else.

Related

Search space data

I was wondering if anyone knew of a source which provides 2D model search spaces to test a GA against. I believe i read a while ago that there are a bunch of standard search spaces which are typically used when evaluating these type of algorithms.
If not, is it just a case of randomly generating this data yourself each time?
Edit: View from above and from the side.
The search space is completely dependent on your problem. The idea of a genetic algorithm being that modify the "genome" of a population of individuals to create the next generation, measure the fitness of the new generation and modify the genomes again with some randomness thrown is to try to prevent getting stuck in local minima. The search space however is completely determined by what you have in your genome, which in turn in completely determined by what the problem is.
There might be standard search spaces (i.e. genomes) that have been found to work well for particular problems (I haven't heard of any) but usually the hardest part in using GAs is defining what you have in your genome and how it is allowed to mutate. The usefulness comes from the fact that you don't have to explicitly declare all the values for the different variables for the model, but you can find good values (not necessarily the best ones though) using a more or less blind search.
EXAMPLE
One example used quite heavily is the evolved radio antenna (Wikipedia). The aim is to find a configuration for a radio antenna such that the antenna itself is as small and lightweight as possible, with the restriction that is has to respond to certain frequencies and have low noise and so on.
So you would build your genome specifying
the number of wires to use
the number of bends in each wire
the angle of each bend
maybe the distance of each bend from the base
(something else, I don't know what)
run your GA, see what comes out the other end, analyse why it didn't work. GAs have a habit of producing results you didn't expect because of bugs in the simulation. Anyhow, you discover that maybe the genome has to encode the number of bends individually for each of the wires in the antenna, meaning that the antenna isn't going to be symmetric. So you put that in your genome and run the thing again. Simulating stuff that needs to work in the physical world is usually the most expensive because at some point you have to test the indivudal(s) in the real world.
There's a reasonable tutorial of genetic algorithms here with some useful examples about different encoding schemes for the genome.
One final point, when people say that GAs are simple and easy to implement, they mean that the framework around the GA (generating a new population, evaluating fitness etc.) is simple. What usually is not said is that setting up a GA for a real problem is very difficult and usually requires a lot of trial and error because coming up with an encoding scheme that works well is not simple for complex problems. The best way to start is to start simple and make things more complex as you go along. You can of course make another GA to come with the encoding for first GA :).
There are several standard benchmark problems out there.
BBOB (Black Box Optimization Benchmarks) -- have been used in recent years as part of a continuous optimization competition
DeJong functions -- pretty old, and really too easy for most practical purposes these days. Useful for debugging perhaps.
ZDT/DTLZ multiobjective functions -- multi-objective optimization problems, but you could scalarize them yourself I suppose.
Many others

Perceptual similarity between two audio sequences

I would like to get some sort of distance measure between two pieces of audio. For example, I want to compare the sound of an animal to the sound of a human mimicking that animal, and then return a score of how similar the sounds were.
It seems like a difficult problem. What would be the best way to approach it? I was thinking to extract a couple of features from the audio signals and then do a Euclidian distance or cosine similarity (or something like that) on those features. What kind of features would be easy to extract and useful to determine the perceptual difference between sounds?
(I saw somewhere that Shazam uses hashing, but that's a different problem because there the two pieces of audio being compared are fundamentally the same, but one has more noise. Here, the two pieces of audio are not the same, they are just perceptually similar.)
The process for comparing a set of sounds for similarities is called Content Based Audio Indexing, Retrieval, and Fingerprinting in computer science research.
One method of doing this is to:
Run several bits of signal processing on each audio file to extract features, such as pitch over time, frequency spectrum, autocorrelation, dynamic range, transients, etc.
Put all the features for each audio file into a multi-dimensional array and dump each multi-dimensional array into a database
Use optimization techniques (such as gradient descent) to find the best match for a given audio file in your database of multi-dimensional data.
The trick to making this work well is which features to pick. Doing this automatically and getting good results can be tricky. The guys at Pandora do this really well, and in my opinion they have the best similarity matching around. They encode their vectors by hand though, by having people listen to music and rate them in many different ways. See their Music Genome Project and List of Music Genome Project attributes for more info.
For automatic distance measurements, there are several projects that do stuff like this, including marsysas, MusicBrainz, and EchoNest.
Echonest has one of the simplest APIs I've seen in this space. Very easy to get started.
I'd suggest looking into spectrum analysis. Whilst this isn't as straightforward as you're most likely wanting, I'd expect that decomposing the audio into it's underlying frequencies would provide some very useful data to analyse. Check out this link
Your first step will definitely be taking a Fourier Transform(FT) of the sound waves. If you perform an FT on the data with respect to Frequency over Time1, you'll be able to compare how often certain key frequencies are hit over the course of the noise.
Perhaps you could also subtract one wave from the other, to get a sort of stepwise difference function. Assuming the mock-noise follows the same frequency and pitch trends2 as the original noise, you could calculate the line of best fit to the points of the difference function. Comparing the best fit line against a line of best fit taken of the original sound wave, you could average out a trend line to use as the basis of comparison. Granted, this would be a very loose comparison method.
- 1. hz/ms, perhaps? I'm not familiar with the unit magnitude being worked with here, I generally work in the femto- to nano- range.
- 2. So long as ∀ΔT, ΔPitch/ΔT & ΔFrequency/ΔT are within some tolerance x.
- Edited for formatting, and because I actually forgot to finish writing the full answer.

Generate musical instrument sounds algorithmically

Is it possible to generate a musical instrument's sounds using only algorithms? or can it only be done via pre-recorded sound samples?
Wavetable synthesis (PDF) is the most realistic method of real-instrument synthesis, as it takes samples and alters them slightly (for example adding vibrato, expression etc).
The waveforms generated by most musical instruments (especially wind and brass instruments) are so complex that pure algorithmic synthesis is not yet optimised enough to run on current hardware - even if it were, the technical complexities of writing such an algorithm are huge.
Interesting site here.
It is completely possible - that is one of the things synthesizers do.
It being possible doesn't mean it is simple. Synthesizers are usually expensive, and the amount of algorithms used are complex - the wikipedia page I linked before has links to some of them.
Pre-recorded sounds are simpler and cheaper to use, but they also have their limitations - they sound more "repetitive" for example.
Several years back, Sound on Sound magazine ran an excellent series called "Synth Secrets" which can now be viewed online for free. They give a good introduction to the types of techniques used in hardware synthesizers (both analogue and digital), and includes some articles discussing the difficulties of replicating certain real-world instrument sounds such as plucked and bowed strings, brass, snare drums, acoustic pianos etc.
After several days of hunting, this the best resource I have found: https://ccrma.stanford.edu/~jos/
This is a treasure trove for the subject of synthesising sounds.
STK
For example, this page links to a C example of synthesising a string, also a sound toolkit STK written in C++ for assisting this work.
This is going to keep me quiet for a few weeks while I dig through it.
It certainly is, and there are many approaches. Wolfram recently released WolframTones, which (unsurprisingly, if you know Wolfram) uses cellular automata. A detailed description of how it functions is here.
Karplus Strong Algorithm gives a very good synthesis of a plucked string. It can also be coded in a few lines of C. You create a circular buffer of floats (length proportional to the wavelength ie 1/f), and fill it full of random noise between -1 and 1.
Then you cycle through: each cycle, you replace the value at your current index with the average of the previous two values, and emit this new value.
index = (index+1) % bufSize;
outVal = buf[index] = decay * 0.5 * ( buf[index-1] + buf[index-2] );
The resultant byte stream gives you your sound. Of course, this can be heavily optimised.
To make your soundwave damp to 0.15 of its original strength after one second, you could set decay thus:
#define DECAY_1S =.15
Float32 decay = pow(DECAY_1S, 1.0f / freq);
Note: you need to size the original buffer so that it contains one complete waveform. so if you wish to generate a 441Hz sound, and your sampling rate is 44.1KHz, then you will need to allocate 100 elements in your buffer.
You can think of this as a resonance chamber, whose fundamental frequency is 441Hz, initially energised, with energy dissipating outwards from every point in the ring simultaneously. Magically it seems to organise itself into overtones of a fundamental frequency.
Could anyone post more algorithms? How about an algorithm for a continuous tone?
In addition to the answers provided here, there are also analysis synthesis frameworks that construct mathematical models (often based on capturing the trajectories of sinusoidal or noise components) of an input sound, allowing transformation and resynthesis. A few well-known frameworks are: SMS (available through the CLAM C++ project,) and Loris.
Physical models of instruments also are an option - they model the physical properties of an instrument such as reed stiffness, blowhole aperture, key clicking, and often produce realistic effects by incorporating non-linear effects such as overblowing. STK is one of these frameworks in C++.
These frameworks are generally more heavy then the wavetable synthesis option, but can provide more parameters for manipulation.
Chuck
This PDF details how SMule created Ocarina for the iPhone (I'm sure everyone has seen the advert). they did it by porting Chuck - Strongly-timed, Concurrent, and On-the-fly
Audio Programming Language
This is available for MacOS X, Windows, and Linux.
I've had a lot of success and fun with FM synthesis. It is a computationally light method, and is the source of a huge number of pop synth sounds from the 1980's (being the basis of the ubiquitous Yamaha DX-7).
I do use it for generating sound on the fly rather than using recordings. A simple example of a string synth sound generated on the fly can be had (free download, Win64) from http://philfrei.itch.io/referencenotekeyboard. The organ synth on that same program is a simple additive synthesis algo.
I would have to dig around for some good tutorials on this. I know we discussed it a bunch at java-gaming.org, when I was trying to figure it out, and then when helping nsigma work out a kink in his FM algo. The main thing is to use phase rather than frequency modulation if you want to chain more than a single modulator to a carrier.
Which reminds me! A kind of amazing java-based sound generator to check out, allows the generation of a number of different forms of synthesis, with real time control:
PraxisLIVE

Optimizing Data Translation

Our business deals with houses and over the years we have created several business objects to represent them. We also receive lots of data from outside sources, and send data to external consumers. Every one of these represents the house in a different way and we spend a lot of time and energy translating one format into another. I'm looking for some general patterns or best practices on how to deal with this situation. How can I write a universal data translator that is flexible, extensible, and fast.
Background: A house generally has 30-40 attributes such as size, number of bedrooms, roof type, construction material, siding material, etc. These are typically represented as key/value pairs. A typical translation problem is that one vendor will represent the number of bedrooms as a single key/value pair: NumBedrooms=3, while a different vendor will have a key/value pair per bedroom: Bedroom=master, Bedroom=small, Bedroom=small.
There's nothing particularly hard about the translation, but we spend a lot of time and energy writing and testing translations. How can I optimize this?
Thanks
(My environment is .Net)
The best place to start is by creating an "internal representation" which is the representation that your processing will always. Then create translators from and to "external representations" as needed. I'd imagine that this is what you are already doing, but it should be mentioned for completeness. The optimization comes from being able to selectively write import and export only when you need them.
A good implementation strategy is to externalize the transformation if you can. If you can get your inputs and outputs into XML documents, then you can write XSLT transforms between your internal and external representations. The goal is to be able to set up a pipeline of transformations from an input XML document to your internal representation. If everything is represented in XML and using a common protocol (say... hmm... HTTP), then the process can be controlled using configuration. BTW - this is essentially the Pipes and Filters design pattern.
Take a look at Yahoo pipes, Apache Cocoon, XML pipeline, and NetKernel for inspiration.
My employer back in the 90s faced this problem. We had a standard format we converted the customers' data to and from, as D.Shawley suggests.
I went further and designed a simple format-description language; we described our standard format in that language and then, for a new dataset, we'd write up its format too. Then a program would take both descriptions and convert the data from one format to the other, with automatic type conversions, safety checks, etc. (This came in handy for some other operations as well, not just these initial/final conversions.)
The particulars probably won't help you -- chances are you deal with completely different kinds of data. You can likely profit from the general principle, though. The "data definition language" needn't necessarily be a fancy thing with a parser and scanner; you might define it directly with a data structure in IronPython, say.

Determine the differences between two nearly identical photographs

This is a fairly broad question; what tools/libraries exist to take two photographs that are not identical, but extremely similar, and identify the specific differences between them?
An example would be to take a picture of my couch on Friday after my girlfriend is done cleaning and before a long weekend of having friends over, drinking, and playing rock band. Two days later I take a second photo of the couch; lighting is identical, the couch hasn't moved a milimeter, and I use a tripod in a fixed location.
What tools could I use to generate a diff of the images, or a third heatmap image of the differences? Are there any tools for .NET?
This depends largely on the image format and compression. But, at the end of the day, you are probably taking two rasters and comparing them pixel by pixel.
Take a look at the Perceptual Image Difference Utility.
The most obvious way to see every tiny, normally nigh-imperceptible difference, would be to XOR the pixel data. If the lighting is even slightly different, though, it might be too much. Differencing (subtracting) the pixel data might be more what you're looking for, depending on how subtle the differences are.
One place to start is with a rich image processing library such as IM. You can dabble with its operators interactively with the IMlab tool, call it directly from C or C++, or use its really decent Lua binding to drive it from Lua. It supports a wide array of operations on bitmaps, as well as an extensible library of file formats.
Even if you haven't deliberately moved anything, you might want to use an algorithm such as SIFT to get good sub-pixel quality alignment between the frames. Unless you want to treat the camera as fixed and detect motion of the couch as well.
I wrote this free .NET application using the toolkit my company makes (DotImage). It has a very simple algorithm, but the code is open source if you want to play with it -- you could adapt the algorithm to .NET Image classes if you don't want to buy a copy of DotImage.
http://www.atalasoft.com/cs/blogs/31appsin31days/archive/2008/05/13/image-difference-utility.aspx
Check out Andrew Kirillov's article on CodeProject. He wrote a C# application using the AForge.NET computer vision library to detect motion. On the AForge.NET website, there's a discussion of two frame differences for motion detection.
It's an interesting question. I can't refer you to any specific libraries, but the process you're asking about is basically a minimal case of motion compensation. This is the way that MPEG (MP4, DIVX, whatever) video manages to compress video so extremely well; you might look into MPEG for some information about the way those motion compensation algorithms are implemented.
One other thing to keep in mind; JPEG compression is a block-based compression; much of the benefit that MPEG brings from things is to actually do a block comparison. If most of your image (say the background) is the same from one image to the next, those blocks will be unchanged. It's a quick way to reduce the amount of data needed to be compared.
just use the .net imaging classes, create a new bitmap() x 2 and look at the R & G & B values of each pixel, you can also look at the A (Alpha/transparency) values if you want to when determining difference.
also a note, using the getPixel(y, x) method can be vastly slow, there is another way to get the entire image (less elegant) and for each ing through it yourself if i remember it was called the getBitmap or something similar, look in the imaging/bitmap classes & read some tutes they really are all you need & aren't that difficult to use, dont go third party unless you have to.

Resources