Under review as a workshop contribution at ICLR 2015
recognition tasks. Eck & Schmidhuber (2002) was the first that employed LSTM and RNN to learn
and compose blues music. Franklin (2006) later on utilized LSTM networks to model Jazz music.
We believe LSTM recurrent neural networks could learn the global structure of music pieces well,
but the efficiency and efficacy of the training phase could be further improved with adaptive learning
algorithm.
In this paper, we propose a framework for music composition that uses resilient propagation
(RPROP) (Riedmiller & Braun, 1993) (Riedmiller & Rprop, 1994) in replace of standard back prop-
agation to train the recurrent neural network. We will also use long short term memory cells in the
network as they have better ability to learn songs, long song phrases and structrues precisely.
The remainder of the paper is organized as follows. In section 2 we will look at past works that use
RNN for computer-aided music generation. In section 3, we will introduce our system framework.
We will then evaluate the system by conducting experiments in section 4.
2 RELATED WORK
Todd (1989) was one of the earliest paper that used RNN for note-by-note music generation with
Jordan recurrent neural network. Jordan recurrent network is a simple RNN that has a recurrent
connection from the output layer to the input layer, and a self-recurrent link at the input layer. The
network is trained with BPTT, and recurrence is managed by teacher-forcing. In the training phase,
Todd trained monophonic melodies with the network. The trained network could then be used to
generate music by either mixing and varying the original training data, or by introducing new ”seed
melody” as the input, and the rest of the network output are recorded as the generated music.
In Mozer (1994), a fully connected RNN was trained by minimizing log-likelihood function of the
L2 norm of the predicted and actual output via back-propagation through time (BPTT). The outputs
of the final layer are treated as probability of whether the note should be on or off. In addition, to
better model harmonic relationship of musical notes, Mozer proposed a grey-code like representation
that encodes notes based on their location on chromatic circle, circle of fifths and pitch height, a
psychologically based representation derived from Shepard (1982). To compensate BPTT-trained
RNN’s inefficacy of learning long-term dependencies, Mozer also used a similar encoding scheme
to represent durations based on three fraction scales.
Franklin (2001) adopted Todd’s network and added a second training phase, where the network
was further trained via reinforcement learning. In reinforcement learning phase, a scalar value was
calculated by a set of ”music rules” to determine how good the output is, and is then used to replace
the explicit error information.
To deal with vanishing gradient problem, Eck & Schmidhuber (2002) used two long short-term
memory (LSTM) networks, one for learning melody and one for chords, to compose blues music.
The output of the chord network is connected to the input of the melody network. The system was
able to learn the standard 12-bar blues chord sequences and generate music notes that follows the
chords. Franklin (2006) also used LSTM networks to learn Jazz music. They developed a pitch rep-
resentation scheme based on major and minor thirds, the circles-of-thirds representation, inspired
by Mozer (1994)’s circle-of-fifths pitch representation. They also extended Mozer’s duration repre-
sentation by dividing note durations into 96 subdivisions, corresponding to a ”tick” in the Musical
Instrument Digital Interface (MIDI)(Messick, 1988) standard digital protocol.
To describe music’s correlated pattern among multiple notes, Boulanger-Lewandowski et al. (2012)
developed an RNN-based model by using restricted Bolzmann machine (RBM) (Smolensky, 1986)
and recurrent temporal RBM(RTRBM) (Sutskever et al., 2009). The model, RNN-RBM, allows
freedom in describing the temporal dependencies of the notes, and is believed to be able to model
unconstrained polyphonic music in a piano-roll representation without any dimension reduction.
3 METHOD
This section goes over each individual component involved in building the whole system.
2