{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Extracting information from MIDI files\n", "\n", "In our first lesson we want to get to know how to work with datasets so we can\n", "\n", "* parse a dataset\n", "* analyse the dataset and make sure it matches our expectations\n", "* check if errors occured during parsing\n", "\n", "After this is done we want to find out how we can generate new drum patterns from the existing one.\n", "But in order to do this we need to take a look at the quantisation (?) of our patterns.\n", "\n", "In such experiments, the cleaning of the dataset and setting the data up properly takes most of the time. But if we make mistakes here those mistakes will propagate through our system – so it's a good idea to spend some time with this task." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import glob\n", "\n", "import pandas as pd\n", "import numpy as np\n", "import matplotlib.pyplot as plt\n", "\n", "np.random.seed(42) # makes the randomness deterministic\n", "\n", "%matplotlib inline\n", "# todo: try %matplotlib widget\n", "plt.rcParams['figure.figsize'] = (15, 5)\n", "plt.rcParams['axes.grid'] = True" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Getting the dataset\n", "\n", "As our methods in this machine learning based workshop rely on data, we need a way to obtain such data. This will be on of our first endeavours.\n", "Thankfully there are search engines which help us to find data in the internet easily.\n", "When searching for *\"midi dataset\"*, one search result is [https://colinraffel.com/projects/lmd/](https://colinraffel.com/projects/lmd/).\n", "\n", "> The Lakh MIDI dataset is a collection of 176,581 unique MIDI files, 45,129 of which have been matched and aligned to entries in the Million Song Dataset. [...]\n", "\n", "Of course the cultural bias of such a dataset is a thing we need to be aware of.\n", "What kind of music is transcribable into the MIDI format and what kind of music is transcribed as MIDI at all?\n", "Maybe we can shed some light on the last question by inspecting the dataset.\n", "But before we can do this, we need to download and understand the data.\n", "\n", "We use a function which will check if the files are already downloaded and if not will download and extract the directory." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "midi already downloaded to ../datasets/lmd/lmd_full.tap.gz\n", "json already downloaded to ../datasets/lmd/md5_to_path.json\n" ] } ], "source": [ "import urllib.request\n", "import subprocess\n", "\n", "def download_dataset(download_path: str = \"../datasets/lmd\"):\n", " os.makedirs(download_path, exist_ok=True)\n", " archive_dir = os.path.join(download_path, \"lmd_full\")\n", " \n", " dl_files = {\n", " \"midi\": {\n", " \"path\": os.path.join(download_path, \"lmd_full.tap.gz\"),\n", " \"url\": \"http://hog.ee.columbia.edu/craffel/lmd/lmd_full.tar.gz\"\n", " },\n", " \"json\": {\n", " \"path\": os.path.join(download_path, \"md5_to_path.json\"),\n", " \"url\": \"http://hog.ee.columbia.edu/craffel/lmd/md5_to_paths.json\"\n", " },\n", " }\n", " \n", " for dl_name, dl in dl_files.items():\n", " if os.path.isfile(dl[\"path\"]) or (dl_name == \"midi\" and os.path.isdir(archive_dir)):\n", " print(f\"{dl_name} already downloaded to {dl['path']}\")\n", " continue\n", " print(f\"Start downloading {dl_name} to {dl['path']} - this can take multiple minutes!\")\n", " urllib.request.urlretrieve(dl['url'], dl['path'])\n", " print(f\"Finished downloading\")\n", " \n", " if not os.path.isdir(archive_dir):\n", " print(\"Start extracting the files of archive - this will take some minutes\")\n", " # todo: windows has no tar\n", " subprocess.check_output([\n", " 'tar', '-xzf', dl_files[\"midi\"][\"path\"],\n", " '-C', os.path.join(download_path)\n", " ])\n", " print(\"Finished extracting\")\n", " \n", " if os.path.isfile(dl_files[\"midi\"][\"path\"]):\n", " print(\"Remove archive\")\n", " os.remove(dl_files[\"midi\"][\"path\"])\n", "\n", "download_dataset()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Parsing the dataset\n", "\n", "When working with large sets of files the unix utility [glob](https://en.wikipedia.org/wiki/Glob_(programming)) comes in handy as we can describe the pattern of the file paths we want to match instead of listing all files.\n", "\n", "When we take a quick look at the pattern it seems they all follow a structure like\n", "\n", "```\n", "../datasets/lmd/lmd_full/2/4a0cbb3f083d14d57858c87b26f85873.mid\n", "```\n", "\n", "Soon we will understand why the filename has this cryptic format, but for now we simply want to parse all available files into an array." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Found 178561 midi files in dataset\n" ] } ], "source": [ "midi_files = glob.glob('../datasets/lmd/lmd_full/*/*.mid')\n", "print(f'Found {len(midi_files)} midi files in dataset')" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "tags": [] }, "outputs": [ { "data": { "text/plain": [ "array(['../datasets/lmd/lmd_full/4/4a69b1a10e4dbe409ac922de4f6256d8.mid',\n", " '../datasets/lmd/lmd_full/b/b102261b4c27ea58bad4777b5df5be5e.mid',\n", " '../datasets/lmd/lmd_full/3/3a8ccaab480919c35f37fdf08c238e29.mid',\n", " '../datasets/lmd/lmd_full/d/dc6416232e05b3d44fb62007ca84b474.mid',\n", " '../datasets/lmd/lmd_full/4/4fb744a04c1afbb0309b581410b33363.mid'],\n", " dtype='" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import pretty_midi as pm\n", "\n", "midi_stream = pm.PrettyMIDI(example_midi_file)\n", "midi_stream" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After we have now loaded a MIDI file with *pretty_midi* we now take a look at how we can access the data.\n", "Therefore the [documentation of pretty_midi](https://craffel.github.io/pretty-midi/) will help us to know how to access the MIDI information." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "189.431068" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_stream.get_end_time()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "will give us the time in seconds of the MIDI file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "array([0.0000000e+00, 1.2500000e-02, 4.4107100e-01, 8.6964200e-01,\n", " 1.2982130e+00, 1.7267840e+00, 2.1553550e+00, 2.5839260e+00,\n", " 3.0124970e+00, 3.4410680e+00, 3.8410680e+00, 4.2410680e+00,\n", " 4.6410680e+00, 5.0410680e+00, 5.4410680e+00, 5.8410680e+00,\n", " 6.2410680e+00, 6.6410680e+00, 7.0410680e+00, 7.4410680e+00,\n", " 7.8410680e+00, 8.2410680e+00, 8.6410680e+00, 9.0410680e+00,\n", " 9.4410680e+00, 9.8410680e+00, 1.0241068e+01, 1.0641068e+01,\n", " 1.1041068e+01, 1.1441068e+01, 1.1841068e+01, 1.2241068e+01,\n", " 1.2641068e+01, 1.3041068e+01, 1.3441068e+01, 1.3841068e+01,\n", " 1.4241068e+01, 1.4641068e+01, 1.5041068e+01, 1.5441068e+01,\n", " 1.5841068e+01, 1.6241068e+01, 1.6641068e+01, 1.7041068e+01,\n", " 1.7441068e+01, 1.7841068e+01, 1.8241068e+01, 1.8641068e+01,\n", " 1.9041068e+01, 1.9441068e+01])" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_stream.get_beats()[0:50] # limit to 50 entries for printing" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "will give us the position in secounds of all beats (so any action) in a MIDI file" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "(array([0. , 0.0125 , 3.441068]),\n", " array([120. , 140.00014, 150. ]))" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_stream.get_tempo_changes()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "will tell us any changes in the tempo in bpm.\n", "Keep in mind that we do not have two tempo changes here but that the function returns two arrays - the first one with the mark in seconds where the tempo gets changed and another one to what bpm it changes - for now we are only interested in the second array.\n", "\n", "We should also prepare for the case that the MIDI file does not provide any tempo information." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[TimeSignature(numerator=4, denominator=4, time=0.0125),\n", " TimeSignature(numerator=2, denominator=4, time=21.041068000000003),\n", " TimeSignature(numerator=4, denominator=4, time=21.841068000000003),\n", " TimeSignature(numerator=2, denominator=4, time=36.241068000000006),\n", " TimeSignature(numerator=4, denominator=4, time=37.041068),\n", " TimeSignature(numerator=2, denominator=4, time=51.441068),\n", " TimeSignature(numerator=4, denominator=4, time=52.241068000000006),\n", " TimeSignature(numerator=2, denominator=4, time=66.641068),\n", " TimeSignature(numerator=4, denominator=4, time=67.441068),\n", " TimeSignature(numerator=2, denominator=4, time=107.441068),\n", " TimeSignature(numerator=4, denominator=4, time=108.24106800000001),\n", " TimeSignature(numerator=2, denominator=4, time=122.641068),\n", " TimeSignature(numerator=4, denominator=4, time=123.44106800000002)]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_stream.time_signature_changes" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "tells us the time signatures of the MIDI file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "4/4\n" ] } ], "source": [ "time_signature = midi_stream.time_signature_changes[0]\n", "print(f'{time_signature.numerator}/{time_signature.denominator}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As with the tempo we should also prepare for the case that no time signature information is given on the MIDI file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Instrument(program=62, is_drum=False, name=\"\"),\n", " Instrument(program=34, is_drum=False, name=\"\"),\n", " Instrument(program=25, is_drum=False, name=\"\"),\n", " Instrument(program=28, is_drum=False, name=\"\"),\n", " Instrument(program=25, is_drum=False, name=\"\"),\n", " Instrument(program=30, is_drum=False, name=\"\"),\n", " Instrument(program=26, is_drum=False, name=\"\"),\n", " Instrument(program=18, is_drum=False, name=\"\"),\n", " Instrument(program=0, is_drum=True, name=\"\"),\n", " Instrument(program=0, is_drum=True, name=\"\"),\n", " Instrument(program=0, is_drum=True, name=\"\"),\n", " Instrument(program=0, is_drum=True, name=\"\"),\n", " Instrument(program=0, is_drum=True, name=\"\")]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_stream.instruments" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "gives us access to all instruments with its names and if this is a drum track or not." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also take a look at the notes of each instrument" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "[Note(start=9.834401, end=9.937735, pitch=62, velocity=109),\n", " Note(start=10.037735, end=10.311068, pitch=62, velocity=107),\n", " Note(start=10.437735, end=10.847735, pitch=62, velocity=113),\n", " Note(start=10.834401, end=11.057735, pitch=60, velocity=109),\n", " Note(start=11.034401, end=11.237735, pitch=59, velocity=106),\n", " Note(start=11.237735, end=11.594401, pitch=62, velocity=108),\n", " Note(start=11.627735, end=11.834401, pitch=60, velocity=101),\n", " Note(start=11.827735, end=12.247735, pitch=59, velocity=108),\n", " Note(start=12.224401, end=12.544401, pitch=62, velocity=106),\n", " Note(start=13.024401, end=13.131068, pitch=64, velocity=102)]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_stream.instruments[0].notes[0:10]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Performance\n", "\n", "Now we know how we can use *pretty_midi* to load a MIDI file and access the information of a single MIDI file.\n", "But the problem is that a loading of a MIDI file is pretty slow in Python - we will do some measurements now to have a quick way to access the information we are interested in.\n", "\n", "We can use a built in tool of *Jupyter* which is `%%timeit` which will do multiple runs of a cell and measure the time it takes for each run." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "186 ms ± 6.59 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "\n", "pm.PrettyMIDI(example_midi_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Of course this time to parse the MIDI file depends on the complexity of our MIDI file. But now its time to do some maths.\n", "For the sake of simplicity we will assume that it takes around 100ms to load a MIDI file.\n", "\n", "$$ 0.1 \\frac{\\text{sec}}{\\text{file}} * 178000 \\text{ files} = 17800 \\text{ sec} \\approx 296 \\text{ minutes} \\approx 4 \\text{ hours}$$\n", "\n", "Note that this assumes that we do not process multiple files in paralell, which is possible but also tricky.\n", "Also this is only for the loading of the data, we not have acessed yet any data.\n", "\n", "Maybe there is a file format to which we can store the MIDI files and load them quicker whenever we need them.\n", "[*note_seq*](https://github.com/magenta/note-seq) from the [magenata project](https://magenta.tensorflow.org/) provide such a format called *note sequence* which nicely interacts with *pretty_midi*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import note_seq" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "14.4 ms ± 381 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%%timeit\n", "\n", "notes = note_seq.midi_to_note_sequence(midi_stream)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So the conversion of this file is fast compared to the parsing of our MIDI file.\n", "The problem is that we still need the MIDI file parsed and we have not yet taken a look at the saving and loading of a note sequence.\n", "\n", "*note_seq* is using a special kind of format for this which is called [protobuf](https://developers.google.com/protocol-buffers/docs/pythontutorial) - a binary file format to store and exchange files developed by Google.\n", "The advantage is that this is really fast and efficient as it is a binary format and we can exchange the files to other programming languages - think of it like a binary version of JSON but with type safety included.\n", "\n", "We start by saving the note sequence into a protobuf and try loading it again from our harddrive." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "notes = note_seq.midi_to_note_sequence(midi_stream)\n", "\n", "with open('note_seq_test.protobuf', 'wb') as f:\n", " f.write(notes.SerializeToString())" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "3.47 ms ± 97.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)\n" ] } ], "source": [ "%%timeit\n", "\n", "with open('note_seq_test.protobuf', 'rb') as f:\n", " notes_loaded = note_seq.NoteSequence.FromString(f.read())" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets say it takes about 4 ms to load a protobuf file - now we will do the calculation from before again.\n", "\n", "$$ 0.004 \\frac{\\text{sec}}{\\text{file}} * 178000 \\text{ files} = 712 \\text{ sec} \\approx 11 \\text{ minutes}$$\n", "\n", "which is much more acceptable.\n", "And we have not yet considered any parallelisation of the code." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "But before we convert everything to a note sequence as a protobuf we should compare the file size of the two file formats." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "File size of ../datasets/lmd/lmd_full/e/ed05335f8f6c273f506a997137b2805b.mid: 46.6513671875 kbytes\n", "File size of note_seq_test.protobuf: 195.958984375 kbytes\n" ] } ], "source": [ "for f in [example_midi_file, 'note_seq_test.protobuf']:\n", " print(f'File size of {f}: {os.path.getsize(f)/1024} kbytes')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It turns out that we are trading storage in favour of computation power.\n", "\n", "Before we can decide if we really want to trade, we may need to take a look at the file size of our MIDI dataset.\n", "For this we can use a functionality of Jupyter to execute commands in a shell from within Jupyter by appending a `!` in front of the command.\n", "In order to get the size of a dictionary we will use the unix tool [du](https://linux.die.net/man/1/du) with the arguments `s` (sum everything within the path) and `h` (convert number of bytes to a human readable format)." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "5.9G\t../datasets/lmd/lmd_full\n" ] } ], "source": [ "!du -sh ../datasets/lmd/lmd_full" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have around 6 gigabytes of files here - when we convert them to protobuf we will blow them up by a factor of $\\approx 4$ so we will have about 24 GB of protobuf files. This is still acceptable somehow." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Processing all files\n", "\n", "So after we now know how to load, access and convert the files to a nicer format, we still need to take a look at how we can do this on our 180k files and maybe save some time here by a little bit of programming effort.\n", "\n", "First we will start by programming a function in which we combine everything we did before and return to us all necessary information as a dictionary." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "PROTO_SAVE_PATH = '../datasets/lmd/proto/'\n", "\n", "# make sure that the folder we want to save to actually exists\n", "os.makedirs(PROTO_SAVE_PATH, exist_ok=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from typing import Dict\n", "\n", "from mido import KeySignatureError\n", "\n", "def parse_midi_file(midi_file_path: str, proto_save_path: str=PROTO_SAVE_PATH) -> Dict:\n", " r = {\n", " 'midi_path': midi_file_path,\n", " 'midi_error': True,\n", " }\n", " midi_name = os.path.splitext(midi_file_path.split(os.sep)[-1])[0]\n", " proto_path = os.path.join(PROTO_SAVE_PATH, f'{midi_name}.protobuf')\n", " \n", " r['original_names'] = md5_filenames[midi_name]\n", " \n", " try:\n", " stream = pm.PrettyMIDI(midi_file_path)\n", " r['midi_error'] = False\n", " except (OSError, ValueError, IndexError, KeySignatureError, EOFError, ZeroDivisionError):\n", " return r\n", " \n", " try:\n", " # r['beat_start'] = stream.estimate_beat_start() # omitted b/c adds 200ms to parsing\n", " r['estimate_tempo'] = stream.estimate_tempo()\n", " r['tempi_sec'], r['tempi'] = stream.get_tempo_changes() \n", " r['end_time'] = stream.get_end_time()\n", " r['drums'] = any([i.is_drum for i in stream.instruments])\n", " r['resolution'] = stream.resolution\n", " r['instrument_names'] = [i.name.strip() for i in stream.instruments]\n", " r['num_time_signature_changes'] = len(stream.time_signature_changes)\n", " except ValueError as e:\n", " # ValueError: Can't estimate beat start when there are no notes.\n", " # ValueError: Can't provide a global tempo estimate when there are fewer than two notes.\n", " print(f\"Could not parse MIDI file {midi_file_path}: {e}\")\n", " \n", " notes = note_seq.midi_to_note_sequence(stream)\n", " with open(proto_path, 'wb') as f:\n", " f.write(notes.SerializeToString())\n", " r['proto_path'] = proto_path\n", " \n", " return r" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we want to try out the function on a single MIDI file." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'midi_path': '../datasets/lmd/lmd_full/e/ed05335f8f6c273f506a997137b2805b.mid',\n", " 'midi_error': False,\n", " 'original_names': ['QUALITY MIDI/EClapton-Promises.mid'],\n", " 'estimate_tempo': 159.6553471870248,\n", " 'tempi_sec': array([0. , 0.0125 , 3.441068]),\n", " 'tempi': array([120. , 140.00014, 150. ]),\n", " 'end_time': 189.431068,\n", " 'drums': True,\n", " 'resolution': 120,\n", " 'instrument_names': ['', '', '', '', '', '', '', '', '', '', '', '', ''],\n", " 'num_time_signature_changes': 13,\n", " 'proto_path': '../datasets/lmd/proto/ed05335f8f6c273f506a997137b2805b.protobuf'}" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "parse_midi_file(example_midi_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "So this works as expected - time to time it as well." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "233 ms ± 10.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)\n" ] } ], "source": [ "%%timeit\n", "\n", "parse_midi_file(example_midi_file)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It seems that accessing the MIDI data via *pretty_midi* does not come for free.\n", "Repeating the calculation from before:\n", "\n", "$$ 0.2 \\frac{\\text{sec}}{\\text{file}} * 178000 \\text{ files} = 35600 \\text{ sec} \\approx 593 \\text{ minutes} \\approx 10 \\text{ hours}$$\n", "\n", "This is not unheard of but we can reduce the time by parallelizing it for which we need to do some tricks into which we will not go detail.\n", "On a recent i9 processor the next step will take around 90 minutes if executed for the first time." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Use cached parquet file midi_files_full.parquet\n" ] }, { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_path
81923../datasets/lmd/lmd_full/f/f4b72836981feedb9e1...False175.232953[0.0][82.00003553334874]128.780432True384.0[harmnica, accordn2, drum, acoubass, pianogrd,...1.0../datasets/lmd/proto/f4b72836981feedb9e1046f1...
171833../datasets/lmd/lmd_full/5/5f3eaf75cbc1b2554df...False213.043443[0.0, 297.04290665, 297.1153704, 297.188899816...[69.99998833333528, 69.00001725000433, 67.9999...302.411045True120.0[YO TE AMO;CHAYANE, YO TE AMO;CHAYANE, YO TE A...1.0../datasets/lmd/proto/5f3eaf75cbc1b2554dfa60bd...
161501../datasets/lmd/lmd_full/2/2f02103cfb1764e661d...False170.691458[0.0, 0.004166666666666667, 2.0015625][75.0, 120.0, 75.0]174.718229True384.0[, , , , , , , , , , , , , , , ]1.0../datasets/lmd/proto/2f02103cfb1764e661d54151...
39942../datasets/lmd/lmd_full/6/609b19dde84d2374b7d...False237.529957[0.0, 0.0026041666666666665][120.0, 110.00091667430561]254.776049True384.0[S-File, S-File, S-File, S-File, S-File, S-Fil...1.0../datasets/lmd/proto/609b19dde84d2374b7d19124...
100593../datasets/lmd/lmd_full/d/de4dde0f1e960e41360...False170.384180[0.0, 3.0, 183.33574275, 183.66386775][120.0, 67.00002903334591, 60.0, 49.0000318500...190.137078False192.0[, , , , , ]0.0../datasets/lmd/proto/de4dde0f1e960e413602c493...
44272../datasets/lmd/lmd_full/6/6f74186f4c163863630...False242.119770[0.0][122.00006913337249]228.161756True480.0[drummix, bass, ac piano, Vocal, back voc, ac ...1.0../datasets/lmd/proto/6f74186f4c163863630e8baa...
15131../datasets/lmd/lmd_full/0/0d8e9c4eb761a3e632b...False245.724188[0.0, 3.9344240000000004, 14.843497999999999, ...[122.00006913337249, 121.00018755029072, 123.0...251.814031True96.0[TAKE ME HOME Bextor/Aller/...1.0../datasets/lmd/proto/0d8e9c4eb761a3e632ba4f10...
34735../datasets/lmd/lmd_full/6/6d3149b41d088c63627...False84.000025[0.0, 89.2856875, 89.88351320833333, 90.410473...[42.00001260000378, 46.00002913335179, 51.0000...340.282369False96.0[Voice (Alto), Voice (Alto), Voice (Alto), Pia...6.0../datasets/lmd/proto/6d3149b41d088c6362792faf...
45905../datasets/lmd/lmd_full/1/1a931072e21340f256b...False207.446994[0.0][120.0]280.114583True384.0[A.PIANO 1, FINGERDBAS, A.PIANO 1, PAN FLUTE, ...1.0../datasets/lmd/proto/1a931072e21340f256b96104...
12844../datasets/lmd/lmd_full/0/0a7c38887ba4882921e...False224.582071[0.0][127.96996971377385]193.497545True480.0[CANDOMBE, PARA, GARDEL, http://fberni.tripod....1.0../datasets/lmd/proto/0a7c38887ba4882921ee2c82...
114123../datasets/lmd/lmd_full/4/455fcb4f4c5ad4d10da...False153.020774[0.0, 7.15475625, 7.3302190000000005, 7.506771...[85.95680670463092, 85.48823040787859, 84.9608...100.750126False1024.0[]1.0../datasets/lmd/proto/455fcb4f4c5ad4d10da384b6...
14339../datasets/lmd/lmd_full/0/029c4baa3089eca3817...False199.999987[0.0, 6.461539, 7.961539, 9.790110733333334, 9...[64.99999458333379, 40.0, 69.99998833333528, 6...330.377789True120.0[, Bass, Bass, Strings, Melodia, Slowstrings, ...5.0../datasets/lmd/proto/029c4baa3089eca381772e2b...
131099../datasets/lmd/lmd_full/3/353116ea8d2eb4041c9...False260.000260[0.0][130.00013000013]51.692256True96.0[MIDI Ch. 1, MIDI Ch. 2, MIDI Drums]0.0../datasets/lmd/proto/353116ea8d2eb4041c9f4622...
124606../datasets/lmd/lmd_full/3/3574baf9f9cdfa4c3e1...False207.262310[0.0, 0.16129033333333334, 0.9425403333333333,...[30.999997933333468, 32.0, 32.99999670000033, ...273.626126True120.0[Rhodes Piano, Synth Bass H, Synth Bass L, Aco...1.0../datasets/lmd/proto/3574baf9f9cdfa4c3e1da456...
136078../datasets/lmd/lmd_full/e/eb59c9f6854d897a03a...False193.775528[0.0, 3.5416666666666665][120.0, 109.99990833340973]217.860027True240.0[Bass / Acoustic bass, Piano / Rhodes, Melody ...1.0../datasets/lmd/proto/eb59c9f6854d897a03a3ed78...
\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "81923 ../datasets/lmd/lmd_full/f/f4b72836981feedb9e1... False \n", "171833 ../datasets/lmd/lmd_full/5/5f3eaf75cbc1b2554df... False \n", "161501 ../datasets/lmd/lmd_full/2/2f02103cfb1764e661d... False \n", "39942 ../datasets/lmd/lmd_full/6/609b19dde84d2374b7d... False \n", "100593 ../datasets/lmd/lmd_full/d/de4dde0f1e960e41360... False \n", "44272 ../datasets/lmd/lmd_full/6/6f74186f4c163863630... False \n", "15131 ../datasets/lmd/lmd_full/0/0d8e9c4eb761a3e632b... False \n", "34735 ../datasets/lmd/lmd_full/6/6d3149b41d088c63627... False \n", "45905 ../datasets/lmd/lmd_full/1/1a931072e21340f256b... False \n", "12844 ../datasets/lmd/lmd_full/0/0a7c38887ba4882921e... False \n", "114123 ../datasets/lmd/lmd_full/4/455fcb4f4c5ad4d10da... False \n", "14339 ../datasets/lmd/lmd_full/0/029c4baa3089eca3817... False \n", "131099 ../datasets/lmd/lmd_full/3/353116ea8d2eb4041c9... False \n", "124606 ../datasets/lmd/lmd_full/3/3574baf9f9cdfa4c3e1... False \n", "136078 ../datasets/lmd/lmd_full/e/eb59c9f6854d897a03a... False \n", "\n", " estimate_tempo tempi_sec \\\n", "81923 175.232953 [0.0] \n", "171833 213.043443 [0.0, 297.04290665, 297.1153704, 297.188899816... \n", "161501 170.691458 [0.0, 0.004166666666666667, 2.0015625] \n", "39942 237.529957 [0.0, 0.0026041666666666665] \n", "100593 170.384180 [0.0, 3.0, 183.33574275, 183.66386775] \n", "44272 242.119770 [0.0] \n", "15131 245.724188 [0.0, 3.9344240000000004, 14.843497999999999, ... \n", "34735 84.000025 [0.0, 89.2856875, 89.88351320833333, 90.410473... \n", "45905 207.446994 [0.0] \n", "12844 224.582071 [0.0] \n", "114123 153.020774 [0.0, 7.15475625, 7.3302190000000005, 7.506771... \n", "14339 199.999987 [0.0, 6.461539, 7.961539, 9.790110733333334, 9... \n", "131099 260.000260 [0.0] \n", "124606 207.262310 [0.0, 0.16129033333333334, 0.9425403333333333,... \n", "136078 193.775528 [0.0, 3.5416666666666665] \n", "\n", " tempi end_time drums \\\n", "81923 [82.00003553334874] 128.780432 True \n", "171833 [69.99998833333528, 69.00001725000433, 67.9999... 302.411045 True \n", "161501 [75.0, 120.0, 75.0] 174.718229 True \n", "39942 [120.0, 110.00091667430561] 254.776049 True \n", "100593 [120.0, 67.00002903334591, 60.0, 49.0000318500... 190.137078 False \n", "44272 [122.00006913337249] 228.161756 True \n", "15131 [122.00006913337249, 121.00018755029072, 123.0... 251.814031 True \n", "34735 [42.00001260000378, 46.00002913335179, 51.0000... 340.282369 False \n", "45905 [120.0] 280.114583 True \n", "12844 [127.96996971377385] 193.497545 True \n", "114123 [85.95680670463092, 85.48823040787859, 84.9608... 100.750126 False \n", "14339 [64.99999458333379, 40.0, 69.99998833333528, 6... 330.377789 True \n", "131099 [130.00013000013] 51.692256 True \n", "124606 [30.999997933333468, 32.0, 32.99999670000033, ... 273.626126 True \n", "136078 [120.0, 109.99990833340973] 217.860027 True \n", "\n", " resolution instrument_names \\\n", "81923 384.0 [harmnica, accordn2, drum, acoubass, pianogrd,... \n", "171833 120.0 [YO TE AMO;CHAYANE, YO TE AMO;CHAYANE, YO TE A... \n", "161501 384.0 [, , , , , , , , , , , , , , , ] \n", "39942 384.0 [S-File, S-File, S-File, S-File, S-File, S-Fil... \n", "100593 192.0 [, , , , , ] \n", "44272 480.0 [drummix, bass, ac piano, Vocal, back voc, ac ... \n", "15131 96.0 [TAKE ME HOME Bextor/Aller/... \n", "34735 96.0 [Voice (Alto), Voice (Alto), Voice (Alto), Pia... \n", "45905 384.0 [A.PIANO 1, FINGERDBAS, A.PIANO 1, PAN FLUTE, ... \n", "12844 480.0 [CANDOMBE, PARA, GARDEL, http://fberni.tripod.... \n", "114123 1024.0 [] \n", "14339 120.0 [, Bass, Bass, Strings, Melodia, Slowstrings, ... \n", "131099 96.0 [MIDI Ch. 1, MIDI Ch. 2, MIDI Drums] \n", "124606 120.0 [Rhodes Piano, Synth Bass H, Synth Bass L, Aco... \n", "136078 240.0 [Bass / Acoustic bass, Piano / Rhodes, Melody ... \n", "\n", " num_time_signature_changes \\\n", "81923 1.0 \n", "171833 1.0 \n", "161501 1.0 \n", "39942 1.0 \n", "100593 0.0 \n", "44272 1.0 \n", "15131 1.0 \n", "34735 6.0 \n", "45905 1.0 \n", "12844 1.0 \n", "114123 1.0 \n", "14339 5.0 \n", "131099 0.0 \n", "124606 1.0 \n", "136078 1.0 \n", "\n", " proto_path \n", "81923 ../datasets/lmd/proto/f4b72836981feedb9e1046f1... \n", "171833 ../datasets/lmd/proto/5f3eaf75cbc1b2554dfa60bd... \n", "161501 ../datasets/lmd/proto/2f02103cfb1764e661d54151... \n", "39942 ../datasets/lmd/proto/609b19dde84d2374b7d19124... \n", "100593 ../datasets/lmd/proto/de4dde0f1e960e413602c493... \n", "44272 ../datasets/lmd/proto/6f74186f4c163863630e8baa... \n", "15131 ../datasets/lmd/proto/0d8e9c4eb761a3e632ba4f10... \n", "34735 ../datasets/lmd/proto/6d3149b41d088c6362792faf... \n", "45905 ../datasets/lmd/proto/1a931072e21340f256b96104... \n", "12844 ../datasets/lmd/proto/0a7c38887ba4882921ee2c82... \n", "114123 ../datasets/lmd/proto/455fcb4f4c5ad4d10da384b6... \n", "14339 ../datasets/lmd/proto/029c4baa3089eca381772e2b... \n", "131099 ../datasets/lmd/proto/353116ea8d2eb4041c9f4622... \n", "124606 ../datasets/lmd/proto/3574baf9f9cdfa4c3e1da456... \n", "136078 ../datasets/lmd/proto/eb59c9f6854d897a03a3ed78... " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "import math\n", "import multiprocessing\n", "from typing import Optional\n", "\n", "from helpers.helpers import *\n", "\n", "def load_midi_files(parquet_path: str = 'midi_files_{}.parquet', limit: Optional[int] = None) -> pd.DataFrame:\n", " parquet_path = parquet_path.format(limit if limit else 'full')\n", " \n", " if os.path.isfile(parquet_path):\n", " print(f'Use cached parquet file {parquet_path}')\n", " return pd.read_parquet(parquet_path)\n", " \n", " midi_files = glob.glob('../datasets/lmd/lmd_full/*/*.mid')\n", " if limit:\n", " print(f'Limit to {limit} files')\n", " midi_files = np.random.choice(midi_files, limit)\n", " print(f'Parse {len(midi_files)} midi files')\n", " \n", " cpu_count = math.ceil(multiprocessing.cpu_count()*3/4)\n", " with multiprocessing.Pool(cpu_count, maxtasksperchild=5) as p:\n", " midi_meta = p.map(parse_midi_file_async, midi_files, chunksize=255)\n", " print(f'Finished parsing midi files')\n", " \n", " midi_meta = pd.DataFrame(midi_meta)\n", " midi_meta.to_parquet(parquet_path)\n", " \n", " return midi_meta\n", "\n", "# if you want to work with a random subset set the limit argument to num of files as a limit\n", "midi_df = load_midi_files(limit=None)\n", "\n", "midi_df.sample(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We now also add the original file names from the md5 json file to our dataframe." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_pathoriginal_files
31900../datasets/lmd/lmd_full/7/745f66980de9ca27fe0...False169.981663[0.0][151.00037750094376]141.733917True96.0[Acc Bass, Piano, Melody, Sax, Solo, E.Guitar,...1.0../datasets/lmd/proto/745f66980de9ca27fe055cc6...[BonJovi/Runaway3.mid, MikeDoyle/MyLittleRunAw...
33585../datasets/lmd/lmd_full/6/68b145d7ffdf4c43e1f...False204.472843[0.0][99.99999999999999]163.656250True96.0[Remelexo, Remelexo, Remelexo, Remelexo, Remel...2.0../datasets/lmd/proto/68b145d7ffdf4c43e1f5f59c...[C/Cesar E Paulinho - Remelexo.mid, Midis Româ...
67436../datasets/lmd/lmd_full/a/a373ed7cd11fb80fac3...False159.432272[0.0][159.0002067002687]154.150743True192.0[Crash Cymbal, Ride Cymbal, Open HiHat, Closed...1.0../datasets/lmd/proto/a373ed7cd11fb80fac3dca06...[t/trust_me.mid, T/trust_me.mid]
142153../datasets/lmd/lmd_full/e/ec2fff2c295943aedf4...False194.483086[0.0][88.70005632453577]259.977287True120.0[You Can Leave Your Hat On, You Can Leave Your...1.0../datasets/lmd/proto/ec2fff2c295943aedf4797c3...[Y/You Can Leave Your Hat on Pm L.mid, Y/You C...
127493../datasets/lmd/lmd_full/3/3fae950c18f739286ec...False160.190137[0.0][160.0]100.425000True480.0[Sax, Sax, French Horn, Piano, Bass, Drums]1.0../datasets/lmd/proto/3fae950c18f739286ec42107...[2009 MIDI/barbra_Ann2-Regents-F160.mid]
138301../datasets/lmd/lmd_full/e/ec16ec22a846e4d4d91...False164.382517[0.0, 186.66648][162.000162000162, 120.0]186.666480True120.0[Vocal, ElecGtr 1, ElecGtr 2, PickedBass, Orga...1.0../datasets/lmd/proto/ec16ec22a846e4d4d91959cc...[c/chaingm.mid, C/chaingm.mid]
83006../datasets/lmd/lmd_full/f/fd8a6fcf912a7ddc602...False160.171123[0.0][80.0]274.525000True120.0[Corazon Partio, Corazon Partio, Corazon Parti...7.0../datasets/lmd/proto/fd8a6fcf912a7ddc60242146...[C/Corazon Partio Pm L.mid, Midis Latinas/Cora...
13575../datasets/lmd/lmd_full/0/0dff276c070ccc16e2b...False203.014003[0.0][190.0002850004275]48.976900True96.0[Bop on the Rocks, Bop on the Rocks, Bop on th...3.0../datasets/lmd/proto/0dff276c070ccc16e2bd6fa1...[Jazz/Bop on.mid, b/bop-on.mid, Jazz/BOP_ON.MI...
134536../datasets/lmd/lmd_full/e/ed793dadf58b913ea24...False192.000192[0.0][140.00014000014]6.861600False96.0[z3ta+ (MIDI)]0.0../datasets/lmd/proto/ed793dadf58b913ea2460ad1...[M/Machinehead - Headwave (Zatox Mix).mid, M/M...
116286../datasets/lmd/lmd_full/4/4c4d48e046e6f5a2fc2...False255.499940[0.0][145.9999659333413]6.575344False96.0[MIDI out]0.0../datasets/lmd/proto/4c4d48e046e6f5a2fc2f89cb...[J/jan_johnston__flesh__tilt_remix__bamford.mi...
126553../datasets/lmd/lmd_full/3/312b4e4b6f9bbd3abe3...False192.617363[0.0][96.0]87.500000True384.0[ACOU BASS, A.PIANO 1, A.PIANO 1, JAZZ GTR, SL...2.0../datasets/lmd/proto/312b4e4b6f9bbd3abe3af2bb...[Diversen/Strike-Up-The-Band.mid, DIVERSEN/STR...
150997../datasets/lmd/lmd_full/b/bf3f61d4bf0c7a99c7c...False196.007331[0.0][98.0036653370836]364.744011True384.0[SHOUT, SHOUT, SHOUT, SHOUT, SHOUT, SHOUT, SHO...1.0../datasets/lmd/proto/bf3f61d4bf0c7a99c7c5dded...[080/Shout.mid, 080/Shout.mid]
66108../datasets/lmd/lmd_full/8/80fd25fb1ff70896ed8...False180.717774[0.0, 0.008333333333333333, 11.500865666666666...[60.0, 67.00002903334591, 65.99999340000066, 6...109.561609False120.0[English (open lyrics window), French (open ly...16.0../datasets/lmd/proto/80fd25fb1ff70896ed8b1115...[R/Ravel Maurice Chanson Des Cueilleuses De Le...
5427../datasets/lmd/lmd_full/9/91100d1888462c59031...False246.623850[0.0][120.0]32.005208False480.0[Nylon-Str. Gt., Bass, *Piano, Warm Pad, Strings]1.0../datasets/lmd/proto/91100d1888462c5903132be8...[COREL COLLECTION/MOOD_06.MID]
78360../datasets/lmd/lmd_full/f/f577f833d678672ef3b...False194.594595[0.0][120.0]200.916667True384.0[Snare / Tom/cymbal, Snare / Tom/cymbal, Snare...3.0../datasets/lmd/proto/f577f833d678672ef3b2787c...[D/Double Vision.mid, D/Double Vision.mid, D/D...
\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "31900 ../datasets/lmd/lmd_full/7/745f66980de9ca27fe0... False \n", "33585 ../datasets/lmd/lmd_full/6/68b145d7ffdf4c43e1f... False \n", "67436 ../datasets/lmd/lmd_full/a/a373ed7cd11fb80fac3... False \n", "142153 ../datasets/lmd/lmd_full/e/ec2fff2c295943aedf4... False \n", "127493 ../datasets/lmd/lmd_full/3/3fae950c18f739286ec... False \n", "138301 ../datasets/lmd/lmd_full/e/ec16ec22a846e4d4d91... False \n", "83006 ../datasets/lmd/lmd_full/f/fd8a6fcf912a7ddc602... False \n", "13575 ../datasets/lmd/lmd_full/0/0dff276c070ccc16e2b... False \n", "134536 ../datasets/lmd/lmd_full/e/ed793dadf58b913ea24... False \n", "116286 ../datasets/lmd/lmd_full/4/4c4d48e046e6f5a2fc2... False \n", "126553 ../datasets/lmd/lmd_full/3/312b4e4b6f9bbd3abe3... False \n", "150997 ../datasets/lmd/lmd_full/b/bf3f61d4bf0c7a99c7c... False \n", "66108 ../datasets/lmd/lmd_full/8/80fd25fb1ff70896ed8... False \n", "5427 ../datasets/lmd/lmd_full/9/91100d1888462c59031... False \n", "78360 ../datasets/lmd/lmd_full/f/f577f833d678672ef3b... False \n", "\n", " estimate_tempo tempi_sec \\\n", "31900 169.981663 [0.0] \n", "33585 204.472843 [0.0] \n", "67436 159.432272 [0.0] \n", "142153 194.483086 [0.0] \n", "127493 160.190137 [0.0] \n", "138301 164.382517 [0.0, 186.66648] \n", "83006 160.171123 [0.0] \n", "13575 203.014003 [0.0] \n", "134536 192.000192 [0.0] \n", "116286 255.499940 [0.0] \n", "126553 192.617363 [0.0] \n", "150997 196.007331 [0.0] \n", "66108 180.717774 [0.0, 0.008333333333333333, 11.500865666666666... \n", "5427 246.623850 [0.0] \n", "78360 194.594595 [0.0] \n", "\n", " tempi end_time drums \\\n", "31900 [151.00037750094376] 141.733917 True \n", "33585 [99.99999999999999] 163.656250 True \n", "67436 [159.0002067002687] 154.150743 True \n", "142153 [88.70005632453577] 259.977287 True \n", "127493 [160.0] 100.425000 True \n", "138301 [162.000162000162, 120.0] 186.666480 True \n", "83006 [80.0] 274.525000 True \n", "13575 [190.0002850004275] 48.976900 True \n", "134536 [140.00014000014] 6.861600 False \n", "116286 [145.9999659333413] 6.575344 False \n", "126553 [96.0] 87.500000 True \n", "150997 [98.0036653370836] 364.744011 True \n", "66108 [60.0, 67.00002903334591, 65.99999340000066, 6... 109.561609 False \n", "5427 [120.0] 32.005208 False \n", "78360 [120.0] 200.916667 True \n", "\n", " resolution instrument_names \\\n", "31900 96.0 [Acc Bass, Piano, Melody, Sax, Solo, E.Guitar,... \n", "33585 96.0 [Remelexo, Remelexo, Remelexo, Remelexo, Remel... \n", "67436 192.0 [Crash Cymbal, Ride Cymbal, Open HiHat, Closed... \n", "142153 120.0 [You Can Leave Your Hat On, You Can Leave Your... \n", "127493 480.0 [Sax, Sax, French Horn, Piano, Bass, Drums] \n", "138301 120.0 [Vocal, ElecGtr 1, ElecGtr 2, PickedBass, Orga... \n", "83006 120.0 [Corazon Partio, Corazon Partio, Corazon Parti... \n", "13575 96.0 [Bop on the Rocks, Bop on the Rocks, Bop on th... \n", "134536 96.0 [z3ta+ (MIDI)] \n", "116286 96.0 [MIDI out] \n", "126553 384.0 [ACOU BASS, A.PIANO 1, A.PIANO 1, JAZZ GTR, SL... \n", "150997 384.0 [SHOUT, SHOUT, SHOUT, SHOUT, SHOUT, SHOUT, SHO... \n", "66108 120.0 [English (open lyrics window), French (open ly... \n", "5427 480.0 [Nylon-Str. Gt., Bass, *Piano, Warm Pad, Strings] \n", "78360 384.0 [Snare / Tom/cymbal, Snare / Tom/cymbal, Snare... \n", "\n", " num_time_signature_changes \\\n", "31900 1.0 \n", "33585 2.0 \n", "67436 1.0 \n", "142153 1.0 \n", "127493 1.0 \n", "138301 1.0 \n", "83006 7.0 \n", "13575 3.0 \n", "134536 0.0 \n", "116286 0.0 \n", "126553 2.0 \n", "150997 1.0 \n", "66108 16.0 \n", "5427 1.0 \n", "78360 3.0 \n", "\n", " proto_path \\\n", "31900 ../datasets/lmd/proto/745f66980de9ca27fe055cc6... \n", "33585 ../datasets/lmd/proto/68b145d7ffdf4c43e1f5f59c... \n", "67436 ../datasets/lmd/proto/a373ed7cd11fb80fac3dca06... \n", "142153 ../datasets/lmd/proto/ec2fff2c295943aedf4797c3... \n", "127493 ../datasets/lmd/proto/3fae950c18f739286ec42107... \n", "138301 ../datasets/lmd/proto/ec16ec22a846e4d4d91959cc... \n", "83006 ../datasets/lmd/proto/fd8a6fcf912a7ddc60242146... \n", "13575 ../datasets/lmd/proto/0dff276c070ccc16e2bd6fa1... \n", "134536 ../datasets/lmd/proto/ed793dadf58b913ea2460ad1... \n", "116286 ../datasets/lmd/proto/4c4d48e046e6f5a2fc2f89cb... \n", "126553 ../datasets/lmd/proto/312b4e4b6f9bbd3abe3af2bb... \n", "150997 ../datasets/lmd/proto/bf3f61d4bf0c7a99c7c5dded... \n", "66108 ../datasets/lmd/proto/80fd25fb1ff70896ed8b1115... \n", "5427 ../datasets/lmd/proto/91100d1888462c5903132be8... \n", "78360 ../datasets/lmd/proto/f577f833d678672ef3b2787c... \n", "\n", " original_files \n", "31900 [BonJovi/Runaway3.mid, MikeDoyle/MyLittleRunAw... \n", "33585 [C/Cesar E Paulinho - Remelexo.mid, Midis Româ... \n", "67436 [t/trust_me.mid, T/trust_me.mid] \n", "142153 [Y/You Can Leave Your Hat on Pm L.mid, Y/You C... \n", "127493 [2009 MIDI/barbra_Ann2-Regents-F160.mid] \n", "138301 [c/chaingm.mid, C/chaingm.mid] \n", "83006 [C/Corazon Partio Pm L.mid, Midis Latinas/Cora... \n", "13575 [Jazz/Bop on.mid, b/bop-on.mid, Jazz/BOP_ON.MI... \n", "134536 [M/Machinehead - Headwave (Zatox Mix).mid, M/M... \n", "116286 [J/jan_johnston__flesh__tilt_remix__bamford.mi... \n", "126553 [Diversen/Strike-Up-The-Band.mid, DIVERSEN/STR... \n", "150997 [080/Shout.mid, 080/Shout.mid] \n", "66108 [R/Ravel Maurice Chanson Des Cueilleuses De Le... \n", "5427 [COREL COLLECTION/MOOD_06.MID] \n", "78360 [D/Double Vision.mid, D/Double Vision.mid, D/D... " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "original_files = []\n", "for _, midi_path in midi_df['midi_path'].iteritems():\n", " midi_name = midi_path.split(os.sep)[-1].split('.')[0]\n", " original_files.append(md5_filenames[midi_name])\n", "\n", "midi_df['original_files'] = original_files\n", "\n", "midi_df.sample(15)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Analysis of metadata\n", "\n", "Let's examine the extracted metadata of those files.\n", "Before we start plotting it is always a good idea to take a look a the dataframe and its description." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_pathoriginal_files
160813../datasets/lmd/lmd_full/2/2b03576c2eac6b20c55...False220.504202[0.0][99.99999999999999]200.200000True384.0[Lead, Bass, Guitar, Rhythm guitar, Voice oohs...1.0../datasets/lmd/proto/2b03576c2eac6b20c5537514...[M/Monday Monday.mid]
83189../datasets/lmd/lmd_full/f/fc7280752baa136165c...False149.380190[0.0][160.01024065540196]150.326316True48.0[SYNTH, SYNTH, SYNTH, SYNTH, SYNTH, SYNTH, SYN...1.0../datasets/lmd/proto/fc7280752baa136165c911e0...[e/entreaty.mid]
96858../datasets/lmd/lmd_full/c/cb226a46053507ae8a4...False170.000085[0.0][170.0000850000425]11.294112False96.0[Neo Cortex - Elements (Styles & Breeze Remix)]0.0../datasets/lmd/proto/cb226a46053507ae8a47a16c...[N/Neo Cortex - Elements (Breeze & Styles Remi...
34038../datasets/lmd/lmd_full/6/6d12ce4d832f9c39ca6...False231.914925[0.0, 2.368422, 3.493422, 11.1907935, 11.19735...[151.99993920002433, 80.0, 151.99993920002433,...503.284151False120.0[Primo RH, Primo LH G, Primo LH F, Secondo RH ...1.0../datasets/lmd/proto/6d12ce4d832f9c39ca683d53...[Mendelsonn/Allegro Brillant for Piano 4-hands...
171000../datasets/lmd/lmd_full/5/586c46e8f1c65beff59...False236.159582[0.0][130.0023833770286]91.378132True384.0[ACCORDION, A.PIANO 1, FINGERDBAS, CRYSTAL, DR...1.0../datasets/lmd/proto/586c46e8f1c65beff59afe0b...[Byrd, Charlie/The-Girl-From-Ipanema.mid, BYRD...
103720../datasets/lmd/lmd_full/d/dabbeaf50e6bad0b54d...TrueNaNNoneNoneNaNNoneNaNNoneNaNNone[Sure.Polyphone.Midi/Entertainer.mid, Various ...
71588../datasets/lmd/lmd_full/a/aaf2e0398a707b34bbe...False174.928646[0.0][124.000248000496]131.612640True384.0[ESTCE PR HASARD, ESTCE PR HASARD, ESTCE PR HA...1.0../datasets/lmd/proto/aaf2e0398a707b34bbeaefb9...[E/Est Ce Par Hasard Dave.mid]
139917../datasets/lmd/lmd_full/e/ebc5edd3c533092d324...False196.431974[0.0][85.00028333427778]276.574446True384.0[TORN, TORN, TORN, TORN, TORN, TORN, TORN, TOR...1.0../datasets/lmd/proto/ebc5edd3c533092d32461eee...[X/Xgtorn.mid, 097/XGtorn.mid, 097/XGtorn.mid]
45209../datasets/lmd/lmd_full/1/1450ab4b1e52e25438d...False195.096618[0.0][107.9999136000691]224.166846False96.0[Soprano, Soprano, Soprano, Soprano, Soprano, ...38.0../datasets/lmd/proto/1450ab4b1e52e25438de8e7e...[civilwar2/61sfttobfa.mid]
74319../datasets/lmd/lmd_full/a/aae666057efdbc7268d...False240.000000[0.0][120.0]66.010417False96.0[3xOsc (MIDI), 3xOsc #2 (MIDI), 3xOsc #3 (MIDI...1.0../datasets/lmd/proto/aae666057efdbc7268d62b8f...[K/Kara_Sun_-_Into_The_Sun_(Airbase_Dub_Mix)__...
90199../datasets/lmd/lmd_full/c/c750e5a017155f06e63...False256.000000[0.0][128.0]15.117188False96.0[Sylenth1 4, Sylenth1 3, Sylenth1 1]1.0../datasets/lmd/proto/c750e5a017155f06e63ae69c...[C/Cascada_-_BadBoy__DjMixdOut_20130211014527....
99470../datasets/lmd/lmd_full/c/cc6b412aeeb834e8fec...False275.408685[0.0, 182.14267500000003][140.00014000014, 108.99994368336242]185.995888True120.0[, , , , , , , , , , ]1.0../datasets/lmd/proto/cc6b412aeeb834e8fecfd308...[VerucaSalt/Seether.mid, VerucaSalt/Seether.mi...
175912../datasets/lmd/lmd_full/5/500a1212bae787f7ef8...False240.000000[0.0][120.0]98.494792False384.0[The Flowers, Of The Heath, Sequenced By, Barr...1.0../datasets/lmd/proto/500a1212bae787f7ef8fb099...[h/heath.mid, H/heath.mid]
30403../datasets/lmd/lmd_full/7/7dc2e8000f630ab4e57...False200.644292[0.0, 9.69696, 38.49696, 48.193920000000006, 5...[198.000198000198, 199.99999999999997, 198.000...144.326112True384.0[Drums - Travis, Bass - Mark, Guitar 1 - Tom, ...1.0../datasets/lmd/proto/7dc2e8000f630ab4e57eebd5...[B/blink_182-dumpweed.mid]
108557../datasets/lmd/lmd_full/d/d864a2adc13ad8b9797...False251.560269[0.0][126.99998518333504]205.980339True120.0[The Corrs, \"Breathless\", , ****************, ...1.0../datasets/lmd/proto/d864a2adc13ad8b9797d6be2...[Corrs/The Corrs - Breathless.mid, Various/The...
\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "160813 ../datasets/lmd/lmd_full/2/2b03576c2eac6b20c55... False \n", "83189 ../datasets/lmd/lmd_full/f/fc7280752baa136165c... False \n", "96858 ../datasets/lmd/lmd_full/c/cb226a46053507ae8a4... False \n", "34038 ../datasets/lmd/lmd_full/6/6d12ce4d832f9c39ca6... False \n", "171000 ../datasets/lmd/lmd_full/5/586c46e8f1c65beff59... False \n", "103720 ../datasets/lmd/lmd_full/d/dabbeaf50e6bad0b54d... True \n", "71588 ../datasets/lmd/lmd_full/a/aaf2e0398a707b34bbe... False \n", "139917 ../datasets/lmd/lmd_full/e/ebc5edd3c533092d324... False \n", "45209 ../datasets/lmd/lmd_full/1/1450ab4b1e52e25438d... False \n", "74319 ../datasets/lmd/lmd_full/a/aae666057efdbc7268d... False \n", "90199 ../datasets/lmd/lmd_full/c/c750e5a017155f06e63... False \n", "99470 ../datasets/lmd/lmd_full/c/cc6b412aeeb834e8fec... False \n", "175912 ../datasets/lmd/lmd_full/5/500a1212bae787f7ef8... False \n", "30403 ../datasets/lmd/lmd_full/7/7dc2e8000f630ab4e57... False \n", "108557 ../datasets/lmd/lmd_full/d/d864a2adc13ad8b9797... False \n", "\n", " estimate_tempo tempi_sec \\\n", "160813 220.504202 [0.0] \n", "83189 149.380190 [0.0] \n", "96858 170.000085 [0.0] \n", "34038 231.914925 [0.0, 2.368422, 3.493422, 11.1907935, 11.19735... \n", "171000 236.159582 [0.0] \n", "103720 NaN None \n", "71588 174.928646 [0.0] \n", "139917 196.431974 [0.0] \n", "45209 195.096618 [0.0] \n", "74319 240.000000 [0.0] \n", "90199 256.000000 [0.0] \n", "99470 275.408685 [0.0, 182.14267500000003] \n", "175912 240.000000 [0.0] \n", "30403 200.644292 [0.0, 9.69696, 38.49696, 48.193920000000006, 5... \n", "108557 251.560269 [0.0] \n", "\n", " tempi end_time drums \\\n", "160813 [99.99999999999999] 200.200000 True \n", "83189 [160.01024065540196] 150.326316 True \n", "96858 [170.0000850000425] 11.294112 False \n", "34038 [151.99993920002433, 80.0, 151.99993920002433,... 503.284151 False \n", "171000 [130.0023833770286] 91.378132 True \n", "103720 None NaN None \n", "71588 [124.000248000496] 131.612640 True \n", "139917 [85.00028333427778] 276.574446 True \n", "45209 [107.9999136000691] 224.166846 False \n", "74319 [120.0] 66.010417 False \n", "90199 [128.0] 15.117188 False \n", "99470 [140.00014000014, 108.99994368336242] 185.995888 True \n", "175912 [120.0] 98.494792 False \n", "30403 [198.000198000198, 199.99999999999997, 198.000... 144.326112 True \n", "108557 [126.99998518333504] 205.980339 True \n", "\n", " resolution instrument_names \\\n", "160813 384.0 [Lead, Bass, Guitar, Rhythm guitar, Voice oohs... \n", "83189 48.0 [SYNTH, SYNTH, SYNTH, SYNTH, SYNTH, SYNTH, SYN... \n", "96858 96.0 [Neo Cortex - Elements (Styles & Breeze Remix)] \n", "34038 120.0 [Primo RH, Primo LH G, Primo LH F, Secondo RH ... \n", "171000 384.0 [ACCORDION, A.PIANO 1, FINGERDBAS, CRYSTAL, DR... \n", "103720 NaN None \n", "71588 384.0 [ESTCE PR HASARD, ESTCE PR HASARD, ESTCE PR HA... \n", "139917 384.0 [TORN, TORN, TORN, TORN, TORN, TORN, TORN, TOR... \n", "45209 96.0 [Soprano, Soprano, Soprano, Soprano, Soprano, ... \n", "74319 96.0 [3xOsc (MIDI), 3xOsc #2 (MIDI), 3xOsc #3 (MIDI... \n", "90199 96.0 [Sylenth1 4, Sylenth1 3, Sylenth1 1] \n", "99470 120.0 [, , , , , , , , , , ] \n", "175912 384.0 [The Flowers, Of The Heath, Sequenced By, Barr... \n", "30403 384.0 [Drums - Travis, Bass - Mark, Guitar 1 - Tom, ... \n", "108557 120.0 [The Corrs, \"Breathless\", , ****************, ... \n", "\n", " num_time_signature_changes \\\n", "160813 1.0 \n", "83189 1.0 \n", "96858 0.0 \n", "34038 1.0 \n", "171000 1.0 \n", "103720 NaN \n", "71588 1.0 \n", "139917 1.0 \n", "45209 38.0 \n", "74319 1.0 \n", "90199 1.0 \n", "99470 1.0 \n", "175912 1.0 \n", "30403 1.0 \n", "108557 1.0 \n", "\n", " proto_path \\\n", "160813 ../datasets/lmd/proto/2b03576c2eac6b20c5537514... \n", "83189 ../datasets/lmd/proto/fc7280752baa136165c911e0... \n", "96858 ../datasets/lmd/proto/cb226a46053507ae8a47a16c... \n", "34038 ../datasets/lmd/proto/6d12ce4d832f9c39ca683d53... \n", "171000 ../datasets/lmd/proto/586c46e8f1c65beff59afe0b... \n", "103720 None \n", "71588 ../datasets/lmd/proto/aaf2e0398a707b34bbeaefb9... \n", "139917 ../datasets/lmd/proto/ebc5edd3c533092d32461eee... \n", "45209 ../datasets/lmd/proto/1450ab4b1e52e25438de8e7e... \n", "74319 ../datasets/lmd/proto/aae666057efdbc7268d62b8f... \n", "90199 ../datasets/lmd/proto/c750e5a017155f06e63ae69c... \n", "99470 ../datasets/lmd/proto/cc6b412aeeb834e8fecfd308... \n", "175912 ../datasets/lmd/proto/500a1212bae787f7ef8fb099... \n", "30403 ../datasets/lmd/proto/7dc2e8000f630ab4e57eebd5... \n", "108557 ../datasets/lmd/proto/d864a2adc13ad8b9797d6be2... \n", "\n", " original_files \n", "160813 [M/Monday Monday.mid] \n", "83189 [e/entreaty.mid] \n", "96858 [N/Neo Cortex - Elements (Breeze & Styles Remi... \n", "34038 [Mendelsonn/Allegro Brillant for Piano 4-hands... \n", "171000 [Byrd, Charlie/The-Girl-From-Ipanema.mid, BYRD... \n", "103720 [Sure.Polyphone.Midi/Entertainer.mid, Various ... \n", "71588 [E/Est Ce Par Hasard Dave.mid] \n", "139917 [X/Xgtorn.mid, 097/XGtorn.mid, 097/XGtorn.mid] \n", "45209 [civilwar2/61sfttobfa.mid] \n", "74319 [K/Kara_Sun_-_Into_The_Sun_(Airbase_Dub_Mix)__... \n", "90199 [C/Cascada_-_BadBoy__DjMixdOut_20130211014527.... \n", "99470 [VerucaSalt/Seether.mid, VerucaSalt/Seether.mi... \n", "175912 [h/heath.mid, H/heath.mid] \n", "30403 [B/blink_182-dumpweed.mid] \n", "108557 [Corrs/The Corrs - Breathless.mid, Various/The... " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_df.sample(15)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\n", "RangeIndex: 178561 entries, 0 to 178560\n", "Data columns (total 12 columns):\n", " # Column Non-Null Count Dtype \n", "--- ------ -------------- ----- \n", " 0 midi_path 178561 non-null object \n", " 1 midi_error 178561 non-null bool \n", " 2 estimate_tempo 173984 non-null float64\n", " 3 tempi_sec 173984 non-null object \n", " 4 tempi 173984 non-null object \n", " 5 end_time 173984 non-null float64\n", " 6 drums 173984 non-null object \n", " 7 resolution 173984 non-null float64\n", " 8 instrument_names 173984 non-null object \n", " 9 num_time_signature_changes 173984 non-null float64\n", " 10 proto_path 174476 non-null object \n", " 11 original_files 178561 non-null object \n", "dtypes: bool(1), float64(4), object(7)\n", "memory usage: 15.2+ MB\n" ] } ], "source": [ "midi_df.info()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "With a pandas dataframe we can simply aggregate and plot data into different formats.\n", "Let's start by taking a look at our success rate of parsing the MIDI files." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "midi_df['midi_error'].value_counts().plot.pie()\n", "plt.title('Faulty MIDI files');" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "midi_df.midi_error.value_counts().plot.bar()\n", "plt.title(\"Number of faulty MIDI files\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After we know the amount of failed MIDI files we could inspect those further to see if they are indeed corrupted or if our library has a bug." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_pathoriginal_files
3../datasets/lmd/lmd_full/9/911cd08fa1fae36e5e0...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[s/sou.mid, S/sou.mid]
24../datasets/lmd/lmd_full/9/94862530febd2b295b9...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[e/ELVIS_PRESLEY__Dont_Be_Cruel.mid, ElvisPres...
25../datasets/lmd/lmd_full/9/906e72809900e01f919...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[dream theater/uagm1.mid]
98../datasets/lmd/lmd_full/9/99e40264f321a4bfc5d...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[V/vong_tay_cau_hon.mid]
106../datasets/lmd/lmd_full/9/9f22ccab9572cafafce...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[W/Wippenberg - Pong (Tocadisco Remix).mid, W/...
.......................................
178315../datasets/lmd/lmd_full/5/5063a2b400597a372e7...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[f/faded.mid, 2009 MIDI/faded_love1-D120patsy_...
178377../datasets/lmd/lmd_full/5/523ce5adc656c872a69...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[l/laputa.mid, L/laputa.mid]
178419../datasets/lmd/lmd_full/5/546df09d78a32141369...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[h/heraldic.mid, H/heraldic.mid]
178484../datasets/lmd/lmd_full/5/5a4e49112c6cf832341...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[w/WESTERN.MID, 2009 MIDI/good_bad_and_ugly4-C...
178525../datasets/lmd/lmd_full/5/5cae276ae00fb1fd464...TrueNaNNaNNaNNaNNaNNaNNaNNaNNaN[soad/Chic_N_Stu.mid]
\n", "

4084 rows × 12 columns

\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "3 ../datasets/lmd/lmd_full/9/911cd08fa1fae36e5e0... True \n", "24 ../datasets/lmd/lmd_full/9/94862530febd2b295b9... True \n", "25 ../datasets/lmd/lmd_full/9/906e72809900e01f919... True \n", "98 ../datasets/lmd/lmd_full/9/99e40264f321a4bfc5d... True \n", "106 ../datasets/lmd/lmd_full/9/9f22ccab9572cafafce... True \n", "... ... ... \n", "178315 ../datasets/lmd/lmd_full/5/5063a2b400597a372e7... True \n", "178377 ../datasets/lmd/lmd_full/5/523ce5adc656c872a69... True \n", "178419 ../datasets/lmd/lmd_full/5/546df09d78a32141369... True \n", "178484 ../datasets/lmd/lmd_full/5/5a4e49112c6cf832341... True \n", "178525 ../datasets/lmd/lmd_full/5/5cae276ae00fb1fd464... True \n", "\n", " estimate_tempo tempi_sec tempi end_time drums resolution \\\n", "3 NaN NaN NaN NaN NaN NaN \n", "24 NaN NaN NaN NaN NaN NaN \n", "25 NaN NaN NaN NaN NaN NaN \n", "98 NaN NaN NaN NaN NaN NaN \n", "106 NaN NaN NaN NaN NaN NaN \n", "... ... ... ... ... ... ... \n", "178315 NaN NaN NaN NaN NaN NaN \n", "178377 NaN NaN NaN NaN NaN NaN \n", "178419 NaN NaN NaN NaN NaN NaN \n", "178484 NaN NaN NaN NaN NaN NaN \n", "178525 NaN NaN NaN NaN NaN NaN \n", "\n", " instrument_names num_time_signature_changes proto_path \\\n", "3 NaN NaN NaN \n", "24 NaN NaN NaN \n", "25 NaN NaN NaN \n", "98 NaN NaN NaN \n", "106 NaN NaN NaN \n", "... ... ... ... \n", "178315 NaN NaN NaN \n", "178377 NaN NaN NaN \n", "178419 NaN NaN NaN \n", "178484 NaN NaN NaN \n", "178525 NaN NaN NaN \n", "\n", " original_files \n", "3 [s/sou.mid, S/sou.mid] \n", "24 [e/ELVIS_PRESLEY__Dont_Be_Cruel.mid, ElvisPres... \n", "25 [dream theater/uagm1.mid] \n", "98 [V/vong_tay_cau_hon.mid] \n", "106 [W/Wippenberg - Pong (Tocadisco Remix).mid, W/... \n", "... ... \n", "178315 [f/faded.mid, 2009 MIDI/faded_love1-D120patsy_... \n", "178377 [l/laputa.mid, L/laputa.mid] \n", "178419 [h/heraldic.mid, H/heraldic.mid] \n", "178484 [w/WESTERN.MID, 2009 MIDI/good_bad_and_ugly4-C... \n", "178525 [soad/Chic_N_Stu.mid] \n", "\n", "[4084 rows x 12 columns]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_df[midi_df['midi_error'] == True]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "For now, we will simply ignore those files and focus on the ones that we could sucessfully parse.\n", "\n", "For our experiments it is important that the MIDI files contain a drum track - only those files that we could parse succssfully and that contain a drum track can be used for training of our model." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "midi_df.drums.value_counts().plot.pie()\n", "plt.title(\"MIDI files that contain drums\");" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA4AAAAFOCAYAAADXSXO6AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Z1A+gAAAACXBIWXMAAAsTAAALEwEAmpwYAAAiBklEQVR4nO3dfdRlZX0f/O9PRhRfQU3mUQaFKE2KkrQ6QRKTdiqJgkmDa9UYLAYwVJoVjXlaUoNNGp74kmhTSvSJ+iwaUEiNaIyJtJAgVe/Y2KLiS0SkxgmCDL6gzKCi8WX09/xxromH8Z63+57hzMz+fNY6a/a+9rX3/p1z7jnn/t7XPtep7g4AAAAHv3stugAAAADuGQIgAADARAiAAAAAEyEAAgAATIQACAAAMBECIAAAwEQIgAATUFWvr6qXLujcVVWvq6otVfW+RdSwI1X1pKr6RFXdVVVPr6o/r6ozx7azquqv7oEaFvbc3FOq6ser6uN76VhHV1VX1Zq9cTyAqREAARagqm6uqtur6v5zbf+qqpYWWNa+8mNJfjLJuu4+YfuNI2h1VV24Xfupo/31Y/1uv/iP4PSNqvryuH20qn6nqh683bF3FuJenOT3u/sB3f1n3X1Kd1+6F+7zsvZ2qKyq/6eq/uveOt6+Om93/8/u/v59WRMAu0cABFicQ5L8yqKL2FNVdcge7vKoJDd391d20udvkzxzu1GdM5P8zS6O/R+7+4FJvifJc5KcmOQ988F6N2q7YTf7sp8bo81+twHYCS+SAIvzu0l+taoO337Dcpe5VdVSVf2rsXxWVb2nqi6sqjur6qaq+tHRfusYXTxzu8M+rKquGaNlf1lVj5o79g+MbZur6uNV9cy5ba+vqtdW1VVV9ZUk/2yZeh9RVVeM/TdW1XNH+9lJ/iDJj4zLLH9rB4/FZ5Ncn+SpY7+HJPnRJFfsxuOY7v5ad78/yc8keWhmYXCnqupvk3xfkv82arvP/GO8TP+dPUZPq6qPjcf2tqr61WX2/4dJ/r9857G4c27zEVV15dj/vVX16Ln9Xjme0y9V1Qeq6sdH+8lJ/n2SnxvH++sd1H1UVb21qj5fVXdU1e+P9ntV1W9U1S3j5+WybaOncz9/Z1bVp6rqC1X16zs7b1U9p6puHPfhpqr613M1bKiqTXPrN1fVr1bVR6rqi1X1pqq67w7qP6Sq/tOo4aYkP7Xd9qWqellVvSfJV5N83zj+T8z1+fsRy7n79pzxuG6pql+sqh8e9dy57TEa/R8z/r98cdTwpuXqBDhQCIAAi3NdkqUk3xUWdtMTk3wks8DzR0kuT/LDSR6T5NlJfr+qHjDX//QkL0nysCQfTvKGJKnZaNk14xjfm+S0JK+pquPm9v2XSV6W5IFJlruE8fIkm5I8Iskzkvx2VT25uy9O8otJ/ve4zPL8ndyfy5KcMZZPS/K2JF/f5aMwp7u/PO7Lj+9G30cn+VSSfz5q2+G5duMxujjJvx6jkY9L8s5lzndj7v5YHD63+bQkv5XkiCQbM3ust3l/kn+U5CHj/H9cVfft7r9I8ttJ3jSO90PL1H1Ikv+e5JYkRyc5MrPnKknOGrd/llkQfkCS39/uED+W5PuTnJTkN6vqH+7kvLcn+ekkD8osgF9YVY/fvqY5z0xycpJjkvzgqGU5zx3H/cdJ1mf287W9n09yTmY/n7fs5Jzznpjk2CQ/l+T3kvx6kp9I8tjMRqP/6ej3kiRvz+y5WZfk/93N4wPslwRAgMX6zSS/XFXfs4J9P9ndr+vubyV5U5Kjkry4u7/e3W9P8o3MwuA2V3b3u0fQ+fXMRqKOyuyX65vHsbZ294eS/EmSn53b923d/Z7u/nZ3f22+iHGMJyX5tTES9+HMRv3OyJ750yQbxijUGZkFwpX4dGZhaW/a1WP0zSTHVdWDuntLd39wD4//p939vu7emlkw/0fbNnT3f+3uO8Z5L0hyn8xC2e44IbNQ/u+6+yvj+dkW4E9P8p+7+6buvivJi5KcVne/DPe3uvvvuvuvk/x1ku8KmXN1Xtndf9szf5lZaNpZEH9Vd3+6uzcn+W/z93k7z0zye9196+j7O8v0eX133zAeo2/u5JzzXjIej7cn+UqSN3b37d19W5L/mVngTGbP7aOSPGK7xw/ggCQAAixQd380sxGa81aw++fmlv9uHG/7tvkRwFvnzntXks2ZhYNHJXniuPTtznFp4ulJ/q/l9l3GI5JsHqNv29yS2WjTbuvuv0tyZZLfSPLQ7n7Pnuw/58jM7tvetKvH6F8keVqSW8blgj+yh8f/7NzyVzP3vI1LJW8clyDemeTBmY3i7o6jktwyguX2HpG7j5bdkmRNkrW7U9f2quqUqrp2XCJ7Z2aPx87q3N1jPyJ3//lbboRvZz+fO7L9/5Ud/d95YZJK8r6quqGqfmEF5wLYb5hCGWDxzk/ywSQXzLVtmzDlfkm+NJbnA9lKHLVtYVwa+pDMRstuTfKX3f2TO9m3d7Lt00keUlUPnAuBj0xy2wpqvCyzyyd39FnBnRr36ydy90so94adPkbj84enVtW9kzw/yZsz93jPd92Tk47P+70ws0swb+jub1fVlswCye4c79Ykj6yqNcuEwE9nFmy3eWSSrZkFoXW7OO7dzltV98lsRPSMzEaLv1lVfzZX52p8Jnd/LB+5q3oy+/9zv7n1Ff/f6e7PZnYZaqrqx5L8j6p6d3dvXOkxARbJCCDAgo1fJN+U5AVzbZ/PLEA9e0yC8QtJHr2DQ+yup1XVj1XVoZl9runa7r41sxHIf1BVP19V9x63H67ZpCW7U/+tSf5Xkt+pqvtW1Q8mOTvJSr6e4C8z+8qIPfqcVc0mcHlCkj9LsiXJ61Zw7p3Z4WNUVYdW1elV9eBx+eGXknx7B8f5XJJ14znYHQ/MLJR9PsmaqvrNzD5jN3+8o2vHM1++L7MA9fKquv94fp40tr0xyb+pqmNGcN72ub7lRguXux/z5z00s0tTP59ka1WdkuQpu3kfd+XNSV5QVeuq6ojs3mj5hzO7nPXeVbWjzw3ulqr62araFoi3ZBY2d/T8Auz3BECA/cOLk2z/1QXPTfLvktyR2cQU/2uV5/ijzEYbNyd5QmYTxWybOOUpmU1E8unMLs17RWa/0O+uZ2U2ycinM/ss3/nd/T/2tMDx+bF3jM967Y4XVtWXM3uMLkvygSQ/uouvnNhju/EY/XySm6vqS5lN9HL6Dg71zsy+duKzVfWF3Tj11Un+IrOvw7glyddy98sd/3j8e0dVfdfnDsfnQ/95Zp8F/VRmE/X83Nh8SZI/TPLuJJ8cx/7l3ajpu847Hp8XZBbWtmQ2adBuzeC6G/5LZo/DX2c2Uv7W3djnP2T2B5MtmY0m/9Eqzv/DSd5bVXdldp9+pbtvWsXxABaquvfoahQAAAAOUEYAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACbioPsi+Ic97GF99NFHL7oMOGB95Stfyf3vv/23EQDAPcd7EazOBz7wgS909/cst+2gC4BHH310rrvuukWXAQespaWlbNiwYdFlADBh3otgdarqlh1tcwkoAADARAiAAAAAEyEAAgAATMQuA2BVXVJVt1fVR+fafreq/k9VfaSq/rSqDp/b9qKq2lhVH6+qp861nzzaNlbVeXPtx1TVe0f7m6rq0NF+n7G+cWw/em/daQAAgCnanRHA1yc5ebu2a5I8rrt/MMnfJHlRklTVcUlOS/LYsc9rquqQqjokyauTnJLkuCTPGn2T5BVJLuzuxyTZkuTs0X52ki2j/cLRDwAAgBXaZQDs7ncn2bxd29u7e+tYvTbJurF8apLLu/vr3f3JJBuTnDBuG7v7pu7+RpLLk5xaVZXkyUneMva/NMnT54516Vh+S5KTRn8AAABWYG98DcQvJHnTWD4ys0C4zabRliS3btf+xCQPTXLnXJic73/ktn26e2tVfXH0/8L2BVTVOUnOSZK1a9dmaWlpdfcIJuyuu+7yfwiAhfJeBPvOqgJgVf16kq1J3rB3ylmZ7r4oyUVJsn79+va9MbByvnsJgEXzXgT7zooDYFWdleSnk5zU3T2ab0ty1Fy3daMtO2i/I8nhVbVmjALO9992rE1VtSbJg0d/AAAAVmBFXwNRVScneWGSn+nur85tuiLJaWMGz2OSHJvkfUnen+TYMePnoZlNFHPFCI7vSvKMsf+ZSd42d6wzx/IzkrxzLmgCAACwh3Y5AlhVb0yyIcnDqmpTkvMzm/XzPkmuGfOyXNvdv9jdN1TVm5N8LLNLQ5/X3d8ax3l+kquTHJLkku6+YZzi15JcXlUvTfKhJBeP9ouT/GFVbcxsEprT9sL9BQAAmKw62AbV1q9f39ddd92iy2Anjj7vykWXwE6ce/zWXHD93pgfin3l5pf/1KJLANinfAYQVqeqPtDd65fbtqJLQAEAADjwCIAAAAATIQACAABMhAAIAAAwEQIgAADARAiAAAAAEyEAAgAATIQACAAAMBECIAAAwEQIgAAAABMhAAIAAEyEAAgAADARAiAAAMBECIAAAAATIQACAABMhAAIAAAwEQIgAADARAiAAAAAEyEAAgAATIQACAAAMBECIAAAwEQIgAAAABMhAAIAAEyEAAgAADARAiAAAMBECIAAAAATIQACAABMhAAIAAAwEQIgAADARAiAAAAAEyEAAgAATIQACAAAMBECIAAAwEQIgAAAABMhAAIAAEzELgNgVV1SVbdX1Ufn2h5SVddU1SfGv0eM9qqqV1XVxqr6SFU9fm6fM0f/T1TVmXPtT6iq68c+r6qq2tk5AAAAWJndGQF8fZKTt2s7L8k7uvvYJO8Y60lySpJjx+2cJK9NZmEuyflJnpjkhCTnzwW61yZ57tx+J+/iHAAAAKzALgNgd787yebtmk9NculYvjTJ0+faL+uZa5McXlUPT/LUJNd09+bu3pLkmiQnj20P6u5ru7uTXLbdsZY7BwAAACuwZoX7re3uz4zlzyZZO5aPTHLrXL9No21n7ZuWad/ZOb5LVZ2T2Yhj1q5dm6WlpT28O9yTzj1+66JLYCfWHuY52t95jQMOdnfddZfXOthHVhoA/153d1X13ihmpefo7ouSXJQk69ev7w0bNuzLclils867ctElsBPnHr81F1y/6pcG9qGbT9+w6BIA9qmlpaX4fQ72jZXOAvq5cflmxr+3j/bbkhw112/daNtZ+7pl2nd2DgAAAFZgpQHwiiTbZvI8M8nb5trPGLOBnpjki+MyzquTPKWqjhiTvzwlydVj25eq6sQx++cZ2x1ruXMAAACwAru8zquq3phkQ5KHVdWmzGbzfHmSN1fV2UluSfLM0f2qJE9LsjHJV5M8J0m6e3NVvSTJ+0e/F3f3tollfimzmUYPS/Ln45adnAMAAIAV2GUA7O5n7WDTScv07STP28FxLklyyTLt1yV53DLtdyx3DgAAAFZmpZeAAgAAcIARAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYCAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYCAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYCAEQAABgIgRAAACAiRAAAQAAJmJVAbCq/k1V3VBVH62qN1bVfavqmKp6b1VtrKo3VdWho+99xvrGsf3oueO8aLR/vKqeOtd+8mjbWFXnraZWAACAqVtxAKyqI5O8IMn67n5ckkOSnJbkFUku7O7HJNmS5Oyxy9lJtoz2C0e/VNVxY7/HJjk5yWuq6pCqOiTJq5OckuS4JM8afQEAAFiB1V4CuibJYVW1Jsn9knwmyZOTvGVsvzTJ08fyqWM9Y/tJVVWj/fLu/np3fzLJxiQnjNvG7r6pu7+R5PLRFwAAgBVYcQDs7tuS/Kckn8os+H0xyQeS3NndW0e3TUmOHMtHJrl17Lt19H/ofPt2++yoHQAAgBVYs9Idq+qIzEbkjklyZ5I/zuwSzntcVZ2T5JwkWbt2bZaWlhZRBrvp3OO37roTC7P2MM/R/s5rHHCwu+uuu7zWwT6y4gCY5CeSfLK7P58kVfXWJE9KcnhVrRmjfOuS3Db635bkqCSbxiWjD05yx1z7NvP77Kj9brr7oiQXJcn69et7w4YNq7hb7GtnnXfloktgJ849fmsuuH41Lw3sazefvmHRJQDsU0tLS/H7HOwbq/kM4KeSnFhV9xuf5TspyceSvCvJM0afM5O8bSxfMdYztr+zu3u0nzZmCT0mybFJ3pfk/UmOHbOKHprZRDFXrKJeAACASVvxn/m7+71V9ZYkH0yyNcmHMhuFuzLJ5VX10tF28djl4iR/WFUbk2zOLNClu2+oqjdnFh63Jnled38rSarq+UmuzmyG0Uu6+4aV1gsAADB1q7rOq7vPT3L+ds03ZTaD5/Z9v5bkZ3dwnJcledky7VcluWo1NQIAADCz2q+BAAAA4AAhAAIAAEyEAAgAADARAiAAAMBECIAAAAATIQACAABMhAAIAAAwEQIgAADARAiAAAAAEyEAAgAATIQACAAAMBECIAAAwEQIgAAAABMhAAIAAEyEAAgAADARAiAAAMBECIAAAAATIQACAABMhAAIAAAwEQIgAADARAiAAAAAEyEAAgAATIQACAAAMBECIAAAwEQIgAAAABMhAAIAAEyEAAgAADARAiAAAMBECIAAAAATIQACAABMhAAIAAAwEQIgAADARAiAAAAAEyEAAgAATIQACAAAMBGrCoBVdXhVvaWq/k9V3VhVP1JVD6mqa6rqE+PfI0bfqqpXVdXGqvpIVT1+7jhnjv6fqKoz59qfUFXXj31eVVW1mnoBAACmbLUjgK9M8hfd/QNJfijJjUnOS/KO7j42yTvGepKckuTYcTsnyWuTpKoekuT8JE9MckKS87eFxtHnuXP7nbzKegEAACZrxQGwqh6c5J8kuThJuvsb3X1nklOTXDq6XZrk6WP51CSX9cy1SQ6vqocneWqSa7p7c3dvSXJNkpPHtgd197Xd3UkumzsWAAAAe2g1I4DHJPl8ktdV1Yeq6g+q6v5J1nb3Z0afzyZZO5aPTHLr3P6bRtvO2jct0w4AAMAKrFnlvo9P8svd/d6qemW+c7lnkqS7u6p6NQXujqo6J7PLSrN27dosLS3t61OyCucev3XRJbATaw/zHO3vvMYBB7u77rrLax3sI6sJgJuSbOru9471t2QWAD9XVQ/v7s+MyzhvH9tvS3LU3P7rRtttSTZs17402tct0/+7dPdFSS5KkvXr1/eGDRuW68Z+4qzzrlx0CezEucdvzQXXr+algX3t5tM3LLoEgH1qaWkpfp+DfWPFl4B292eT3FpV3z+aTkrysSRXJNk2k+eZSd42lq9IcsaYDfTEJF8cl4peneQpVXXEmPzlKUmuHtu+VFUnjtk/z5g7FgAAAHtotX/m/+Ukb6iqQ5PclOQ5mYXKN1fV2UluSfLM0feqJE9LsjHJV0ffdPfmqnpJkvePfi/u7s1j+ZeSvD7JYUn+fNwAAABYgVUFwO7+cJL1y2w6aZm+neR5OzjOJUkuWab9uiSPW02NAAAAzKz2ewABAAA4QAiAAAAAEyEAAgAATIQACAAAMBG+7AsAmJSjfR/tfu/c47f63uD92M0v/6lFl8AqGAEEAACYCAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYCAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYCAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCJWHQCr6pCq+lBV/fexfkxVvbeqNlbVm6rq0NF+n7G+cWw/eu4YLxrtH6+qp861nzzaNlbVeautFQAAYMr2xgjgryS5cW79FUku7O7HJNmS5OzRfnaSLaP9wtEvVXVcktOSPDbJyUleM0LlIUleneSUJMcledboCwAAwAqsKgBW1bokP5XkD8Z6JXlykreMLpcmefpYPnWsZ2w/afQ/Ncnl3f317v5kko1JThi3jd19U3d/I8nloy8AAAArsNoRwN9L8sIk3x7rD01yZ3dvHeubkhw5lo9McmuSjO1fHP3/vn27fXbUDgAAwAqsWemOVfXTSW7v7g9U1Ya9VtHKajknyTlJsnbt2iwtLS2yHHbh3OO37roTC7P2MM/R/s5rHKyO17j9n/ei/Zv3oQPbigNgkicl+ZmqelqS+yZ5UJJXJjm8qtaMUb51SW4b/W9LclSSTVW1JsmDk9wx177N/D47ar+b7r4oyUVJsn79+t6wYcMq7hb72lnnXbnoEtiJc4/fmguuX81LA/vazadvWHQJcEDzPrT/8160f/M+dGBb8SWg3f2i7l7X3UdnNonLO7v79CTvSvKM0e3MJG8by1eM9Yzt7+zuHu2njVlCj0lybJL3JXl/kmPHrKKHjnNcsdJ6AQAApm5f/Gnl15JcXlUvTfKhJBeP9ouT/GFVbUyyObNAl+6+oarenORjSbYmeV53fytJqur5Sa5OckiSS7r7hn1QLwAAwCTslQDY3UtJlsbyTZnN4Ll9n68l+dkd7P+yJC9bpv2qJFftjRoBAACmbm98DyAAAAAHAAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYCAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYCAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJWHEArKqjqupdVfWxqrqhqn5ltD+kqq6pqk+Mf48Y7VVVr6qqjVX1kap6/Nyxzhz9P1FVZ861P6Gqrh/7vKqqajV3FgAAYMpWMwK4Ncm53X1ckhOTPK+qjktyXpJ3dPexSd4x1pPklCTHjts5SV6bzAJjkvOTPDHJCUnO3xYaR5/nzu138irqBQAAmLQVB8Du/kx3f3AsfznJjUmOTHJqkktHt0uTPH0sn5rksp65NsnhVfXwJE9Nck13b+7uLUmuSXLy2Pag7r62uzvJZXPHAgAAYA/tlc8AVtXRSf5xkvcmWdvdnxmbPptk7Vg+Msmtc7ttGm07a9+0TDsAAAArsGa1B6iqByT5kyT/d3d/af5jet3dVdWrPcdu1HBOZpeVZu3atVlaWtrXp2QVzj1+66JLYCfWHuY52t95jYPV8Rq3//NetH/zPnRgW1UArKp7Zxb+3tDdbx3Nn6uqh3f3Z8ZlnLeP9tuSHDW3+7rRdluSDdu1L432dcv0/y7dfVGSi5Jk/fr1vWHDhuW6sZ8467wrF10CO3Hu8VtzwfWr/tsQ+9DNp29YdAlwQPM+tP/zXrR/8z50YFvNLKCV5OIkN3b3f57bdEWSbTN5npnkbXPtZ4zZQE9M8sVxqejVSZ5SVUeMyV+ekuTqse1LVXXiONcZc8cCAABgD63mTytPSvLzSa6vqg+Ptn+f5OVJ3lxVZye5Jckzx7arkjwtycYkX03ynCTp7s1V9ZIk7x/9Xtzdm8fyLyV5fZLDkvz5uAEAALACKw6A3f1XSXb0vXwnLdO/kzxvB8e6JMkly7Rfl+RxK60RAACA79grs4ACAACw/xMAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYCAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYCAEQAABgIgRAAACAiRAAAQAAJkIABAAAmAgBEAAAYCIEQAAAgIkQAAEAACZCAAQAAJgIARAAAGAiBEAAAICJEAABAAAmQgAEAACYiP0+AFbVyVX18araWFXnLboeAACAA9V+HQCr6pAkr05ySpLjkjyrqo5bbFUAAAAHpv06ACY5IcnG7r6pu7+R5PIkpy64JgAAgAPS/h4Aj0xy69z6ptEGAADAHlqz6AL2hqo6J8k5Y/Wuqvr4IuuBA9kLkocl+cKi62DH6hWLrgBg3/JetH/zPnRAeNSONuzvAfC2JEfNra8bbXfT3RclueieKgoOZlV1XXevX3QdAEyX9yLYd/b3S0Dfn+TYqjqmqg5NclqSKxZcEwAAwAFpvx4B7O6tVfX8JFcnOSTJJd19w4LLAgAAOCDt1wEwSbr7qiRXLboOmBCXUwOwaN6LYB+p7l50DQAAANwD9vfPAAIAALCXCIAAAAATIQDCxNXMs6vqN8f6I6vqhEXXBQDA3icAAq9J8iNJnjXWv5zk1YsrB4Apqqr7VdV/qKr/MtaPraqfXnRdcLARAIEndvfzknwtSbp7S5JDF1sSABP0uiRfz+yPkklyW5KXLq4cODgJgMA3q+qQJJ0kVfU9Sb692JIAmKBHd/d/TPLNJOnuryapxZYEBx8BEHhVkj9N8r1V9bIkf5XktxdbEgAT9I2qOizf+YPkozMbEQT2It8DCKSqfiDJSZn9pfUd3X3jgksCYGKq6ieT/EaS45K8PcmTkpzV3UuLrAsONgIgTFxVPXK59u7+1D1dCwDTVlUPTXJiZn+QvLa7v7DgkuCgIwDCxFXV9ZldblNJ7pvkmCQf7+7HLrQwACalqp6U5MPd/ZWqenaSxyd5ZXffsuDS4KDiM4Awcd19fHf/4Pj32CQnJPnfi64LgMl5bZKvVtUPJfm3Sf42yWWLLQkOPgIgcDfd/cEkT1x0HQBMztaeXZp2apJXd/erkzxwwTXBQWfNogsAFquq/u3c6r0yu+Tm0wsqB4Dp+nJVvSjJs5P8k6q6V5J7L7gmOOgYAQQeOHe7T5IrM/vrKwDck34us699OLu7P5tkXZLfXWxJcPAxCQxM2PgC+Fd0968uuhYAAPY9l4DCRFXVmu7eOmZdA4CFqKovZ3z5+/abknR3P+geLgkOakYAYaKq6oPd/fiqem2SI5P8cZKvbNve3W9dWHEAAOwTRgCB+ya5I8mT853vA+wkAiAA97iq+t7M3puSJN39qQWWAwcdARCm63vHDKAfzXeC3zYuDQDgHlVVP5PkgiSPSHJ7kkcluTHJYxdZFxxszAIK03VIkgeM2wPnlrfdAOCe9JIkJyb5m+4+JslJSa5dbElw8DECCNP1me5+8aKLAIDhm919R1Xdq6ru1d3vqqrfW3RRcLARAGG6atddAOAec2dVPSDJu5O8oapuz9zkZMDeYRZQmKiqekh3b150HQBMW1U9srs/VVX3T/J3mX1E6fQkD07yhu6+Y6EFwkFGAAQAYGG2fS3RWP6T7v4Xi64JDmYmgQEAYJHmP5LwfQurAiZCAAQAYJF6B8vAPuASUAAAFqaqvpXZZC+V5LAkX922KUl394MWVRscjARAAACAiXAJKAAAwEQIgAAAABMhAAIAAEyEAAgAADARAiAAAMBE/P8c/QLybsAmUAAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "midi_df.drums.value_counts().plot.bar();\n", "plt.title(\"Number of MIDI files that contain drums\");" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We should verify that the files in which we did not detect any drums have indeed no drums by listening to a random subset of them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_pathoriginal_files
170937../datasets/lmd/lmd_full/5/5c7f00cfbfb1c7a4c8d...False219.517230[0.0, 216.0][80.0, 82.00003553334875]422.539545False480.0[Piano]1.0../datasets/lmd/proto/5c7f00cfbfb1c7a4c8df5cc0...[C/Casablan.MID, C/CASABLAN.MID, C/CASABLAN.MID]
173537../datasets/lmd/lmd_full/5/5ccabab69a29eb49782...False134.749985[0.0, 0.4225350000000001, 2.9225340000000006, ...[71.00003550001773, 72.0000288000115, 68.00007...271.779287False192.0[Flute, Celeste, Harp, Vibes]10.0../datasets/lmd/proto/5ccabab69a29eb49782f7ac2...[A/AS_OP19.MID, A/AS_OP19.MID]
160054../datasets/lmd/lmd_full/2/248de65be7464e72e13...False288.075739[0.0][143.99988480009216]13.333344False96.0[MIDI out]0.0../datasets/lmd/proto/248de65be7464e72e13026ee...[H/Heaven's Cry - I Dont Need This No More.mid...
104986../datasets/lmd/lmd_full/d/d62040c47cc8ccc1256...False230.565249[0.0, 1.1940279999999999, 1.6062414999999999, ...[201.00031155048293, 131.00007641671124, 18.00...418.198189False240.0[Diskant]1.0../datasets/lmd/proto/d62040c47cc8ccc12566150b...[r/raffsonatacl.mid, R/raffsonatacl.mid]
177415../datasets/lmd/lmd_full/5/5d3a4dcffc36dc21507...False183.485497[0.0, 26.052641999999995, 26.252641999999994, ...[75.99996960001218, 75.0, 73.99998273333736, 7...355.954855False96.0[PRIMI MANDOLINI, SECONDI MANDOLINI, MANDOLE, ...3.0../datasets/lmd/proto/5d3a4dcffc36dc21507ef683...[p/primiracconti.mid, P/primiracconti.mid]
25978../datasets/lmd/lmd_full/7/7a6107b2ebe71033dd6...False133.333333[0.0][100.0]46.500000False120.0[, , , ]1.0../datasets/lmd/proto/7a6107b2ebe71033dd6d7a5b...[h/himno442.mid, H/himno442.mid]
17961../datasets/lmd/lmd_full/0/09634cc18ee4b3e3d8e...False224.868701[0.0][132.000132000132]198.167415False192.0[Clarinet, Trumpet, Tuba, Strings, Strings]1.0../datasets/lmd/proto/09634cc18ee4b3e3d8e84072...[a/Airship_Remix.mid, A/Airship_Remix.mid]
34639../datasets/lmd/lmd_full/6/603e957bc609b539948...False100.000000[0.0, 62.5, 65.5, 86.5, 91.5, 112.5, 113.5, 11...[60.0, 30.0, 60.0, 30.0, 60.0, 30.0, 60.0, 30....191.763885False96.0[Soprano, Alto, Tenor, Bass, Piano (hi), Piano...5.0../datasets/lmd/proto/603e957bc609b5399481f26f...[bliss/ppb74mhamt.mid]
161340../datasets/lmd/lmd_full/2/24c247fe99ec2cfb909...False141.750142[0.0][126.00012600012599]63.805492False240.0[Piano, Piano]1.0../datasets/lmd/proto/24c247fe99ec2cfb909c4aa4...[W/waltz_05.mid, brahms/waltz_05.mid]
29071../datasets/lmd/lmd_full/7/7d3cac11d3db19406bb...False106.219130[0.0][91.04496862742124]42.176960False96.0[, , , ]1.0../datasets/lmd/proto/7d3cac11d3db19406bb568a5...[Christian/Inmyhart.mid, Various Artists/inmyh...
\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "170937 ../datasets/lmd/lmd_full/5/5c7f00cfbfb1c7a4c8d... False \n", "173537 ../datasets/lmd/lmd_full/5/5ccabab69a29eb49782... False \n", "160054 ../datasets/lmd/lmd_full/2/248de65be7464e72e13... False \n", "104986 ../datasets/lmd/lmd_full/d/d62040c47cc8ccc1256... False \n", "177415 ../datasets/lmd/lmd_full/5/5d3a4dcffc36dc21507... False \n", "25978 ../datasets/lmd/lmd_full/7/7a6107b2ebe71033dd6... False \n", "17961 ../datasets/lmd/lmd_full/0/09634cc18ee4b3e3d8e... False \n", "34639 ../datasets/lmd/lmd_full/6/603e957bc609b539948... False \n", "161340 ../datasets/lmd/lmd_full/2/24c247fe99ec2cfb909... False \n", "29071 ../datasets/lmd/lmd_full/7/7d3cac11d3db19406bb... False \n", "\n", " estimate_tempo tempi_sec \\\n", "170937 219.517230 [0.0, 216.0] \n", "173537 134.749985 [0.0, 0.4225350000000001, 2.9225340000000006, ... \n", "160054 288.075739 [0.0] \n", "104986 230.565249 [0.0, 1.1940279999999999, 1.6062414999999999, ... \n", "177415 183.485497 [0.0, 26.052641999999995, 26.252641999999994, ... \n", "25978 133.333333 [0.0] \n", "17961 224.868701 [0.0] \n", "34639 100.000000 [0.0, 62.5, 65.5, 86.5, 91.5, 112.5, 113.5, 11... \n", "161340 141.750142 [0.0] \n", "29071 106.219130 [0.0] \n", "\n", " tempi end_time drums \\\n", "170937 [80.0, 82.00003553334875] 422.539545 False \n", "173537 [71.00003550001773, 72.0000288000115, 68.00007... 271.779287 False \n", "160054 [143.99988480009216] 13.333344 False \n", "104986 [201.00031155048293, 131.00007641671124, 18.00... 418.198189 False \n", "177415 [75.99996960001218, 75.0, 73.99998273333736, 7... 355.954855 False \n", "25978 [100.0] 46.500000 False \n", "17961 [132.000132000132] 198.167415 False \n", "34639 [60.0, 30.0, 60.0, 30.0, 60.0, 30.0, 60.0, 30.... 191.763885 False \n", "161340 [126.00012600012599] 63.805492 False \n", "29071 [91.04496862742124] 42.176960 False \n", "\n", " resolution instrument_names \\\n", "170937 480.0 [Piano] \n", "173537 192.0 [Flute, Celeste, Harp, Vibes] \n", "160054 96.0 [MIDI out] \n", "104986 240.0 [Diskant] \n", "177415 96.0 [PRIMI MANDOLINI, SECONDI MANDOLINI, MANDOLE, ... \n", "25978 120.0 [, , , ] \n", "17961 192.0 [Clarinet, Trumpet, Tuba, Strings, Strings] \n", "34639 96.0 [Soprano, Alto, Tenor, Bass, Piano (hi), Piano... \n", "161340 240.0 [Piano, Piano] \n", "29071 96.0 [, , , ] \n", "\n", " num_time_signature_changes \\\n", "170937 1.0 \n", "173537 10.0 \n", "160054 0.0 \n", "104986 1.0 \n", "177415 3.0 \n", "25978 1.0 \n", "17961 1.0 \n", "34639 5.0 \n", "161340 1.0 \n", "29071 1.0 \n", "\n", " proto_path \\\n", "170937 ../datasets/lmd/proto/5c7f00cfbfb1c7a4c8df5cc0... \n", "173537 ../datasets/lmd/proto/5ccabab69a29eb49782f7ac2... \n", "160054 ../datasets/lmd/proto/248de65be7464e72e13026ee... \n", "104986 ../datasets/lmd/proto/d62040c47cc8ccc12566150b... \n", "177415 ../datasets/lmd/proto/5d3a4dcffc36dc21507ef683... \n", "25978 ../datasets/lmd/proto/7a6107b2ebe71033dd6d7a5b... \n", "17961 ../datasets/lmd/proto/09634cc18ee4b3e3d8e84072... \n", "34639 ../datasets/lmd/proto/603e957bc609b5399481f26f... \n", "161340 ../datasets/lmd/proto/24c247fe99ec2cfb909c4aa4... \n", "29071 ../datasets/lmd/proto/7d3cac11d3db19406bb568a5... \n", "\n", " original_files \n", "170937 [C/Casablan.MID, C/CASABLAN.MID, C/CASABLAN.MID] \n", "173537 [A/AS_OP19.MID, A/AS_OP19.MID] \n", "160054 [H/Heaven's Cry - I Dont Need This No More.mid... \n", "104986 [r/raffsonatacl.mid, R/raffsonatacl.mid] \n", "177415 [p/primiracconti.mid, P/primiracconti.mid] \n", "25978 [h/himno442.mid, H/himno442.mid] \n", "17961 [a/Airship_Remix.mid, A/Airship_Remix.mid] \n", "34639 [bliss/ppb74mhamt.mid] \n", "161340 [W/waltz_05.mid, brahms/waltz_05.mid] \n", "29071 [Christian/Inmyhart.mid, Various Artists/inmyh... " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_df[midi_df['drums']==False].sample(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Lets take also a look at the other metadata we have extracted." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAAA3cAAAE/CAYAAADlpzo+AAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjQuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/Z1A+gAAAACXBIWXMAAAsTAAALEwEAmpwYAAAVuElEQVR4nO3de/DldX3f8dfbBcQAw0UMEbAuRmtWGCXCODqhltVEI5jgZJwpJCkm0tjW1qnNtAm6qZfJbILNrZbp1KAkhlbRhCZqYZKIuptMWm+giOBGRbNEvICIIjBg6PrpH+e7y2HZC3s9Z9/7eMx853fO95zzPd/febPn8NzvOWdrjBEAAAAObI9Z9A4AAACw58QdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AaK+qRlU9dT/cT1XVH1bVt6vqE/v6/nZVVd1cVWcvej8A2DfEHQD7TVVtrKr7q+reKYCuqaonLXq/NquqX6iqv9mDTZyV5CeSnDzGeM4+2P4eGWOcOsZYv6j7B2DfEncA7G8/NcY4MskTk9ye5NIF78/e9OQkG8cY9y16RwA4+Ig7ABZijPFAkquSPGPzuqo6uqquqKpvVtWtVfVrVfWYqjquqm6rqp+arndkVd1SVRdO599ZVW+rqmur6p6q+quqevK27ncH97EqyduSPG86svid7dz+xKr6QFXdNe3DL03rL0ryjrnbv3mr221z+1X12Kr67ar6+6q6ffo9Hjdddvb0e/9KVd1RVV+vqpdV1TlV9YVpH14/dx9vqqqrquq90+Pwqap61tzlG6vqx3dtUgAcKMQdAAtRVT+Q5J8l+djc6kuTHJ3kKUn+aZILk/ziGOOuJK9M8vaq+sEkv5fkhjHGFXO3/bkkv57k+CQ3JHnXdu56e/exIcm/SvLRMcaRY4xjtnP79yS5LcmJSV6e5Deq6gVjjMu3uv0b52+0g+1fkuQfJzk9yVOTnJTkDXM3/aEkh8+tf3uSn09yRpJ/kuQ/VdUpc9c/L8mfJDkuybuTvK+qDt3O7wJAI+IOgP3tfdNRq7sz+3zabyVJVa1Icn6S140x7hljbEzyO0n+eZKMMT6YWbR8OMk5Sf7lVtu9Zozx12OM7yVZk9kRsod9nm9n97Ez0/Z+LMmvjjEeGGPckNnRugt35QGY214leVWSfz/GuGuMcU+S35j2cbMHk6wdYzyYWVgen+St0/7fnORzSZ41d/3rxxhXTdf/3czC8Lm7s38AHFjEHQD728umo1aHJ/m3Sf6qqn4os2g5NMmtc9e9NbMjVptdluS0JO8cY3xrq+1+ZfOJMca9Se7K7OjavEdzHztyYpLNEbY7t9/aE5L8QJLrq+o7U/T+xbR+s2+NMTZNp++fft4+d/n9SY6cOz//OHw/Dx1lBKA5cQfAQowxNo0x/jTJpsy+ZfLOzI5SzX9W7h8l+Wqy5ajbZUmuSPLqbfzTBluO0lXVkZm9LfFrW11nh/eRZOxkt7+W5LiqOmo7t9+Zrbd/Z2ZxduoY45hpOXr6wpndNf84PCbJyXnk4wBAQ+IOgIWY/k2485Icm2TDdHTqj5Osraqjpi9E+eUk/3O6yeszi6NXZvZWzium4NvsnKo6q6oOy+yzdx8bY3xl7vI8ivu4PcnJ0zYeYdre/03ym1V1eFU9M8lFc7ffmYdtfzqy9vYkvzd9ljBVdVJVvfhRbm9bzqiqn6mqQ5K8Nsn38vDPNQLQlLgDYH/731V1b5LvJlmb5BXTZ8eS5DVJ7kvy5SR/k9kXgvxBVZ2RWYRdOAXaWzILvYvntvvuJG/M7O2YZ2T2pSPbss37mC77SJKbk3yjqu7czu0vSLIys6Nhf5bkjWOMDz3K331b2//VJLck+VhVfTfJh5I8/VFub1ven9kX1Xw7s88S/sz0+TsAmqsxdvYOFABYblX1ziS3jTF+bdH7skhV9aYkTx1jbC9sAWjMkTsAAIAGxB0AAEAD3pYJAADQgCN3AAAADYg7AACABg5Z9A7siuOPP36sXLly0bvxCPfdd1+OOOKIRe8G22E+y818lpv5LD8zWm7ms9zMZ7mZz7Zdf/31d44xnrCtyw6ouFu5cmWuu+66Re/GI6xfvz5nn332oneD7TCf5WY+y818lp8ZLTfzWW7ms9zMZ9uq6tbtXeZtmQAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0IC4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGxB0AAEAD4g4AAKABcQcAANCAuAMAAGhA3AEAADQg7gAAABoQdwAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0IC4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGxB0AAEAD4g4AAKABcQcAANCAuAMAAGhA3AEAADQg7gAAABoQdwAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0IC4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGxB0AAEAD4g4AAKABcQcAANCAuAMAAGhA3AEAADQg7gAAABoQdwAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0IC4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGxB0AAEAD4g4AAKABcQcAANCAuAMAAGhA3AEAADQg7gAAABoQdwAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0IC4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGDln0DhwMnvXmD+bu+x9Mkhy16uLcs+GSJMnRjzs0n3njixa5awAAQBOO3O0Hd9//YDZecm42XnJukmw5vTn4AAAA9pS4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLjbC1avXr3Ptl1V+2zbAABAH+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAa2GHcVdUxVfXqfbkDVXViVV21L+8DAACgu50duTsmyT6NuzHG18YYL9+X93Egqqot/wzC5tOW3VtWr1693cuA/efKK6/MaaedlhUrVuS0007LlVdeuehdAoBHOJBfr3YWd5ck+eGquqGqfquq/mNVfbKqbqyqNydJVa2sqr+tqndW1Req6l1V9eNV9X+q6otV9Zzpem+qqv9RVR+d1v/S3O1v2re/5oFFdOxbV1999ZbTHmvYP6688sqsWbMml156aR544IFceumlWbNmzQH1gglAfwf669XO4u7iJF8aY5ye5NokT0vynCSnJzmjqp4/Xe+pSX4nyY9My88mOSvJf0jy+rntPTPJC5I8L8kbqurEvfJbwKN06qmn5txzz80YY9G7AgeVtWvX5vLLL8/q1atz6KGHZvXq1bn88suzdu3aRe8aAGxxoL9eHbIL133RtHx6On9kZrH390n+bozx2SSpqpuTfHiMMarqs0lWzm3j/WOM+5PcX1XrMgvFG3Z0p1X1qiSvSpITTjgh69ev34Vd3n9WXnzNDi+f3+/50zu7HXvXhg0btjz+a9euzZo1a5b2v6mDxb333msGS2xvzWfDhg3ZtGnTw7a1adOmh/2ZZPf4M7TczGe5mc9yW8R8DvTXq12Ju0rym2OM33/YyqqVSb43t+r7c+e/v9V9bH24ZKeHT8YYlyW5LEnOPPPMcfbZZ+/CLu8/Gy85d7uXrbz4mmzZ7z/KQ6f/4ppt3q7esvf3j5lVq1ZtefxXr16dZG4eLMT69evNYIntrfmsWrUqK1aseNi21q1b97A/k+wef4aWm/ksN/NZbouYz4H+erWzt2Xek+So6fRfJnllVR2ZJFV1UlX94C7e33lVdXhVPT7J2Uk+uYu3hz1y880355prrvFZO9jP1qxZk4suuijr1q3Lgw8+mHXr1uWiiy7KmjVrFr1rALDFgf56tcMjd2OMb01fjHJTkj9P8u4kH53+x/jeJD+fZNMu3N+NSdYlOT7Jr48xvjYd+WPOGEN87EMvfelLt5z22TvYPy644IIkyWte85ps2LAhq1atytq1a7esB4BlcKC/Xu30bZljjJ/datVbt3G10+au/wtzpzfOX5bkxjHGhVttf+vrkIeio6oEyB7ylgtYDhdccMEB8+IIwMHrQH692tnbMgEAADgA7MoXquyRMcab9td9AQAAHGwcuQMAAGhA3AEAADQg7gAAABoQdwAAAA2Iu71g3bp1+2zb/hkEAADg0RB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaOGTRO3CwWHnxNUmSo1Y9dProxx26yF0CAAAaEXf7wcZLzp07d+52rwcAALC7vC0TAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGxB0AAEAD4g4AAKABcQcAANCAuAMAAGhA3AEAADQg7gAAABoQdwAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0IC4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGxB0AAEAD4g4AAKABcQcAANCAuAMAAGhA3AEAADQg7gAAABoQdwAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0IC4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGxB0AAEAD4g4AAKABcQcAANCAuAMAAGhA3AEAADQg7gAAABoQdwAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0IC4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGxB0AAEAD4g4AAKABcQcAANCAuAMAAGhA3AEAADQg7gAAABoQdwAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0IC4AwAAaEDcAQAANCDuAAAAGhB3AAAADYg7AACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAANiDsAAIAGxB0AAEAD4g4AAKABcQcAANCAuAMAAGhA3AEAADQg7gAAABoQdwAAAA2IOwAAgAbEHQAAQAPiDgAAoAFxBwAA0ECNMRa9D49aVX0zya2L3o9tOD7JnYveCbbLfJab+Sw381l+ZrTczGe5mc9yM59te/IY4wnbuuCAirtlVVXXjTHOXPR+sG3ms9zMZ7mZz/Izo+VmPsvNfJab+ew6b8sEAABoQNwBAAA0IO72jssWvQPskPksN/NZbuaz/MxouZnPcjOf5WY+u8hn7gAAABpw5A4AAKABcbcHquonq+rzVXVLVV286P05WFTVH1TVHVV109y646rq2qr64vTz2Gl9VdV/nWZ0Y1U9e+42r5iu/8WqesUifpeOqupJVbWuqj5XVTdX1b+b1pvRkqiqw6vqE1X1mWlGb57Wn1JVH59m8d6qOmxa/9jp/C3T5SvntvW6af3nq+rFC/qV2qmqFVX16aq6ejpvNkukqjZW1Wer6oaqum5a5zluSVTVMVV1VVX9bVVtqKrnmc9yqKqnT39uNi/frarXms9eNMaw7MaSZEWSLyV5SpLDknwmyTMWvV8Hw5Lk+UmeneSmuXX/OcnF0+mLk7xlOn1Okj9PUkmem+Tj0/rjknx5+nnsdPrYRf9uHZYkT0zy7On0UUm+kOQZZrQ8y/RYHzmdPjTJx6fH/o+TnD+tf1uSfz2dfnWSt02nz0/y3un0M6bnvscmOWV6Tlyx6N+vw5Lkl5O8O8nV03mzWaIlycYkx2+1znPckixJ/ijJv5hOH5bkGPNZviWz/5f+RpInm8/eWxy5233PSXLLGOPLY4x/SPKeJOcteJ8OCmOMv05y11arz8vsyTzTz5fNrb9izHwsyTFV9cQkL05y7RjjrjHGt5Ncm+Qn9/nOHwTGGF8fY3xqOn1Pkg1JTooZLY3psb53OnvotIwkL0hy1bR+6xltnt1VSV5YVTWtf88Y43tjjL9Lcktmz43sgao6Ocm5Sd4xna+YzYHAc9wSqKqjM/tL4MuTZIzxD2OM78R8ltELk3xpjHFrzGevEXe776QkX5k7f9u0jsU4YYzx9en0N5KcMJ3e3pzMbz+Y3iL2o5kdGTKjJTK97e+GJHdk9qL4pSTfGWP8v+kq84/3lllMl9+d5PExo33lvyT5lSTfn84/PmazbEaSD1bV9VX1qmmd57jlcEqSbyb5w+mtze+oqiNiPsvo/CRXTqfNZy8Rd7QzxhiZvfCyQFV1ZJL/leS1Y4zvzl9mRos3xtg0xjg9ycmZHdH5kcXuEUlSVS9NcscY4/pF7ws7dNYY49lJXpLk31TV8+cv9By3UIdk9tGN/z7G+NEk92X2Nr8tzGfxps8N/3SSP9n6MvPZM+Ju9301yZPmzp88rWMxbp8O02f6ece0fntzMr99qKoOzSzs3jXG+NNptRktoentSuuSPC+zt7scMl00/3hvmcV0+dFJvhUz2hd+LMlPV9XGzN7u/4Ikb43ZLJUxxlenn3ck+bPM/oLEc9xyuC3JbWOMj0/nr8os9sxnubwkyafGGLdP581nLxF3u++TSZ42fYPZYZkdWv7AgvfpYPaBJJu/KekVSd4/t/7C6duWnpvk7umw/18meVFVHTt9I9OLpnXsoenzPpcn2TDG+N25i8xoSVTVE6rqmOn045L8RGafjVyX5OXT1bae0ebZvTzJR6a/Wf1AkvNr9o2NpyR5WpJP7JdfoqkxxuvGGCePMVZm9rrykTHGz8VslkZVHVFVR20+ndlz003xHLcUxhjfSPKVqnr6tOqFST4X81k2F+Sht2Qm5rP3LOJbXLosmX2Dzxcy+6zKmkXvz8GyZPZk8PUkD2b2N3QXZfYZkw8n+WKSDyU5brpuJflv04w+m+TMue28MrMvGbglyS8u+vfqsiQ5K7O3U9yY5IZpOceMlmdJ8swkn55mdFOSN0zrn5JZANyS2VtlHjutP3w6f8t0+VPmtrVmmt3nk7xk0b9bpyXJ2Xno2zLNZkmWaRafmZabN7/+e45bniXJ6Umum57j3pfZtymaz5IsSY7I7B0GR8+tM5+9tNT04AAAAHAA87ZMAACABsQdAABAA+IOAACgAXEHAADQgLgDAABoQNwBAAA0IO4AAAAaEHcAAAAN/H/q3fMKTWpqQwAAAABJRU5ErkJggg==\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "midi_df.explode('tempi')['tempi'].plot.box(vert=False);\n", "plt.title('Boxplot of tempi');" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "midi_df.explode('tempi')['tempi'].plot.hist(bins=5000, xlim=(0, 300));\n", "plt.title('Histogram of tempi');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We notice a big spork at around 120 bpm which is rather interesting and should be inspected what the exact casue of this spike is.\n", "\n", "It is important to get an understanding and feeling for the data.\n", "We have an expectation of our data which can help us to see if it makes sense and if we parsed it correctly. But that can also be problematic because we project certain simplifying expectations on our data.\n", "\n", "Let's count the occurences of tempo changes in each song." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "midi_df.explode('tempi').groupby('midi_path').size().value_counts().plot.box(vert=False);\n", "plt.title('Boxplot of number of time chages per song');" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "midi_path\n", "../datasets/lmd/lmd_full/9/95cf26881a51f376a54e4e3f049f2d48.mid 6597\n", "../datasets/lmd/lmd_full/3/34c04cd56b1087851f79780360f48225.mid 6096\n", "../datasets/lmd/lmd_full/2/2677cf785791e1cfca13a0566aafbc85.mid 6096\n", "../datasets/lmd/lmd_full/3/34a3df2a3a1e2267cf63657984f608cf.mid 5912\n", "../datasets/lmd/lmd_full/c/c615436a609bf4e82c102503ea01d855.mid 5758\n", "../datasets/lmd/lmd_full/1/1e40fd0edc293c8733a9c1b66517890b.mid 5758\n", "../datasets/lmd/lmd_full/6/6662d98153d0c84d93e794d0c2b3940f.mid 5758\n", "../datasets/lmd/lmd_full/8/8d956a0e5ee409a19adda9db287f7108.mid 4301\n", "../datasets/lmd/lmd_full/0/001549d8bc6ba6dc62ade443cf01a51e.mid 4058\n", "../datasets/lmd/lmd_full/0/0b0fdfe28eb0318ea3ac86046053724b.mid 3922\n", "dtype: int64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_df.explode('tempi').groupby('midi_path').size().sort_values(ascending=False).head(10)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Its worth to inspect those files in e.g. [MuseScore](https://musescore.org/de) and try to figure out what happens in them." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "midi_df['end_time'].plot.box(vert=False);\n", "plt.title('Boxplot of end_time');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We see that the `end_time` has some extreme values that are probably wrong - let's inspect those examples closer to see if we can see a pattern." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_pathoriginal_files
84737../datasets/lmd/lmd_full/f/f7b41341a20201f860c...False225.041860[0.0][128.0]31055.625000False120.0[Melody, On the Sun Road, Vivian Lai, Joseph L...2.0../datasets/lmd/proto/f7b41341a20201f860cb375e...[s/sunroad.mid, S/sunroad.mid]
28204../datasets/lmd/lmd_full/7/7abab3566b77073f827...False220.109820[0.0, 1.411764, 67.095876, 129.0154569375, 129...[170.0000850000425, 95.00014250021376, 70.0000...25714.187790True96.0[, , , , , , , ]9.0../datasets/lmd/proto/7abab3566b77073f8274c097...[FErnszt/Olala.mid]
69901../datasets/lmd/lmd_full/a/af6b689cfbb13c302bd...False229.041171[0.0, 5.052624, 40.420992, 94.420992, 118.4209...[95.00014250021376, 190.0002850004275, 80.0, 1...20384.791244True96.0[Acoustic Bass, Brass, Grand piano, Rock Organ...7.0../datasets/lmd/proto/af6b689cfbb13c302bd41b86...[FErnszt/Swingin.mid]
26973../datasets/lmd/lmd_full/7/73f4f536d3d42293bd7...False144.311366[0.0][122.00006913337249]19652.447880True192.0[tk1, tk2, tk3, tk4, tk10, tk11]1.0../datasets/lmd/proto/73f4f536d3d42293bd789b34...[Clapton Eric/Bellbottom Blues.mid, Various Ar...
24515../datasets/lmd/lmd_full/7/7486b07d9a4060fa614...False181.059483[0.0, 0.4829544166666666, 1.9996389166666664, ...[88.00002346667293, 89.00994241056726, 90.0099...19623.803343True192.0[Steel Str.Guitar, Fretless Bass, Piano, Violi...13.0../datasets/lmd/proto/7486b07d9a4060fa6142ac46...[K/Kevin Parent Les Doigts 2.mid]
\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "84737 ../datasets/lmd/lmd_full/f/f7b41341a20201f860c... False \n", "28204 ../datasets/lmd/lmd_full/7/7abab3566b77073f827... False \n", "69901 ../datasets/lmd/lmd_full/a/af6b689cfbb13c302bd... False \n", "26973 ../datasets/lmd/lmd_full/7/73f4f536d3d42293bd7... False \n", "24515 ../datasets/lmd/lmd_full/7/7486b07d9a4060fa614... False \n", "\n", " estimate_tempo tempi_sec \\\n", "84737 225.041860 [0.0] \n", "28204 220.109820 [0.0, 1.411764, 67.095876, 129.0154569375, 129... \n", "69901 229.041171 [0.0, 5.052624, 40.420992, 94.420992, 118.4209... \n", "26973 144.311366 [0.0] \n", "24515 181.059483 [0.0, 0.4829544166666666, 1.9996389166666664, ... \n", "\n", " tempi end_time drums \\\n", "84737 [128.0] 31055.625000 False \n", "28204 [170.0000850000425, 95.00014250021376, 70.0000... 25714.187790 True \n", "69901 [95.00014250021376, 190.0002850004275, 80.0, 1... 20384.791244 True \n", "26973 [122.00006913337249] 19652.447880 True \n", "24515 [88.00002346667293, 89.00994241056726, 90.0099... 19623.803343 True \n", "\n", " resolution instrument_names \\\n", "84737 120.0 [Melody, On the Sun Road, Vivian Lai, Joseph L... \n", "28204 96.0 [, , , , , , , ] \n", "69901 96.0 [Acoustic Bass, Brass, Grand piano, Rock Organ... \n", "26973 192.0 [tk1, tk2, tk3, tk4, tk10, tk11] \n", "24515 192.0 [Steel Str.Guitar, Fretless Bass, Piano, Violi... \n", "\n", " num_time_signature_changes \\\n", "84737 2.0 \n", "28204 9.0 \n", "69901 7.0 \n", "26973 1.0 \n", "24515 13.0 \n", "\n", " proto_path \\\n", "84737 ../datasets/lmd/proto/f7b41341a20201f860cb375e... \n", "28204 ../datasets/lmd/proto/7abab3566b77073f8274c097... \n", "69901 ../datasets/lmd/proto/af6b689cfbb13c302bd41b86... \n", "26973 ../datasets/lmd/proto/73f4f536d3d42293bd789b34... \n", "24515 ../datasets/lmd/proto/7486b07d9a4060fa6142ac46... \n", "\n", " original_files \n", "84737 [s/sunroad.mid, S/sunroad.mid] \n", "28204 [FErnszt/Olala.mid] \n", "69901 [FErnszt/Swingin.mid] \n", "26973 [Clapton Eric/Bellbottom Blues.mid, Various Ar... \n", "24515 [K/Kevin Parent Les Doigts 2.mid] " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_df.sort_values('end_time', ascending=False).head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Regarding this, it seems we should filter out those really long pieces as those are most probably wrong files.\n", "\n", "Those errors could be caused by our parsing or by the MIDI files itself - but as we have a big enough corpora it is justifiable by not including those outliers." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Resolution corresponds to the [PPQN](https://en.wikipedia.org/wiki/Pulses_per_quarter_note) of the MIDI file.\n", "Everything above 1000 should be suspicious, so let's take a look at those examples." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_pathoriginal_files
33871../datasets/lmd/lmd_full/6/610987ebaf0a326db40...False239.221606[0.0][120.0]135.368840False25000.0[, , , ]1.0../datasets/lmd/proto/610987ebaf0a326db4089867...[b/beethovenalladanza.mid, B/beethovenalladanz...
45187../datasets/lmd/lmd_full/1/1283eddbc4ab9042dcc...False225.608072[0.0][112.00005973336519]79.280092True24576.0[, , ]1.0../datasets/lmd/proto/1283eddbc4ab9042dccaea09...[Pop_and_Top40/2 Pac - Changes.mid, Pop_and_To...
67351../datasets/lmd/lmd_full/a/a5c6eb45c94d84474cb...False235.213491[0.0][120.0]26.987488False16384.0[, , ]2.0../datasets/lmd/proto/a5c6eb45c94d84474cbe00a3...[a/amigo3.mid, A/amigo3.mid]
159033../datasets/lmd/lmd_full/2/2076f0fa330be484c1c...False236.923314[0.0][140.00014000014]26.999973False15360.0[Main Lead]1.0../datasets/lmd/proto/2076f0fa330be484c1c60242...[M/M.I.D.O.R. - Far East.mid, M/m.i.d.o.r.__fa...
15472../datasets/lmd/lmd_full/0/0c3f038f4eaed7d7f8c...False272.000290[0.0][136.0001450668214]28.014676False15360.0[Melody]1.0../datasets/lmd/proto/0c3f038f4eaed7d7f8c90478...[W/whiteroom__someday_intro__ambia.mid, W/Whit...
\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "33871 ../datasets/lmd/lmd_full/6/610987ebaf0a326db40... False \n", "45187 ../datasets/lmd/lmd_full/1/1283eddbc4ab9042dcc... False \n", "67351 ../datasets/lmd/lmd_full/a/a5c6eb45c94d84474cb... False \n", "159033 ../datasets/lmd/lmd_full/2/2076f0fa330be484c1c... False \n", "15472 ../datasets/lmd/lmd_full/0/0c3f038f4eaed7d7f8c... False \n", "\n", " estimate_tempo tempi_sec tempi end_time drums \\\n", "33871 239.221606 [0.0] [120.0] 135.368840 False \n", "45187 225.608072 [0.0] [112.00005973336519] 79.280092 True \n", "67351 235.213491 [0.0] [120.0] 26.987488 False \n", "159033 236.923314 [0.0] [140.00014000014] 26.999973 False \n", "15472 272.000290 [0.0] [136.0001450668214] 28.014676 False \n", "\n", " resolution instrument_names num_time_signature_changes \\\n", "33871 25000.0 [, , , ] 1.0 \n", "45187 24576.0 [, , ] 1.0 \n", "67351 16384.0 [, , ] 2.0 \n", "159033 15360.0 [Main Lead] 1.0 \n", "15472 15360.0 [Melody] 1.0 \n", "\n", " proto_path \\\n", "33871 ../datasets/lmd/proto/610987ebaf0a326db4089867... \n", "45187 ../datasets/lmd/proto/1283eddbc4ab9042dccaea09... \n", "67351 ../datasets/lmd/proto/a5c6eb45c94d84474cbe00a3... \n", "159033 ../datasets/lmd/proto/2076f0fa330be484c1c60242... \n", "15472 ../datasets/lmd/proto/0c3f038f4eaed7d7f8c90478... \n", "\n", " original_files \n", "33871 [b/beethovenalladanza.mid, B/beethovenalladanz... \n", "45187 [Pop_and_Top40/2 Pac - Changes.mid, Pop_and_To... \n", "67351 [a/amigo3.mid, A/amigo3.mid] \n", "159033 [M/M.I.D.O.R. - Far East.mid, M/m.i.d.o.r.__fa... \n", "15472 [W/whiteroom__someday_intro__ambia.mid, W/Whit... " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_df.sort_values('resolution', ascending=False).head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Another important metric is the number of time signatures.\n", "\n", "If we use a step-sequencer like approach, we will need our data to have one time signature only." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": { "needs_background": "light" }, "output_type": "display_data" } ], "source": [ "midi_df['num_time_signature_changes'].plot.box(vert=False);" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_pathoriginal_files
72284../datasets/lmd/lmd_full/a/a85612fb0992100bded...False222.143725[0.0, 1.54624, 1.6513706666666668, 1.779370666...[194.0190397350993, 190.23944805194805, 195.31...204.310165True48.0[STRAUSS, STRAUSS, STRAUSS, STRAUSS, STRAUSS, ...959.0../datasets/lmd/proto/a85612fb0992100bded76798...[S/STRAUSS.MID, S/STRAUSS.MID]
26776../datasets/lmd/lmd_full/7/7d42a95293cc47bcacf...False199.481296[0.0, 1.6240640000000002, 2.071552, 2.22173866...[160.0922131147541, 134.08180778032036, 133.16...229.441685True48.0[NACHT, NACHT, NACHT, NACHT, NACHT, NACHT, NACHT]782.0../datasets/lmd/proto/7d42a95293cc47bcacfaa568...[Hollands/NACHT.MID]
168045../datasets/lmd/lmd_full/5/5693291bc82548fea37...False242.258977[0.0][136.0001450668214]255.266272True96.0[VIBRA SLAP, VIBRA SLAP, VIBRA SLAP, VIBRA SLA...479.0../datasets/lmd/proto/5693291bc82548fea376ec42...[S/Striving.MID, Midis Diversas/STRIVING.MID, ...
60293../datasets/lmd/lmd_full/8/85b2d4e8637008f6b4e...False218.486679[0.0, 2.5, 73.5, 75.586952, 89.884816, 131.884...[96.0, 240.0, 230.00049833441304, 235.00013708...283.476487False1024.0[, ]424.0../datasets/lmd/proto/85b2d4e8637008f6b4e54e5c...[B/beevar2.mid]
44034../datasets/lmd/lmd_full/6/69df67cd25dd33c1217...False228.828269[0.0, 1.25, 3.65, 102.78041, 105.1804100000000...[96.0, 100.0, 115.0000287500072, 100.0, 115.00...407.079384False1024.0[, ]382.0../datasets/lmd/proto/69df67cd25dd33c12177e159...[Schumann/Schuman Toccata op7.mid, Schumann/Sc...
\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "72284 ../datasets/lmd/lmd_full/a/a85612fb0992100bded... False \n", "26776 ../datasets/lmd/lmd_full/7/7d42a95293cc47bcacf... False \n", "168045 ../datasets/lmd/lmd_full/5/5693291bc82548fea37... False \n", "60293 ../datasets/lmd/lmd_full/8/85b2d4e8637008f6b4e... False \n", "44034 ../datasets/lmd/lmd_full/6/69df67cd25dd33c1217... False \n", "\n", " estimate_tempo tempi_sec \\\n", "72284 222.143725 [0.0, 1.54624, 1.6513706666666668, 1.779370666... \n", "26776 199.481296 [0.0, 1.6240640000000002, 2.071552, 2.22173866... \n", "168045 242.258977 [0.0] \n", "60293 218.486679 [0.0, 2.5, 73.5, 75.586952, 89.884816, 131.884... \n", "44034 228.828269 [0.0, 1.25, 3.65, 102.78041, 105.1804100000000... \n", "\n", " tempi end_time drums \\\n", "72284 [194.0190397350993, 190.23944805194805, 195.31... 204.310165 True \n", "26776 [160.0922131147541, 134.08180778032036, 133.16... 229.441685 True \n", "168045 [136.0001450668214] 255.266272 True \n", "60293 [96.0, 240.0, 230.00049833441304, 235.00013708... 283.476487 False \n", "44034 [96.0, 100.0, 115.0000287500072, 100.0, 115.00... 407.079384 False \n", "\n", " resolution instrument_names \\\n", "72284 48.0 [STRAUSS, STRAUSS, STRAUSS, STRAUSS, STRAUSS, ... \n", "26776 48.0 [NACHT, NACHT, NACHT, NACHT, NACHT, NACHT, NACHT] \n", "168045 96.0 [VIBRA SLAP, VIBRA SLAP, VIBRA SLAP, VIBRA SLA... \n", "60293 1024.0 [, ] \n", "44034 1024.0 [, ] \n", "\n", " num_time_signature_changes \\\n", "72284 959.0 \n", "26776 782.0 \n", "168045 479.0 \n", "60293 424.0 \n", "44034 382.0 \n", "\n", " proto_path \\\n", "72284 ../datasets/lmd/proto/a85612fb0992100bded76798... \n", "26776 ../datasets/lmd/proto/7d42a95293cc47bcacfaa568... \n", "168045 ../datasets/lmd/proto/5693291bc82548fea376ec42... \n", "60293 ../datasets/lmd/proto/85b2d4e8637008f6b4e54e5c... \n", "44034 ../datasets/lmd/proto/69df67cd25dd33c12177e159... \n", "\n", " original_files \n", "72284 [S/STRAUSS.MID, S/STRAUSS.MID] \n", "26776 [Hollands/NACHT.MID] \n", "168045 [S/Striving.MID, Midis Diversas/STRIVING.MID, ... \n", "60293 [B/beevar2.mid] \n", "44034 [Schumann/Schuman Toccata op7.mid, Schumann/Sc... " ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_df.sort_values('num_time_signature_changes', ascending=False).head(5)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Let's check the amount of files we would loose if we limit our files to exactly one time signature." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "image/png": "\n", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "(midi_df.num_time_signature_changes <= 1).value_counts().plot.pie();\n", "plt.title('Tracks without time signature changes');" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can also take a look at the most common instrument names in our dataset." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ " 354676\n", "Bass 29532\n", "Drums 19438\n", "untitled 18471\n", "Piano 16344\n", "Strings 10922\n", "Soprano 10478\n", "Guitar 10126\n", "Alto 9368\n", "Tenor 8987\n", "DRUMS 8013\n", "Melody 6693\n", "WinJammer Demo 6488\n", "Voice 6270\n", "Piano (hi) 4905\n", "Piano (lo) 4900\n", "Italian 4679\n", "STRINGS 4003\n", "bass 3785\n", "MELODY 3775\n", "Name: instrument_names, dtype: int64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_df.explode('instrument_names').instrument_names.value_counts().head(20)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "And also take a look at the most common names of our original filenames." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "name": "stderr", "output_type": "stream", "text": [ ":1: FutureWarning: The default value of regex will change from True to False in a future version.\n", " midi_df.explode('original_files').original_files.str.replace('[^a-zA-Z]', ' ').str.lower().str.split(' ').explode().value_counts().head(50)\n" ] }, { "data": { "text/plain": [ " 1091324\n", "mid 570786\n", "midi 56765\n", "various 42479\n", "artists 36610\n", "a 34883\n", "the 34148\n", "l 29181\n", "s 28479\n", "polyphone 24771\n", "sure 23089\n", "i 22765\n", "m 20958\n", "e 20792\n", "t 19164\n", "midis 19014\n", "c 18162\n", "b 17950\n", "d 17712\n", "n 17012\n", "h 15452\n", "o 15425\n", "g 13303\n", "you 13228\n", "p 12267\n", "k 12032\n", "of 11890\n", "poly 11836\n", "in 11073\n", "f 10433\n", "beatles 9696\n", "love 9246\n", "r 8911\n", "me 8901\n", "j 8816\n", "diversen 8643\n", "my 8172\n", "to 7585\n", "divers 7025\n", "w 6894\n", "analisadas 6320\n", "no 6073\n", "de 5601\n", "on 5373\n", "and 5349\n", "midirip 5017\n", "it 4853\n", "bach 4692\n", "classical 4542\n", "unsorted 4419\n", "Name: original_files, dtype: int64" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "midi_df.explode('original_files').original_files.str.replace('[^a-zA-Z]', ' ').str.lower().str.split(' ').explode().value_counts().head(50)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "An interesting aspect of this analysis reveals that the most common words in a text corpora do not often contain much information what the text is about. This is because the words that belong to the grammatical structure of the text introduce noise to the dataset.\n", "\n", "There are basic algorithms such as [tf-idf](https://en.wikipedia.org/wiki/Tf%E2%80%93idf) which allows us to filter out the noise by asserting that a word which is common in many documents has little no meaning (e.g. *the* or in our case *midi*) and therefore give it a low score.\n", "Although if a significant subset of our file names contain a word that is not so common in the other file names (e.g. *beatles*) this *token* will receive a high rating.\n", "\n", "For now we don't want to inspect this further, but its always good to know such kind of algorithms as it helps to filter out the relevant data." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Its also good to make some sanity checks on our data.\n", "One assumption would be that every file in which we did not detect a drum track also does not contain an instrument with the name *drums*." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_pathoriginal_files
0../datasets/lmd/lmd_full/0/0093ffc81428aabfa14...False190.000285[0.0][190.0002850004275]137.151110False192.0Drums1.0../datasets/lmd/proto/0093ffc81428aabfa14aaa13...[M/Martha My Dear 3.mid, Beatles +GeorgeJohnPa...
1../datasets/lmd/lmd_full/0/032e298d9560fdbe954...False120.000180[0.0][190.0002850004275]114.947196False120.0Drums1.0../datasets/lmd/proto/032e298d9560fdbe954d7551...[s/smetmoth.mid, S/smetmoth.mid]
2../datasets/lmd/lmd_full/0/043e1c01e58a4bb4499...False139.999977[0.0, 0.4285715, 7.071422000000001, 39.071422][69.99998833333528, 140.00014000014, 60.0, 69....196.778591False120.0Drums1.0../datasets/lmd/proto/043e1c01e58a4bb44995ee38...[Songs+300/Tunnel_o.mid]
3../datasets/lmd/lmd_full/0/06f86377b1dbad0ebf0...False226.482465[0.0, 1.0, 15.608691999999998][120.0, 115.0000287500072, 120.0]59.483692False192.0DRUMS1.0../datasets/lmd/proto/06f86377b1dbad0ebf08fdcd...[s/seaquest2.mid, S/seaquest2.mid]
4../datasets/lmd/lmd_full/0/078e2cc7c46eb5de726...False204.052854[0.0][98.00014373354414]281.211322False192.0Drums1.0../datasets/lmd/proto/078e2cc7c46eb5de72621329...[Pop/SLEGDEHM.MID, PeterGabriel/Sledgehammer6....
.......................................
148../datasets/lmd/lmd_full/f/f2a97c0f7fa26244a77...False218.459582[0.0][70.00007000007]213.553358False192.0drums1.0../datasets/lmd/proto/f2a97c0f7fa26244a77a8d0f...[u/unchainedmelody3.mid, U/UnchainedMelody3.mid]
149../datasets/lmd/lmd_full/f/f6702b2445a6ce9aa8e...False211.143695[0.0][120.0]21.241667False480.0Drums1.0../datasets/lmd/proto/f6702b2445a6ce9aa8e6cad8...[e/ElectroJam.Mid, E/ElectroJam.Mid]
150../datasets/lmd/lmd_full/f/f8a56c3ebc00bfef70d...False157.793103[0.0][80.0]263.847656False192.0Drums1.0../datasets/lmd/proto/f8a56c3ebc00bfef70d033a8...[t/tearinhand.mid, T/tearinhand.mid]
151../datasets/lmd/lmd_full/f/fa06799e6825c6a4095...False284.928064[0.0][148.99983858350816]383.315852False240.0drums16.0../datasets/lmd/proto/fa06799e6825c6a4095e2934...[l/louder.mid, L/louder.mid]
152../datasets/lmd/lmd_full/f/fd37e5d1258ba97fc0e...False235.685752[0.0][100.0]191.795000False120.0Drums1.0../datasets/lmd/proto/fd37e5d1258ba97fc0ef9779...[Christian/Easyway.mid, Various Artists/easywa...
\n", "

153 rows × 12 columns

\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "0 ../datasets/lmd/lmd_full/0/0093ffc81428aabfa14... False \n", "1 ../datasets/lmd/lmd_full/0/032e298d9560fdbe954... False \n", "2 ../datasets/lmd/lmd_full/0/043e1c01e58a4bb4499... False \n", "3 ../datasets/lmd/lmd_full/0/06f86377b1dbad0ebf0... False \n", "4 ../datasets/lmd/lmd_full/0/078e2cc7c46eb5de726... False \n", ".. ... ... \n", "148 ../datasets/lmd/lmd_full/f/f2a97c0f7fa26244a77... False \n", "149 ../datasets/lmd/lmd_full/f/f6702b2445a6ce9aa8e... False \n", "150 ../datasets/lmd/lmd_full/f/f8a56c3ebc00bfef70d... False \n", "151 ../datasets/lmd/lmd_full/f/fa06799e6825c6a4095... False \n", "152 ../datasets/lmd/lmd_full/f/fd37e5d1258ba97fc0e... False \n", "\n", " estimate_tempo tempi_sec \\\n", "0 190.000285 [0.0] \n", "1 120.000180 [0.0] \n", "2 139.999977 [0.0, 0.4285715, 7.071422000000001, 39.071422] \n", "3 226.482465 [0.0, 1.0, 15.608691999999998] \n", "4 204.052854 [0.0] \n", ".. ... ... \n", "148 218.459582 [0.0] \n", "149 211.143695 [0.0] \n", "150 157.793103 [0.0] \n", "151 284.928064 [0.0] \n", "152 235.685752 [0.0] \n", "\n", " tempi end_time drums \\\n", "0 [190.0002850004275] 137.151110 False \n", "1 [190.0002850004275] 114.947196 False \n", "2 [69.99998833333528, 140.00014000014, 60.0, 69.... 196.778591 False \n", "3 [120.0, 115.0000287500072, 120.0] 59.483692 False \n", "4 [98.00014373354414] 281.211322 False \n", ".. ... ... ... \n", "148 [70.00007000007] 213.553358 False \n", "149 [120.0] 21.241667 False \n", "150 [80.0] 263.847656 False \n", "151 [148.99983858350816] 383.315852 False \n", "152 [100.0] 191.795000 False \n", "\n", " resolution instrument_names num_time_signature_changes \\\n", "0 192.0 Drums 1.0 \n", "1 120.0 Drums 1.0 \n", "2 120.0 Drums 1.0 \n", "3 192.0 DRUMS 1.0 \n", "4 192.0 Drums 1.0 \n", ".. ... ... ... \n", "148 192.0 drums 1.0 \n", "149 480.0 Drums 1.0 \n", "150 192.0 Drums 1.0 \n", "151 240.0 drums 16.0 \n", "152 120.0 Drums 1.0 \n", "\n", " proto_path \\\n", "0 ../datasets/lmd/proto/0093ffc81428aabfa14aaa13... \n", "1 ../datasets/lmd/proto/032e298d9560fdbe954d7551... \n", "2 ../datasets/lmd/proto/043e1c01e58a4bb44995ee38... \n", "3 ../datasets/lmd/proto/06f86377b1dbad0ebf08fdcd... \n", "4 ../datasets/lmd/proto/078e2cc7c46eb5de72621329... \n", ".. ... \n", "148 ../datasets/lmd/proto/f2a97c0f7fa26244a77a8d0f... \n", "149 ../datasets/lmd/proto/f6702b2445a6ce9aa8e6cad8... \n", "150 ../datasets/lmd/proto/f8a56c3ebc00bfef70d033a8... \n", "151 ../datasets/lmd/proto/fa06799e6825c6a4095e2934... \n", "152 ../datasets/lmd/proto/fd37e5d1258ba97fc0ef9779... \n", "\n", " original_files \n", "0 [M/Martha My Dear 3.mid, Beatles +GeorgeJohnPa... \n", "1 [s/smetmoth.mid, S/smetmoth.mid] \n", "2 [Songs+300/Tunnel_o.mid] \n", "3 [s/seaquest2.mid, S/seaquest2.mid] \n", "4 [Pop/SLEGDEHM.MID, PeterGabriel/Sledgehammer6.... \n", ".. ... \n", "148 [u/unchainedmelody3.mid, U/UnchainedMelody3.mid] \n", "149 [e/ElectroJam.Mid, E/ElectroJam.Mid] \n", "150 [t/tearinhand.mid, T/tearinhand.mid] \n", "151 [l/louder.mid, L/louder.mid] \n", "152 [Christian/Easyway.mid, Various Artists/easywa... \n", "\n", "[153 rows x 12 columns]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "explodeded_df = midi_df.explode('instrument_names')\n", "explodeded_df[\n", " (explodeded_df['instrument_names'].str.lower().isin(['drums', 'drum']))&\n", " (explodeded_df['drums'] == False)\n", "].groupby('midi_path').first().reset_index()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Listening to those examples reveal that some of them use another instrument than drums for their drums - either by artistic choice or by mistake.\n", "As this *only* applies to 153 files we can live with this error and ignore the files were we did not detect the drum track." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Filter and extract the data\n", "\n", "After inspecting all availabe files we should filter out some *noisy* files that are either not interesting to us (they lack a drum track) or the parsing of the MIDI files did not work as expected.\n", "With the analysis above we can verify borders of data we want to allow.\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
midi_pathmidi_errorestimate_tempotempi_sectempiend_timedrumsresolutioninstrument_namesnum_time_signature_changesproto_pathoriginal_files
5../datasets/lmd/lmd_full/9/9d30679480d0e55a07b...False224.502048[0.0][115.00002875000717]296.347752True120.0[Piano, Organ, Big Brass, Horns, Horns2, Horns...1.0../datasets/lmd/proto/9d30679480d0e55a07be1414...[S/Smooth .mid]
9../datasets/lmd/lmd_full/9/974f5b2466a5d5670df...False184.213982[0.0, 2.714283, 170.86248999999998, 171.960050...[210.00021000021, 80.99997165000993, 82.000035...225.896259True120.0[, , , , , , , , , ]1.0../datasets/lmd/proto/974f5b2466a5d5670dfddee0...[Sure.Polyphone.Midi/A Winter's Tale.mid, w/wi...
10../datasets/lmd/lmd_full/9/952df587b9fff5adbeb...False265.636110[0.0][144.0023040368646]216.460946True480.0[Track 2, Track 3, Track 4, Track 5, Track 6, ...1.0../datasets/lmd/proto/952df587b9fff5adbeb5f40b...[E/Enola Gay 63.mid, divers midi 3/OMD_-_Enola...
11../datasets/lmd/lmd_full/9/93e2cb3a0c36af361bb...False165.000165[0.0][165.000165000165]310.029993True480.0[Chorus Guitar, Overdrive Guita, Overdrive Gui...1.0../datasets/lmd/proto/93e2cb3a0c36af361bb49c5e...[z/zombie05.mid, Z/zombie05.mid]
12../datasets/lmd/lmd_full/9/97c1f795ffb676dfaa5...False212.877687[0.0][110.00011000011]222.766823True192.0[BASS, E.PIANO, GUITAR, STRINGS, PIANO, New Tr...1.0../datasets/lmd/proto/97c1f795ffb676dfaa5a0202...[J/J. Rivers Poor Side of Town.mid, divers mid...
.......................................
178555../datasets/lmd/lmd_full/5/5bd52f314b616bf50c4...False141.332152[0.0, 183.428388, 186.85695800000002, 241.9283...[70.00007000007, 35.00001458333941, 70.0000700...250.221166True480.0[Track 1, Track 2, Track 3, Track 4, Track 6, ...1.0../datasets/lmd/proto/5bd52f314b616bf50c40104c...[T/taylor_swift-tim_mcgraw.mid]
178556../datasets/lmd/lmd_full/5/5eea68ee369af3ee5cc...False184.252715[0.0, 210.73867815, 210.75516165, 210.77763355...[90.00009000009, 91.00009100009099, 89.0000400...299.296908True240.0[You've Made Me So Very Happy, You've Made Me ...1.0../datasets/lmd/proto/5eea68ee369af3ee5cc577ba...[Y/Youvemad L.mid, Y/YOUVEMAD L.mid, Y/YOUVEMA...
178557../datasets/lmd/lmd_full/5/5309fb3f62bf73780e6...False183.172080[0.0][81.01003309259853]139.390142True120.0[, , , , , , , , , ]1.0../datasets/lmd/proto/5309fb3f62bf73780e61f85b...[R/Roberto Carlos - Propuesta L.mid, Midis Jov...
178559../datasets/lmd/lmd_full/5/5538b0174111bcef760...False274.353398[0.0][142.00007100003546]172.389879True96.0[Std Drums, Jazz Gtr, Muted Gtr (Lead Dbl), Mu...1.0../datasets/lmd/proto/5538b0174111bcef760a5f29...[B/Boys 2.mid, b/boys_2.mid, Beatles +GeorgeJo...
178560../datasets/lmd/lmd_full/5/51e756897765aed30af...False229.821409[0.0][64.99999458333379]293.317332True96.0[, , , , , ]1.0../datasets/lmd/proto/51e756897765aed30af232df...[G/Gerry Boulet Deadline.mid]
\n", "

97848 rows × 12 columns

\n", "
" ], "text/plain": [ " midi_path midi_error \\\n", "5 ../datasets/lmd/lmd_full/9/9d30679480d0e55a07b... False \n", "9 ../datasets/lmd/lmd_full/9/974f5b2466a5d5670df... False \n", "10 ../datasets/lmd/lmd_full/9/952df587b9fff5adbeb... False \n", "11 ../datasets/lmd/lmd_full/9/93e2cb3a0c36af361bb... False \n", "12 ../datasets/lmd/lmd_full/9/97c1f795ffb676dfaa5... False \n", "... ... ... \n", "178555 ../datasets/lmd/lmd_full/5/5bd52f314b616bf50c4... False \n", "178556 ../datasets/lmd/lmd_full/5/5eea68ee369af3ee5cc... False \n", "178557 ../datasets/lmd/lmd_full/5/5309fb3f62bf73780e6... False \n", "178559 ../datasets/lmd/lmd_full/5/5538b0174111bcef760... False \n", "178560 ../datasets/lmd/lmd_full/5/51e756897765aed30af... False \n", "\n", " estimate_tempo tempi_sec \\\n", "5 224.502048 [0.0] \n", "9 184.213982 [0.0, 2.714283, 170.86248999999998, 171.960050... \n", "10 265.636110 [0.0] \n", "11 165.000165 [0.0] \n", "12 212.877687 [0.0] \n", "... ... ... \n", "178555 141.332152 [0.0, 183.428388, 186.85695800000002, 241.9283... \n", "178556 184.252715 [0.0, 210.73867815, 210.75516165, 210.77763355... \n", "178557 183.172080 [0.0] \n", "178559 274.353398 [0.0] \n", "178560 229.821409 [0.0] \n", "\n", " tempi end_time drums \\\n", "5 [115.00002875000717] 296.347752 True \n", "9 [210.00021000021, 80.99997165000993, 82.000035... 225.896259 True \n", "10 [144.0023040368646] 216.460946 True \n", "11 [165.000165000165] 310.029993 True \n", "12 [110.00011000011] 222.766823 True \n", "... ... ... ... \n", "178555 [70.00007000007, 35.00001458333941, 70.0000700... 250.221166 True \n", "178556 [90.00009000009, 91.00009100009099, 89.0000400... 299.296908 True \n", "178557 [81.01003309259853] 139.390142 True \n", "178559 [142.00007100003546] 172.389879 True \n", "178560 [64.99999458333379] 293.317332 True \n", "\n", " resolution instrument_names \\\n", "5 120.0 [Piano, Organ, Big Brass, Horns, Horns2, Horns... \n", "9 120.0 [, , , , , , , , , ] \n", "10 480.0 [Track 2, Track 3, Track 4, Track 5, Track 6, ... \n", "11 480.0 [Chorus Guitar, Overdrive Guita, Overdrive Gui... \n", "12 192.0 [BASS, E.PIANO, GUITAR, STRINGS, PIANO, New Tr... \n", "... ... ... \n", "178555 480.0 [Track 1, Track 2, Track 3, Track 4, Track 6, ... \n", "178556 240.0 [You've Made Me So Very Happy, You've Made Me ... \n", "178557 120.0 [, , , , , , , , , ] \n", "178559 96.0 [Std Drums, Jazz Gtr, Muted Gtr (Lead Dbl), Mu... \n", "178560 96.0 [, , , , , ] \n", "\n", " num_time_signature_changes \\\n", "5 1.0 \n", "9 1.0 \n", "10 1.0 \n", "11 1.0 \n", "12 1.0 \n", "... ... \n", "178555 1.0 \n", "178556 1.0 \n", "178557 1.0 \n", "178559 1.0 \n", "178560 1.0 \n", "\n", " proto_path \\\n", "5 ../datasets/lmd/proto/9d30679480d0e55a07be1414... \n", "9 ../datasets/lmd/proto/974f5b2466a5d5670dfddee0... \n", "10 ../datasets/lmd/proto/952df587b9fff5adbeb5f40b... \n", "11 ../datasets/lmd/proto/93e2cb3a0c36af361bb49c5e... \n", "12 ../datasets/lmd/proto/97c1f795ffb676dfaa5a0202... \n", "... ... \n", "178555 ../datasets/lmd/proto/5bd52f314b616bf50c40104c... \n", "178556 ../datasets/lmd/proto/5eea68ee369af3ee5cc577ba... \n", "178557 ../datasets/lmd/proto/5309fb3f62bf73780e61f85b... \n", "178559 ../datasets/lmd/proto/5538b0174111bcef760a5f29... \n", "178560 ../datasets/lmd/proto/51e756897765aed30af232df... \n", "\n", " original_files \n", "5 [S/Smooth .mid] \n", "9 [Sure.Polyphone.Midi/A Winter's Tale.mid, w/wi... \n", "10 [E/Enola Gay 63.mid, divers midi 3/OMD_-_Enola... \n", "11 [z/zombie05.mid, Z/zombie05.mid] \n", "12 [J/J. Rivers Poor Side of Town.mid, divers mid... \n", "... ... \n", "178555 [T/taylor_swift-tim_mcgraw.mid] \n", "178556 [Y/Youvemad L.mid, Y/YOUVEMAD L.mid, Y/YOUVEMA... \n", "178557 [R/Roberto Carlos - Propuesta L.mid, Midis Jov... \n", "178559 [B/Boys 2.mid, b/boys_2.mid, Beatles +GeorgeJo... \n", "178560 [G/Gerry Boulet Deadline.mid] \n", "\n", "[97848 rows x 12 columns]" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "filtered_midi_df = midi_df[\n", " (midi_df.drums == True)\n", " & (midi_df.midi_error == False)\n", " & (midi_df.end_time.between(30, 800))\n", " & (midi_df.num_time_signature_changes<=1)\n", "]\n", "filtered_midi_df" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "0.5479696014247232" ] }, "execution_count": null, "metadata": {}, "output_type": "execute_result" } ], "source": [ "len(filtered_midi_df)/len(midi_df)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We have now filtered half of our available examples, just by looking at the metadata of the MIDI files.\n", "The next step is to transform the drum track of each MIDI file into a mathematical representation. \n", "\n", "The notatation within the MIDI file is bound to time, but in music we often represent the time domain in relation to a tempo which remaps the time measured in seconds.\n", "A common unit for this is *beats per minute* (bpm) where 60 bpm maps each second to a *beat* - if we want to shorten the time between two beats (commonly refered to as *faster*) we increase the *bpm* and vice versa.\n", "\n", "Another instance for the segmentation of events in time is the *time signature* of a track.\n", "\n", "One notation which tries to simplify this notation is in the form of a *step seqencer* in which we snap notes to the nearest step on a grid.\n", "\n", "For this we can use *note_seq* with the function [midi_file_to_drum_track](https://github.com/magenta/note-seq/blob/6631a4299092cd52201e2f4a8e36c1223d76561f/note_seq/drums_lib.py#L269)." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Conclusion\n", "\n", "We filtered out all noisy examples of our dataset and transformed the drum track into a mathematical, simplified notation.\n", "\n", "In the next step we will start to analyse the drum patterns that we extracted." ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3" } }, "nbformat": 4, "nbformat_minor": 4 }