next_inactive up previous


Standards for Music Information

darkness

Version 0.7.1
Last revised: September 13, 2006
Generated: September 13, 2006


Contents

Introduction

This document specifies standards for acquisition and formatting of music information. ``Music information'' includes album titles and artists' names, among other pieces of information that might be associated with music. The primary goal for this standard is to provide for consistency across catalogs of music information.

Assumptions and limitations

This document was written with rules for American English in mind.

The motivation for writing this document was to provide a standard for music information that would then be embedded within digitally encoded music; in particular, this digitally encoded music would be in the form of compressed files, usually split up by track. Additionally, only contemporary music methods of composition and distribution were seriously considered. Thus, for genres such as Classical music or music not distributed in a typical album-with-tracks set-up, this document may prove highly inadequate.

Definitions

This section defines terms that will be used throughout this document with special meaning. Other terms may be introduced as they are used, at which time their definitions will be made clear.

Catalog

A catalog is a collection of music information. Often this might be called a ``collection'' (as in ``a collection of music''), but use of this term would be in conflict with the definition found in section 2.2. For the purposes of this document catalog will always be used to refer to the music information for all tracks in a person's library.


Sets, Collections, and Tracks

A set
is a group of collections that are distributed together. For example, an ``album'' containing two CDs is a two CD set. It is possible for a set to contain only one collection.
A collection
is a group of tracks (see below). The most common type of collection is a single piece of physical media, such as a single CD. It is possible for a collection to contain only one track.
A track
is a single element of a collection. In most cases, it is safe to think of a track as a single song from a collection.

Fields

Information about a track can be split up into small, isolated units called fields. A field is a single, specific piece of information about a track. Fields have a full name, and are also given ``short field identifiers'' which are usually a three to four letter abbreviation for a field. Either of these names can be used in this or other documents to refer to a specific field and its accompanying formatting as defined in this document.

Fields are said to be unset before they have been assigned a value, or set after information has been acquired, formatted, and assigned to the field. Fields may also be formatted fields or unformatted fields. A formatted field is one that contains some sort of text formatting (for example, all lowercase) that is clearly intentional and artistically important. An unformatted field is a field that has no intentional or meaningful text formatting.


Standard fields

This section defines a set of commonly used fields. Information about the acquisition and formatting of these fields can be found in sections 4 and 5.

Artist name

ARTT

Artist name is the name of the primary performing artist for a track. This field must be unique in a catalog.

Extra artists

XART

The extra artists field is used for the name(s) of performing artist(s) not listed in the artist name field. For example, a featured artist would have their name listed in the extra artists field.

Modifying artists

MART

Modifying artists holds the name of any artist(s) who modified the track from its original form. The artist(s) named in this field does not qualify for the extra artists field since they did not contribute to the original performance version, but did modify it after recording. For example, if an artist were credited as remixing a track, their name would be placed in this field.

Compilation producer

CMPT

Compilation producer holds a list of one or more people involved in putting together a compilation set. For example, the name of the DJ that mixed a compilation set would be put in this field.

Set title

SETT

Set title is the title of the set that includes the track. The pair of fields set title and set subtitle must be unique among all other set title/subtitle pairs within an artist name.

Set subtitle

SETS

Set subtitle is the subtitle of the set that includes the track.

Country of release

CNTR

Country of release describes the country in which the track's set was released.

Media type

MEDA

Media type specifies the original type of media a particular set (or copy of a set) was released on.

Release date

RELD

Release date is the date this track's set was released to the general public (date of main/major distribution).

Collection title

COLT

Collection title is the title of the collection on which the track appears. The pair of fields collection title and collection subtitle must be unique among all other collection title/subtitle pairs within a set title/subtitle pair.

Collection subtitle

COLS

Collection subtitle is the subtitle for the collection the track appears on.

Collection position and collection total

COLP,COLZ

Collection position is the position within a set occupied by this collection. Collection total is the total number of collections in the set. Both of these numbers must always be greater than or equal to one, and both must be integers.

For example, for a two CD set the first CD (collection) in the set would have a collection position of 1, and both discs would have collection total set to 2. The second CD in the set would, of course, have a collection position of 2.

Genre

GENR

Genre is the style of music that this track belongs to.

Track title

TRKT

Track title is the name of the track.

Track subtitle

TRKS

Track subtitle is the subtitle of the track.

Modification subtitle

MODS

Modification subtitle holds an additional subtitle specifically for a subtitle added by an artist after modifying a track from its original recorded version. For example, a remix subtitle would be placed in this field.

Track position and tracks total

TRKP,TRKZ

Track position and tracks total are analogous to Set position and set total. Track position is the track's position within the collection this track is found on. Tracks total is the number of tracks in the collection. Both of these numbers must be greater than or equal to one, and both must be integers.


Acquisition

This section deals with some issues regarding the acquisition of data for fields.

Note that much of music information acquisition requires proper judgment on the part of the acquisitor. There are two general rules to help guide this judgment, in order of descending importance:

  1. Keep information common between different tracks consistent wherever possible. (Note that ``consistent'' does not necessarily mean ``the same''; for example, see information about artist name DJ mixes.)
  2. Preserve the original artist's intent.
Any other concerns may be considered only after these two rules do not result in a clear conclusion.

Authoritative sources of information

This section deals with sources for music information that are considered trustworthy and thus authoritative. Sources not listed in this section of the document must be judged relative to the quality and consistency of these sources, and must only be used as a last resort. If no trustworthy source of information can be found, the acquisition guidelines for unknown fields in section 4.11 must be followed with all fields initially unknown.

The following sources are listed in order of descending preference. The first source from this list to offer information must be selected as the primary information source. Missing information must be acquired from sources of lower preference than the primary information source, but only after the primary information source has been checked. More details on the proper acquisition of music information for specific fields follow this section.

Collection/Set packaging

Original collection or set packaging is usually straightforward to read. However, crosschecking other sources of lower preference is recommended, and required when the packaging is of questionable quality, origin, or a mistake (such as a printing error) is suspected. In cases of questionable packaging, other sources of lower preference can be selected as the primary information source. Missing information must be filled in from another source.

The following are some additional guidelines and tips for reading collection or set packaging. In many cases, if any of the below items are encountered the information must be crosschecked as described above.

Web sites

A common source of information for many albums, especially those that were obtained without original packaging, would be web sites. These are subcategorized, in order of descending preference:

Artist's or label's websites.
The authenticity of these sites must be ensured before acquiring information from them.
Trusted cataloging sites.
See appendix A for a list of sites currently considered trusted at the time of this document's publication.
Trusted retail sites.
See appendix B for a list of sites currently considered trusted at the time of this document's publication.
Fan sites, or other sites.
These sites must possess longevity, have good design, current content, and accurate information (as verified using any known good information from other sources); in other words, high-quality sites. This category of sites must only be used as a last resort.

Checking multiple sites for information is suggested. Also beware of cataloging nuances of some web sites, especially those in the first three categories. For example, some sites might elide articles or leave off featured artists.

Electronically distributed or bootleg music

In cases of electronic-only (i.e., Internet), bootleg, or other unauthorized distribution of music, often times the data files (the packaging) distributed with the music must be used as the primary source of information. If simple spelling, grammar, or punctuation errors are suspected with reasonable certainty, these errors must be corrected.


Compilation sets

Any set containing original works from multiple artists, modified from their original form or not, is considered a compilation set. For example, this might include soundtracks, ``best of'' albums, remixes, or DJ recordings.

Artist names

In order of descending preference, one of the following methods must be used to determine the artist name for a track that is part of a compilation set:

  1. The artist name from the original version of the track. For example, if the track is a remix, the artist name will be the original artist's name rather than the artist who remixed the track. This method may only be used for sets where up to 10% of artist names unknown, with the rules in section 4.11 being used for the unknown artist name fields. If more than 10% of artist names are unknown, another method must be chosen.
  2. The compilation performer, editor, or producer if credited prominently on the album. For example, for a DJ recording where the track's original artist is not known, the DJ's name must be used.
  3. The string ``Various Artists'' may be used for the artist name.
Exactly one of these methods must be used for an entire compilation set.

Obtaining information from other sets

Since compilation sets are composed of tracks probably found elsewhere, if you can find the musically identical track on another set this other set can be used to acquire missing information. However, extra care must be taken to ensure the tracks are musically identical.

Pre-release sets

Pre-release sets are sets that are released before the final public distribution of a set has commenced. Pre-release sets must have their music information after the official release where cases of misspelling, punctuation errors, or similar mistakes were included in the pre-release packaging. Track ordering or widely different track names must remain unchanged.

Bootleg sets

For bootleg sets without set titles, the set title must be set as follows:

Bootleg, <Location>
Where ``<Location>'' is the geographic location where the bootleg was performed. If the location is not known, that portion of the set title must be left off, leaving only ``Bootleg''.

All other fields for a bootleg set follow normal rules.

Bonus tracks

Some sets will contain tracks that are not documented on the packaging. These are frequently called ``hidden tracks'' or ``bonus tracks.'' If no information on these tracks is available from any authoritative sources, the following rules must be used to derive certain field values for the tracks. These rules are to be applied in combination with, but take precedence over, the rules for unknown fields (see section 4.11).

Track title

If no value for the track title field is available, the value ``Bonus track'' should be used. Note that this value should also be used for those tracks which are separate, but which you would consider ``hidden tracks.''

Track position and tracks total

The track position assigned to the track depends on where the track occurs in its collection:

Sequences of bonus tracks should be kept in order, and track positions of other tracks (i.e., documented tracks) should be increased as necessary. For example, a collection where the first documented track has track position one, but is preceeded by three bonus tracks, would have its three bonus tracks numbered one, two, and three in the order they occur, followed by the first documented track taking on track position four.

The tracks total field must reflect all documented tracks as well as bonus tracks.

Censored fields

Sometimes field contents are obviously modified to remove objectionable content. For example, a track title such as Bulls!@# has most likely been censored. In such cases the censored content will remain.2

Set titles and collection titles

In cases where a distinction between set title and collection title is not clear, the two fields are filled as follows:

Set subtitles and collection subtitles

Beware of cases where a title is listed with a parenthetical name afterwards. For example if a song title was listed on an album cover was ``Foxes Running (Kill the Guys)'', ``Kill the Guys'' would be the set subtitle.

Genre

Genre can be very subjective. To increase the chance of consistency of genres between different catalogs, very broad genres should be used. Wherever possible, the genre field should be filled with data acquired from an authoritative source of information, or derived from information acquired from an authoritative source of information. For example, using ``Alternative'' when an online retail site categorizes the music as ``Alternative Rock''.

Also note that the genre field must be consistent for all tracks within a collection, as per section 4.12. Additionally, if possible and appropriate, the genre field should have the same value for all tracks within a set.

Release date for music transferred to a new media

Sometimes an older work is transferred to a new media without any change beyond track ordering. In this case, the release date must be the release date of the first release of the set. Note that the transferred set must contain the exact same tracks that the original work did, or else the release date must be set to the release date of the new media version of the set.


Unknown information

In some cases proper information for a field cannot be acquired, but a value is needed for the field regardless. The following guidelines cover what values a few fields must take on if no information can be acquired for the field. Unset (unknown) fields not covered below must remain unset.

Artist name
must be set to ``Unknown Artist''.
Track name
must be set to ``Unknown''.
Media type
must be set to ``Collection''.


Consistency of information within a set

Some fields for tracks within a set must be consistent across all tracks in that set. Additionally, some fields for tracks within a collection must be consistent across all tracks in the collection. Both of these categories are covered in table 1.



Table 1: Consistent fields
Field Consistent for tracks within
Artist name Collection1
Set title Set
Collection title Collection
Collection subtitle Collection
Collection position Collection
Collection total Set
Tracks total Collection
Release date Set
Genre Collection
Country of release Set
Media type Collection
1 For compilation sets, ``consistency'' means following the rules in section 4.2.



Formatting

After the data has been acquired, some fields may require additional formatting. Wherever applicable, the rules in this section must be followed to ensure consistent presentation of field data.

Note that formatted fields cannot have their formatting altered or removed by the rules below. Adding tags to formatted fields is permitted.


Tags

A tag is a specific method for appending information to a field. Tag data is not part of field data, but is extra information added to qualify or otherwise augment acquired field data. Tags are used, for example, to satisfy uniqueness for two fields that would otherwise be identical.

A tag is separated from the end of the actual field data by a single space. All tag data for a field is enclosed in one pair of square brackets. The tag data cannot contain leading or trailing spaces. Any character is valid in tag data except for a semi-colon (``;''). A semi-colon is used to separate multiple pieces of tag data. For a field with no tag, the square brackets are placed as described and then the desired tag data is placed between the square brackets. For fields with existing tags, a semi-colon followed by a space is placed at the end of the existing tag data, and then the new tag data is appended.

For example, assume there is a field with the following contents:

Some Field Data
If you were to add the tag ``Some Tag Data'' to this field, the field would now have the following contents:
Some Field Data [Some Tag Data]
If you were to add another tag containing ``Some Other Tag Data'' to this field, the field would have the following contents:
Some Field Data [Some Tag Data; Some Other Tag Data]


General formatting

This section presents general formatting guidelines. These rules must be followed for all fields, unless superseded by a rule given for a specific field later in the formatting section, or unless overridden by the presence of a formatted field.

Fields with multiple values

When a field has multiple values, those fields must be separated by a comma followed by a space. For example, for a track which has ``The Foos'' and ``Bar Brady'' as values for the extra artists field, the final content of the field should be ``The Foos, Bar Brady''.

Capitalization

Capitalization for unformatted fields is to follow normal American English rules for capitalization wherever possible. A field will be treated like a title for purposes of capitalization.3 One noted exception to the rules of capitalization are values or parts of values that are explicitly supplied in this document, such as the content of a tag; such content must appear as supplied.

For reference, a few major capitalization rules from American English are:

There are a few special cases which may not be clearly covered in American English reference works:

Punctuation and other symbols

The ampersand character (&) when used as a conjunction must be replaced with the word ``and''. The exception to this rule is for ampersands in acronyms, in which an ampersand must not be changed.

Any series of two or more dots that is thought to represent an ellipsis must be replaced with a proper ellipsis, ``...''. There must be no space before an ellipsis, and a single space after an ellipsis if text follows the ellipsis.

A single hyphen must be used for hyphenation, ranges, and limits. Two hyphens must be used for an em dash (``--'').


Dates

Dates must be in ISO 8601 date format. This format is:

YYYY-MM-DD

Where YYYY is the four-digit year, MM is the two digit zero-padded month, and DD is the two digit zero-padded day of the month. For example, February 14, 2002 would be written as 2002-02-14. The fields in a date may be removed from right to left if the value is not known. This means that YYYY-MM and YYYY are also valid date formats, but YYYY-DD or MM-DD is not valid. Unless otherwise noted, as much of a date as is known must be recorded.


Geographic locations

Geographic locations are given as:

Specific Information, CC
Where CC is the ISO 3166-1 Alpha-2 country code and Specific Information is any information about state, province, or other designator that indicates a subset of the country given by CC.

Unless otherwise noted, geographic locations must be given in the broadest form possible. In most cases, for example, the Specific Information can be left off. In this case the field appears as just ``CC''.

Pre-release sets

A pre-release set that cannot conclusively be proven to be musically identical to the final released version of the set must have the tag ``Pre-release'' added to the set title.

Censored (``Clean'') sets

Some albums are released in two versions: an ``explicit'' version which is the unmodified work by the artist, and a ``clean'' version which has any profanity or unacceptable content removed. For example, a few large retail chains in the US sell only the clean version of an album. In the event that an album is found to be the clean version, the set title must include the tag ``Clean''. Note that if an album is censored (clean) but there is no explicit version in distribution this section does not apply.

Artist name

The artist name must always be in ``natural format'', meaning first name first (as opposed to last name first). For example, ``Fiona Apple'' not ``Apple, Fiona''.

Leading articles in an artist's name must be relocated to the end, separated from the rest of the name by a comma. For example, ``The Beatles'' must be changed to ``Beatles, The''.

Collection subtitle

When collection total is greater than one, a tag containing the media type field, exactly one space, and the collection position must be added to the collection subtitle. For example, for the second CD in a three CD set, a tag containing ``CD 2'' must be added to the collection subtitle. Note that this requirement ensures the uniqueness of collection title/subtitle, as required in their definition.

Satisfying uniqueness requirements

The fields artist name and set title/subtitle have uniqueness requirements. This section covers all permitted methods for resolving uniqueness conflicts in these fields.

Where multiple methods for resolution are listed, these methods must be applied in order, and their tags added in the order listed, unless otherwise specified. The fewest methods needed to satisfy a field's uniqueness requirements must be used. Any methods applied must be applied to the field or fields for all tracks involved in the conflict.

Note that, when entering music information for a track, the utmost care for future uniqueness of data must be taken. Sources of data should be checked for other artist names or versions of albums, for example, and the precautions below should be taken when a duplicate is possible, even when such a duplicate does not exist in your catalog.

Artist name

Artist names that are non-unique within a catalog must apply one or more of the below methods:

In the extreme case where none of the above methods suffice, the only tag added by this section must be an upper case Roman numeral to each conflicting artist name in whatever orders the cataloger sees fit.

Set title

If both set title and set subtitle are duplicated for an artist name, one or more of the following methods must be used to make the set title unique:

If none of the above methods are sufficient to make the set title/subtitle unique, an upper case Roman numeral tag must be added to the set subtitle field of each track involved in the conflict. The Roman numerals must be in order of set release date. The above methods must be used first to make the smallest number of sets in conflict, only after which must the Roman numeral tag be added.

Acknowledgments and comments

Acknowledgments

Thanks to...

To-Do

Contacting the author

The author of this document can be contacted at mailto:darkness@codefu.orgdarkness@codefu.org. Please contact me with any grammar or spelling issues, omissions, or parts that are not clear. Interpretation questions would also be welcome. If you simply don't agree with something in this document, and you're sure you understand it correctly, you can contact me; I just don't guarantee that I'll care, unless you present me with a well-reasoned argument.


Trusted cataloging sites

This is a list of acknowledged trusted cataloging web sites as of the revision date of this document. These sites seem to store reliable, consistent information; they make no over commercial measures, other than perhaps ads and even links to retail sites; and they often times include information specifically formatted for programmatic or otherwise standard music cataloging data.


\begin{urllist}
\item\selfhref{http://www.allmusic.com/}
\end{urllist}


Trusted retail sites

This is a list of acknowledged trusted retail web sites as of the revision date of this document. These are large, trusted retail sites that have been selling music for quite some time on the Internet. They seem largely consistent, have a large amount of musical information (much of probably gleaned directly from the label), and are usable sites.


\begin{urllist}
\item\selfhref{http://www.amazon.com/}
\end{urllist}

Acknowledgments and comments

Acknowledgments

Thanks to...

To-Do

Contacting the author

The author of this document can be contacted at mailto:darkness@codefu.orgdarkness@codefu.org. Please contact me with any grammar or spelling issues, omissions, or parts that are not clear. Interpretation questions would also be welcome. If you simply don't agree with something in this document, and you're sure you understand it correctly, you can contact me; I just don't guarantee that I'll care, unless you present me with a well-reasoned argument.

Revision history

Version 0.1:
- Initial release.

Version 0.2:
- Rephrased several things.
- Moved sections around to make things more logical.
- Added tags description.
- Added revision date to cover page.
- Added bootleg set titles to acquisition.
- Added set subtitle and modified subtitle fields.
- Completely rewrote rules for non-unique artist name and set
  title resolution.
- Removed article relocation for titles.
- Made sure to mention that genres should be broad.
- Probably a few other things I forgot.

Version 0.3:
- Removed stray ``Tags'' section under definitions.
- Renamed ``Modified subtitle'' to ``Modification subtitle''.
- Added Roman numerals as last resort to set title/subtitle
  for uniqueness.
- Changed special edition tag from ``Bonus'' to ``Special''.
- Changed title from ``Standards for Music Information
  Formatting'' to just ``Standards for Music Information''.
- Fixed several typos, bad phrasings, etc.
- Changed a bunch of ``should'' to ``must''.
- Added contact section.
- Added natural ordering requirement for artist name field
  to formatting section.
- Moved collection title/subtitle uniqueness directly under
  formatting: now it is a requirement to add ``CD 1'' and
  such whenever collection total is greater than one.
- Tags added for uniqueness of set title/subtitle have been
  moved from set title to set subtitle.
- Added section about set and collection subtitles enclosed
  in parentheses.

Version 0.4:
- Changed unknown field values from an option (``may'') to a
  requirement (``must'').
- Fixed a grammar/spelling error or two.

Version 0.5:
- Added compilation producer field.

Version 0.6:
- Added ``Clean'' tag.
- Added section on censored fields.
- Added capitalization rules for composite and unknown words.
- Added prepositions to the list of examples of words not to be
  capitalized.
- Described comma-separated fields briefly.
- Added acquisition rules for bonus tracks.
- Updated to-do.
- Changed contact information.
- Removed CDNow from trusted retail sites, since it's now totally
  Amazon.
- Changed capitalization rules to follow that of titles instead
  of sentences.

Version 0.7:
- Added requirement on ordering by release date to roman
  numerals for ensuring set title uniqueness.
- Added article relocation for artist name.
- Changed unknown information section to indicate that the
  given values ``must'' be used for an unset field, instead of
  ``may'' be used.
- Added rule for a release date on an album which is merely
  a transference of an older work to a new media.
- Added some texttt in examples, like ``Fiona Apple''.

Version 0.7.1:
- Andy corrected my grammar: parenthesis->parentheses in many
  places.

$Revision: 1.4 $

About this document ...

Standards for Music Information

This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)

Copyright © 1993, 1994, 1995, 1996, Nikos Drakos, Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999, Ross Moore, Mathematics Department, Macquarie University, Sydney.

The command line arguments were:
latex2html -split 0 smi.tex

The translation was initiated by darkness on 2006-09-13


Footnotes

... than1
This provides for some sort of obscure case where the track position numbering may have gaps in the numbering.
... remain.2
Ideally the censorship could be reversed where the original content was clear and the artist was unhappy with the censorship. Obviously determining the intent of the artist, and in some cases even determining the original content of the field, could become very subjective and lead to many divergent versions of the field among copies of the same work. This is unacceptable.
... capitalization.3
This is an important distinction to make so that the rule for capitalizing the first word in the field is within the rules of American English capitalization.

next_inactive up previous
darkness 2006-09-13