darkness
Version 0.7.1
Last revised: September 13, 2006
Generated: September 13, 2006
This document specifies standards for acquisition and formatting of music information. ``Music information'' includes album titles and artists' names, among other pieces of information that might be associated with music. The primary goal for this standard is to provide for consistency across catalogs of music information.
This document was written with rules for American English in mind.
The motivation for writing this document was to provide a standard for music information that would then be embedded within digitally encoded music; in particular, this digitally encoded music would be in the form of compressed files, usually split up by track. Additionally, only contemporary music methods of composition and distribution were seriously considered. Thus, for genres such as Classical music or music not distributed in a typical album-with-tracks set-up, this document may prove highly inadequate.
This section defines terms that will be used throughout this document with special meaning. Other terms may be introduced as they are used, at which time their definitions will be made clear.
A catalog is a collection of music information. Often this might be called a ``collection'' (as in ``a collection of music''), but use of this term would be in conflict with the definition found in section 2.2. For the purposes of this document catalog will always be used to refer to the music information for all tracks in a person's library.
Fields are said to be unset before they have been assigned a value, or set after information has been acquired, formatted, and assigned to the field. Fields may also be formatted fields or unformatted fields. A formatted field is one that contains some sort of text formatting (for example, all lowercase) that is clearly intentional and artistically important. An unformatted field is a field that has no intentional or meaningful text formatting.
This section defines a set of commonly used fields. Information about the acquisition and formatting of these fields can be found in sections 4 and 5.
Artist name is the name of the primary performing artist for a track. This field must be unique in a catalog.
The extra artists field is used for the name(s) of performing artist(s) not listed in the artist name field. For example, a featured artist would have their name listed in the extra artists field.
Modifying artists holds the name of any artist(s) who modified the track from its original form. The artist(s) named in this field does not qualify for the extra artists field since they did not contribute to the original performance version, but did modify it after recording. For example, if an artist were credited as remixing a track, their name would be placed in this field.
Compilation producer holds a list of one or more people involved in putting together a compilation set. For example, the name of the DJ that mixed a compilation set would be put in this field.
Set title is the title of the set that includes the track. The pair of fields set title and set subtitle must be unique among all other set title/subtitle pairs within an artist name.
Set subtitle is the subtitle of the set that includes the track.
Country of release describes the country in which the track's set was released.
Media type specifies the original type of media a particular set (or copy of a set) was released on.
Release date is the date this track's set was released to the general public (date of main/major distribution).
Collection title is the title of the collection on which the track appears. The pair of fields collection title and collection subtitle must be unique among all other collection title/subtitle pairs within a set title/subtitle pair.
Collection subtitle is the subtitle for the collection the track appears on.
Collection position is the position within a set occupied by this collection. Collection total is the total number of collections in the set. Both of these numbers must always be greater than or equal to one, and both must be integers.
For example, for a two CD set the first CD (collection) in the set would have a collection position of 1, and both discs would have collection total set to 2. The second CD in the set would, of course, have a collection position of 2.
Genre is the style of music that this track belongs to.
Track title is the name of the track.
Track subtitle is the subtitle of the track.
Modification subtitle holds an additional subtitle specifically for a subtitle added by an artist after modifying a track from its original recorded version. For example, a remix subtitle would be placed in this field.
Track position and tracks total are analogous to Set position and set total. Track position is the track's position within the collection this track is found on. Tracks total is the number of tracks in the collection. Both of these numbers must be greater than or equal to one, and both must be integers.
This section deals with some issues regarding the acquisition of data for fields.
Note that much of music information acquisition requires proper judgment on the part of the acquisitor. There are two general rules to help guide this judgment, in order of descending importance:
This section deals with sources for music information that are considered trustworthy and thus authoritative. Sources not listed in this section of the document must be judged relative to the quality and consistency of these sources, and must only be used as a last resort. If no trustworthy source of information can be found, the acquisition guidelines for unknown fields in section 4.11 must be followed with all fields initially unknown.
The following sources are listed in order of descending preference. The first source from this list to offer information must be selected as the primary information source. Missing information must be acquired from sources of lower preference than the primary information source, but only after the primary information source has been checked. More details on the proper acquisition of music information for specific fields follow this section.
Original collection or set packaging is usually straightforward to read. However, crosschecking other sources of lower preference is recommended, and required when the packaging is of questionable quality, origin, or a mistake (such as a printing error) is suspected. In cases of questionable packaging, other sources of lower preference can be selected as the primary information source. Missing information must be filled in from another source.
The following are some additional guidelines and tips for reading collection or set packaging. In many cases, if any of the below items are encountered the information must be crosschecked as described above.
A common source of information for many albums, especially those that were obtained without original packaging, would be web sites. These are subcategorized, in order of descending preference:
Checking multiple sites for information is suggested. Also beware of cataloging nuances of some web sites, especially those in the first three categories. For example, some sites might elide articles or leave off featured artists.
In cases of electronic-only (i.e., Internet), bootleg, or other unauthorized distribution of music, often times the data files (the packaging) distributed with the music must be used as the primary source of information. If simple spelling, grammar, or punctuation errors are suspected with reasonable certainty, these errors must be corrected.
Any set containing original works from multiple artists, modified from their original form or not, is considered a compilation set. For example, this might include soundtracks, ``best of'' albums, remixes, or DJ recordings.
In order of descending preference, one of the following methods must be used to determine the artist name for a track that is part of a compilation set:
Since compilation sets are composed of tracks probably found elsewhere, if you can find the musically identical track on another set this other set can be used to acquire missing information. However, extra care must be taken to ensure the tracks are musically identical.
Pre-release sets are sets that are released before the final public distribution of a set has commenced. Pre-release sets must have their music information after the official release where cases of misspelling, punctuation errors, or similar mistakes were included in the pre-release packaging. Track ordering or widely different track names must remain unchanged.
For bootleg sets without set titles, the set title must be set as follows:
All other fields for a bootleg set follow normal rules.
Some sets will contain tracks that are not documented on the packaging. These are frequently called ``hidden tracks'' or ``bonus tracks.'' If no information on these tracks is available from any authoritative sources, the following rules must be used to derive certain field values for the tracks. These rules are to be applied in combination with, but take precedence over, the rules for unknown fields (see section 4.11).
If no value for the track title field is available, the value ``Bonus track'' should be used. Note that this value should also be used for those tracks which are separate, but which you would consider ``hidden tracks.''
The track position assigned to the track depends on where the track occurs in its collection:
Sequences of bonus tracks should be kept in order, and track positions of other tracks (i.e., documented tracks) should be increased as necessary. For example, a collection where the first documented track has track position one, but is preceeded by three bonus tracks, would have its three bonus tracks numbered one, two, and three in the order they occur, followed by the first documented track taking on track position four.
The tracks total field must reflect all documented tracks as well as bonus tracks.
Sometimes field contents are obviously modified to remove objectionable content. For example, a track title such as Bulls!@# has most likely been censored. In such cases the censored content will remain.2
In cases where a distinction between set title and collection title is not clear, the two fields are filled as follows:
Beware of cases where a title is listed with a parenthetical name afterwards. For example if a song title was listed on an album cover was ``Foxes Running (Kill the Guys)'', ``Kill the Guys'' would be the set subtitle.
Genre can be very subjective. To increase the chance of consistency of genres between different catalogs, very broad genres should be used. Wherever possible, the genre field should be filled with data acquired from an authoritative source of information, or derived from information acquired from an authoritative source of information. For example, using ``Alternative'' when an online retail site categorizes the music as ``Alternative Rock''.
Also note that the genre field must be consistent for all tracks within a collection, as per section 4.12. Additionally, if possible and appropriate, the genre field should have the same value for all tracks within a set.
Sometimes an older work is transferred to a new media without any change beyond track ordering. In this case, the release date must be the release date of the first release of the set. Note that the transferred set must contain the exact same tracks that the original work did, or else the release date must be set to the release date of the new media version of the set.
In some cases proper information for a field cannot be acquired, but a value is needed for the field regardless. The following guidelines cover what values a few fields must take on if no information can be acquired for the field. Unset (unknown) fields not covered below must remain unset.
Some fields for tracks within a set must be consistent across all tracks in that set. Additionally, some fields for tracks within a collection must be consistent across all tracks in the collection. Both of these categories are covered in table 1.
| ||||||||||||||||||||||||
After the data has been acquired, some fields may require additional formatting. Wherever applicable, the rules in this section must be followed to ensure consistent presentation of field data.
Note that formatted fields cannot have their formatting altered or removed by the rules below. Adding tags to formatted fields is permitted.
A tag is a specific method for appending information to a field. Tag data is not part of field data, but is extra information added to qualify or otherwise augment acquired field data. Tags are used, for example, to satisfy uniqueness for two fields that would otherwise be identical.
A tag is separated from the end of the actual field data by a single space. All tag data for a field is enclosed in one pair of square brackets. The tag data cannot contain leading or trailing spaces. Any character is valid in tag data except for a semi-colon (``;''). A semi-colon is used to separate multiple pieces of tag data. For a field with no tag, the square brackets are placed as described and then the desired tag data is placed between the square brackets. For fields with existing tags, a semi-colon followed by a space is placed at the end of the existing tag data, and then the new tag data is appended.
For example, assume there is a field with the following contents:
Some Field DataIf you were to add the tag ``Some Tag Data'' to this field, the field would now have the following contents:
Some Field Data [Some Tag Data]If you were to add another tag containing ``Some Other Tag Data'' to this field, the field would have the following contents:
Some Field Data [Some Tag Data; Some Other Tag Data]
This section presents general formatting guidelines. These rules must be followed for all fields, unless superseded by a rule given for a specific field later in the formatting section, or unless overridden by the presence of a formatted field.
When a field has multiple values, those fields must be separated by a comma followed by a space. For example, for a track which has ``The Foos'' and ``Bar Brady'' as values for the extra artists field, the final content of the field should be ``The Foos, Bar Brady''.
Capitalization for unformatted fields is to follow normal American English rules for capitalization wherever possible. A field will be treated like a title for purposes of capitalization.3 One noted exception to the rules of capitalization are values or parts of values that are explicitly supplied in this document, such as the content of a tag; such content must appear as supplied.
For reference, a few major capitalization rules from American English are:
There are a few special cases which may not be clearly covered in American English reference works:
The ampersand character (&) when used as a conjunction must be replaced with the word ``and''. The exception to this rule is for ampersands in acronyms, in which an ampersand must not be changed.
Any series of two or more dots that is thought to represent an ellipsis must be replaced with a proper ellipsis, ``...''. There must be no space before an ellipsis, and a single space after an ellipsis if text follows the ellipsis.
A single hyphen must be used for hyphenation, ranges, and limits. Two hyphens must be used for an em dash (``--'').
Dates must be in ISO 8601 date format. This format is:
Where YYYY is the four-digit year, MM is the two digit zero-padded month, and DD is the two digit zero-padded day of the month. For example, February 14, 2002 would be written as 2002-02-14. The fields in a date may be removed from right to left if the value is not known. This means that YYYY-MM and YYYY are also valid date formats, but YYYY-DD or MM-DD is not valid. Unless otherwise noted, as much of a date as is known must be recorded.
Geographic locations are given as:
Unless otherwise noted, geographic locations must be given in the broadest form possible. In most cases, for example, the Specific Information can be left off. In this case the field appears as just ``CC''.
A pre-release set that cannot conclusively be proven to be musically identical to the final released version of the set must have the tag ``Pre-release'' added to the set title.
Some albums are released in two versions: an ``explicit'' version which is the unmodified work by the artist, and a ``clean'' version which has any profanity or unacceptable content removed. For example, a few large retail chains in the US sell only the clean version of an album. In the event that an album is found to be the clean version, the set title must include the tag ``Clean''. Note that if an album is censored (clean) but there is no explicit version in distribution this section does not apply.
The artist name must always be in ``natural format'', meaning first name first (as opposed to last name first). For example, ``Fiona Apple'' not ``Apple, Fiona''.
Leading articles in an artist's name must be relocated to the end, separated from the rest of the name by a comma. For example, ``The Beatles'' must be changed to ``Beatles, The''.
When collection total is greater than one, a tag containing the media type field, exactly one space, and the collection position must be added to the collection subtitle. For example, for the second CD in a three CD set, a tag containing ``CD 2'' must be added to the collection subtitle. Note that this requirement ensures the uniqueness of collection title/subtitle, as required in their definition.
The fields artist name and set title/subtitle have uniqueness requirements. This section covers all permitted methods for resolving uniqueness conflicts in these fields.
Where multiple methods for resolution are listed, these methods must be applied in order, and their tags added in the order listed, unless otherwise specified. The fewest methods needed to satisfy a field's uniqueness requirements must be used. Any methods applied must be applied to the field or fields for all tracks involved in the conflict.
Note that, when entering music information for a track, the utmost care for future uniqueness of data must be taken. Sources of data should be checked for other artist names or versions of albums, for example, and the precautions below should be taken when a duplicate is possible, even when such a duplicate does not exist in your catalog.
Artist names that are non-unique within a catalog must apply one or more of the below methods:
In the extreme case where none of the above methods suffice, the only tag added by this section must be an upper case Roman numeral to each conflicting artist name in whatever orders the cataloger sees fit.
If both set title and set subtitle are duplicated for an artist name, one or more of the following methods must be used to make the set title unique:
If none of the above methods are sufficient to make the set title/subtitle unique, an upper case Roman numeral tag must be added to the set subtitle field of each track involved in the conflict. The Roman numerals must be in order of set release date. The above methods must be used first to make the smallest number of sets in conflict, only after which must the Roman numeral tag be added.
Thanks to...
The author of this document can be contacted at mailto:darkness@codefu.orgdarkness@codefu.org. Please contact me with any grammar or spelling issues, omissions, or parts that are not clear. Interpretation questions would also be welcome. If you simply don't agree with something in this document, and you're sure you understand it correctly, you can contact me; I just don't guarantee that I'll care, unless you present me with a well-reasoned argument.
This is a list of acknowledged trusted cataloging web sites as of the revision date of this document. These sites seem to store reliable, consistent information; they make no over commercial measures, other than perhaps ads and even links to retail sites; and they often times include information specifically formatted for programmatic or otherwise standard music cataloging data.
This is a list of acknowledged trusted retail web sites as of the revision date of this document. These are large, trusted retail sites that have been selling music for quite some time on the Internet. They seem largely consistent, have a large amount of musical information (much of probably gleaned directly from the label), and are usable sites.
Thanks to...
The author of this document can be contacted at mailto:darkness@codefu.orgdarkness@codefu.org. Please contact me with any grammar or spelling issues, omissions, or parts that are not clear. Interpretation questions would also be welcome. If you simply don't agree with something in this document, and you're sure you understand it correctly, you can contact me; I just don't guarantee that I'll care, unless you present me with a well-reasoned argument.
Version 0.1: - Initial release. Version 0.2: - Rephrased several things. - Moved sections around to make things more logical. - Added tags description. - Added revision date to cover page. - Added bootleg set titles to acquisition. - Added set subtitle and modified subtitle fields. - Completely rewrote rules for non-unique artist name and set title resolution. - Removed article relocation for titles. - Made sure to mention that genres should be broad. - Probably a few other things I forgot. Version 0.3: - Removed stray ``Tags'' section under definitions. - Renamed ``Modified subtitle'' to ``Modification subtitle''. - Added Roman numerals as last resort to set title/subtitle for uniqueness. - Changed special edition tag from ``Bonus'' to ``Special''. - Changed title from ``Standards for Music Information Formatting'' to just ``Standards for Music Information''. - Fixed several typos, bad phrasings, etc. - Changed a bunch of ``should'' to ``must''. - Added contact section. - Added natural ordering requirement for artist name field to formatting section. - Moved collection title/subtitle uniqueness directly under formatting: now it is a requirement to add ``CD 1'' and such whenever collection total is greater than one. - Tags added for uniqueness of set title/subtitle have been moved from set title to set subtitle. - Added section about set and collection subtitles enclosed in parentheses. Version 0.4: - Changed unknown field values from an option (``may'') to a requirement (``must''). - Fixed a grammar/spelling error or two. Version 0.5: - Added compilation producer field. Version 0.6: - Added ``Clean'' tag. - Added section on censored fields. - Added capitalization rules for composite and unknown words. - Added prepositions to the list of examples of words not to be capitalized. - Described comma-separated fields briefly. - Added acquisition rules for bonus tracks. - Updated to-do. - Changed contact information. - Removed CDNow from trusted retail sites, since it's now totally Amazon. - Changed capitalization rules to follow that of titles instead of sentences. Version 0.7: - Added requirement on ordering by release date to roman numerals for ensuring set title uniqueness. - Added article relocation for artist name. - Changed unknown information section to indicate that the given values ``must'' be used for an unset field, instead of ``may'' be used. - Added rule for a release date on an album which is merely a transference of an older work to a new media. - Added some texttt in examples, like ``Fiona Apple''. Version 0.7.1: - Andy corrected my grammar: parenthesis->parentheses in many places. $Revision: 1.4 $
This document was generated using the LaTeX2HTML translator Version 2002-2-1 (1.70)
Copyright © 1993, 1994, 1995, 1996,
Nikos Drakos,
Computer Based Learning Unit, University of Leeds.
Copyright © 1997, 1998, 1999,
Ross Moore,
Mathematics Department, Macquarie University, Sydney.
The command line arguments were:
latex2html -split 0 smi.tex
The translation was initiated by darkness on 2006-09-13