Skip to main content
Digitial Humanities @ uOttawa

Data Curation Process


To decide the composition of our dataset, we asked ourselves some of Gebru et al.’s (2020) datasheet questions to determine the composition, collection and cleaning of the data. The data is self-contained and in total there are 1,100 records of processed text, comprised of both nominal and numerical attributes. From the data, some artists can be identified by their real or stage name. The augmented dataset also contains data that might be considered sensitive as it reveals racial and ethnic origins, as well as the sexual orientation of the artists. It is worth mentioning that this data is only accessible with a paid license by radio professionals. Additionally, the dataset identifies subpopulations by distinguishing the gender of the artists, members of two-spirit, lesbian, gay, bisexual, transgender and queer (2SLGBTQ+), racial and ethnic communities.


As a team we decided to augment the original dataset starting with the attribute of ensemble type, for which we wanted to capture the different composition of artists whether they are solo, or part of a duo or group. While cleaning the data, we decided to include trios under the "group" category because there were only a few occurrences and we deemed it appropriate to join both values. We carefully chose six attributes to potentially call attention to the gender and ethno-racial inequality present in the white and male-dominated rock genre (Schaap, 2019). So as to unveil the gender representation contained in the dataset (or lack thereof), we added the gender identity of the artists to define them individually as men, women or non-binary, or in the case of bands as a mixed-gender or non-binary ensemble. We then chose to outline if the artists are members of the 2SLGBTQ+ community by adding two attributes considering their sexual orientation and if they are transgender (using yes/no). Moreover, the race-ethnicity attribute was included to classify the artists in a racial group (white, Black, person of colour, ensemble of various races and ethnicities). In order to provide more detail, another attribute was added to specify the artist’s precise race and ethnicity. Lastly, we included the artists' country of origin, which refers to where the artist was born or grew up, instead of the country in which the artist is based or working. This decision was made to reflect the country where the artist holds citizenship, and the country that claims the artist as their own. Many artists, particularly those who are not American, spend their careers outside of their home country, so we want to make the distinction from where an artist resides. These additional attributes allow us to examine the trends in gender representation and the intersectionality of the marginalized artists.

To be efficient, we divided up the data collection process of the directory between the two of us. We both used the websites Wikipedia and MusicBrainz to collect our information. There were some instances where the race and ethnicity of an artist was not explicitly specified. These occurrences happened specifically to artists and group members who by appearance seemed white. This is why we entered the value “white” in squared brackets to distinguish that we assumed their race and ethnicity. It was a calculated guess as whiteness is implicit in western culture (Criado-Perez, 2019) and race is often only mentioned when an artist is non-white (Schaap, 2015). Furthermore, the sexual orientation of the artists was rarely mentioned on either website and so we resorted to performing a Google search with the name of the artists and the acronym “LGBTQ+” to find any piece of reliable information specifying their orientation, such as an interview segment or a social media post. Finally, we collected data on the one lead singer who came out as transgender during her career, and we decided to differentiate her group to demonstrate that one of their songs charted while she was not “out” and one after she had transitioned.


The table below highlights the coding system used to augment the biographic records of the artists contained within the dataset. We endeavoured to develop codes that reveal the complexity of identity. Our methodology was inclusive to a non-binary view of gender and sexuality, but the end result is still a binary: the artists played on Canadian active rock radio are described in gender and sexuality binaries (with one exception). From our research, all of the artists in the dataset use the terms woman or man to describe their gender, except one musician in the band Crown Lands who is non-binary and Two-Spirit. This points to larger issues in the active rock music industry that will be further explored under Discussion.

Code Meaning
Ensemble Types
collab Special collaboration of artists or ensembles.
duo Ensemble comprised of two artists.
group Ensembles comprised of three or more artists.
solo Solo artist.
solo-ens Solo artist with an ensemble band.
fm-ens Ensemble comprised of both male and female artists with female as lead singer.
men Refers to a person who internally identifies and/or publicly expresses as a man.
mf-ens Ensemble comprised of both male and female artists with male as lead or equal partnership.
nf-ens Ensemble comprised of  non-binary and female artists with non-binary artist as lead singer.
nfm-ens Ensemble comprised of male, female and/or non-binary artists with non-binary artist as lead singer.
nm-ens Ensemble comprised of non-binary and male artists with non-binary artist as lead singer.
non-binary Refers to a person who internally identifies and/or publicly expresses as non-binary.
women Refers to a person who internally identifies and/or publicly expresses as a woman.
Race or Ethnicity
[black] Artist is assumed Black.
[me-ens] Ensemble of members of assumed various races or ethnicities.
[white] Artist is assumed white.
asian Artist is Asian.
black Artist is Black.
hispanic Artist is Hispanic.
indigenous Artist is Indigenous.
latinx Artist is Latinx.
me-ens Ensemble with members of various races or ethnicities.
white Arist is white.
[straight] Artist is assumed heterosexual.
bisexual Artist is bisexual.
gay Artist is gay.
lesbian Artist is a lesbian.
ms-ens Ensemble with members of various sexualities.
straight Artist is heterosexual.
two-spirit Artist is two-spirit.
mn-ens Ensemble with member of various nationalities.


The cleaning process was completed together and consisted of eliminating acronyms in the artists’ names and countries by spelling them out, as well as listing the individual countries from within the United Kingdom (e.g., England, Ireland, Scotland) to be more specific. Eventually, we also made the decision to add details to the two fields named “Year”. One of them referred to the year a song ranked while the other referred to the year a song was released. It was confusing and we determined that it would need to be changed in preparation for the upcoming data analysis.


An example of the augmented dataset