Method of Research
Data Curation, Augmentation, and Cleaning
The original dataset was sourced from the Wikipedia entry on the prize, Palme d’Or (2021) and the corresponding pages for each year of the competition from 1939 to 2021.
I used the following fields as a starting point for curating my final dataset:
My data curation goal for this project was to collect data on artists involved in the Palme d’Or competition to explore how their different facets of identity show up and intersect. With this aim in mind, in addition to the Director(s) from my starting dataset, I chose to add Cast Lead(s) and Character Lead(s) to the artist roles that my data features. For each of these artist roles, I also augmented my data with different aspects of the artist’s identity, including the associated country, race, nationality, gender and transgender identity, sexual-orientation, and disablity.
In curating the data, I first asked how to categorize the director, cast, and character ensembles and chose to label them as either individuals or groups of more than one person. When determining the type of lead ensemble for a film, I looked for the individual or set of individuals who is responsible for or are essential to the story. This could be challenging. For instance, in the case of Cast Leads, sometimes films give top billing credit to a celebrity featuring as a minor character, while the protagonist is played by an artist who is brand new to the industry and in some cases a non-professional. I did my best to identify a primary story or theme for each film and which character or characters were essential to telling it.
Next, I defined how to collect and organize data on the different facets of identity of each film’s director(s), lead cast and character(s). For the facets of: country, race/ethnicity, gender-identity, transgender, sexual-orientation, and disability, I created definitions for the fields and codes in my controlled vocabulary. I used the Government of Canada (n.d.) categories of self-identification, the APA STYLE (n.d.) guide for Racial and Ethnic Identity, and the Government of Canada - Legislative Services Branch (2019) – Accessible Canada Act to use categorizations that were both widely recognized and appropriatly descriptive.
For the facet of gender-identity, I chose to categorize group ensembles as either all men, all women, men and women, and nonbinary to easily compare the gender identity of group ensembles with individual ensembles.
To categorize people by race, I used the labels:
- white – for individuals with European heritage,
- black – for individuals with Subsaharan African heritage, or
- person of colour (poc) – for individuals of Indigenous North/South American or Australian/Oceanic, Asian, Middle Eastern, or North African heritage).
I chose the labels white and black because my data is based on people from all over the world and regardless of country, someone who has white (European), or black (Subsaharan African) heritage is likely to identify or be identified as white or black respectively. The third category: person of colour is used to identity individuals who have a cultural or ethnic identity that is based in Indigenous North/South America or Australia/Oceanea, Asia, the Middle East, or North Africa. For people who are mixed race, I used black to describe anyone with partial black heritage, and person of colour to describe anybody with primarily Indigenous, Asian, Middle Eastern, or North African heritage. Instead of ethnicity, I chose to identify people by nationality, including Indigenous nations if applicable, because globaly individuals are likely to identify as being from at least one nation.
For the facets of identity other than gender (race, sexual orientation, and disability), I wanted to be able to show groups of people who did not share the same type of a particular facet of identity. I used [,] to separate names of individuals in groups as well as the corresponding facets of identity and used specific label orders to identify group ensembles with similar compositions. For instance: [white],[black],[person of colour] for a group of artists of more than one race.
Below is an excerpt from my Controlled Vocabulary spreadsheet showing the final set of codes used in my dataset.