Tidy Tuesday for July 22nd, 2025ΒΆ

MTA Permanent Art CatalogΒΆ

Irene MorseΒΆ

Setup and IntroductionΒΆ

InΒ [1]:
# for dataframe wrangling
import pandas as pd
import numpy as np
InΒ [25]:
mta_art = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-22/mta_art.csv')
station_lines = pd.read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-07-22/station_lines.csv')
InΒ [26]:
mta_art.head()
Out[26]:
agency station_name line artist art_title art_date art_material art_description art_image_link
0 NYCT Clark St 2,3 Ray Ring Clark Street Passage 1987 Terrazzo floor tile The first model that Brooklyn-born artist Ray ... https://new.mta.info/agency/arts-design/collec...
1 NYCT 125 St 4,5,6 Houston Conwill The Open Secret 1986 Bronze - polychromed The Open Secret, in the 125th Street and Lexin... https://new.mta.info/agency/arts-design/collec...
2 NYCT Astor Pl 6 Milton Glaser Untitled 1986 Porcelain enamel murals Milton Glaser, best known for his work in grap... https://new.mta.info/agency/arts-design/collec...
3 NYCT Kings Hwy B,Q Rhoda Andors Kings Highway Hieroglyphs 1987 Porcelain Enamel Murals on Steel The artist discusses her work: Γ’If public art... https://new.mta.info/agency/arts-design/collec...
4 NYCT Newkirk Av B,Q David Wilson Transit Skylight 1988 Zinc-glazed Apolycarbonate skylight The artist recalls, Γ’About the same time that ... https://new.mta.info/agency/arts-design/collec...
InΒ [27]:
station_lines.head()
Out[27]:
agency station_name line
0 NYCT Clark St 2
1 NYCT Clark St 3
2 NYCT 125 St 4
3 NYCT 125 St 5
4 NYCT 125 St 6

This week's Tidy Tuesday dataset documents the art displayed within the MTA subway system in New York City. The primary dataset contains each art piece, the artist who completed it, the year it was completed, the material/medium, a description of the art, and a URL to an image of the art, as well as details about where within the subway system it is diplayed. The second dataset spells out additional details about the MTA system in a tidier fashion. (For my purposes today, I will not utilze the second dataset.)

Looking through this very interesting data, I'm intrigued by the idea of gender representation within the art displayed by the MTA. The arts were historically dominated by men with a more recent shift towards female and gender diverse artists. I want to investigate whether or not the art displayed by the MTA tends to be dominated by male artists or whether it is more-or-less equal between the genders.

Step 1: Guess Gender from Artists' NamesΒΆ

InΒ [28]:
#!pip install gender-guesser
import gender_guesser.detector as gg

To investigate the artists' genders, I first need to determine (or guess!) what their genders actually are. The dataset contains only their names, but these can provide a reasonable guess as to what their genders are. To accomplish this, I have installed the gender-guesser package.

InΒ [29]:
d = gg.Detector()
# iterate through df, split artist name and extract first name only, then guess gender based on first name
mta_art['gender'] = [d.get_gender(mta_art['artist'].iloc[i].split()[0]) for i in range(0, len(mta_art))]
InΒ [30]:
mta_art[['artist','gender']].head(20)
Out[30]:
artist gender
0 Ray Ring mostly_male
1 Houston Conwill male
2 Milton Glaser male
3 Rhoda Andors female
4 David Wilson male
5 Steve Wood male
6 Valerie Jaudon female
7 Matt Mullican male
8 Nitza Tufi–o (in collaboration with Grosvenor ... unknown
9 Arthur Gonzalez male
10 Arthur Gonzalez male
11 Dan Sinclair male
12 Harry Roseman male
13 Kathleen McCarthy female
14 Kathleen McCarthy female
15 Kathleen McCarthy female
16 Nitza Tufi–o unknown
17 Alison Saar female
18 Martha Jackson-Jarvis female
19 Michele Oka Doner female

It looks like the gender-guesser package has done a reasonably good job guessing the artists' genders, though it is unsure for certain less common names, such as "Nitza." Just for simplicity's sake, I'd like to replace "mostly_female" and "mostly_male" entries with just "female" and "male." Additionally the gender-guesser package uses the label "andy" for androgynous names, and I'd like to replace that with "unknown."

InΒ [31]:
mta_art['gender'] = mta_art['gender'].replace('mostly_female', 'female')
mta_art['gender'] = mta_art['gender'].replace('mostly_male', 'male')
mta_art['gender'] = mta_art['gender'].replace('andy', 'unknown')
mta_art['gender'].value_counts()
Out[31]:
count
gender
male 178
female 153
unknown 50

So there are a total of 178 male artists in the dataset and 153 female artists in the dataset. Given that there are also 50 artists of unknown gender, it's hard to say anything difinitive about gender representation based on the analysis so far.

Step 2: Visualize Gender Distribution Over TimeΒΆ

InΒ [32]:
import matplotlib.pyplot as plt
import matplotlib as mpl
from cycler import cycler
InΒ [33]:
# convert gender to category type variable
mta_art['gender'] = mta_art['gender'].astype("category")
gender_by_year = mta_art.groupby(mta_art['art_date'])['gender'].value_counts()
gender_by_year = gender_by_year.reset_index()
InΒ [34]:
gender_by_year.head(9)
Out[34]:
art_date gender count
0 1980 male 1
1 1980 female 0
2 1980 unknown 0
3 1986 male 3
4 1986 female 0
5 1986 unknown 0
6 1987 female 2
7 1987 male 1
8 1987 unknown 0

I would not like to visualize the artists' gender distribution by year. I have a hunch that male artists will be overrepresented in the earlier years of the MTA program (e.g. the 1980s) but this may shift in favor of female artists over time. To create this visualization, I have first aggregated the gender counts by year.

InΒ [35]:
# adjust colors of plots
mpl.rcParams['axes.prop_cycle'] = cycler(color=['pink', 'lightblue', 'lightgray'])
InΒ [36]:
plt.plot(gender_by_year.loc[gender_by_year['gender']=='female', 'art_date'],
         gender_by_year.loc[gender_by_year['gender']=='female', 'count'],
         label='Female Artists')
plt.plot(gender_by_year.loc[gender_by_year['gender']=='male', 'art_date'],
         gender_by_year.loc[gender_by_year['gender']=='male', 'count'],
         label='Male Artists')
plt.plot(gender_by_year.loc[gender_by_year['gender']=='unknown', 'art_date'],
         gender_by_year.loc[gender_by_year['gender']=='unknown', 'count'],
         label='Gender Unknown')
plt.legend()
plt.title("MTA Artists' Gender Distribution Over Time")
Out[36]:
Text(0.5, 1.0, "MTA Artists' Gender Distribution Over Time")
No description has been provided for this image

The over-time plot is noisy but informative. It is true that in the very early years of the MTA program, male artists dominate. However, as early as 1990 we can see a much more equitable distribution of both male and female artists, including a few years where there are actually more female artists put on display. There also appears to be a short period in the early 2000s when male artists again pretty noticeably dominate the MTA displays.

InΒ [37]:
mta_art['art_decade'] = np.nan
mta_art.loc[(mta_art['art_date']>=1980) & (mta_art['art_date']<1990), 'art_decade'] = "1980s"
mta_art.loc[(mta_art['art_date']>=1990) & (mta_art['art_date']<2000), 'art_decade'] = "1990s"
mta_art.loc[(mta_art['art_date']>=2000) & (mta_art['art_date']<2010), 'art_decade'] = "2000s"
mta_art.loc[(mta_art['art_date']>=2010) & (mta_art['art_date']<2020), 'art_decade'] = "2010s"
mta_art.loc[(mta_art['art_date']>=2020) & (mta_art['art_date']<2030), 'art_decade'] = "2020s"
/tmp/ipython-input-37-950652811.py:2: FutureWarning: Setting an item of incompatible dtype is deprecated and will raise an error in a future version of pandas. Value '1980s' has dtype incompatible with float64, please explicitly cast to a compatible dtype first.
  mta_art.loc[(mta_art['art_date']>=1980) & (mta_art['art_date']<1990), 'art_decade'] = "1980s"
InΒ [38]:
gender_by_decade = mta_art.groupby(mta_art['art_decade'])['gender'].value_counts()
gender_by_decade = gender_by_decade.reset_index()

I'm curious to see if my conclusions change if I visualize this data slightly differently. Instead of by year, I have now aggregated the data by decade, and I will utilize a bar graph instead of a line graph.

InΒ [39]:
# Code taken and modified from:
# https://python-graph-gallery.com/grouped-barplot-with-the-total-of-each-group-represented-as-a-grey-rectangle/

pivot_df = gender_by_decade.pivot(index='art_decade', columns='gender', values='count')

fig, ax = plt.subplots(figsize=(7, 5))

bar_width = 0.25
x = np.arange(len(pivot_df.index))
for i, sub_cat in enumerate(pivot_df.columns):
    ax.bar(x + i * bar_width, pivot_df[sub_cat],
           width=bar_width, label=sub_cat)

ax.set_xlabel("Decade")
ax.set_ylabel("Count")
ax.set_title("Gender of MTA Artists by Decade", loc="left")
ax.set_xticks(x + bar_width / 2)
ax.set_xticklabels(pivot_df.index)
ax.legend(title="Gender of Artist")

plt.show()
No description has been provided for this image

The bar graph is slightly clearer, in my opinion, than the line graph. It also tells a less optimistic story, as it shows male artists dominating the MTA displays through the 1990s and 2000s. A slight shift toward female artists occurs in the 2010s and 2020s, but I would hesitate to say that female artists "dominate" the MTA displays in those years; instead it is simply a more equitable distribution.

Step 3: Visualize Gender Distribution Over SpaceΒΆ

Next I am curious to see if particular subway stations contain more male or female artists on display. I would first like to exclude subway stations that don't have very much art on display. This is partly because there are a LOT of subway stations, and it will be hard to include all of them on a readable visualization. Therefore I will calculate the average number of art pieces per subway station and will use this to decide on a cut point for which stations I want to include in my visualization.

InΒ [40]:
station_totals = mta_art['station_name'].value_counts()
station_totals = station_totals.reset_index()
print("The average number of art pieces per station is", station_totals['count'].mean())
The average number of art pieces per station is 1.233009708737864
InΒ [41]:
station_totals.head(20)
Out[41]:
station_name count
0 Times Sq-42 St 7
1 86 St 7
2 34 St-Herald Sq 4
3 34 St-Penn Station 4
4 125 St 4
5 Grand Central-42 St 4
6 Grand Central Terminal 3
7 72 St 3
8 Bay Pkwy 3
9 23 St 3
10 50 St 3
11 18 Av 3
12 Avenue U 3
13 Harlem-125 St 3
14 96 St 3
15 Bellmore 2
16 5 Av/53 St 2
17 33 St 2
18 Broadway 2
19 Canal St 2
InΒ [42]:
stations_to_drop = list(station_totals.loc[station_totals['count']<3, "station_name"])
mta_art = mta_art.set_index('station_name')
mta_art_subset = mta_art.drop(stations_to_drop)
mta_art_subset = mta_art_subset.reset_index()

Based on the average of 1.2 pieces of art per station and a brief peak at the aggregated data, I have decided to exclude any stations with less than 3 pieces of art. I will now aggregate the data by station and generate a new bar graph.

InΒ [46]:
gender_by_station = mta_art_subset.groupby(mta_art_subset['station_name'])['gender'].value_counts()
gender_by_station = gender_by_station.reset_index()
InΒ [47]:
# Code taken and modified from:
# https://python-graph-gallery.com/grouped-barplot-with-the-total-of-each-group-represented-as-a-grey-rectangle/

pivot_df = gender_by_station.pivot(index='station_name', columns='gender', values='count')

fig, ax = plt.subplots(figsize=(7, 5))

bar_width = 0.25
x = np.arange(len(pivot_df.index))
for i, sub_cat in enumerate(pivot_df.columns):
    ax.barh(x + i * bar_width, pivot_df[sub_cat],
           height=bar_width, label=sub_cat)

ax.set_xlabel("Count")
ax.set_ylabel("Subway Station")
ax.set_title("Gender of MTA Artists by Subway Station", loc="left")
ax.set_yticks(x + bar_width / 2)
ax.set_yticklabels(pivot_df.index)
#ax.tick_params(labelsize=8)
ax.legend(title="Gender of Artist")

plt.show()
No description has been provided for this image

This final data visualization shows an additional aspect of gender representation within the MTA art displays. Of the stations that have the most art, 7 are dominated by male artists, 6 are dominated by female artists, and 2 have an equal number of male and female artists. This seems like fairly equitable distribution! However, it is of note that the Times Sq-42 St station is heavily dominated by male artists. I suspect that this is of the MTA's most heavily utilized stations, and a quick look at the MTA's ridership data confirms this suspcition. This means that MTA passengers look at art created by men somewhat more frequently than at art created by women, and this may be especially true for tourists and others spending most of their time in the Manhattan area.

MTA_ridership.png

ConclusionΒΆ

Gender representation among artists displayed withing the MTA system is a nuanced topic. Though male artists dominated the early years of the program, the MTA has course corrected over time with more and more female artists included. Male artists dominate certain subway stations, while female artists dominate others. However, it is worth noting that the top 3 most utilized subway stations (Times Sq-42 St, Grande Central-42 St, and 43 St-Herald Sq) are all dominated by male artists, indicating that the average MTA rider is still slightly more likely to be exposed to art created by men than by women.