How I calculated the recent Blue Key demographics
I invite Blue Key leadership to take a look at their applicants and admits, and see themselves what the breakdown is
Over the summer, a friend from high school came up to me and asked how she could get into Blue Key. This was far from the first time. Every year there is an entire new class of students who want to know what is the best way to get into what is referred to as the most prestigious organization at UF.
The following will attempt to explain how I went about answering that question from a statistical perspective. Sadly, there is almost no information about Blue Key admission regarding criteria, the number of students who applied, average GPA, etc. The only public information granted by Blue Key is the names of the admitted individuals. As for other information about Blue Key, there are various articles containing anecdotes of qualified individuals supposedly pushed out of admission, or a defamation lawsuit where Blue Key was involved in falsely making a student body presidential candidate out to be a child molester.
I have never been one who has given much weight to anecdotes, and I was disappointed with the lack of transparency about the process or its results, so I wanted to attempt to answer that original question as well as I can.
The breakdown of the data describing Blue Key demographics at the University of Florida will delve into two parts. First, how the dataset was created. Second, how the analysis was done in R.
Over the summer, eight UF students (from nearly every large organization on campus, from HSA, to BSU, to Cicerones, to Blue Key, to FLC) who have personal relationships and knowledge of Blue Key admits collaborated and created a dataset with six main variables, (Greek Affiliation, Race, Gender, Name, Year Admitted). The Name and Year admitted columns were created using the web scraping capacities of the programming language R, coming from the Blue Key website where they keep the name and year of each entrant. To account for multiracial entrants, the dataset was created with a column for each race option accepting boolean values. Given that information, we went through the list and filled in information. For a number of the Greek Affiliation values, LinkedIn was used (i.e. if one was the president of a fraternity, they also must be a member).
Around one percent of the information is missing due to a gap in knowledge of the students, and an unwillingness to exclusively only go off of the social media profiles of individuals for demographic information.
At the moment, all eight students wish to be anonymous and so they will not be revealed in any capacity.
The programming language R is reproducible. This makes proving the dataset wrong, given the open source aspect of the methodology, extraordinarily easy. If the dataset is wrong, all one would have to do is run the script below on what is perceived to be the correct dataset. I have included notes and instructions for reference, and am more than willing to walk interested individuals through the code itself, or translate into Python or SQL if desired. This code is also housed online. For the reader who does not know R, Python, or SQL I will attempt to explain the process below.
The first issue that came up was what to do with Ph.D, medical, and law school students. Because most are not allowed to participate in Greek life, to include them in any analysis would be inaccurate. Other than that, basic functions can get us to any analysis we desire. The percentage of Blue Key admissions are compared to their respective percentages of the general population.
As I described before, better data would compare admissions vs applicants, but that is not available. Either way, it raises much more serious questions if the claim is made that people of color and women are less likely to apply or that they are less qualified. This argument is repeated heavily when discussing diversity among STEM fields, when individuals say “women aren’t discriminated against in STEM fields, they just don’t apply as much” which clearly raises larger problems than it solves.
The level of confidence for the data is highest for the gender breakdown, as knowledge errors are highly unlikely and would only stem if an individual identified as another gender than what was known. Greek affiliation confidence is lower but still high as individuals at UF are extraordinarily public with this information re shirts, Instagram bios, profile pics. The lowest confidence is for the racial breakdown of Blue Key members, as individuals are less likely to know their friend’s race than any other variable. However, the disparities tend to run in the double digits, and while there may be errors, it is highly unlikely that those errors would exceed 10 percent. In order for that to be the case, nearly 50 entries would have to be incorrect.
It is clear that making these claims would not be taken lightly. This is exactly why I have made my steps as open source as possible, so that if I am wrong it has never been easier to check.
I freely invite Blue Key leadership to take a look at their applicants and admits, and see themselves what the breakdown is. Feel free to reach out and I will host a programming workshop so problems like these can more easily be resolved.