“How do I get started?” While not asked explicitly, this question summarizes the highest ranked inquiries we received from attendees of the “R in Sport Analytics” discussion we hosted in mid-June. My contribution to RStudio’s ongoing Enterprise Community Meetups highlighted the importance of data-driven frameworks and how they should influence decisions on the field, court, ice or pitch in the same way they impact decisions in the boardroom. Like many of the attendees, I was also unsure of the path to an NFL team as an analytics professional. This post aims to answer questions from the meetup while also providing recommendations for aspiring sports analytics professionals.
As a brief introduction, the following bullet points summarize roughly the past 20 years of my academic and professional experiences.
- Completed undergraduate degrees in Mathematics and Spanish from Monmouth College and an MBA from the University of Iowa
- Played and coached football collegiately, chased a professional football playing career (think minor-league baseball equivalent but in football)
- Held analyst, product management and leadership positions for multiple sports analytics organizations
- Founded and led analytics departments for the Chicago Bears and Denver Broncos (member of Super Bowl 50 World Championship team)
- Recently started in Customer Success at RStudio and am extremely fortunate to collaborate with data science teams across the globe
That is more than enough about me. Now, about those questions…
Is Teamwork Online really the primary avenue into professional sports jobs, particularly the NFL? It seems like a black hole. Better suggestions?
First, for those that are not familiar, TeamWork Online is a recruiting platform/job board for sports teams, entities and leagues. I addressed this briefly during the meetup, but the best advice I can provide is the following: “Do something.” Outside of playing and coaching for several years after completing my undergraduate degree, my foray into sports analytics started as an independent study in graduate school. I had no real research experience, but my professors did. Thanks to Jeff Ohlmann at the University of Iowa, I had a forum that allowed me to combine years of experience in sports with the problem-solving and technical skills I had developed in the classroom. There is simply no substitute for identifying a problem that warrants further investigation, framing questions, developing hypotheses, collecting/wrangling/analyzing/modeling/visualizing data and then presenting your work. As someone who has hired technical roles and assembled a staff, one of the first questions I would almost always ask was “What have you done?” Do something.
What are the best opportunities to widen options (job prospects, career) and network if based in a foreign country with few opportunities in sports analytics?
Time zone differences create scheduling challenges, but there are essentially no boundaries thanks to modern technology. Look no further than the 2020 NFL Big Data Bowl as Philipp Singer and Dmitry Gordeev, two data scientists based in Austria, joined forces to win the Open Kaggle Competition. As a former small college football player who competed for roster spots and playing time with players from Power 5 schools, I always used to tell myself “If you can play, then you can play. Where you come from does not and should not matter.” The same concept applies here. If you are willing and talented as a data scientist to compete and perform well in open competitions, where you come from does not and should not matter.
Additionally, as I mentioned previously, I am extremely fortunate to work with data science teams across the globe. I am approaching the end of month 4 at RStudio, but I have already connected with data science teams on 4 continents. One of my early takeaways from my time at RStudio is that good data science work is good data science work regardless of the industry. Therefore, do not be discouraged if your day job finds you analyzing data that is completely unrelated to sports. Leverage the data skills you develop outside of sports to solve meaningful problems in sports.
The priority at the start should not be memorizing the exact keystrokes of a particular language to simply pass a final exam or advance to the next class in the sequence of a program. Rather, the focus at the start should be learning fundamental programming concepts and data structures. The scripting language provides a syntax for applying higher-order thought. Conditional statements, loops and an appreciation for how rows and columns fit together are core tenets that apply to data analysis in either R or Python. I understood columns and lookups well before I understood vectors and joins, respectively. I also learned R first, and if someone asked me to complete a data analysis project in an hour, hello
library(tidyverse). However, when I have needed to work in Python, I adapted quickly because I understand the fundamentals outlined above.
Furthermore, my experiences learning a second language have heavily influenced my thoughts toward learning additional programming languages. My knowledge of Portuguese or Italian is virtually non-existent, but I have a feeling my experiences studying, reading and writing Spanish would greatly benefit my ability to learn Portuguese or Italian because of their similarities as Romance languages. The same concept applies to learning R or Python first.
In summary, emphasize fundamentals and structure over memorizing code at the start; a solid foundation in either R or Python will improve future learning.
In alphabetical order so as to avoid any hint of prioritization…
- Computer Science
To be very clear, proficiency in every subject is not a prerequisite for early-career employment opportunities. Continued growth in these areas summarizes my own professional development. I highlighted them here because all disciplines from the list above significantly contribute to the success or failure of data science teams within organizations. Technical expertise with poor communication creates confusion and potentially even doubt in the minds of business leaders who are tasked with organizational decision-making. Conversely, persuasive campaigns that lack technical substance create high expectations that fall flat. Overpromising and underdelivering is a recipe for disaster, but it occurs when data science teams fail to clearly articulate the performance of a model or the limitations of a research study.
“Why science?” Two reasons come to mind. First, I am heavily influenced by the work of David Epstein, but individuals who think critically and understand complex problems such as colliding particles bring a unique perspective to the investigation of player movements and interactions in sports. Secondly, I also strongly agree with Adam Grant in the value of thinking like a scientist.
Thanks for your time and thoughtful review if you have reached this section of the post. This is admittedly my first in a public forum after years of working for teams, but I don’t think this will be my last. Please share your comments and feedback, and here are a few final items that also warrant your attention.
- You can find the full recording of the meetup here.
- To connect with other attendees or ask follow-up questions, please join the #chat-sports_analytics Slack channel on the R4DS Online Learning Community Slack
- Follow me on Twitter where I post thoughts on sports, data science, business and the intersection of those topics. If you have any questions, feel free to mention @MitchTanney. Increasing my activity on Twitter is a goal that I expressed during the meetup, and I look forward to hearing from you.