On this episode of The Artists of Data Science, we get a chance to hear from Brandeis Marshall, a computer scientist that is excellent at breaking down difficult concepts into easily digestible pieces. She is passionate about educating people on data, as well as understanding the impact data has on race, gender, and socio-economic disparities. She is the CEO of DataEdx, a company which focuses on making data science accessible to all professionals.
She shares her perspective on how data impacts communities, how to promote diversity and inclusion in the data science space, and the importance of documenting your process. It was an absolute pleasure to hear her perspective, and I believe her message will help broaden the data science field.
Some notable segments from the show
[8:29] How data impacts marginalized communities
[13:29] From Brandeis’s perspective, what separates great data scientist from good ones
[14:48] Understanding how data is packaged, and ways to break it down into bite-size portions
[19:30] The impact of live tweeting on social movements
[30:09] Discussing inclusiveness in the data workspace
[39:46] How to be gritty and break away from negative thoughts
Brandeis Marshall’s journey
Brandeis began her data science journey when she entered graduate school. She had always been interested in user experience (UX). But when she found out that the well-known professor that taught UX was retiring, she decided she needed to switch paths.
She happened to be taking a course which involved data at the time, which ended up sparking her interest for data. She took a course that revolved around information retrieval, which ended up being the focus of her dissertation during her PhD program.
[5:10] “Oh, wow. Where do I start? Okay, I'll start easily with entering graduate school. And when I entered graduate school, I actually was very interested in UX.
When I got there, the person who was a UX professor. Well, the well-known individual was actually retiring. And I was like, well, I need to find something else to do. And then happenstance I fell into -
I'm looking at some data that was part of a course. And I was like, this is interesting. This is really interesting. I like the structure of the organization. I and then I start thinking, well, everything needs the structure and organization. So I've actually been a data head since 2000. I've been thinking data is cool from way back then. Took a data science, databases course that dovetail into information retrieval and that's wind up what I concentrated my PhD dissertation on. And so for me, data has been part of my entire career. And in fact, applying data in how data is applied in different spaces has been something I've been doing since I can remember it as part of my my graduate work. So what I see as far as getting into the field, it's a matter of where do you know the origins of the data? Are you interested in that part? And of course, moving forward and trying to figure out what the dataset is, figuring out, how do you know, clean up the data? How do you figure out how to analyze the data? So all of those parts are interesting to me. And that's kind of how I got into it. It was happenstance by luck, by interest, by passion. But I already was a computer scientist, so I always say I'm computer science first, data scientist second.”
Where is the field headed in 2-5 years?
In the next five years, Brandeis feels that we’re going to look at the gender, race, and class disparities that happen inside data. There is a concern about who is able to participate and have access to data, as well as who is represented in the data.
Over and over again, we are seeing marginalized communities that are disproportionately not included or are over saturated in certain datasets.
We need to assess the impact this has on our communities, and then develop and enforce policies in order to make sure that data becomes part of every facet of our society, not just STEM.
[8:29] “Yeah. So next sort of five years is going to be one where I actually tweeted about this early. And in the top of 2020 is to say we're going to be looking a lot at the gender, race, class disparities that happened inside of data, how data is used. We're going to be concerned about who is participating, who has access, how inclusion strategies are working or not working, as well as who's represented in the data.
We're seeing it over and over again where marginalized communities are disproportionately not included or over saturated inside of certain datasets.
And how do we shift the conversation so that all people are included in the data conversation? So the next two to five years is going to now being bringing aboard the understanding of the importance and the power of data and how that impacts communities differently. And, of course, developing policies, enforcing those policies, whatever regulations at the local, federal, national level in order to make sure that data becomes part of our known fabric inside of every facet from curriculum at K through 12, through those that are currently in the workforce, in all workforces, not just STEM
Don't get me wrong, I'm definitely love my STEM people, but it's an all workforces across the board. Everyone needs to know more about data.
What will separate great data scientists from the rest of them?
What's really going to set those apart is going to be those that have open minds with very good documentation [of your processes]. Those that are consistently learning from sources of quality, and are able to discern what is a quality source and what isn’t.
Great data scientists are those that do not try to take on all the responsibilities for the whole process. They understand the importance of teamwork, and know when to delegate to someone who is a better fit to answer certain questions. They understand their expertise.
[13:29] “What's really going to set those apart is going to be those that have open minds with very good documentation.
Those that are consistently learning from sources of quality. And that means you're going to hit some bumpy road, you're going to hit some you know, you might get some disinformation, you might get some misinformation, but then you're going to learn from it.
And then you're going to now be able to discern what is quality and what isn't quality. You're then going to be able to talk about, oh, I know this individual is working in this space. That's not my expertise. So don't ask me. Ask this expert. And I think that's going to be extremely important for data scientists to not try to take on all the responsibility for the whole process. This is one where it is a it's teamwork. So you have to be able in order to share out where other people are better talent and a better fit to answer those questions.
Key takeaways from the episode
[40:09] Gritty people are those that choose to look at difficult situations with a particular lens. They understand the positive, the negative, and realize where work needs to be done. Then, they do the work.
Diversity and inclusion
[31:07] To be inclusive doesn't mean that you are pushing away anybody. It means that you are seeking out those who have an open mind. It means to be someone willing to listen to others and not suppress marginalized groups. It’s about safety. Ask, “can I share with people, and can people share with me?”
Impact of data on communities
[10:59] You need to connect with people you have not connected with before. Open up the conversation about how the data that you're currently using now impacts communities that you're not necessarily a part of. Get yourself out of a comfort zone. That is the key. How you do this is by following someone you've never followed before.
Data across the board
[7:38] Data is a part of every industry. Everyone is concerned with how their data is being used. Data is being created and used at rapid rates, and the challenge is to be able to understand and harvest data in a meaningful way.
Documenting your process
[11:03] It’s very important to document your process, whether it’s scientific or non-scientific, and make sure you push your team to do the same. This is crucial because your data may be used in a way that you did not intend. But if you document your process and start having conversations about it, misinformation will be less likely to spread.
[7:57] “I'm trying to do my best to be... that beacon to talk about data in sizable, understandable nuggets, because it's not just a science thing. It is our everyday life.”
[11:45] “...if you stay within your own lane in your own expertise, only talking to people who have your particular background, you're losing the whole story... and with data, there's always a story”
[29:34] “...I want...other people to know that they can talk about their particular ethnicities, content in a research space, in the tech space, and still be successful.”
The one thing that Brandeis wants you to learn from her story
[43:12] My story is not done yet. If you feel like you're done, then that's not data science. The story is never complete.
From the lightning round
Best advice Brandeis has ever recieved
Don’t take any wooden nickels.
Data Science superpower
The ability to explain things easily to people.
Advice that Brandeis would give to her 20 year-old self
I would say it's going to be OK. Your time to shine isn't quite yet.
Topic outside of data science we should study
Sociology. You have to understand social context. If you don't understand social context, you don't understand data.
Books that Brandeis recommended you should read
“Algorithms of Oppression” by Safiya Noble
“Who Gets What and why: The New Economics of Matchmaking and Market Design” by Alvin E. Roth
Books and other media mentioned in this episode
Race After Technology: Abolitionist Tools for the New Jim Code by Ruha Benjamin
Anything by Andre Brock
Data Feminism by Catherine D'Ignazio and Lauren F. Klein
Podcast: #causeascene by Kim Crayton
How you can connect with Brandeis online