As the lead researcher on the Penn Media Accountability Project (PennMAP), Homa Hosseinmardi tackles questions of online political radicalization and misbehavior using large-scale data. In this month’s researcher spotlight, she shares about her experience navigating the field of CSS, joining the Lab, and spearheading PennMAP.
Q: Tell me a bit about yourself and your research.
A: I started my bachelor’s degree in Iran in electrical engineering. There were a lot of fields that I was not familiar with, so I used my bachelor’s as a time to figure out what I liked. When I came to the U.S., I realized that electrical engineering is not what I like—I just enjoyed the computational part of it.
After talking to some faculty, one of them introduced me to a project looking at online social media and the kinds of misbehavior and antisocial behavior happen on certain platforms. It all started when I worked with them on a related project to detect gestures on social media pages, which linked my computational skills with pattern recognition. They told me to just stay on the theme of patten recognition, and I realized then that I didn’t really like anything to do with sensors, so we worked together to apply my skills to questions of human behavior.
So I started working with online social networking data. This first project started as finding misbehavior, aggression, and bullying—which, at the time, was a topic that I had minimum familiarity with. I didn’t know if it was happening rarely, or if it was even important. I started collecting data myself, because it didn’t seem to be a really hot topic and there was not a lot of data on it. I had to learn the definitions of those behaviors, how I could quantify them, how I could measure them.
After two years of struggle, I started actually getting the data—learning how to find it and how to sample along the way—and it was like finding a needle in a haystack. Then, when I found it, I realized that this is one of the most important problems online and that I wanted to keep working on it. You actually see people get hurt. You see harassment, hate towards specific groups, even harassment based on things such as health or illnesses.
So that was the first half of my academic career—a lot of working with data. I kept learning more computational techniques, which opened up working with other types of human behavioral data. My research has been very interdisciplinary; I worked with behavioral scientists on the cyberbullying project, and with biologists and psychologists when I was working on passively collected sensor data from hospital employees.
Q: How did you become involved with the CSSLab?
A: Well, it was around that time that I started paying a lot of attention to issues like hidden biases, generalizability, and interpretability, all of which can be easily ignored if you apply “black box” techniques. I could see those issues with the types of data that we worked with; we’d fit some model to a small scale of users, and we’d make big claims.
Duncan [Watts] was one of the big advocates for generalizability and replication. I got the opportunity to start working with Duncan on a lot of different types of studies—news platform misinformation, user behavior, just a big mix of all of my interests. Plus, there was a big demand for computational skills because of all the different data types and the scales of the data that we were working with—going up to tens of terabytes. So that was the start of my past two years of collaboration.
Q: Tell me about your current work at the CSSLab on PennMAP.
A: PennMAP is all about studying the information ecosystem and all sorts of problematic content and behavior related to democracy and misinformation—which, when I started working on it, I realized is very close to all the projects I worked on and skills that I gained in the past.
As I started researching algorithmic bias on YouTube, it bugged me that a lot of the field was based on examples that were then generalized by the media and journalists. I had already seen all these issues come up when you don’t work with the data scientifically, so I started pursuing answers to questions like these without generalizing one or two examples as systematic problems.
We initiated this project because we had this really unique opportunity to access the browsing behavior of thousands of panelists from a representative panel. That was a major factor that could help us look at this problem systematically, rather than with anecdotes from some data points that we don’t really know the context of. That was how we could see that, in fact, although there are individual users who show increasing consumption of extreme political content, it’s not a systematic fault of the algorithm or some kind of flow where users get directed from benign content to somehow extreme content.
That was where I got to see how polarization affects people’s understanding of their world. I could see how harassment, hate, and polarization are connected—they reinforce each other. People start to get biased information, and then they get misinformation, and then they get these wrong perceptions of different groups, and the harassment circulates as more misinformation is produced. It can all start from a source of bias outside of the online information ecosystem, from somewhere out in society, but when misinformation is produced around it online it can reinforce a lot of hate for certain minority groups.
All of that is part of this one big problem that I am interested in: the roles of users, society, and platforms as they all interact. It’s this interaction that makes the problem visible to us, since for the first time, I can look at a lot of human behavioral data all at once by using data from the platforms. But I think that right now, a lot of people think that since YouTube and Instagram and all these platforms came into the picture, there has suddenly been suicidal ideation, victimization, polarization. But just because these platofrms have made these problems visible to us and have given us the tools to study them, that doesn’t mean that there’s causality.
Q: What have been some of the major challenges and most rewarding experiences you’ve had while working on this at the CSSLab?
A: I think that at first, or from the outside, it seems like we have this luxury of having all these big data sources. But if you want to do something correctly, it will bring a lot of challenges in terms of how to deal with the data, which will require a lot of investigation. As we start investigating these data, we realize that, yes, high-quality data is different from large-scale data. So when you see just the scale, you can’t get excited immediately; you need to validate first. Validating all the qualities for big data is way harder, computationally, and comes with a lot of resource issues. It’s something we’ve definitely needed to spend a good amount of time on over the past two years now, so that we can be sure that anything we’re inferring about society is correct. But when you pass that phase, then you can actually answer a lot of fundamental questions from these massive multi-source datasets.
Q: Tell me about the future of PennMAP, or about ways you’re hoping to expand upon your research.
A: I think that the future will be way, way more exciting than what we’re seeing right now. It’s something that keeps us working harder and harder—getting to that future where we build a lab or a data clinic where we really carefully and continously monitor the health of our data. We don’t want to just get some data, freeze it, publish a paper, and move on. We want to actually have a solution-oriented research agenda and approach.
For all these social issues and platform issues, there are constantly new policies that come out and affect the information ecosystem, and we want to analyze these changes. We need people to constantly monitor and confirm whether policies are working and how behaviors shift. We need to keep these data alive so that we can come back and check for replication as time passes, so we can see how things change, how things improve after we make decisions, and how things scale to the future. When you work with human behavior, social issues, and technological issues, things change really quick, so you have to have the resources to be able to keep your data and models up to date.
Learn more about Homa’s current research by visiting our PennMAP project page.