As of August 2022, the CSSLab is excited to welcome Coen Needell to the team as a pre-doctoral researcher. In this Researcher Spotlight, he shares about his pathway through the field of CSS, his role in the Penn Media Accountability Project (PennMAP), and how he’s poised to contribute in the year ahead.
A: I have sort of a mixed background. I started out doing physics and economics in my undergrad at Washington University in St. Louis. I minored in the philosophy of science, which I think has informed a lot of my work today—a lot more than my majors did.
I then went to the University of Chicago to do a masters in computational social science. The program was organized so that I could be a part of different social science departments while staying under the CSS umbrella. I started out in the anthro-sociology space, but bounced around a lot in the first year trying to find a good fit for my interests. That’s how I met my advisor, who does computational neuroscience in vision researching visual memory. In that vein, my master’s thesis ended up being a deep-learning model that predicts the memorability of images.
Q: How did you become involved with the CSSLab?
A: After getting my masters, I took a pre-doctoral position at Microsoft Research, where I was involved in a bunch of disparate projects with David [Rothschild]. They all revolved around understanding how mainstream news on the internet behaves, from how journalists frame the same event from different angles to how the news cycle has sped up over time. The biggest one had to do with understanding economic trends based on Bing search data.
When I got to the end of my pre-doc there, I knew I wanted to continue in academia. While I was applying to new pre-doc programs, David was already talking to Duncan [Watts] about opening a similar position at the Lab, so it ended up working out great for me.
Q: Tell me about the projects you’re involved with—or are going to be involved with—at the Lab.
A: Along with helping out with research, my main project here will be to build up the Lab’s data infrastructure. The biggest thing that everyone here needs is a way to connect the various sources we’re pulling from in our research. I’m working on a prototype which pulls from non-protected sources as proof of concept; I’m currently writing algorithms for unifying open-source data on publishers. And if I end up inventing anything in the process, it’ll turn into a paper!
“The biggest thing that everyone here needs is a way to connect the various sources we’re pulling from in our research.”
On the research side of things, I’m helping out with creating the Living Journal—a project in the works to turn the Lab’s research into real-time dashboards—and continuing some papers David and I were working on at Microsoft. We have three main projects in progress, all under the PennMAP [Penn Media Accountability Project] umbrella:
1. Our work on the speed of news is rooted in two big ideas.
One is that the news cycle has some certain amount of time that it takes for a news publisher to present something as “news.” Historically, there’s a lot of talk about a 24-hour news cycle, but the internet has sped that up dramatically. So one question is: right now, how long does it actually take for a major publisher to completely turn over content?
The second idea is based on the fact that the median American now consumes zero news articles per week. If that’s true—and we have good reason to believe it is—then who is all this turnover for? Why all this churn? Who is the market that’s buying all these quick-turnover, low-quality articles?
2. The quotes and framing project focuses on the ways that journalists report on something that someone has said, for example, and use it to frame their argument. Their core argument may not even be representative of reality, but because they frame it as coverage of a simple quote, they’re not technically lying.
We’ve found in preliminary studies that that type of reporting often focuses on someone who’s in the “enemy camp”—you find the craziest person in the opposition camp and report on the craziest thing they’ve said in public as if it’s the group’s entire opinion. The other side of this is that a certain camp might present a more hardline opinion from their group, and will use that as a way to defend or distance themselves from their more radical stances.
3. Finally, our third PennMAP paper looks at narratives about the economy in the news over the last 4 years. We’ll be looking for economic keywords in article text and headlines, with the goal of better understanding how narratives about the economy are presented by these publishers—especially in partisan ways and ways that are intended to scare people.
This has been particularly interesting because, for the past two years or so, the news has been pushing narratives of the economy that are kind of destructive. One of the things that interests me so much about the economy is that it’s so dependent on what we think about it. So when the news pushes people to be scared of inflation, that increases the impact of inflation, essentially creating an information feedback loop. We want to examine news publishers’ role in that more closely.
Q: What’s in store for you? Tell me about the type of research you’d like to pursue in your PhD and your plans for the coming year.
A: I have a few main sets of interests which have determined the sorts of labs I’m applying to. First, there’s the possibility of working on causal inference and how that relates to programming. There are a select couple of labs that I’m looking at for that type of track, along with a few that specialize in HCI—human-computer interaction.
“There are many other applications of ML techniques that we haven’t even begun to explore because they’re not very prestigious.”
However, the majority of the labs I’m applying to are bigger labs like ours, which are generally more interdisciplinary and have more freedom to explore the space. In a situation like that, I’d want my focus to be on building tools for computational social science and documenting the process. The goal of that would be to explore different applications of machine learning [ML] that are non-standard; ML research is largely driven by trying to best each other at very particular tasks, so there are many other applications of ML techniques that we haven’t even begun to explore because they’re not very prestigious.
As an example of this, in the past I took on a project that looked at BandCamp album art, broke them down into their constituent colors, and sorted them in a colorgram. I then ran Latent Dirichlet Allocation to get a set of color topics that would be popular among BandCamp artists. You could then compare those color topics across genres or tags to see patterns; for example, certain genres tended to feature more skin tones in their album art, which meant that they were more likely to include pictures of people, whereas genres like emo rock tended to use muted colors. It was a fun, non-standard application of an NLP technique, and I’d want to explore those types of applications further.
Until then, I’ll be working with the Lab on our various projects. Ideally, we’ll have proof-of-concept prototypes of our major ones done by the time I’m applying for PhDs. This will mainly be to show that we can take data from disparate sources and leverage even very simple techniques to make a unified database. This could be applied well beyond news; the approach just needs some sort of unifying object for it to work.
I’m also looking forward to working on the Living Journal project. I’ve always been really interested in user experience and interaction, so having a chance to build a large system based off of those principles—and document the process from an academic point of view—is very exciting.