(The Culture Digitally community hails from a number of disciplines, Communication being one of them. We were excited to come across Rodrigo Zamith’s analysis of the presentations at ICA, the most prominent of the scholarly conferences in the Communication field. We’re thrilled to share it here. Even if your work draws on other fields, many of the issues interesting to this community are being addressed at ICA… Tarleton and Hector)

This week, I’m in London with my colleague and adviser, Seth Lewis, attending the 2013 ICA Annual Conference. I’ll be on a panel on the challenges of ‘Big Data’ for communication research and to present recent work on how news innovators are re-imagining dynamic spaces for online news discussion. (Do stop by if either topic interests you. The ‘Big Data’ panel is on Friday, from 1:30-2:45 at the Lancaster room; the online news discussion paper is immediately beforehand, from 12:00-1:15 at the Belgrave room.)

This will be my first time at ICA, and I understand that it has a reputation for being (a) quite large and (b) very diverse. With this in mind, I was curious about where people would be coming from and thus decided to gather some data about the authors of the works being presented. (Note: there will surely be attendees who are not authors on accepted papers). Since these data are not, to my knowledge, publicly available in a structured format, I had to scrape the online conference program and work with that data (which was not always pretty). See the “Data and Source Code” section below for more information. To the first point, there were a whopping 2,338 different submissions accepted for presentation at ICA (including some pre-conference sessions). Among all authors, there are 3,328 unique authors, representing 804 unique institutions from 61 different countries. Among first authors, there are 2,001 unique authors, representing 649 unique institutions from 56 different countries. The distinction between all authors and first authors is made repeatedly throughout this post; papers can have multiple authors, who are generally ordered by the amount of their contributions. The first author, therefore, is in most instances the one who led the work and presumably more likely to be attending the conference in order to present it.

Below are some of my observations from looking at the data. However, there’s a lot that one can dig through in these data, so I also decided to include an interactive feature near the bottom of the post that allows readers to browse through previous years of the conference and segment by division, type of submission, and the statistic of interest.

Where are authors coming from?

The first thing I decided to do was to plot the data about all authors on a map (click images to zoom). Unsurprisingly, North American and European institutions are quite well-represented. Countries outside these two continents—except, perhaps, for Australia, China, Israel, Japan, Korea, Singapore, and Taiwan—are not very well-represented in terms of numbers. Indeed, South America and Africa stand out as having a very limited number of authors.

Building on this, even though a total of 61 countries are going to be represented at ICA, the vast majority of authors are coming from just a handful of countries. Indeed, 71.8% of all authors are coming from the top 4 countries: USA (54.3%), Germany (6.8%), Great Britain (5.4%) and the Netherlands (5.2%). Among first authors, that figure ticks up to 71.9%. The disparity is quite staggering, with the U.S. single-handedly accounting for more than half of the unique authors among both all authors and first authors. With the U.S. being so prominent (and also where I study), I decided to create a map just for U.S.-based institutions (see figure to the right).

In terms of number of schools, the East Coast, and especially the Northeast, appears to be very well-represented; however, when we look at the schools with the highest number of authors, the Midwest has the greatest amount, accounting for half of the top 10 schools (and slightly more than half when we consider onlyfirst authors).

Institutional Representation

Another piece of information that I was interested in was which schools were best-represented at ICA. In terms of first authors, U.S.-based institutions accounted for nine of the top 10 institutions, with the University of Amsterdam being the lone exception (see figure to the right). Among first authors, the University of Texas has the greatest number of unique authors (48), followed by the University of Southern California (45), University of Amsterdam (38), University of Pennsylvania (36), and Wisconsin (33). Among all authors, U.S.-based institutions accounted for eight of the top 10 institutions, with Michigan State (73) serving as the leader, followed by the University of Texas (71), University of Amsterdam (67), University of Southern California (67), and Ohio State University (61). I should caution readers to not use these figures as a proxy for assessing program quality; the size of the schools and the relevant departments is not uniform. Additionally, even if these data were proportional (e.g., per capita), there are several additional variables that would come into play, such as the amount of travel funding provided to students and faculty (travel to and lodging in London is not cheap).

Most-Prolific Institutions

One would expect to find a fairly similar figure in terms of the number of submissions for each school, and we do, to a certain extent. In terms of first authors, U.S.-based institutions accounted for eight of the top nine institutions worldwide, with the University of Amsterdam being the lone exception (see figure to the right); four schools tied for tenth place, two of which are not from the U.S. Among first authors, the University of Texas had the greatest number of unique submissions (64), followed by the University of Pennsylvania (51), University of Southern California (51), University of Amsterdam (46), and Michigan State and and University of Wisconsin (33 apiece). Among all authors, U.S.-based institutions once again accounted for eight of the top nine institutions worldwide. The University of Texas (76) again topped the chart, followed by the University of Southern California (60), University of Amsterdam (56), University of Pennsylvania (56), and Michigan State University (51). Again, readers are cautioned against using these figures as a proxy for assessing program quality.

Most-Prolific Scholars

In addition to the institutions, I also wondered who the most prolific scholars were, in terms of the number of accepted submissions. Among first authors, that honor goes to Liesbet van Zoonen from ~~the University of Amsterdam~~ (Update: Prof. van Zoonen now teaches at Loughborough University), who led five studies that were accepted for presentation. Following Liesbet were five authors who each had four submissions accepted: Claudia Mellado (University of Santiago), Colin Sparks (Hong Kong Baptist University), John Hartley (Curtin University), Matt Matsagani (University at Albany-SUNY), and Xiaoli Nan (University of Maryland). Among all authors, the honor is shared by Carolyn Lin (University of Connecticut) and Michael Xenos (University of Wisconsin), each of whom had some part in six accepted submissions. 18 different authors followed suit with five accepted submissions. It should be noted that these figures include both full papers as well as session papers; when only full papers are considered, the picture is quite a bit different. Among first authors, Xiaoli Nan (University of Maryland) is the most-polific scholar; among all authors, that honor belongs solely to Carolyn Lin (University of Connecticut).

Submissions by Division

The unit with the most submissions was ‘Sponsored Sessions,’ which appears to comprise of, among other things, all of the pre-conference sessions; 184 submissions were accepted by that unit. Among the divisions, Communication and Technology (173) accepted the most submissions, followed by Mass Communication (157), Political Communication (153), Journalism Studies (150), and Popular Communication (133). One should make clear here that higher figures don’t always mean higher acceptance rates; indeed, using published AEJMC acceptance figures as an example, last year, the three divisions (out of 18) that accepted the most submissions had acceptance rates of 53.4% (CTEC), 48.9% (MCS), and 44.5% (PR), and ranked 14, 7, and 2, respectively, in terms of difficulty (with rank 1 being the lowest acceptance rate). I’ll be interested to see what the acceptance rates for ICA are, when (if?) those figures are published.

What Kind of Work will be Presented?

After accounting for punctuation, stemming the words, and removing stopwords, the two most common words on accepted submissions’ titles were, unsurprisingly, “media” (423 occurrences) and “communic” (283). After removing these terms (in order to reduce visual pollution), I created a word cloud of the most common keywords in the titles. The top term was “social” (265), which I suspect highlights the vast growth of interest in social media, though it could also be used to refer to things like social relationships, social movements, and social theory. The second most-frequent term was “effect” (198), which may suggest a focus on media effects-oriented research, though it could certainly allude to non-media variables as well (or the effectiveness of interventions). The third most-frequent term was “news” (191), which I find to be heartening given my own personal interest in journalism; this would likely include a mixture of content analyses as well as phenomena related to news production, selection, and consumption. The fourth most-used term was “polit” (184), which would include variants like “political” and “politics” and helps to highlight the close ties between the fields of Mass Communication and Political Science. The fifth most-frequent term was “onlin” (158) which suggests that scholars focused a considerable amount of effort on studying new media (if one can still call it that). The remainder of the top ten most-frequent terms were “public” (149), “studi” (117), “network” (113) “analysi” (98), and “role” (95).

Interactive Conference Explorer

For an interactive map of these data, please see: http://www.rodrigozamith.com/2013/06/13/interactive-map-of-the-2013-ica-conference/

Data and Source Code

All of the data were obtained by scraping the online conference program of ICA with a custom Python script that may be obtained here. The final data files used to make these plots may be downloaded here. The R code used to generate the plots may be obtained here. The code for the HTML/JS interactive feature may be found here here. (If you wish to use any of this, I just ask for attribution, with the optional link back to my website.)

In the vast majority of instances, the spellings and institutional affiliations reflect what was found on the ICA program. Because author and institution data were grouped together as a single string, I had to split them up by using commas (Author, Institution) and semicolons (to segment submissions that had multiple authors) as delimiters. This generally worked well, although it did result in stripping suffixes like “, Jr.” and “, III”.

I did perform some small additional clean-up for the year 2013, especially for institutions; specifically, I reviewed the list of affiliated institutions and combined different variations (e.g., misspellings) of the same institutions. Additionally, no geocoding information was available on the online program for ICA and simply plugging the institution’s name into a batch geocoder oftentimes led to funky results. I thus manually looked up references (e.g., actual addresses or appending a city and country) for each institution and batch geocoded them using the Google Maps API through the ggmap R package. Due to a small issue with the package, countries were later associated by reverse-coding the coordinates using GeoNames and the GeoNames R package. In a few instances, the associated institution was something generic like “Communication School” or simply institution that I could not readily locate with a quick web search; those cases were removed when creating the map (they are included in the other graphs, however).

All analysis was performed using R (and RStudio) and the following key packages: ggplot2 (plots), tm (text mining), and wordcloud (for the word clouds). The scraping was performed using Python and leveraging the wonderful BeautifulSoup library. The figure browser above uses jQuery to switch between images.

This analysis was originally posted on Rodrigo Zamith’s personal blog.