Wikipedia access and contribution : Language choice in multilingual communities . A case study *

This paper presents a study on language use in both accessing and contributing to Wikipedia in a context were users were expected to be able to read and write in at least three languages (Catalan, Spanish and English). Seventy-seven first-year audiovisual communication students made contributions to Wikipedia as part of the assessed work in the firstyear course titled “Digital Culture.” Before and after writing Wikipedia articles, the students responded to two questionnaires that enquired about their language-related habits when using the site and about their language choice for contributing to it. The results show how some interesting facts and patterns appear between the languages known and used in editing. Students favor the English edition of Wikipedia when consulting it despite the fact that this is the language they assess themselves as being less proficient at in reading. More generally, our research shows that multilingual Wikipedia users move seamlessly from one language edition to another, thus refuting the cliché that relates minority languages with exclusively local and self-referential topics. In relation to this, it brings to light some correlations between the students’ identification with either one or two main languages, and how this influenced both their choice of language in editing * Funding Project 1: “Interactive content and creation in multimedia information communication: audiences, design, systems and styles”. CSO2015-64955-C4-2-R (MINECO/ FEDER). Spanish Ministry of Economy and Competitiveness. Funding Project 2: 68-Plaquid, 2015-2016; Creación y ampliación de artículos en la Wikipedia como herramienta docente y de evaluación. Pompeu Fabra University (Spain). 64 Anàlisi 57, 2017 Joan Soler-Adillon; Pere Freixa Wikipedia articles and the specific topics they decided to write about. Additionally, the study also offers relevant insight on what drives students to engage with such a task up to the point of making extra contributions to make sure their contributions might reach a larger audience.


Introduction
In recent years, Wikipedia has been used extensively in education.Teachers use it at different levels and contexts as a tool for their students to acquire a range of skills that include collaborative and academic writing, digital literacy and familiarization with the open culture of Internet.In turn, the student's contributions become part of this massively used project, and can help expand the domains covered by the courses where this experience take place and this is, indeed, one of the motivations to incorporate these activities in the classroom (Jemielniak and Aibar, 2016).
One of the indicators of how much activity these practices involve is the Wikimedia Outreach Education Program website (Wikimedia, 2016).Along these lines, Lerga and Aibar (2015) recently published an extensive and detailed report of examples of such use, which is at the same time a very valuable guide to understand the type of activities that can be implemented, which range from the creation of articles to the critical analysis of existing ones.Their report is part of a wider research effort, which aims at accounting for the benefits of these activities as a way of fostering both collaboration and the development of media literacy skills among students, while participating in the open culture of the project (Lerga and Aibar, 2015;Brailas et al., 2015;Ricuarte-Quijano and Álvarez, 2016;Dawe and Robinson, 2017).While most of these accounts openly advocate the formal use of Wikipedia in academic practices (see, e.g., Hafner et al., 2015;Walker and Li, 2016;Freire and Li, 2016;Meseguer-Artola et al., 2016;Di Lauro and Johinke, 2017), it is acknowledged that, in the university context, there are still important concerns among scholars regarding the use of this online encyclopedia (Llados et al., 2013;Aibar et al., 2015;Konieczny, 2014Konieczny, , 2016)), even though such concerns have diminished over the last few years (Shachaf, 2009;Soules, 2015).As regards students, other studies suggest that there is also some resistance to openly using Wikipedia, especially due to a lack of value in terms of how useful and credible they consider it to be (Meseguer-Artola, 2014;Selwyn and Gorard, 2016;Huang et al., 2016).It has been argued that this is due to a lack of knowledge of how the actual editing process works (Menchen-Trevino and Hargittai, 2011).
There is, however, one important aspect that usually remains unaccounted for in these studies: the uses of the different language editions of Wikipe-dia by users who can read more than one language.This is arguably due to one of two very practical reasons: either the study focuses on something for which language use is not relevant, or it takes place in a largely monolingual community.Nevertheless, there are contexts in which it becomes relevant that the users have access to more than one language edition of Wikipedia, and thus move from one to another depending on motivation, trust, topic and length of the articles in each edition.In these contexts, contrasting and comparing the different versions becomes a relevant part of the experience.For a variety of reasons, there are significant differences in coverage, approaches and even internal policies among the different editions, which impact on how one can participate in the writing of articles.Additionally, there are the possible biases, which have been identified in the literature as being more likely to occur in smaller language communities, as it is expected that they would have a smaller group of people involved in the curation process (Pfeil et al., 2006;Massa and Scrinzi, 2013;Eom et al., 2015).
In addition to this, the use of Wikipedia in the classroom in certain contexts affords the possibility to reflect on the role of the academic institution in promoting scientific literature in minority languages.In the design of the academic task, the teacher can incorporate the coverage of local or universal themes, or the need to address certain topics in which the local edition of Wikipedia needs to grow.The debate on the role of institutions in promoting Wikipedia can bring to light certain agendas, both ideological and cultural, which might or might not collide with Wikipedia's neutrality and with the institution's own promotion and international visibility (Hale, 2015;Lages et al., 2016;Miquel-Ribé and Laniado, 2016).
The work presented here builds on previous research on the innovative pedagogic uses of information technologies and open platforms.Some of these previous studies have addressed the acquisition and evaluation of generic and transversal skills by higher education students, such as socialization processes and critical capacity (Freixa and Sora, 2008), while others have aimed at creating a more generic account of the competencies needed for the practice of interactive communication (Soler-Adillon et al., 2016).Finally, the use of online multilingual tools has also been explored by the authors both through the implementation of pedagogic tools in the university and through the dissemination of their design processes (Freixa et al., 2013).

Methods
With the help and guidance of the volunteers at Amical Wikimedia, an organization which is dedicated to promoting the Catalan version of Wikipedia, we implemented our Wikipedia study with a group of first-year audiovisual communication students at Universitat Pompeu Fabra (UPF) in April-June of 2016, as an assessed exercise in the Digital Culture course.We asked students to create a Wikipedia article (or improve an existing one) of their own choosing within the scope of the course.The students spent six weeks in the process of researching, drafting and finally publishing their articles in Wikipedia.They had to get their article published as part of the task, which meant adhering to the Wikipedia community's guidelines along with the requirements of the course.They were free to decide the language in which they would write the article.Since the university is based in Barcelona, we expected the students to be able to do so in Catalan, Spanish or English.As a general rule, students at UPF are expected to be fully competent in both Catalan and Spanish and to have at least the ability to read English at an advanced level.
The possibility of performing this study with bilingual and trilingual students allowed us to look into the quality of the contributions by the kind of users that, according to Hale, are particularly relevant for they are those who can "play a unique role in diffusing content between different language editions" (2014: 100).Consequently, in doing so these users help to make the different editions more uniform.In addition, this ability to compare versions allows them to check the biases that, according to some authors, are more likely to occur in minority language editions.Hale suggests that users writing primarily in smaller-sized language editions "will be more likely to cross-language boundaries than users writing primarily in larger-sized language editions" (Hale, 2014: 101).
We prepared two questionnaires for the students to complete before and after the experience, each of them containing language-related questions.First, we enquired about language identification and knowledge: mother or main language and reading comprehension of Catalan, Spanish and English.Second, we asked the students about the general use of Wikipedia in terms of language editions in these three languages.With a frequency Likert scale (Very frequently/Frequently/Occasionally/Rarely/Never) and with an open question for each: "In which situations do you use the Catalan/Spanish/ English version of Wikipedia?" In the post-experience questionnaire, we enquired again about the students' main language and then asked in which language they edited or created the article and why.We also added an open question to address an issue that became relevant after reading the answers to the first questionnaire.In many of the open comments on the differences between language editions, the students suggested that the English version was better, more complete or more reliable.In response to this, we wanted to further enquire on this particular issue, so we decided to incorporate the following question to Q2: "In the previous survey, some answers suggested that the English version of WP is the most reliable (i.e., it is more reliable than others).Do you agree?Why?".
Finally, a last question was included in order to enquire about whether the multilingual users used the referenced text in other language editions when writing their articles: "What types of information sources have you used?"By doing so, we were attempting to confirm or refute Hale's hypothesis about the role of this type of users in the dissemination of content between different language editions of the encyclopedia.Table 1 shows the questions specific to language in both the questionnaire that the students completed before editing Wikipedia (Q1) and the one they completed after the experience (Q2).Open answer What types of information sources have you used?

Open answer
Source: The authors.

Results and discussion
As shown in figures 1 and 2, despite a lower participation in Q2, the gender proportion remained largely the same in both questionnaires as did the age distribution with the notable exception of those students who were exactly 18 years old.In detail, 77 students responded to Q1 and 50 to Q2.In Q1, 55 were female, 20 were males and 2 preferred not to respond to the gender question.The respondents' ages ranged from 17 to 34, with the following

Language knowledge and linguistic use of Wikipedia
In Q1, we were able to gather some interesting data about the students' uses of the three language editions of Wikipedia under consideration.In this phase, we asked them about their main language and language knowledge (reading comprehension) before delving into the uses of the different language versions of Wikipedia.
The first language-related question addressed to the students was: "Which of the following is your main language?"Of the students who responded that they had only one main language, 34 (44.2%)identified Catalan as being their main language, while 18 (23.4%)stated that it was Spanish.A total of 21 students (27.3%) responded that they had a combination of Catalan and Spanish, 3 students added English to these two languages, and one student responded that his or her main languages were English and Spanish.Thus, 58 (75.3%) of the students had Catalan as or among their main languages, 43 (55.8%) had Spanish and 4 (5.2%) had English among their main languages.
When asking about language reading comprehension, we used the European Framework of Reference for Languages, but specified to the students that we were not asking about language certificates but for a self-assessment of comprehension.The possible answers for each of the languages were: A1 Beginner; A2 Elementary; B1 Intermediate; B2 Upper intermediate; C1 Advanced; C2 Native or Proficient.Following this, the questions addressed the frequency of use of the different versions of Wikipedia, and we ended with a question on choice of language in the event that an article existed in all three of the language versions.Table 2 shows the answers to these questions in detail.What follows is a descriptive analysis of some of the most relevant aspects of it.
As expected, the results in terms of reading comprehension were very high both in Catalan and Spanish, with 68 (88.3%) and 69 (89.6%) of the students judging themselves to be either native or proficient in these two languages, respectively, while only 4 (5.2%) of them said the same about English.It is in this last language where the answers showed a wider spread, with 32 students (41.6%) answering that they had an advanced level and 31 (40.3%) that they had an upper intermediate level.
In contrast, the use of language versions shows a clear tendency towards Spanish and English.While the numbers for Catalan are very much spread across the options and clearly tend to the middle point, both of the other two languages tend strongly to the frequent and very frequent use.We enquired about language choice with the following question: "If an article is available in Catalan, Spanish and English, which version are you most likely to read first?"Here, the majority of the respondents stated that they were most likely to read articles in English.Specifically, 37 students (48.7%) chose English, while 26 (34.2%) responded Spanish and 13 (17.1%)Catalan.
To further enquire into the uses of language for information access, we asked about the situations in which they go to either version of the encyclopedia.In this case, a large number of answers pointed to the use of the Catalan version for topics specifically related to Catalan culture or geography.The second main reason was related to completion and complementation.That is, they stated that they used the Catalan version to compare it with the other language versions, and then chose the language version that had a longer article on the topic.According to the students' responses, the Spanish version was used for more general purposes.Although some answers did point to cultural specificity, in general the students acknowledged that the Spanish version provided longer and more complete articles, to which they either went straight away or when they needed to complement what they had already read in Catalan.Finally, the English version was seen by many of the students as the main reference page, and they stated that they used it 'by default'.A large number of answers referred to the fact that this version usually had longer articles on the topics they were reading about, and that this was why they would usually go to this version first.
Finally, in Q2 we added a question to specifically address this trend, which we had identified in the first survey.To this end, we asked the following question: "In the previous survey, some answers suggested that the English version of WP is the most reliable (i.e., it is more reliable than others).Do you agree?Why?" Keeping in mind that the students responded to this question after having written a Wikipedia article and undergoing the process of publishing it (and thus of the strict peer review curation of the Wikipedia community of volunteers), it is interesting to see how most of the answers (31 out of 50) were positive and thus identified the English version as being more reliable than the others.About half ( 16) stated the contrary by answering negatively, and on some occasions they did so stressing how strongly they disagreed.The other three answers didn't really respond to the question, although two of the three did state that it was relevant that English had more information than the rest.

Language choice for editing
In response to the question "In which Language did you edit/create your Wikipedia article?" 33 students stated Catalan, 10 Spanish, 3 Catalan and Spanish and 3 Catalan and English.Only 1 answered Catalan, Spanish and English.These results show that 18% of the students worked with more than one language, while 82% preferred to edit articles in only one.
It is interesting to observe the changes from the main language to the language chosen for the editing of articles.In the open question regarding language choice, a large percentage of students who identified Catalan as their main language and who wrote the article in Catalan explained that they did so because there were no Catalan articles on the chosen topic.Only one student stated that he or she had written in Catalan "in order to promote the culture and language".Two other students explained that they weighed the fact that they were writing about someone or something that was especially relevant to the Catalan community.Table 3 shows more detailed information of these language choices.
All of the articles in this exercise were written by groups of students.Naturally, this forced them to agree on the language used in writing.There is a substantial group of students who identified their main languages to be both Catalan and Spanish, and who wrote the article in Catalan.Most of them stated that they did so because the article already existed in English and Spanish, but not in Catalan.Only two students, one who went from Catalan as the main language to Spanish as the editing choice, and one who did the exact opposite, stated that they did so because it was a group decision to which they had to adhere.
Finally, it is worth mentioning that editing in more than one language is, according to the students' comments, linked to a clear will on their part to both reach a larger audience and to allow English-speaking audiences to read about local topics of either Catalan or Spanish culture.It is important to note here that this was not something that would be weighted in their favor when being marked on their assignments.Rather the opposite, since it demanded an extra effort on their part.There was even a case of a student who wrote two additional short articles to complement the main article that she had been working on.Thus, we can read this as an indicator of their engagement with the task and hence of its success regarding the desired objectives.
As shown in Table 3, there is a clear tendency among the students that identify themselves as having one main language to use such language to edit or create an article.There are 14 cases of Catalan to Catalan and six of Spanish to Spanish, which account for a combined 40% of all the cases.As can be observed, there is a very strong tendency to edit in Catalan among those students that state that both Catalan and Spanish are their main languages (17 out of 21) and not one case among those who did work in both.Source: The authors.
The overall numbers show that the students are very flexible and feel very free to move from one language to another.With three languages in the mix, and only two as the main language of the students, there are eleven different combinations of main language to editing language.Arguably, this is a reflection on how they move freely from one language edition to another depending on the context in which they are working.
Finally, as said above, we enquired about types of sources used in light of Hale's idea of the role of multilingual students in disseminating content from one edition to another.The answers we got from the students were generally rather generic, with students stating that they had used "books, news and web pages" or "information in e-books and newspaper articles".A small number (6%) cited Google Scholar as a source to find academic references.However, despite this generic tone, 12% of the students explicitly stated that they used Wikipedia pages in other languages in order to create their own article (note that these were never cases of simple translations, which had been explicitly ruled out as a possibility in the assignment).Among the explicitly multilingual students, or those who wrote in more than one language, there is no specific mention that shows the transmission of content from one language edition to another.The trilingual reading ability of virtually the whole cohort of students who participated in the task facilitates the consultation of sources in either language, allowing the criteria to be that of quality of content and not of language availability.However, in light of our results, we cannot conclude that Hale's hypothesis on the dissemination of content across language editions is confirmed.

Edited articles and academic guidance
In order to centralize the exercise, and with the help of the Catalan Wikimedia volunteers, we created a Wikiproject page with basic instructions on how to edit articles and links to all those that the students created (Viquipèdia, 2016).The context of the course, Digital Culture, and more generically Media Studies (Comunicació Audiovisual), the degree that the students were enrolled in, provided a frame of reference for choosing the theme.A list of topics was offered to students, but they were free to choose either those or propose new ones, and most choices were closely related to the core contents of the course.
Table 4 shows the 22 new articles that the students created and the language in which they were written (column 2).Column 3 shows whether the article exists in other languages.Finally, column 4 shows the relationship between general and local.The first of the two terms refers to whether the article's topic is of local interest (an author, work or topic related to Catalonia or Spain) or general (historic figure, global concept or idea, authors or works of a universal value).The second term refers to whether the article is aimed at a local audience or at an international public.We exclude from the table those smaller contributions to articles that had already been published.As the table shows, in general terms the students decided to create articles in Catalan about topics of what is becoming their area of specialization and which weren't covered by this particular version of Wikipedia.In a smaller percentage, we find those who have decided to cover local topics in the local language.Six contributions are completely new in any edition of Wikipedia, in the sense that they cover topics or authors (mostly local) for which no article existed.
Therefore, the dissemination aspect of the project, from this point of view, consisted mostly in expanding the Catalan language coverage of the topics.This was indeed one of the main goals of the academic activity and is also in accordance with the foundational goals of Amical Wikimedia.Thus, while we did not specifically steer the students towards working in this direction, they did contribute to expanding Wikipedia's coverage of relevant topics for their studies in their own language.

Conclusions
Our research shows that bilingual and trilingual users of Wikipedia move comfortably among the language editions they can read, but they tend to favor stronger languages when reading about general topics, while going to the editions in the smaller language for more local topics.This is in part due to the very straight forward fact that stronger language editions tend to have a larger number of articles and tend to be longer.For example, the English Wikipedia has, to date, 5.5 million articles, while the Spanish version has over 1.3 million and the Catalan edition has 555,000 (Wikimedia, 2017).However, by accessing the English edition 'by default' for general topics, as many students stated was their usual attitude, these students are implicitly, and most likely inadvertently, accepting the biases 1 of this particular view, which is already culturally dominant in many other domains.In any case, the general trend is that students regularly access either of the language editions they can read in search of a better coverage of the topics they are researching.This use of Wikipedia shows how the multilingual nature of the project is not only valid to provide access to different language communities, but that it works as a whole in terms of accounting for topics that appear in the different languages for multilingual users, offering different levels of coverage and, occasionally, on points of view on the same topics.
1.This is an assumption among the Wikipedia community.It is easy to prove if you go to articles related to history, revolts, wars, etc., in which different countries are involved.An example of this was presented by the Amical Wikimedia team to the students: in the Catalan Wikipedia article on the "Guerra dels Segadors" (Reaper's War) -a revolt which took place in 1640-1659 -there is a link to its equivalent Spanish Wikipedia article on "Sublevación de Cataluña" (Insurgency or revolt of Catalonia).Similar cases can be found in articles on cultural manifestations, such as film, music, etc.However, while this is an interesting discussion, it falls beyond the scope of our investigation.
In relation to this, we can also observe that, while the reading knowledge of Catalan and Spanish is very strong, and that of English is significantly lower, the trends of language use shift in favor of English despite this fact.The open question in Q2 confirms that this is due to the perceived stronger reliability of the English edition, although some answers did explicitly disagree with this after students had undergone the process of editing in Catalan or Spanish.The fact that, as stated above, the English Wikipedia has more articles and these tend to be longer, and arguably because English is commonly perceived as a universal language, affects the perception of this particular edition as being the best both in terms of reliability and coverage of topics.Despite this preference for English, there is no indication in our results that this is due to perceiving the smaller language editions to be more likely biased, as suggested by some literature (Pfeil et al, 2006;Massa and Scrinzi, 2013;Eom et al., 2015), but it rather seems to be related to the extension and completion in the more widely used language.
When analyzing the post-experience language-related questions, we can observe a trend towards favoring Catalan as a language of choice for editing the articles among the students who identified themselves as Catalan-Spanish bilinguals, while those who identify only one of these languages to be their main one tend to use it as their editing choice.Thus, we can infer a correlation between the own language and the language of choice for writing and a preference for Catalan among the bilingual students.We believe that the success in creating articles in Catalan is mostly due to the will of making new contributions.That is, of creating articles that did not exist in either language in Wikipedia or, at least, in Catalan in particular.
The editing in English or Spanish is mostly based on smaller contributions to already existing articles.It is, from this point of view, a type of contribution that is common among more experienced and committed users of the site.However, there are some cases in which students used these two languages out of a desire to impact on a larger audience.In most cases, this was done in parallel to the Catalan article, by translating their own contribution.As this was not assessed, it clearly shows how the students engaged in the task beyond the strictly academic motivations of fulfilling the assessed task.This motivational element is very much in line with the dissemination goals that Jemielniak and Aibar (2016) argue for when advocating this type of activities in the classroom.
The agenda of the institutions promoting the use of Wikipedia and how this might either align or collide with Wikipedia's goals has also been discussed in the literature (Hale, 2015;Lages et al., 2016;Miquel-Ribé and Laniado, 2016).In our case, there was a synergy between Amical Wikimedia and the Universitat Pompeu Fabra.Promoting the Catalan language and culture is among the university's general goals.This explicit aim at protecting Catalan cultural heritage finds its institutional formalization through the Pompeu Fabra Chair, which focuses on this goal.Similarly, Amical Wikimedia is an organization devoted to promoting Wikipedia's Catalan edition.In line with this, pedagogic and research efforts like the one presented in this paper afford an interesting level of impact in promoting the language in areas where the specific terminology is often underdeveloped.This was one of the goals of this project, although not one that was made explicit to the students, as they were always given freedom of choice in terms of the language they would use in their contribution.However, 66% of the articles created were written in Catalan and thus the impact on this language edition is relevant within the topics covered.As regards the students, this contribution to the Catalan version of Wikipedia was done in a more implicit than explicit way, as the exercise never focused on the actual use of one language or the other from their point of view, but on the creating of content on their topics of interest and -quite explicitly in this case -improve the coverage of the core topics of the course in which the experience took place.
Future work includes two main vectors: the first is to perform a further analysis of the language tendencies both in use and in content creation and to incorporate data from other multilingual contexts in order to draw comparisons.Importantly, this should be done without neglecting the fact that specific linguistic contexts all have very strong and unique characteristics, and thus drawing these comparisons is particularly challenging.Finally, a second vector would be to precisely factor in these political elements that affect the uses of the different languages, which can be more or less explicit depending on the context.

Table 1 .
Common Q1 and Q2 questions and Q1 and Q2 specific multilanguage questionnaire

Table 2 .
Number of responses and percentages on reading knowledge and frequency of use Source: The authors.

Table 3 .
Edition language and main language vs. edition language

Table 4 .
Articles created, theme content and language