Short CV – March 2024

Pascual Pérez-Paredes

Catedrático de Universidad, Universidad de Murcia

My own references in this CV can be found in this URL.

Research profile

I’ve devoted a substantial part of my research career to the exploration of the interplay between corpus linguistics and language education. My main contributions are found in the development of corpus-based applications and analytical frameworks for corpus-based language education and the analysis of language use by means of corpus methods. Given my expertise in these areas, I’ve contributed, state-of-the-art chapters to a variety of specialized handbooks such as The Routledge Handbook of Corpus linguistics, 2nd Edition, The Routledge Handbook of Corpora in Language Teaching and Learning, The Handbook of Specialized Communication (De Gruyter) or The Bloomsbury Handbook of Language Learning and Technology. I’m Professor in Applied Linguistics and Linguistics, U. Murcia (Catedrático de Lingüística Aplicada y Lingüística) where I teach corpus linguistics and specialised languages research to graduate students (MALTA MA programme) and corpus-driven Morphosyntax and discourse analysis to undergrads. I teach Computer assisted language learning (CALL) research and corpus linguistics research in a range of international MA programmes. Formerly, I was Lecturer in Research in Second Language Education at the University of Cambridge. I am currently Assistant Editor of Cambridge University Press ReCALL journal (4.5 IF 2022; ranked 7th in Linguistics and 3rd in Language & Linguistics) and 2025 incoming Editor in Chief of this journal. I have been granted 4 official positive research evaluations (sexenios). My Google h-index is 28 and my i10-index is 50.

Scientific contribution to the generation of knowledge

One of my main contributions to applied linguistics research can be situated in the generation of new knowledge about the use of corpora in language education in Europe. The other is the use of corpus methods to research variation in specialized languages and Corpus Assisted Discourse Studies (CADS). Both areas are central to this research proposal. As PI, I’ve been successful in attracting some 1 million € across the years to fund research that has explored these two areas. I am a leading researcher in the field of DDL in Spain, the UK and the EU, having contributed to a critical evaluation of both language data and theoretical frameworks in Data-driven learning research. In this area I’ve pioneered the development of advanced learner tracking methods through computer logs (Pérez-Paredes et al., 2011, 20212) and multimedia pedagogic corpora of young adults across different EU languages through the EU-Minerva project System-Aided Compilation and Open Distribution of European Youth Language (SACODEYL) and of English as a Lingua Franca (ELF) through Corpora for Content and Language Integrated Learning (BACKBONE), where I was overall PI and national PI, respectively. My work with computer logs and analysis of searches has contributed to reconceptualizing how learner interaction with both WWW resources, corpus data and computer and mobile interfaces can complement each other by paying attention to the cognitive processes involved during DDL. I provided evidence that undefined searches in University students of General English (GE) is indicative of the learners’ struggle to use highly structured data sets such as language corpora. I’ve paid attention to the first stage in corpus consultation in Sinclair (2003), initiate, often overlooked by researchers and I’ve come back to this stage often in my research, problematizing it as a computational thinking problem (Zapata-Ros & Pérez-Paredes, 2018). In 2014 I co-edited the ReCALL special issue (Boulton & Pérez-Paredes, 2014) on researching uses of corpora for language teaching and learning, an opportunity for researchers to revitalize robust experimental research in DDL. Between 2014 and 2016, I led, as EU PI, the EU-funded project “Transforming European learner language into learning opportunities” (TELL-OP), a milestone in developing awareness and knowledge around the use of natural processing tools (NLP) for the analysis of language use and the use of mobile apps in the context of language education (Pérez-Paredes et al., 2019), fostering a DDL approach to language learning that highlights learners’ activation of domain-general cognitive processes and computational thinking methods in working with language data. Both in my plenary talk of the 2020 Teaching and Language Corpora at the University of Perpignan, and in Pérez-Paredes (2024), I’ve advocated for a new DDL research ecology that embraces what I have called Broad Data-driven language learning (BDDL), which makes use of existing resources such as online dictionaries, text analysis and text processing tools, and artificial intelligence (AI) for language learning across a variety of contexts, including specialized languages and self-directed uses. A second group of contributions to research can be found in the remit of corpus linguistics methods, where I’ve led the corpus compilation, processing and analyses in a variety of research projects. I’m currently Co-PI in the Spanish Research Agency -funded project on the longitudinal analysis of disciplinary literacy in English-Medium Education (to end on 31 May 2024) where I’m looking at a learners’ disciplinary corpus in the area of Business studies (Dafouz, López-Serrano & Pérez-Paredes, 2023). I’m also the PI of a EU project that is looking at the discourses of EU citizens around freedom of movement and internal migration through multi-lingual corpora. In this project (“Freedom of movement at play: EU citizens’ identity and transnational discourses”) I’m working with multilingual annotation of discourse and multimedia corpora, hoping to contribute to expanding the methods available to research multilingual corpora. I’ve been a researcher (miembro del equipo investigador) in different projects funded by the Spanish Research Agency, including “System for detection, tracking, monitoring and analysis of terrorist discourse on the Internet” (NUTCRACKER), and “The language of the Public Administration in immigration law: multilingual study and cultural implications” (LADEX), where I led the collection, computational processing and analysis of multilingual corpora used in the research. Some of these contributions have been published in Q1 journals and edited volumes such as Discourse & Society and John Benjamins collections on migration and polarization (Pérez-Paredes et al., 2017; Pérez-Paredes & McEnery, 2024). I’ve co-edited with Prof Vijay Bathia and Dr P.Sánchez “Researching Specialized languages” for John Benjamins, a volume focusing on the contribution of linguistic corpora to the analysis of specialised languages. This book won the “Enrique Alcaraz” research prize awarded by the European Association of Languages for Specific Purposes (AELFE) in 2013. I was the Overall Coordinator of the MEd Research Methods Strand at the Faculty of Education, University of Cambridge. I’ve been the corpus linguistics strand coordinator for the American Association of Applied Linguistics (AAAL) as well as the corpus linguistics strand coordinator for the European Association of languages for specific purposes (AELFE) and a member of the Proquest Corpus Linguistics Database Product Editorial Board (2019-2020).


Contributions to society, technological development and innovation activities
I’ve developed an array of applications for language analysis including the adaptation of the Text Encoding Initiative (TEI) SGML markup for the annotation of spoken corpora in the context of EU projects (SACODEYL and BACKBONE), and software for the transcription and annotation of specialised corpora in projects funded by the Spanish Ministry of Research (LADEX) and the EU (co-author of five registered proprietary software applications). I’ve taken part in corpus linguistics dissemination activities for Cambridge University Press, the 2017 University of Cambridge Festival of Ideas, have published in the Cambridge University Press Education Papers, took part in the first video cast on corpus linguistics, recorded and distributed by the Department of English, Languages and Applied Linguistics. Aston University, UK, the University of Exeter Language & Education Network Research Seminar and the main podcast for ELT in Chile… I’ve led three editions (2021, 2022,2023) of the international event Corpus Linguistics and Applied Linguistics Research, with over 4,000 registered researchers and over 600 subscribers in my research group Youtube channel (www.youtube.com/@corporaappliedlinguistics8358/) with over 7,000 views of the said talks featuring some of the finest corpus linguistics researchers.


Contributions to the training of young researchers
This is an area where I’ve tried to help as many younger researchers as possible, and one which I feel contributions are essential. I’ve taken part in tens of PhD schools across Europe and Asia including the Inter-University Doctoral Programme in Advanced English Studies: Linguistics, Literature and Culture- Universities of A Coruña (UDC), Santiago de Compostela (USC) and Vigo (U. Vigo) or the University of Oxford 2018 PhD students’ exchange seminar. I was one of the organizers of the ECR early careers researcher symposium for the Language Sciences 2019 at U. Cambridge. At the Doctoral Summer School in Applied Linguistics at the University of Malta 2019 & 2023 I was a member of a world-leading team of applied linguistics researchers including Prof Lourdes Ortega, Prof Sara Mercer or Prof Shelley Staples. Recently I set up the first edition of the international research webinar on DDL for young researchers and early career researchers. Some 60 PhD students took part in the event, which was coordinated by a former Phd student Dr Ordoñana and a PhD student from the University of Lorraine on a research stay in my research group. Some of my former PhD students are developing exciting research careers worldwide: Dr Belén Diez-Bedmar, U. Jaén; Dr Joyce Lim, U. Aston; Dr Danyang Zhang, Shenzhen University; Dr Adam Holden, Instituto Alberto Einstein Panamá; Dr Wai Mu Lim, U. Plymouth; Dr Geraldine Mark, U. Cardiff, Dr. Carlos Ordoñana, CUD San Javier or Dr Yolanda Noguera, UPCT. I’ve been an external evaluator of some of the best Linguistics and Language Education programmes in the UK, including the Research Excellence Framework for the University of Lancaster or the University of Sussex. I’ve been in the Professor and Senior lecturer promotion committee of world-leading experts in corpus linguistics and language education from the University of Lancaster, Oxford, University of Bath or Edge Hill, UK. I’ve been an external examiner in PhD vivas in the US, Canada and the UK, including the Universities of Lancaster, Surrey, Sussex, Limerick, Leeds, Vigo, Complutense, Sheffield and an internal examiner in tens of PhD vivas, mostly at the University of Cambridge. I’ve been an assessor/evaluator for research grants in the UK, Spain, Belgium and France.


C.1. Publications (last 5 years)
Corpus linguistics research: technology and applications of corpora

Pérez-Paredes, P. (2022). A systematic review of the uses and spread of corpora and data-driven learning in CALL research during 2011–2015. Computer Assisted Language Learning, 35(1-2), 36-61. This is a key systematic review on the field. It has been cited already in 90 publications.


Pérez-Paredes, P. (2024) Data-driven learning in informal contexts? Embracing Broad Data-driven learning (BDDL) research. In Crosthwaite, P. (Ed.). Corpora for Language Learning: Bridging the Research-Practice Divide. Routledge. This book offers an extensive, state of the art exploration into the complexities of corpus-based language pedagogy and data-driven learning (DDL).


Pérez-Paredes, P. (2022). How learners use corpora. In R. R. Jablonkai & E. Csomay (Eds). The Routledge Handbook of Corpora and English Language Teaching and Learning (pp. 390-405). Routledge. Conceptual framework to situate DDL across different research ecologies.


Pérez-Paredes, P. & Mark, G. (Eds.) (2021). Beyond concordance lines: applications of corpora in language education. John Benjamins. The latest title in the JB Corpus Linguistics Studies series examining DDL in language education.


Noguera, Y. & Pérez-Paredes, P. (2018). Register analysis and English for Specific Purposes (ESP) pedagogy: noun-phrase modification in a corpus of English for Military Navy submariners. English for Specific Purposes, 53, 118-130. A breakthrough paper that brings together corpus-based pedagogy and ESP. Impact Factor: 2.417. Q1 (2023).


Corpus linguistics research: corpus methods to research language use and specialized languages

Curry, N. & Pérez-Paredes, P. (2021). Stance nouns in COVID-19 related blog posts. A contrastive analysis of blog posts published in The Conversation in Spain and the UK. International Journal of Corpus Linguistics. IF 1.139. Q1 (2019) Novel approach to the multilingual analysis of hybrid genres of scientific writing. The International Journal of Corpus Linguistics is the world-leading journal in the field of corpus linguistics. With over 100 submission proposals, this is one of the five papers that were published in the special issue.


Pérez-Paredes, P. (2020). Corpus linguistics for education. A guide for research. Routledge. A research monograph in this important Routledge corpus linguistics series.


C.2. Conferences & invited talks (Only 5 entries, last 5 years)
Invited talk: The scope of DDL in the 21st century. DDL in informal contexts. International Perspectives on Corpus Linguistics for Education 2022 Seminar Series. The University of Queensland, Australia. 3 March, 2022.


Invited talk: Developing a critical agenda for learning-driven DDL. 2020. Sundsvall, Sweden. Mittuniversitetet / Mid-Sweden University Symposium on Incorporating Corpora in Teaching.


Keynote: Rethinking learning in Data-driven learning. 2020. Teaching and Language Corpora Conference. University of Perpignan, 16-17 July 2020.


Keynote: Examining internal validity in corpus research: Two case studies. 2019.The 4th Learner Corpus Studies in Asia and the World (LCSAW4) Conference. Kobe University, Japan, ESRC-AHRC (PI Prof. Tony McEnery).


Keynote: Pensamiento computacional y aprendizaje de lenguas. Computational thinking and language learning. 2018. Congreso Internacional De Tendencias En Innovación Educativa CITIE II. Universidad Nacional San Agustín De Arequipa, Perú.


C.3. Research projects, indicating your personal contribution.
CO-PI. Understanding internationalisation in Higher Education from the student perspective: a longitudinal analysis of disciplinary literacy in English-Medium Education (SHIFT). Proyectos de I+d+I PID2019-103862RB-100. 2020-2024 (May) . www.ucm.es/shift 56K €


PI. Freedom of movement at play: EU citizens’ identity and transnational discourses. European Comission. ERASMUS+ 2022-1-ES01-KA220-HED-000086521 400k € www.fomatplay.eu


PI. Backbone: Corpora for Content and Language Integrated Learning. Funding agency: European Comission. Lifelong learning Programme. 2009 –2011. 300k www.um.es/backbone


PI. SACODEYL: System-Aided Compilation and Open Distribution of European Youth Language. Funding agency: European Comission. Minerva Programme. 2005 –2008. 250k.


PI. Transforming European Learner Language into Learning Opportunities. European Comission. 2014‐1‐ES01‐KA203-004782. 2014-2017. 2014-2016, 285k € www.tellop.eu


PI. Adverbs in spoken language: A corpus-based analysis of learner and native-speakers. Cambridge Humanities Research Grants Scheme 2016. University of Cambridge. 2016-2017.10k £


Collecting a Dialogue Corpus for Language Learning. Cambridge language sciences incubator fund. University of Cambridge. 2018-2020. 10k


C.4. Contracts, technological or transfer merits etc.
Over 20 contracts to provide linguistic analysis to companies and citizens, including collaboration with universities. Last contract with U. Jaume I Computational processing of quantitative data using scripts from the Corpus GENTT informed consent trilingual (Spanish/Catalan/English)A hundred contracts to produce official translations (official translator EN-SP). Software co-developer: SACODEYL TEI ANNOTATOR, SACODEYL TRANSCRIPTOR. BACKBONE COLLABORATIVE MANAGEMENT TOOL, BACKBONE MULTIMEDIA TRANSCRIPTOR, BACKBONE COLLABORATIVE ANNOTATOR, LADEX TRANSCRIPTOR, LADEX COLLABORATIVE ANNOTATOR