Introduction
1 Structure of the course
1.1 What are the steps of a corpus study?
The workflow of a typical corpus-based study is generally as in the image below.
This course follows this workflow. I will show you how to conduct a corpus investigation step-by-step, and you will be able to do so by yourself!
1.2 Syllabus of the course
The weekly organization of this course can be found in the Table of Contents on the left in addition to the table below.
Week | Content |
---|---|
Week 1 | Introductory week |
Part 1 | Principles of corpus studies |
Week 2 | What is a corpus, and their role to understand language and literature |
Week 3 | Types of corpora: Selecting or building a corpus? |
Part 2 | Building a corpus from the Internet with modern tools |
Week 4 | Introduction to R (1/2): Installation and first steps |
Week 5 | Introduction to R (2/2): Basic functions and how to create reproducible data |
Week 6 | Example 1: Building a corpus from media platforms (1/2) |
Week 7 | Example 1: Building a corpus from media platforms (2/2) |
Week 8 | Example 2: Building a corpus from literature sources (1/2) |
Week 9 | Example 2: Building a corpus from literature sources (2/2) |
Week 10 | Midterm presentations, group and individual meetings in the classroom |
Part 3 | Simple preprocessing and analyzes of corpus data |
Week 11 | Data preprocessing: Tidying data |
Week 12 | Data analysis |
Week 13 | Data visualization |
Week 14 | Presenting a corpus-based study |
Week 15 | Group and individual meetings in the classroom (assistance for final projects) |
Week 16 | Student’s presentations of individual/group projects |
1.3 Required textbooks and additional references
For this course
We will only use some parts of these two books:
1- Paquot, M. & Gries, S. T. (2020) A practical handbook of corpus linguistics. Cham: Springer. Normally, you should be able to download this book freely with this link: Link to the book
2- O’Keeffe, A., & McCarthy, M. J. (2022). The Routledge handbook of corpus linguistics (2nd edition). London/New York: Routledge. Link to the book


Online resources:
1- R website: Link to the R website (freely downloadable)
2- RStudio interface: Link to the RStudio website (freely downloadable)
3- R for data science handbook: Link to the handbook (freely downloadable)
Additional reference
This course is task-oriented, so there are many aspects of corpus linguistics that will not be covered. If you are interested in knowing more, you can read the book published by Anatol Stefanowitsch in 2020, available for free here.
2 How to get through the exercises
The best (if not only) way to learn how to use programming language is just with practice. There are two ways to do so throughout the tutorial. In either case, I highly recommend to use markdown scripts (to know more about markdown scripts, you can refer to Week 5, Part 2.
- I will provide at the beginning of each section a markdown file prefilled with the structure of the section. You can download this file and open it with R, and just paste the codes at the relevant sections of the script.
- Alternatively, you may wish to write the script yourself, and to work based on the codes in each section. This is also a great way to learn how to structure a markdown file alone!