Introduction

1 Structure of the course

1.1 What are the steps of a corpus study?

The workflow of a typical corpus-based study is generally as in the image below.

This course follows this workflow. I will show you how to conduct a corpus investigation step-by-step, and you will be able to do so by yourself!

1.2 Syllabus of the course

The weekly organization of this course can be found in the Table of Contents on the left in addition to the table below.

Week Content
Week 1 Introductory week
Part 1 Principles of corpus studies
Week 2 What is a corpus, and their role to understand language and literature
Week 3 Types of corpora: Selecting or building a corpus?
Part 2 Building a corpus from the Internet with modern tools
Week 4 Introduction to R (1/2): Installation and first steps
Week 5 Introduction to R (2/2): Basic functions and how to create reproducible data
Week 6 Example 1: Building a corpus from media platforms (1/2)
Week 7 Example 1: Building a corpus from media platforms (2/2)
Week 8 Example 2: Building a corpus from literature sources (1/2)
Week 9 Example 2: Building a corpus from literature sources (2/2)
Week 10 Midterm presentations, group and individual meetings in the classroom
Part 3 Simple preprocessing and analyzes of corpus data
Week 11 Data preprocessing: Tidying data
Week 12 Data analysis
Week 13 Data visualization
Week 14 Presenting a corpus-based study
Week 15 Group and individual meetings in the classroom (assistance for final projects)
Week 16 Student’s presentations of individual/group projects

1.3 Required textbooks and additional references

For this course

We will only use some parts of these two books:

1- Paquot, M. & Gries, S. T. (2020) A practical handbook of corpus linguistics. Cham: Springer. Normally, you should be able to download this book freely with this link: Link to the book

2- O’Keeffe, A., & McCarthy, M. J. (2022). The Routledge handbook of corpus linguistics (2nd edition). London/New York: Routledge. Link to the book

(a) Paquot & Gries (2020)
(b) O’Keeffe & McCarthy (2022)
Figure 1: Required textbooks

Online resources:

1- R website: Link to the R website (freely downloadable)

2- RStudio interface: Link to the RStudio website (freely downloadable)

3- R for data science handbook: Link to the handbook (freely downloadable)

Additional reference

This course is task-oriented, so there are many aspects of corpus linguistics that will not be covered. If you are interested in knowing more, you can read the book published by Anatol Stefanowitsch in 2020, available for free here.

2 How to get through the exercises

The best (if not only) way to learn how to use programming language is just with practice. There are two ways to do so throughout the tutorial. In either case, I highly recommend to use markdown scripts (to know more about markdown scripts, you can refer to Week 5, Part 2.

  • I will provide at the beginning of each section a markdown file prefilled with the structure of the section. You can download this file and open it with R, and just paste the codes at the relevant sections of the script.
  • Alternatively, you may wish to write the script yourself, and to work based on the codes in each section. This is also a great way to learn how to structure a markdown file alone!