Introduction

Slides of this section

The slides corresponding to this section can be found here: Chalkboard

1 Structure and workflow of the tutorial

The workflow of a typical corpus-based study is generally as in the image below.

The sections and their links can be found in the Table of Contents on the left in addition to the table below.

Chapter and content Link
Chapter 1: Getting started
Download and install R/RStudio link
Presentation of R/RStudio link
Working with markdown files link
Chapter 2: Let’s get our data
Understanding where the data are on the Internet link
Scraping the data from the web with R link
Chapter 3: Cleaning and analyzing the data
Preprocess the data: Cleaning and transformations link
Analyzing the data link

The picture and table above show what this tutorial will cover. In other words, this tutorial will not discuss the fundamental concepts of corpus linguistics, how to ask a research question and how to motivate it. If you are interested in it, you can read the book published by Anatol Stefanowitsch in 2020, available for free here. This tutorial will also not discuss advanced statistical methods in corpus linguistics, such as classification analyses or vector space representations.

2 How to get through the exercises

The best (if not only) way to learn how to use programming language is just with practice. There are two ways to do so throughout the tutorial. In either case, I highly recommend to use markdown scripts (to know more about markdown scripts, you can refer to Section 1.3.

  • I will provide at the beginning of each section a markdown file prefilled with the structure of the section. You can download this file and open it with R, and just paste the codes at the relevant sections of the script.
  • Alternatively, you may wish to write the script yourself, and to work based on the codes in each section. This is also a great way to learn how to structure a markdown file alone!