Introduction

Overview of the tutorial

Aymeric Collart

Institute of Linguistics, Academia Sinica

2024-11-26

Structure and workflow of the tutorial

Click on the image to zoom in and out

Chapters and section of the tutorial

Chapter 1: Getting started
Download and install R/RStudio link
Presentation of R/RStudio link
Working with markdown files link
Chapter 2: Let’s get our data
Understanding where the data are on the Internet link
Scraping the data from the web with R link
Chapter 3: Cleaning and analyzing the data
Preprocess the data: Cleaning and transformations link
Analyzing the data link

What this tutorial does NOT cover

  • Fundamental concepts of corpus linguistics
  • How to ask a research question and how to motivate it
  • Advanced statistical methods in corpus linguistics (classification analyses, vector space representations)

If you are interested in it, you can read the book published by Anatol Stefanowitsch in 2020, available for free here.

How to get through the exercises

The best (if not only) way to learn how to use programming language is just with practice. There are two ways to do so throughout the tutorial. In either case, I highly recommend to use markdown scripts.

  • I will provide at the beginning of each section a markdown file prefilled with the structure of the section. You can download this file and open it with R, and just paste the codes at the relevant sections of the script.
  • Alternatively, you may wish to write the script yourself, and to work based on the codes in each section. This is a great way to learn how to structure a markdown file alone!