Summary and Schedule
This lesson aims to teach those just starting to undertake research how to manage their data and files.
After completing this course, the learners should be able to:
- Define research data and distinguish between different data types.
- Structure research materials using clear file naming conventions and a logical folder hierarchy
- Describe methods of data collection that make data cleaner and easier to analyse
- Detect inconsistencies and errors in a tabular dataset (“dirty data”)
- Use a set of basic techniques to remove/correct errors and inconsistencies in tabular data (“cleaning data”)
- Use version control to track different versions of files, and switch between them.
Prerequisite knowledge
Before coming to this training, learners should have:
- Basic spreadsheet skills (e.g., opening and saving tables)
- Ability to create, delete, and move files on a computer (Windows, Mac or Linux)
- A research project in progress or data to work with
Setup Instructions | Download files required for the lesson | |
Duration: 00h 00m | 1. What is Research Data? |
What is research data, and why is it important in academic and
scientific research? What are the different types of research data? Where can research data come from? What are the key components of research data management (RDM)? |
Duration: 01h 00m | 2. Structuring Research Materials |
How can you structure data using a standard folder system for better
organisation? What are the benefits of using a consistent file naming convention in research data management? Why is version control important, and how can it be incorporated into file naming practices? In what ways can version control tools like Git and GitHub be useful for managing data? |
Duration: 02h 00m | 3. Tabular Data Collection |
What types of variables are commonly found in tabular data? What kinds of data inconsistencies can affect the quality of a dataset? What are some common causes of inconsistent or messy data? What practices can help ensure clean, consistent data during collection and entry? Why is it important to provide clear instructions or rules when collecting data? What is a data dictionary, and why is it useful? |
Duration: 03h 00m | 4. How to clean a tabular dataset |
What is ‘clean’ data? How can we find inconsistencies in tabular data? How can we correct inconsistencies in tabular data? |
Duration: 03h 50m | 5. Introduction to R | What is…. |
Duration: 04h 50m | Finish |
The actual schedule may vary slightly depending on the topics and exercises chosen by the instructor.
Before joining the workshop, please complete the data and software setup described in this page.
Data Sets
Download the txt file of example data and save it somewhere easily accessible on your computer.
This file is a shortened and modified version of the Metropolitan Museum of Art’s Open Access CSV, released under a CC0 license on github.
Software Setup
Details
This workshop requires access to a spreadsheet program. For example: Microsoft Excel, LibreOffice, Apple Numbers, Gnumeric, Onlyoffice, WPS office, among others.
We encourage you to use Microsoft Excel or LibreOffice (a free, open source alternative). Installation instructions are provided below for LibreOffice:
-
Download the Installer
Install LibreOffice by going to the installation page. The version for Windows should automatically be selected. Click Download. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically. -
Install LibreOffice
Once the installer is downloaded, double click on it and it should install.
-
Download the Installer
Install LibreOffice by going to the installation page. The version for macOS should automatically be selected. Click Download. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically. -
Install LibreOffice
The file LibreOffice_X.X.X_MacOS_x86-64 (whichever version of LibreOffice you have selected) should have been downloaded. Double click on this file, and LibreOffice will be installed.
-
Download the Installer
Install LibreOffice by going to the installation page. The version for Linux should automatically be selected. Click Download. You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically. -
Install LibreOffice
Once the installer is downloaded, double click on it and it should install.