Key Points

What is Research Data?


  • Use .md files for episodes when you want static content
  • Use .Rmd files for episodes when you need to generate output
  • Run sandpaper::check_lesson() to identify any issues with your lesson
  • Run sandpaper::build_lesson() to preview your lesson locally

Structuring Research Materials


  • Organise files into a hierarchical folder structure: start broad and drill down into specific areas, using a sensible number of folders with meaningful names
  • Use a consistent file naming convention across a project so that you and your colleagues can easily find and identify files
  • Avoid spaces and special characters in file names; use hyphens or underscores to separate parts of the name instead
  • Use the YYYYMMDD (ISO 8601) date format in file names to ensure files sort correctly in date order
  • Use version numbers (e.g. v1.0, v1.1) or date-based suffixes to track document versions, and keep earlier versions in a clearly named subfolder
  • Version control tools such as Git and GitHub are particularly useful for text-based files (code, scripts, documentation) that change frequently or are worked on collaboratively

Tabular Data Collection


  • Variables in tabular data can be numeric, string, categorical, or date/time, and a single variable may have both a conceptual type (how it is used) and a technical format (how it is stored).
  • Data inconsistencies such as mixed cases, varying formats, and invalid values can cause errors during analysis and should be identified before working with a dataset.
  • Inconsistencies are easier to prevent than to fix, so enforcing formats, using drop-down menus, and adding validation rules during data collection reduces the need for cleaning later.
  • Documenting data collection guidelines before collecting data ensures consistency and supports reproducibility for collaborators and future users.
  • A data dictionary describes the variables in a dataset, including their names, types, possible values, and units, making the dataset easier to understand and use correctly.

How to clean a tabular dataset


  • keypoint

Introduction to R