Key Points

What is Research Data?

Organise files into a hierarchical folder structure: start broad and drill down into specific areas, using a sensible number of folders with meaningful names
Use a consistent file naming convention across a project so that you and your colleagues can easily find and identify files
Avoid spaces and special characters in file names; use hyphens or underscores to separate parts of the name instead
Use the YYYYMMDD (ISO 8601) date format in file names to ensure files sort correctly in date order
Use version numbers (e.g. v1.0, v1.1) or date-based suffixes to track document versions, and keep earlier versions in a clearly named subfolder
Version control tools such as Git and GitHub are particularly useful for text-based files (code, scripts, documentation) that change frequently or are worked on collaboratively

Variables in tabular data can be numeric, string, categorical, or date/time, and a single variable may have both a conceptual type (how it is used) and a technical format (how it is stored).
Data inconsistencies such as mixed cases, varying formats, and invalid values can cause errors during analysis and should be identified before working with a dataset.
Inconsistencies are easier to prevent than to fix, so enforcing formats, using drop-down menus, and adding validation rules during data collection reduces the need for cleaning later.
Documenting data collection guidelines before collecting data ensures consistency and supports reproducibility for collaborators and future users.
A data dictionary describes the variables in a dataset, including their names, types, possible values, and units, making the dataset easier to understand and use correctly.