Data Cleaning with OpenRefine (An Introduction)
Course code: DCU118
In collaboration with DCU Library
Target Audience:
This course is a hands-on workshop in OpenRefine, a powerful open source tool for working with and cleaning data. It is aimed primarily at researchers working with ‘messy’ datasets or interested in developing their data skills but all are welcome. No prior knowledge of OpenRefine is assumed.
Course Description:
OpenRefine is a tool for working with ‘messy’ data. Simply put, messy data is data containing inconsistencies or structural issues. This includes issues such as duplicate records, empty values and inconsistent spelling or formatting. If left unaddressed, these can lead to problems in any subsequent data analysis, visualisation or research output. Cleaning data is an essential preparatory step for anyone working with data in their research.
“It is often said that 80% of data analysis is spent on the process of cleaning and preparing the data.” (Hadley Wickham, Tidy Data)
OpenRefine allows you to quickly diagnose and improve the quality of your data. This course will show you how to:
- Get an overview of a dataset
- Find and resolve inconsistencies or other issues
- Enhance a dataset with data from other sources
This course is a hands-on workshop where, using a sample dataset, you will learn and perform functions of OpenRefine. No prior knowledge of OpenRefine is assumed.
Facilitated by Liam O’Dwyer
HOW TO REGISTER FOR THIS COURSE
1. Log in to your Core HR portal
2. Click the Learning and Development tab
3. Type DCU118 into the Keywords search field and click Search.