Summarising tables

Approach to streamline workflow when summarising tables

Introduction The result of the data science process is to communicate findings, typically to an audience that doesn’t talk technical. It is the most important deliverable of the process, even if not the first thing that springs to mind when considering data science. Fantastic insights are of no use if the intended audience doesn’t understand or trust it. It is therefore vital to take care when presenting findings. There are typical and often repeated actions when summarising data in tables. [Read More]

Low-cost housing in South Africa

Reporting state changes of large-scale programmes over time

Introduction The purpose of this case study is to explore aspects in reporting state changes of large-scale programmes over time. A state change in this context refers to the shift in statuses of multiple activities performed during the delivery of a project, the project forming part of a more extensive body programme of works (concentrated portfolio of project activities). We could attempt this using Excel, and perhaps we’ll be successful as the current dataset only contains c. [Read More]

Simulating data and file-based ETL

Introduction Data Scientists spend a lot of time importing, cleaning, tidying and transforming data before any decent analysis can start. Like many, the industry that I work in typically email files to communicate data and report. I follow a consistent approach to ETL and subsequent data concentration to better manage the accumulation of multiple, disparate files from a variety of sources and different formats. This tutorial demonstrates a simplified version of this process. [Read More]