Simulating data and file-based ETL

Introduction Data Scientists spend a lot of time importing, cleaning, tidying and transforming data before any decent analysis can start. Like many, the industry that I work in typically email files to communicate data and report. I follow a consistent approach to ETL and subsequent data concentration to better manage the accumulation of multiple, disparate files from a variety of sources and different formats. This tutorial demonstrates a simplified version of this process. [Read More]

Hello World

Last year Louis Columbus of Forbes stated that Machine Learning Engineers, Data Scientists, and Big Data Engineers rank among the top emerging jobs on LinkedIn. David Robinson’s Stackoverflow article about The Incredible Growth of Python substantiates this observation. Data science in the UK is still emerging, whereas in the US it appears to have taken off in a big way. It is encouraging to read about mainstream adoption in the UK, even if still early days. [Read More]