Generating a Synthetic Dataset for Machine Learning and Software Testing

Using Python to generate statistically similar dummy datasets for use in code development and testing robustness

Daniel Ellis Research
Towards Data Science
3 min readMay 20, 2022

--

Photo by henry perks on Unsplash

Whence

The problem of generating a realistic dataset for testing is a prominent one. This is often encountered in one of two…

--

--

Research Software Engineer specialising in High-Performance Computing and Data Visualisation. — PhD in Atmospheric Chemistry and Masters in Theoretical Physics.