YAML for data split

less than 1 minute read

Published: December 19, 2022

Why do we need YAML for data split?

Normally I used a fixed random seed to split dataset. However, I found that my dataset has been slightly and gradually updated with the development of the project.

For instance, in the latter experiments, I found that some patients should be excluded. Then should I re-train all the previous expreiments again? if not, how to ensure the following experiments use the same training/validation/testing data with the previous experiment? (The same seed for different lenth of patient list will lead to very different data split)

So let’s use a YAML file to split dataset so that we can always have the almost same split for training/validation/testing data.

A complete YAML tutorial could be found at Real Python

Difference between YAML, JSON and XML is here

Share on

Twitter Facebook LinkedIn

Jingnan Jia

YAML for data split

Why do we need YAML for data split?

Share on

Leave a Comment

You May Also Enjoy

Develop and release your python package

How to update my code in pypi?

MeVisLab tips

Slurm tips

Frequently used commands

Slurm tips