Taking the Data-Centric route to Machine Learning ๐Ÿšด๐Ÿปโ€โ™€๏ธ

Hey folks ๐Ÿ‘‹๐Ÿป For those who missed the talk by @AndrewYNg on #DataCentric approach to #MachineLearning, which aligns with our mission @DataForML, here is a quick recap ๐Ÿงต๐Ÿ‘‡๐Ÿป

AI = CODE + DATA ๐Ÿง  So far, we have always focused on improving the code (i.e. making the model architecture more sophisticated) so much so that for most of the #MachineLearning problems today, we have easy access to open-source code.

BETTER DATA = BETTER AI ๐Ÿ”ข In this journey, data has been kept fixed and it has not received its due limelight. Hence, to make further progress in #MachineLearning, we should now give importance to data (along with code/model) and adopt a #DataCentric approach.

SOPHISTICATED DATA ENGINEERING ๐Ÿ›  Data collection & labeling has been an intuitive process so far. Focus should now be to collectively evolve these processes & build relevant frameworks. This will eventually yield mature tools to streamline data collection & labeling workflow.

ANALYSE DATA SLICES ๐Ÿ• #DataCentric paradigm modifies the #ML workflow. Here, after model training and performing error analysis, instead of improving the model architecture, we focus on finding a slice of data that needs attention to further improve model performance.

PRIORITIZE DATA ๐ŸŽ– Improvements in data should not be confused with data pre-processing. In fact, enhancements in data should continue even after the deployment and monitoring phase in the #DataCentric paradigm.

PRO TIP ๐Ÿ“– #DataCentric paradigm warrants learning how to build quality datasets to effectively solve #MachineLearning problems. Grab a copy of @DataForML today and learn how to sculpt data the right way using #Python and other #OpenSource tools!

To listen to the whole talk by @AndrewYNg and other experts, check out www.youtube.com/watch?v=Yqj7Kyjznh4 โฏ

Written on August 11, 2021