Taking the Data-Centric route to Machine Learning 🚴🏻♀️
AI = CODE + DATA 🧠 So far, we have always focused on improving the code (i.e. making the model architecture more sophisticated) so much so that for most of the #MachineLearning problems today, we have easy access to open-source code.
BETTER DATA = BETTER AI 🔢 In this journey, data has been kept fixed and it has not received its due limelight. Hence, to make further progress in #MachineLearning, we should now give importance to data (along with code/model) and adopt a #DataCentric approach.
SOPHISTICATED DATA ENGINEERING 🛠 Data collection & labeling has been an intuitive process so far. Focus should now be to collectively evolve these processes & build relevant frameworks. This will eventually yield mature tools to streamline data collection & labeling workflow.
ANALYSE DATA SLICES 🍕 #DataCentric paradigm modifies the #ML workflow. Here, after model training and performing error analysis, instead of improving the model architecture, we focus on finding a slice of data that needs attention to further improve model performance.
PRIORITIZE DATA 🎖 Improvements in data should not be confused with data pre-processing. In fact, enhancements in data should continue even after the deployment and monitoring phase in the #DataCentric paradigm.
PRO TIP 📖 #DataCentric paradigm warrants learning how to build quality datasets to effectively solve #MachineLearning problems. Grab a copy of @DataForML today and learn how to sculpt data the right way using #Python and other #OpenSource tools!