Why do I give my precedence to build the model as Proof of Concept (POC) instead of following established methodologies such as CRISP-DM, SEMMA, AIE, MAD Skills, etc.?
Even though most of the Data scientists will say that they are two different things and used for different purposes; one is a methodology or a step by step approach to deliver the workable model, and the other is to test your idea to make sure your model is possible. Yes, they are different. However, the goal is to have a good working model in both approaches.
Here are the reasons why I prefer to create a working model as Proof of Concept (POC).
- Flexibility – It gives you the needed flexibility, you can incorporate feedback loop, validation, third-party knowledge and SME into any step of the process.
- Benchmark – It gives you a baseline model that you can use as a reference to compare to your other models. Also, you can demo your baseline model at any time through your engagement.
- Support – Not all Data scientists are Data engineers, working on POC as a collaborative task will guarantee that majority of heavy liftings such as ETL, Parallel Computation, and Data Preprocessing will be done on the server side and let you concentrate on the modeling.
- Holistic View – By creating POC you get to see how your model is fit within the scheme of things, you’ll get to see the fruits of your labor and not just an isolated model. Also, you will know exactly what tools, resources, and how much time is required to create a working prototype, because you will have a complete view of your creation.
- Challenges – You will identify all your challenges early in your development and as they happen will be able to attend them.
These are my reasons to create the model using Proof of Concept approach; everyone is different. However, I urge you to try it. I’m a strong believer that you will be much satisfied with the model you have created.