5 Explaining models’ output
When a data-based product is produced, questions regarding its logic of operation often arise. A Data Scientist has to explain complex machine learning models in such a way as to make it clear for people with no technical background.
Questions are asked about the elements that the operation of the model comprises, the most important data considered, its effect, the scale of interrelations as well as the methods of verification of the correctness of the model. A Data Scientist should be capable of explaining the model, the methods of its validation and which of the data used within the model played a crucial role in the given undertaking.
6 The simpler, the better
Sometimes, our interlocutors are surprised to hear this. Even though we have all the intellectual achievements of humankind at our disposal, it is often preferable to use data analysis solutions that are based on simple and explicable rules, which are fast and use as little computing power as possible.
There has been a growing urge to use the most complex and computing power-hungry solutions and yet, a Data Scientist should always strive to minimize the computing time within the product, reducing the memory use, simplifying the models, and reducing the amount of data required. It is easier to manage a simple model and it is also easier to understand its operation.
Needless to say, there are certain solutions where effectiveness is all that counts and where the heaviest guns from the machine learning arsenal are deployed but there are many solutions that are appreciated for their explicability and simplicity of operation.
7 Searching for synergy
This phenomenon may be a novelty for less experienced Data Scientists. Most often, it occurs at higher levels of one’s career, e.g. at the managerial or executive level. It refers to the ability to search for connections among machine learning solutions. Attempts at deploying a tool created by one team for the purpose of other projects that another team is working on.
Quite frequently, the goal is to find solutions and applications which allow killing two birds with one stone. Sometimes, a Data Scientist focuses exclusively on improving a single tool he has been working on. However, in some cases, it is necessary to look at data-based products from a wider perspective, where connections between projects and developed tools are sought so that the potential of already developed solutions can be used to their fullest.