Article

Marcin Kosiński workiing as a consultant in emagine's Warsaw office.

Category: Expert stories

Key skill set for a Data Scientist

From his office in Warsaw, emagine’s in-house Data Scientist, Marcin Kosiński, relays some expert advice to beginner and experienced Data Scientists alike, taking us through the most important aspects of his work.

Welcome to the Consultant’s Corner series, a blog for independent IT consultants. Here you can find out what fellow IT experts are up to in their current or recent projects. Read about trending technologies, and get inspiration from the freelance journey of other like-minded IT professionals.

Marcin Kosiński, Warsaw

There has been a growing demand for Data Scientists, i.e., people who analyze data in order to develop machine learning models.

The development of necessary skills, i.e. creative problem solving, is enabled by the challenges Data Scientists face every day. The work is far from boring, yet it requires specific skills and experience.

Which competences are the most important? Which technologies should one learn? Do you need to have a scientific background and a good head for figures? Or is it a better idea to focus on soft skills? What are the present-day requirements for a Data Scientist, and how will they change in a few or a dozen years?


 

“You need to remember that most of the currently used technologies may undergo changes or become obsolete in just a few years.”

 


In this article, I would like to point to some of the universal elements of the work of a Data Scientist. Following the current trends is crucial from the perspective of the development and utility of the most widely used technologies.

However, you need to remember that most of the currently used technologies may undergo changes or become obsolete in just a few years.

Therefore, in this post, I would like to put particular emphasis on the everlasting credo of a Data Scientist, stated below, in line with the sequence of the data analysis process and the process of creating products based on data and machine learning models.

A man in an office.

Marcin Kosiński is an expert Data Scientist and part of emagine’s Polish team in Warsaw.

Based on my experience, I can say that it does not matter what technology you use for the project, it does not matter what data structure you are working with or what machine learning problem you are trying to solve. You will always have to face the following challenges, and I believe that the ability to solve them represents an essential skill set required of Data Scientists now and in the nearest future.

Most likely, the majority of the work performed by a Data Scientist will become automated in the future. For this reason, one should focus on these elements where the human factor is invaluable!

 

Key competencies of a Data Scientist of the future

 

1 Asking the right questions

There is no research without a hypothesis and there is no project without a goal. Sometimes, formulating your hypothesis or your goal precisely requires many questions.

Customers have their needs in terms of work optimization and automation, and they believe that with Data Science, they can strengthen their market position and yet, they aren’t capable of specifying their needs in a machine learning language. Therefore, the ability to ask the right questions, in order to translate business needs into data-based solutions and to match the existing solutions, as much as possible, to individual project needs, is a key aspect of work for a Data Scientist.

Effective question-asking is also useful at the data quality and usability assessment stage; you will learn more about the infinite need to ask questions in the section on iterative problem-solving!

 

2 Data quality and usability assessment

This skill can’t be replaced with any machine. Despite immense data volumes and petabytes of saved information, in some cases, most of these sources have no real potential for use in Data Science solutions. Oftentimes, the quality of the data is substandard due to unsuccessful migration operations, human errors, logical errors within structures or due to the fact that certain types of information are, ultimately, of little use in machine learning models.

Thus, the ability to evaluate data usability and validate the quality of the data one is working with should be an indispensable element in a Data Scientist’s toolkit.


 

“Most likely, the majority of the work performed by a Data Scientist will become automated in the future. For this reason, one should focus on these elements where the human factor is invaluable.”

 


 

3 Iterative problem solving

This one is the main and pivotal skill of a Data Scientist. It is particularly important due to the nature of the data analysis process as well as the process consisting in creating data-based products.

The entire work process of a Data Scientist is iterative. This means that it resembles a loop, within which one goes in circles. Each time, an increasingly refined product is created by moving through the same successive steps but using the experience gained during the previous iteration (loop) of the process. Recreating the process time and time again, where each cycle is improved with the knowledge gained at the previous stage, helps prepare highly effective and customized solutions.

It is an important lesson for each Data Scientist - it is a good idea to start from the simplest model which, later on, will be subject to numerous improvements. The advantage of starting from the simplest model possible is that by doing so, we create an initial model, to which future, improved solutions can be compared. With this initial model, it becomes possible that in a relatively short time, we will be able to develop a satisfactory solution without the need to deploy heavy guns of machine learning.

 

4 Autoverification

It is often the case that the work of a Data Scientist results in a system of decision-making rules which enable the undertaking of numerous automated and smart actions. Such a system is referred to as a machine learning model. It is also important to be able to assess whether the given model is effective and precise enough.

When developing machine learning models, one must always remember to have the initial (the simplest in terms of its design) model, to which subsequent, improved solutions can be compared. The developed model should also be contrasted with competitive solutions described in the literature.

Man working with a laptop and mobile phones in the foreground

Marcin stresses the importance of iteration, where successful solutions are built on the basis of simply designed models. “The entire work process of a Data Scientist is iterative.”

5 Explaining models’ output

When a data-based product is produced, questions regarding its logic of operation  often arise. A Data Scientist has to explain complex machine learning models in such a way as to make it clear for people with no technical background.

Questions are asked about the elements that the operation of the model comprises, the most important data considered, its effect, the scale of interrelations as well as the methods of verification of the correctness of the model. A Data Scientist should be capable of explaining the model, the methods of its validation and which of the data used within the model played a crucial role in the given undertaking.

 

6 The simpler, the better

Sometimes, our interlocutors are surprised to hear this. Even though we have all the intellectual achievements of humankind at our disposal, it is often preferable to use data analysis solutions that are based on simple and explicable rules, which are fast and use as little computing power as possible.

There has been a growing urge to use the most complex and computing power-hungry solutions and yet, a Data Scientist should always strive to minimize the computing time within the product, reducing the memory use, simplifying the models, and reducing the amount of data required. It is easier to manage a simple model and it is also easier to understand its operation.

Needless to say, there are certain solutions where effectiveness is all that counts and where the heaviest guns from the machine learning arsenal are deployed but there are many solutions that are appreciated for their explicability and simplicity of operation.

 

7 Searching for synergy

This phenomenon may be a novelty for less experienced Data Scientists. Most often, it occurs at higher levels of one’s career, e.g. at the managerial or executive level. It refers to the ability to search for connections among machine learning solutions. Attempts at deploying a tool created by one team for the purpose of other projects that another team is working on.

Quite frequently, the goal is to find solutions and applications which allow killing two birds with one stone. Sometimes, a Data Scientist focuses exclusively on improving a single tool he has been working on. However, in some cases, it is necessary to look at data-based products from a wider perspective, where connections between projects and developed tools are sought so that the potential of already developed solutions can be used to their fullest.

A man in an office.

“Quite frequently, the goal is to find solutions and applications which allow killing two birds with one stone,” says Marcin Kosiński, referencing the aim to look at data-based products from a wider perspective.

 

8 Reproducibility

An important and yet often neglected skill. Developed machine learning models should work today but should also remain operational in the future. Sometimes, due to a heavy workload, we move on straight to a new project directly after developing a model. However, it is recommended that once a project has been completed, we'll need some time to supply the solution with a reproducible environment which can be easily transferred and launched on many machines. This way, we can make sure that our solution will remain operational regardless of the versions of libraries and programming languages used around the world.


 

“Experience will help you ask the right questions, which will lead you toward the most optimal solution.”

 


If we re-examine the list, we may conclude that the work of a Data Scientist consists of continuously asking questions.

This is precisely the case!

Generating applications, questioning solutions, constant improvement, and verification. Curiosity and conscientiousness are surely desirable qualities but in reality, it is the experience that helps ask the right questions which lead you toward the most optimal solution.

In your pursuit of a Data Scientist career, try to take notice of the points made in this article – this will allow you to achieve your dream goals faster.

Consultant Marcin Kosiński

Marcin Kosiński is a Data Scientist with 8 years of professional experience as an IT consultant.

Graduated from Warsaw University with a bachelor's degree in mathematics (2015) and a master's in data analysis (2017), he is known in the Polish Data Science line of business for holding numerous conferences and delivering hundreds of presentations - also across other European states.

Marcin Kosinski portrait

Blog

Consultant testimonials

left-arrow
right-arrow

Helene Haas-Madsen
Consultant testimonials

Helene Haas-Madsen, Sustainability Advisor

“One of the biggest benefits of working with emagine is their extensive client network.”

Consultant testimonials

Mateusz Dudkiewicz, Cloud Engineer

emagine promoted me and my work, not only as a consultant in front of their client, but as a market professional with various achievements in the IT field.

Consultant testimonials

Jakub Gutkowski, ETL Developer

My cooperation with emagine has been very easy and professional since the beginning – they provided a clear project offer, as well as quick answers to all kinds of questions I had (I like to ask as many as possible).

Roy Johansen in an office environment
Consultant testimonials

Now I can choose exciting projects

While changing jobs, Roy became familiar with emagine and the possibility of becoming an independent consultant. Now he thinks the best thing about being a freelancer is deciding which tasks and projects he gets to work on.

Przemyslaw Panczyk developer
Consultant testimonials

emagine is professional every step of the way

My cooperation with emagine has been well-organized. During the whole process, they handle the administrative work. This makes starting and focusing on the job you are hired to do much more manageable.

Louise Gewecke Kristensen
Consultant testimonials

Billing works perfectly

There has been a perfect match with the projects I’ve done through emagine and billing works perfectly.

Steffen Jørgensen consultant
Consultant testimonials

A match made in heaven

This partnership with emagine is a perfect fit for me, as I become visible to their entire portfolio of clients.

Cathrine Nagell
Consultant testimonials

From CEO to preferred business expert

After many years of permanent employment, Norwegian Cathrine Nagell wanted to try her hand as a consultant. It was the start of a new career as a business consultant, legal and management expert and project manager.

Jakob Kjøller sonsultant software developer expert
Consultant testimonials

I am presented with cool projects and great clients

You have to be of a particular breed to achieve success. There is no fixed formula for a successful expert, but some common qualities can be found. Read about the key personality traits that indicate an excellent consultant.

Niels_Erik Jørgensen
Consultant testimonials

emagine helped me realize my dream of going freelance

I chose emagine, and they helped me realize my dream. Because they know the client’s needs and can easily match consultant profiles with suitable assignments.