“R, Python, SQL, and Machine Learning” has long been the standard job description for a Data Scientist. But with the development of the industry, these skills are no longer enough to remain competitive in the labor market.
Upgrade your skills to work in the data market in 2020!
Data science is a very competitive field where people are rapidly accumulating more and more skills and experience. This has led to a boom in positions for machine learning engineers. So my advice for 2020 is that Data Scientists should also become a developer.
To stay competitive, prepare yourself for new ways of working with new tools.
Agile is a method of organizing work that has been widely used by development teams for a long time. Data science positions are increasingly being occupied by people with the skills of software developers. This leads to an increase in the role of the machine learning engineer.
Post-it and Agile seem to go hand in hand
More and more data scientists and machine learning engineers are working as classic developers. They are constantly improving machine learning elements in the existing codebase.
For this role, the data scientist must know the Agile way of working based on the Scrum method. It defines several roles for different specialists, and this distribution of roles ensures smooth and continuous improvement of the product being created.
Git and Github are developer software that helps you manage the different versions of the software you create. It keeps track of all changes made to the code base. It also makes it easier for multiple developers to work together on the same project.
GitHub is the way to go
As the role of a Data Scientist becomes more and more demanding, getting the hang of these tools becomes key. Git is becoming a major requirement and it takes time to get good at it. It’s easy to get started with Git when you’re alone or when all your colleagues are newbies. But when you join a team with Git experts and are still new, you can run into big problems.
Git is a real skill to know for GitHub
Data Science is also changing the way we think about our projects. A data scientist is still a person who answers business questions with machine learning. But Data Science projects are increasingly being developed for production systems, like a micro-service in larger software.
AWS is the largest cloud service provider
At the same time, advanced model types require more and more CPU and RAM resources, especially when working with neural networks and Deep Learning.
In terms of job descriptions, it becomes increasingly important for a Data Scientist not only to think about the accuracy of your model, but also to consider code execution time or other aspects of project industrialization.
Google also has a cloud service, as does Microsoft (Azure)
4. Cloud and big data
While the industrialization of machine learning is becoming more of a requirement for data scientists, it is also becoming a major challenge for data engineers and IT in general.
While the Data Scientist works to reduce the execution time of the model code, IT professionals are doing their part by switching to faster computing services, which are usually obtained in one or both of the following ways:
- Cloud. Moving computing resources to external providers such as AWS, Microsoft Azure, or Google Cloud makes it easy to set up a very fast machine learning environment that can be accessed from a distance. This requires Data Scientists to have a basic understanding of the functioning of the cloud. For example, working with servers at a distance, and not with a local computer, or working with Linux, and not with Windows / Mac.
PySpark is Python for parallel (Big Data) systems
- Big data. The second aspect of a faster IT infrastructure is the use of Hadoop and Spark, which are tools that allow you to parallelize tasks on many computers at the same time (worker nodes). This requires a different approach to implementing Data Science models, as your code must be concurrent.
5. NLP, neural networks and deep learning
Until recently, it was common for the Data Scientist to assume that NLP and pattern recognition are just a specialization of Data Science, which not everyone should master.
You need to master Deep Learning: machine learning based on the idea of the human brain.
But the use cases of image classification and NLP are becoming more frequent even in the “normal” business. Nowadays, it has become unacceptable not to have at least a basic knowledge of such models.
Even if you don’t directly apply these models in your work, doing hands-on study on such a project will help you understand the basics of Computer Vision and NLP.
I wish you good luck in improving your skills! Stay up to date with new trends!