Some interesting facts about Data Science

Data science and related disciplines, including data analytics, remained a top priority. These domains are interesting and in great demand today.

If you’ve read my previous blog, you can see that there is much to prepare for a job in data science. Beside many interesting things about data science, there may be several others that might surprise you.

1. Trends in job market

With all the optimism (and big potential salaries listed in news articles) around data science, it’s easy to see why it presents attractive career opportunities, particularly as the range and scope of data science job titles continues to expand. But as a new entrant to the field, it’s important to have a realistic, nuanced view of where the data science market is heading for the next couple years and adjust accordingly.

Trends impact the data science field

Given these trends, it’s important to understand that it may be hard to initially differentiate yourself from the other resumes on the pile to get into the final round of interviews. You need to have some strategies to help you stand out, which is needed in this new, competitive data science environment.

2. Can you be a data scientist with/without?

a. Can you be a data scientist without programming?

It’s possible to do a lot of data work using only Excel, Tableau, or other business intelligence tools that have graphical interfaces. Although you’re not writing code, these tools claim to have much of the same functionality as languages such as R or Python, and many data scientists do use them sometimes. But can they be a complete data science toolkit? We say no. Practically, very few companies have a data science team where you wouldn’t need to program. Without programming, there will be a limited number of solutions that you could deliver to customers. For example, you can not deploy a machine learning model to run continuously.

Some advantages of programming over using BI tools

b. Do I need a PhD to get a data science job?

PhDs are degrees that take many years to obtain and that focus on training students to become professors. You have to spend years doing research to find a new method or solution to a problem that’s very slightly better than the previous one. You publish to academic journals and move state-of-the-art research forward in an extremely specific area. But little work that a data scientist does is like academic research.

A data scientist cares much less about finding an elegant, state-of-the-art solution and much more about quickly finding something that’s good enough

A fair number of data science job posts require a PhD. But the skills acquired in a PhD program are rarely what’s needed for the job; usually, the PhD requirement is a signal from the company that the position is considered to be prestigious. The material you can learn from a master’s or undergraduate degree program will suit you fine for the vast majority of data science jobs.

You probably don't need a PhD to get a data science job

Also, getting a PhD incurs huge opportunity cost. If it takes seven years to graduate, you could have been working at a company for seven years instead, getting better at data science and making far more money.

Finally, the decision is dependent on you. You could go get a PhD and then be a data scientist, but don’t let anyone tell you that you need this degree.

3. Which skills are the most precious for a Data Scientist?

A data scientist’s main responsibility is to try to imagine all of the possibilities, address the ones that matter, and reevaluate them all as successes and failures happen. That is why—no matter how much code I write—awareness and familiarity with uncertainty are the most valuable things I can offer as a data scientist.

Like an explorer, a modern data scientist typically must survey the landscape, take careful note of surroundings, wander around a bit, and dive into some unfamiliar territory to see what happens. When they find something interesting, they must examine it, figure out what it can do, learn from it, and be able to apply that knowledge in the future. Although analyzing data isn’t a new field, the existence of data everywhere—often regardless of whether anyone is making use of it—enables us to apply the scientific method to discovery and analysis of a preexisting world of data. This is the differentiator between data science and all of its predecessors.

4. Will Data Science disappear?

Underlying the question about whether data science will be around in a decade or two are two main concerns: that the job will become automated and that data science is overhyped and the job-market bubble will pop.

It’s true that certain parts of the data science pipeline can be automated. Automated Machine Learning (AutoML) can compare the performance of different models and perform certain parts of data preparation (such as scaling variables). But these tasks are just a small part of the data science process. You’ll often need to create the data yourself, for example; it’s very rare to have perfectly clean data waiting for you. Also, creating the data usually involves talking with other people, such as user experience researchers or engineers, who will conduct the survey or log the user actions that can drive your analysis.

Regarding the possibility of a pop in a job-market bubble, a good comparison is software engineering in the 1980s. As computers grew cheaper, faster, and more common, there were concerns that soon a computer could do everything and that there would be no need for programmers. But the opposite thing happened, and now there are more than 1.2 million software engineers in the United States (http://mng.bz/MOPo). Although titles such as webmaster did disappear, more people than ever are working on website development, maintenance, and improvement.

We believe that there will be more specialization within data science, which may lead to the disappearance of the general title data scientist, but many companies are still in the early stages of learning how to leverage data science and there’s plenty of work left to do. Moreover, as data sicentists become more senior, they should focus on un-automative tasks and interpersion skills such as preparing proper data, communicating, interpreting results with customers effectively.

5. What do junior data scientists frequently get wrong?

I think junior data scientists assume people are going to automatically recognize the value in their work. This is especially common among data scientists who come from academic backgrounds. We tend to get really wrapped up in everything being superthorough and following the scientific method. While this is important in academia, just working hard is not enough in the industry. The way that you communicate is what gets people to recognize the value of your work.

6. Do you always try to explain the technical part of data science?

It depends on how much the stakeholder wants to be involved. Some project managers do not want to be involved in anything technical. If you just said “This is not working right now”, they would take that at face value. Other project managers want to know every detail, and what you might feel that they do tend to get a little overwhelmed. Some people want you to check in on a regular basis and tell them what’s happening, and even if they know that they don’t understand, they just want to feel in the loop. So I’ll just make sure that they feel in the loop.

7. Data science competitions and real life projects are different

Getting success in a data science competition (e.g. through an online platform like Kaggle) may give a boost to one’s confidence so much that one starts thinking of landing a data science career. There is quite a lot of difference between a competition and a real-life scenario.

Data science competition websites, such as Kaggle or Drivendata are great places for practicing machine learning, but that’s not all of data science.

Compare some characteristics of Data Science competitions with real-life projects

So, it would be safe to say that competitions do give a fair practice for data science. But it is not enough. You need to get your hands dirty and work in live real-time projects to know the correct essence of data science.

8. More data does not always mean more accuracy

Suppose we have a dataset with the exact number of minimum data that is needed to make a correct analysis. This would be an ideal dataset. Now if we add some more data, the entire dataset will need to be reconstructed considering the new set of data as well. While reconstructing, there will be a need to clean the new data and spend time to understand their deviation from the existing set, if any.

Even after the new data is cleaned and merged to the existing ideal dataset, there is a possibility that some new element is still dirty but unidentified. This will lead to an overall degradation of the final result or analysis. In this case, lesser data was surely better than more data. Hence, more data doesn’t mean more insight or more value addition. Using smart data is the key.