Skip to main content

What Makes Someone Data Scientist ? skills for data scientist ?

What Makes Someone Data Scientist ?


"Everybody loves data Scientist ", Wrote Simon Rogers(2012) in the Guardian. Mr.Rogers also tracedthe new found love for numbers crunching to a quote by Google's Hal Varian, who declared that "the sexyjob in the next ten years will be statistician."

whereas Hal Varitian named statistician sexy, it is believed that what he really meant were data Scientists. This raise several important questions:
  1. What is data Science?
  1. How does it differ from statistics
  1. what makes someone data scientist

In the times of Big data, a question as simple as, "what is data science?" can result in many answers ,In some cases The diversity of opinions on these answers borders on hostility.

I define data scientist as someone who finds solutions to problems by analyzing big or small data using appropriate tools and then tells stories to communicate her findings to the relevant stakeholders.

I define data science as something that data scientist do, Data science is a multi-disciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data

I have listed down all the skills required to become a Data Scientist:
  1. Fundamentals
  1. Statistics
  1. Programming
  1. Machine Learning and Advanced Machine Learning (Deep Learning)
  1. Data Visualization
  1. Big Data
  1. Data Ingestion
  1. Data Munging

  1. Tool Box
  1. Data-Driven Problem Solving
Fundamentals:
  • Matrices and Linear Algebra Functions
  • Hash Functions and Binary Tree
  • Relational Algebra, Database Basics
  • ETL ( Extract Transform Load )
  • Reporting VS BI (Business Intelligence) VS Analytics
Statistics:

This includes:


  • Descriptive Statistics (Mean, Median, Range, Standard Deviation, Variance)
  • Exploratory Data Analysis
  • Percentiles and Outliers
  • Probability Theory
  • Bayes Theorem
  • Random Variables
  • Cumulative Distribution function (CDF)
  • Skewness
  • Other Statistics fundamentals

Programming:


Expertise in any one programming language, I would suggest ‘R’ or ‘Python.



Machine Learning and Advanced Machine Learning (Deep Learning):



You should understand what is Machine learning and how it works.Understand different types of Machine Learning techniques:
  • Supervised Learning
  • Unsupervised Learning
  • Reinforcement Learning
Good knowledge on various Supervised and Unsupervised learning algorithms is required such as:
  • Linear Regression
  • Logistic Regression
  • Decision Tree
  • Random Forest
  • K Nearest Neighbor
  • Clustering (for example K-means)

Data Visualization:



Data visualization is a very important part of Data life-cycle. Good hands-on knowledge is required on various visualization tools. Even, you can use a programming language for that purpose.Below are few visualization tools:

  • Tableau:
  • Tableau is one of the most popular Data Visualization tools used by Data Science and Business Intelligence professionals today. It enables you to create insightful and impactful visualizations in an interactive and colorful way.
  • Kibana:Kibana:
  •  is an open source data visualization plugin for Elastic search. It provides visualization capabilities on top of the content indexed on an Elastic search cluster. Users can create bar, line and scatter plots, or pie charts and maps on top of large volumes of data.
  • Google Charts:
  • Google Charts is an interactive Web service that creates graphical charts from user-supplied information. The user supplies data and a formatting specification expressed in JavaScript embedded in a Web page; in response the service sends an image of the chart.
  • Datawrapper:
  • Wrapper in data mining is a program that extracts content of a particular information source and translates it into a relational form. Many web pages present structured data - telephone directories, product catalogs, etc. formatted for human browsing using HTML language

Big Data:




Big Data is everywhere and there is almost an urgent need to collect and preserve whatever data is being generated, for the fear of missing out on something important.There is a huge amount of data floating around. What we do with it is all that matters right now. This is why Big Data Analytics is in the frontiers of IT. Big Data Analytics has become crucial as it aids in improving business, decision makings and providing the biggest edge over the competitors. This applies for organizations as well as professionals in the Analytics domain.

Data Ingestion:



The process of importing , transferring , loading and processing data for later use or storage in a database is called Data Ingestion. This involves loading data from a variety of sources.Below are few Data Ingestion tools:
  • Apache Flume
  • Apache Sqoop
Data Mining:



If you have ever performed data analysis, you might have come across feature selection before you apply your Analytical model to the data.So, in general, all the activity that you do on the raw data to make it “clean” enough to input to your analytical algorithm is data munging.You can use ‘R’ and ‘Python’ packages for that.It is one of the most important part of the data life-cycle.

Tool Box:



You might find this section pretty redundant, but I think it is very very important to have good knowledge on certain tools like:


  • MS Excel
  • Python or R
  • Hadoop
  • Spark 
  • Tableau

Data-Driven Problem Solving:


All the things we have discussed so far, includes tools and technologies that you can learn. But, Data-Driven problem solving approach is something that you need to develop. It will only come with experience.




Comments

  1. Thank you for sharing this wonderful blog with us. This is really exiting blog. Keep sharing these kinds of blogs.
    data analyst course online

    ReplyDelete

Post a Comment

Popular posts from this blog

Why Data Science: The Sexiest Job in the 21st Century?

Data Science: The Sexiest Job in the 21st Century Why Data Science, Machine Learning and Data Analytics ? Data science  is a  multi-disciplinary   field that uses scientific methods, processes, algorithm and systems to extract  knowledge   and insights from structured and unstructured data Data Science is the  concept turned data as " Data Mining and Big Data "use the most powerful hardware, the most powerful programming systems,and the most efficient algorithm to solve the problems. Future Scope of Data Scientist ? Our Digital Footprint has expanded rapidly over the past 10 years. The size of the digital  universe was roughly 130 billion GB in 1995. By 2020 , this number will expand 40 trillion GB. Companies will compete for hundreds of thousands, if not millions. of new workers needed to navigate the new digital World. No wonder the prestigious Harvard Business Review called data science "the sexiest job in...

Complete guide towards data science, do you know?Best Data Science Courses Online,Top 10 Data Science Blogs,Statistics & Probability,Free Data Sets, Python,Visualization

How to start career as a Data Scientist ? 📍Best Data Science Courses Online🔖 Coursera 1. Stanford University 2. DeepLearning .ai 3. IBM 4. Johns Hopkins 5. University of Michigan EdX 6. Harvard University 7. MIT Udacity 8. Data Science Nanodegree hashtag 📌Top 10 Data Science Blogs📈 1. Data Camp 2. Data Science Central 3. KDnuggets 4. R-Bloggers 5. Revolution Analytics 6. Analytics Vidya 7. Codementor 8. Data Plus Science 9. Data Science 101 10. DataRobot 🧮Statistics & Probability 📚 1. Khan Academy 2. OpenIntro 3. Exam Solutions 4. Seeing Theory 5. Towardsdatascience 6. Elitedatascience 7. OLI 8. Class Central 9. Alison 10. Guru99 🔏Free Data Sets🖇 1. Data.world 2. Kaggle 3. FiveThirthyEight 4. BuzzFeed 5. Socrata OpenData 6. Data gov 7. Quandl 8. Reddit 9. UCI Repository 10. Academic Torrents 📇 Python📕 1. Code Academy 2. TutorialsPoint 3. Python org 4. Python for Beginners 5. Pythonspot 6. Interactive Python 7. Python Tutor 8. Full Stack Python 9....