You’ve been taking MOOCs and reading a bunch of textbooks, but now what do you do? Getting a job in data science can seem daunting. The best way to showcase your skills is with a portfolio. This shows employers that you can use the skills you’ve been learning.
To showcase these skills, here’s 5 types of data science projects for your portfolio:
- Data Cleaning
Data scientists can expect to spend up to 80% of the time on a new project cleaning data. This is a huge pain point for teams. If you can show that you’re experienced at cleaning data, you’ll immediately be more valuable. To create a data cleaning project, find some messy data sets, and start cleaning.
- Importing data
- Joining multiple datasets
- Detecting missing values
- Detecting anomalies
- Imputing for missing values
- Data quality assurance
- Exploratory Data Analysis
Another important aspect of data science is exploratory data analysis (EDA). This is the process of generating questions, and investigating them with visualizations. EDA allows an analyst to draw conclusions from data to drive business impact. It might include interesting insights based on customer segments, or sales trends based on seasonal effects. Often you can make interesting discoveries that weren’t initial considerations.
- Ability to formulate relevant questions for investigation
- Identifying trends
- Identifying covariation between variables
- Communicating results effectively using visualizations (scatterplots, histograms, box and whisker, etc.)
- Interactive Data Visualizations
Interactive data visualizations include tools such as dashboards. These tools are useful for both data science teams, as well as more business-oriented end users. Dashboards allow data science teams to collaborate, and draw insights together. Even more important, they provide an interactive tool for business-oriented customers. These individuals focus on strategic goals rather than technical details. Often the deliverable for a data science project to a client will be in the form of a dashboard.
For Python users, the Bokeh and Plotly libraries are great for creating dashboards. For R users, be sure to check out RStudio’s Shiny package. Your dashboard project should highlight these important skills:
- Including metrics relevant to your customer’s needs
- Creating useful features
- A logical layout (“F-pattern” for easy scanning)
- Creating an optimum refresh rate
- Generating reports or other automated actions
- Machine Learning
A machine learning project is another important piece of your data science portfolio. Now before you run off and start building some deep learning project, take a step back for a minute. Rather than building a complex machine learning model, stick with the basics. Linear regression and logistic regression are great to start with. These models are easier to interpret and communicate to upper level management. I’d also recommend focusing on a project that has a business impact, such as predicting customer churn, fraud detection, or loan default. These are more real-world than predicting flower type.
- Reason why you chose to use a specific machine learning model
- Splitting data into training/test sets (k-fold cross validation) to avoid overfitting
- Selecting the right evaluation metrics (AUC, adj-R^2, confusion matrix, etc.)
- Feature engineering and selection
- Hyperparameter tuning
Click here to continue reading John Sullivan’s article.