- Hustle Hub
- Posts
- Hustle Hub #1
Hustle Hub #1
๐ How to Build Your Data Science Portfolio, My Startups Mistakes, & More
Hey friends,
Hope you're having a great week so far. Today, I'm excited for 2 reasons:
I finally wrote my first newsletter ๐
I'm thrilled to share this newsletter with you, because it's like my weekly journal where I share 1 tip, 1 mistake, 1 learning, 1 book, and 1 quote, that I've learned from my data science and startup journey.
To give you a bit of my background, I graduated in 2014 as a Physics fresh graduate.
Found my passion in data science, I started my career as a data scientist working from the online gambling industry (story for another time), semiconductor field, to becoming a data science instructor.
Finally, I quit my job in March 2021 and started building my current startup - Staq - the #1 business banking API platform for Southeast Asia.
As I reflect on my journey, I made tons of mistakes, but also learned valuable lessons from the experiences. This newsletter is a letter to my past self, hopefully you'd have some takeaways from it.
Let's get started! ๐
What's in the hub today?
Tip: How to build a data science portfolio
Mistake: I caused the tech debts
Learning: Balance between urgency & importance
Book: Zero to One
Quote: Finding your purpose - the Ikigai way
1 Tip:
โญ๏ธ How to Build a Data Science Portfolio?
People often asked me, "How to build a data science portfolio?"
I recently talked about the 6 steps to build my portfolio if I were to start from zero. The benefits of these 6 steps are:
You'll learn how to build an end-to-end data science project.
You'll attract the attention of recruiters and employers.
You can easily differentiate yourself from the rest.
It's easy to land job interviews/offers for DS role.
Here's how to build your data science portfolio, step by step:
Step 1: Find a social problem to solve
In the real working environment, most problems are not well-defined. They are vague. Therefore, companies prefer to hire data scientists who have dealt with real world problems before.
If you solve problems from Kaggle, those problems are well-defined, and you can hardly learn how to deal with real world problems. In my opinion, tackling social problems is the best way to build this real working experience.
Here's how to find a social problem to solve:
The best social problems can be found around you. Think about what social problems you (or your friends) are facing in your daily life.
For example, say I'm renting a house, and every month I want to forecast my electricity bill in the next month for budgeting purpose.
Voilร ! I've found a social problem to solve. It's time to get some data. ๐
Step 2: Get the data using web scraping
Getting data from Kaggle is easy, it's given to you. Unfortunately, in real world, you won't have this luxury. Most of the time, you have to go get data yourself.
In order for me to get my historical electricity bills data, I'd need to do a simple web scraping from my bill account.
Here are the tools that I'd use for web scraping:
โScrapyโ
โSeleniumโ
โBeautifulSoupโ
Step 3: Store the data in database on AWS (free tier)
Once I've scraped the data, I'll output it as JSON file and store it on S3 since AWS provides free tier of S3 data storage up to 5 GB.
Why did I store the data in cloud? Two reasons:
I'll need to retrieve data for analysis and ML model training later.
Most companies store their data in cloud, so I want to build my skills in cloud computing. Again, that's the whole point of building my portfolio to get real work experience.
Step 4: Extract the data, clean & analyse it, get insights
Finally it's time to get the JSON file from S3 for data cleaning and analysis. Here are my typical steps on how to analyse the data:
Data cleaning - Standardise the data from JSON format to dataframe format, remove unwanted data fields etc.
Exploratory Data Analysis (EDA) - Understand the data distribution, identify outliers using boxplot, features engineering.
Data visualisation
Get insights - Spot interesting trends, identify relevant features, remove unwanted features.
Step 5: Build a ML model, wrap it into an API to output prediction
After doing all the groundwork, it's time to build a ML model. Once done, I'll deploy the model and wrap it into an API to predict my electricity bills for next month.
Here are the steps I'd take:
โBuild 2-3 different ML models and compare with the baseline model.
Pick the best performing ML model based on the chosen metric (i.e. prediction accuracy)
Deploy the trained ML model in Amazon Sagemaker.โ
โWrap the ML model into REST API to output prediction.
Step 6: Build an end-to-end data & ML pipeline
Once Step 5 is done, I can now automate the full workflow to be performed every month โ from doing web scraping, data cleaning and analysis, ML training to updating my ML model โ so that I can get the updated prediction of my electricity bills in the next month.
You can use Amazon EventBridge to trigger your web scraper in lambda functionand AWS Step Functions to orchestrate the full workflow (Step 2-5).
TL;DR
Find a social problem to solve
Get the data using web scraping
Store the data in database on AWS (free tier)
Extract the data, clean & analyse it, get insights
Build a ML model, wrap it into an API to output prediction
Build an end-to-end data & ML pipeline
By using these strategy, you'll be ahead of most aspiring data scientists who only have certificates or Titanic projects under their belts.
As you can see, these 6 steps will take some time before you can build a fully end-to-end data science portfolio - but trust me, it's worth it.
A great portfolio is 10x better than 5 toy projects that don't mirror any real world projects.
1 Mistake:
As a startup founder, given the limited resource, speed of execution is everything. Because of that, I wanted to build things fast during our early stage - so I took shortcuts.
What did I do? I:
Did a lot of hard coding, instead of making the code robust.
Ignored some minor issues and put them into backlog, instead of fixing them at the beginning.
Didn't plan well in architecture design before building, hence making it hard to maintain and scale.
Over time, bugs arised, tech debts compounded. I ended up wasting more time to fix stuff than actually building it. Not good.
1 Learning:
The compounded tech debt was painful when I started paying for the price.
Here is what I've learned:
Find the balance between urgency and importance. You can build things the right way, and still be fast enough.
Every week, allocate some time to fix issues and reduce tech debts. You can't remove tech debts 100%, but you can reduce it regularly to make your life easier.
1 Book:
A must-read from Peter Thiel if you want to learn how to build a startup that lasts.
Here are my few takeaways after reading the book:
Create new technology and build new things that will make the future not just different, but better โ to go from 0 to 1.
The future wonโt happen on its own. Have โdefinite optimismโ for the future. Make plans and work to make the future better, not wait for it to happen naturally.
Every great business is built around a secret thatโs hidden from the outside. Find the secret, execute on it, you'll win.
This book has changed how I approach and build Staq with a long term view.
Whenever I'm in doubt, I'll come back for these reminders to make sure we're building the future, not for short term gains.
Have you read this book? What's your thought on it?
1 Quote:
Do what you love.
Do what you're good at.
Do what the world needs.
Do what you can be rewarded for.
From How to Ikigai by Tim Tamashir.
Ikigai is the reason you get out of the bed every morning. It's your purpose.
I was lost when I was in school. I studied Physics, but had no clue what I wanted to do in my life.
These steps helped me find my passion and purpose in data science. Here are 4 questions to help you find your purpose:
What do you love?
What are you good at?
What does the world need?
What do you get paid for?
Ask yourself these 4 questions today and let me know how it goes? ๐
๐ Whenever youโre ready, there are 4 ways I can help you:
1. Book a coaching call with me if you need help in the following:
โข How To Get Into Data Science
โข LinkedIn Growth, Content Strategy & Personal Branding
โข 1:1 Mentorship & Career Guidance
โข Resume Review
2. Promote your brand to ~1000 subscribers in the data/tech space by sponsoring this newsletter.
3. Watch my YouTube videos where I talk about data science tips, programming, and my tech life (P.S. Donโt forget to like and subscribe ๐).
4. Follow me on LinkedIn and Twitter for more data science career insights, my mistakes and lessons learned from building a startup.
That's all for today
Thanks for reading. I hope you enjoyed today's issue. More than that, I hope it has helped you in some ways and brought you some peace of mind.
You can always write to me by simply replying to this newsletter and we can chat.
See you again next week.
- Admond
Reply