Exploring Hacker News
Another relatively simple exercise from Dataquest.io to add to my portfolio
Garrett Mayock posted 2019-05-07 19:46:48 UTC
I'm currently earning the "Data Analyst – Python" certification through Dataquest.io and part of their process is including a guided project for each course section. Yesterday I completed the guided project for the "Python for Data Science: Fundamentals" course, and today I created the guided project for the "Python for Data Science: Intermediate" course.
This project has four parts:
- I set a goal for the project
- I collected and sorted the data
- I reformatted and cleaned the data to prepare it for analysis
- I analyzed the data to achieve the goal I set at the start
Note: in traditional data science projects, there'd be some model creating and evaluation phase at the end (for an example, see the blog on my last project, created independently of Dataquest, or the project itself, My Vacation Planner). However, this course is really just focused on building the intermediate Python skills for data analytics and data science. Because of that, this project omits any data science.
I compare these two types of posts to determine the following:
- Do Ask HN or Show HN receive more comments on average?
- Do posts posted at a certain time of day receive more comments on average?
The 3pm hour in the US Eastern Time Zone receives the most posts. Ask HN posts receive the most comments if posted in the 3pm hour, but Show HN posts receive more comments if posted earlier in the day.
My Kaggle Notebook (see code and thought process):