Webscraping Job Postings from Glassdoor

Introduction

This week we are scraping data science job postings from Glassdoor for cities across the country and analyzing them to find for factors that are strongly correlated to salary.

Goals

Scrape job postings from Glassdoor using the requests module and parsing them with beautifulsoup.
Build a Logistic Regression model to predict whether a job will have a high salary.

Risks and Assumptions

Glassdoor is only one of many job search sites, but we are assuming that most advertised positions will be posted here. We are also assuming that Glassdoor’s estimated salary range, included with most postings, gives us a good idea of the expected salary for each position.

Scraping

Exploring the data / Munging

Link to the jupyter notebook.

Written on March 1, 2017