Scraping Job Postings with Splinter – Part I

Splinter is a Python library for interacting with and gathering information from live websites. In this project, I am using it to scrape data from job postings on LinkedIn, and attempting to sidestep the wrath of their TOS enforcers. I believe that having all this info to play with on my local computer will yield much more insight than sorting and filtering through the LinkedIn interface.

First, you must install Splinter.

python3 -m pip install splinter

Next, open up the text editor or notebook of your choice, and import ‘Browser’ from splinter. Also import ‘time’ and ‘random’ for later use.

from splinter import Browser
import time, random

Everything related to a single session must happen inside a ‘with’ statement.

with Browser() as browser:
    ...
    ...

To visit a website, use the ‘visit’ method of the ‘browser’ object.

url = "https://www.linkedin.com/jobs/search/?geoId=103644278&keywords=data%20scientist"
browser.visit(url)

Next, select an element or collection of elements using one of several methods of the ‘browser’ object.

#here are a few methods for selecting elements
text_elements = browser.find_by_text('element text')
id_elements = browser.find_by_id('particularId')
class_elements = browser.find_by_css('.someclass')

If the selection method matches with one element on the page, it returns an ‘ElementAPI’ object, and if it matches with multiple elements on the page, it returns an ‘ElementList’. To click an element on the page, use the ‘click’ method on an instance of ‘ElementAPI’.

text_elements[0].click()

Here is the code involved in signing into LinkedIn via splinter, including later user input to enter the phone verification code they send out for each unfamiliar login.

#find and click sign in button
button = browser.find_by_text('Sign in')
button.click()

#accept LinkedIn credential input
print('Enter Linkedin Credentials')
email = input('Email: ')
password = input('Password: ')
    
browser.find_by_id('username').fill(email)
browser.find_by_id('password').fill(password)
browser.find_by_text('Sign in').click()

verification_code = input('Phone Verification Code: ')
    browser.find_by_id('input__phone_verification_pin').fill(verification_code)
browser.find_by_id('two-step-submit-button').click()

At this point, we are signed in, and the page is ours to interact with and scrape with the same power as a human user. Next article will get into gathering job links and navigating LinkedIn pagination.