Fun Tasks to Automate with Python

This is a fun little project I worked on at MongoDB University

From our terminal...

curl - https://reddit.com/technology/.json

The response is a collection of json objects. It's hard to read. We want to begin collecting the top reddit stories each day and importing them into our Mongo DB stories collection so we can run common database queries against the reddit data.

Create a new .py file.

getredditjson.py

import pymongo  
import requests  
import json

# connect to Mongo DB
connection = pymongo.MongoClient('localhost', 27017)  
db = connection.reddit

#set collection
stories = db.stories

# drop collection if it already exists
# We are appending to this daily, so we are not dropping
# the collection before insert
# stories.drop()

#set headers to avoid 429
hdr = {'User-Agent' : 'superman bot by /u/$'}

# put our url into a variable
url = 'https://www.reddit.com/technology/.json'  
r = requests.get(url, hdr)

# set response to json
parsed = r.json()

# iterate through the reddit homepage data and insert
for item in parsed['data']['children']:  
  db.stories.insert_one(item['data'])

Now, in MongoDB, we can create some useful queries to sort through all of this Reddit data

connecting to: test  
> db.stories.findOne({"title" : {$regex : "python"}});
{"_id": ObjectId('564D13...'), "title": "Introduction to Python", "url": "http://reddit.com/r/python"....}

We could count how many Reddit posts contained the word python, filter by date or other json fields.

Craig Derington

Veteran full stack web dev focused on deploying high-performance, modern applications using Python, Go and Node; featuring industry leading frameworks, Django & Flask, and backends MySQL and MongoDB.

comments powered by Disqus