
As its name suggests PRAW is a Python wrapper for the Reddit API, which enables you to scrape data from subreddits, create a bot and much more.
In this article, we will learn how to use PRAW to scrape posts from different subreddits as well as how to get comments from a specific post.
Getting Started
PRAW can be installed using pip or conda:
Now PRAW can be imported by writting:
import praw
Before it can be used to scrape data we need to authenticate ourselves. For this we need to create a Reddit instance and provide it with a client_id
, client_secret
and a user_agent
.
To get the authentication information we need to create a reddit app by navigating to this page and clicking create app or create another app.

This will open a form where you need to fill in a name, description and redirect uri. For the redirect uri you should choose http://localhost:8080
as described in the excellent PRAW documentation.

After pressing create app a new application will appear. Here you can find the authentication information needed to create the praw.Reddit
instance.

Get subreddit data
Now that we have a praw.Reddit
instance we can access all available functions and use it, to for example get the 10 "hottest" posts from the Machine Learning subreddit.
Output:
[D] What is the best ML paper you read in 2018 and why?
[D] Machine Learning - WAYR (What Are You Reading) - Week 53
[R] A Geometric Theory of Higher-Order Automatic Differentiation
UC Berkeley and Berkeley AI Research published all materials of CS 188: Introduction to Artificial Intelligence, Fall 2018
[Research] Accurate, Data-Efficient, Unconstrained Text Recognition with Convolutional Neural Networks
...
We can also get the 10 "hottest" posts of all subreddits combined by specifying "all" as the subreddit name.
Output:
I've been lying to my wife about film plots for years.
I don't care if this gets downvoted into oblivion! I DID IT REDDIT!!
I've had enough of your shit, Karen
Stranger Things 3: Coming July 4th, 2019
...
This variable can be iterated over and features including the post title, id and url can be extracted and saved into an .csv
file.

General information about the subreddit can be obtained using the .description
function on the subreddit object.
Output:
**[Rules For Posts](https://www.reddit.com/r/MachineLearning/about/rules/)**
--------
+[Research](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AResearch)
--------
+[Discussion](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ADiscussion)
--------
+[Project](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3AProject)
--------
+[News](https://www.reddit.com/r/MachineLearning/search?sort=new&restrict_sr=on&q=flair%3ANews)
--------
...
Get comments from a specific post
You can get the comments for a post/submission by creating/obtaining a Submission
object and looping through the comments
attribute. To get a post/submission we can either iterate through the submissions of a subreddit or specify a specific submission using reddit.submission
and passing it the submission url or id.
To get the top-level comments we only need to iterate over submission.comments
.
This will work for some submission, but for others that have more comments this code will throw an AttributeError saying:
AttributeError: 'MoreComments' object has no attribute 'body'
These MoreComments
object represent the "load more comments" and "continue this thread" links encountered on the websites, as described in more detail in the comment documentation.
There get rid of the MoreComments
objects, we can check the datatype of each comment before printing the body.
But Praw already provides a method called replace_more
, which replaces or removes the MoreComments
. The method takes an argument called limit, which when set to 0 will remove all MoreComments
.
Both of the above code blocks successfully iterate over all the top-level comments and print their body. The output can be seen below.
Source: [https://www.facebook.com/VoyageursWolfProject/](https://www.facebook.com/VoyageursWolfProject/)
I thought this was a shit post made in paint before I read the title
Wow, that's very cool. To think how keen their senses must be to recognize and avoid each other and their territories. Plus, I like to think that there's one from the white colored clan who just goes way into the other territories because, well, he's a badass.
That's really cool. The edges are surprisingly defined.
...
However, the comment section can be arbitrarily deep and most of the time we surely also want to get the comments of the comments. CommentForest
provides the .list
method, which can be used for getting all comments inside the comment section.
The above code will first of output all the top-level comments, followed by the second-level comments and so on until there are no comments left.
Recommended Reading
Conclusion
Praw is a Python wrapper for the Reddit API, which enables us to use the Reddit API with a clean Python interface. The API can be used for webscraping, creating a bot as well as many others.
This article covered authentication, getting posts from a subreddit and getting comments. To learn more about the API I suggest to take a look at their excellent documentation.
If you liked this article consider subscribing on my Youtube Channel and following me on social media.
The code covered in this article is available as a Github Repository.
If you have any questions, recommendations or critiques, I can be reached via Twitter or the comment section.