Reddit Post Classifier

In order to learn about the implementation of neural nets and the PyTorch framework, I started a text classification project using data from Reddit.

In the subreddit r/AmItheAsshole users submit interpersonal conflicts they have and ask other users to judge whether they were the "asshole" in the situation. Posts are given a flair with one of four judgements 18 hours after the post was submitted:

  • Asshole
  • Not the asshole
  • Everyone sucks
  • No assholes here

Users put forth their judgements by leaving a comment on the post with an according code:

  • YTA (you're the asshole)
  • NTA (not the asshole)
  • ESH (everyone sucks here)
  • NAH (no assholes here)

The winning judgement is the top comment (meaning it has the most votes) after 18 hours. I collected submissions and their judgements using the Reddit and PushShift APIs and did data wrangling with Pandas. To attempt to predict the final judgement, I used PyTorch to develop, train, and test a bidirectional long short term memory neural network on the submission text and corresponding judgement.

Project Code