Reddit Post Classifier
In order to learn about the implementation of neural nets and the PyTorch framework, I started a text classification project using data from Reddit.
In the subreddit r/AmItheAsshole users submit interpersonal conflicts they have and ask other users to judge whether they were the "asshole" in the situation. Posts are given a flair with one of four judgements 18 hours after the post was submitted:
- Asshole
- Not the asshole
- Everyone sucks
- No assholes here
Users put forth their judgements by leaving a comment on the post with an according code:
- YTA (you're the asshole)
- NTA (not the asshole)
- ESH (everyone sucks here)
- NAH (no assholes here)
The winning judgement is the top comment (meaning it has the most votes) after 18 hours. I collected submissions and their judgements using the Reddit and PushShift APIs and did data wrangling with Pandas. To attempt to predict the final judgement, I used PyTorch to develop, train, and test a bidirectional long short term memory neural network on the submission text and corresponding judgement.