- Uncovered Curiosities
- Posts
- Measuring Horniness with Topic Modelling
Measuring Horniness with Topic Modelling
How I used Goodreads reviews and Topic Modelling to find the spiciest literature around.
What is the spiciest erotic fiction novel money can buy?
This was the question posed to me by the good folk at PropellerNet a few months ago, and it got me grinning immediately. I love projects which are a little bit saucy - I’ve run several spicy campaigns before, and they’re always a lot of fun and do pretty well. Sex sells, as they say.
Even Gaston loves Smut - as long as there are pictures
Sourcing Book Reviews
As most readers will know, the most popular place for book reviews online is GoodReads - but that doesn’t mean the website is well built. It’s hugely limited in its searching and filtering, and although it used to have an API through which you could access its data, it was shut down because it wasn’t profitable. It’s all pretty disappointing, considering it’s owned by Amazon, which was literally made for books.
We needed a shortlist of books to make the project manageable, and because of the disappointing search capabilities of GoodReads, we decided to use some popular user-made lists. “Best Romance”, “BookTok Erotic Fiction”, and “Best Romantasy” (Romantasy being the blend of the romance and fantasy genres) were the lists chosen, and we ended up with a list of a few thousand user-recommended steamy books.
After a bit of wrestling with the website, I managed to scrape the top 50 reviews for each book to create a database of extremely horny reviews from thousands of smut fans.
Lord Jesus, mama needs a cold shower
Topic Modelling
So, I had a huge list of reviews and ratings given to these erotic fiction novels, but how to rank them not just on the quality of the book, but on how fired up they made the readers? The solution I went for uses a technique called Topic Modelling.
To run Topic Modelling on a collection of text you need a dictionary of phrases related to the topic you’re looking for. You also need to assign a score for each phrase for how confident you are that, if the phrase appears in the text, the phrase is about the topic. In our case, the topic is feeling horny, so the first few rows of my dictionary looked something like this:
Sexy Phrase | Confidence Score |
---|---|
sexy | 0.8 |
steamy | 0.6 |
🥵 | 0.8 |
sensual | 0.8 |
got me going | 0.8 |
turned me on | 0.9 |
erotic | 1 |
Then for each review, each word that is in the dictionary of phrases gives a score, which you simply add up to tell how confident you can be that the review is horny.
Fine Tuning
Adding up the confidence score of each review gives a confidence level that horniness is discussed in the review, but not whether the reviewer was horny themselves, nor that it was the primary emotion of the review.
To improve the score, first, I combined it with the rating they gave the book - this demoted a lot of the negative reviews, such as this one:
ABSOLUTELY NOT. Nope. Stopped reading at 69 % (ironically) because I couldn't take it anymore. I was pushing myself through the book, terribly wishing for it to get better, but it only gets worse. Oh, God. I can't. I absolutely hate Hunter. He's not hot, he's not sexy, he's rather annoying, disgusting and has absolutely no personality, whatsoever.
This review talks about horny topics (“hot”, “sexy”), but is very much not horny, so it, and reviews like it, get a significantly reduced score.
Next, I changed the score calculations to be dependent on the length of the review. This essentially turns the score into a rating of what percentage of the review is horny, removing the advantage of extremely long reviews which happen to mention our horny phrases multiple times.
Twins!!!! Shit, sign me up! I love it! Hmmmm sexy, steamy, Rude, yummmy, the best of both worlds.
The final result is a dirty little smut score you can be proud of.
Finally, I dropped the books which had too few reviews, combined multiple editions, and ended up with the ultimate smut-tier list of erotic fiction.
You can read more about the results (and choose your next book) here.