Category Archives: Programming

100 Toasty Tofu(s) 2018 Edition

It’s that time of the year again. Last year I made the foray into predicting Triple J’s Hottest 100 and it was fun so this year I’ve given it another go with some key differences. I completely rewrote the script that does the legwork, and decided to go one step further with doing some demographic weighting and analysis.

The New Script

Last year I was using Tesseract, one of the leading open source projects. This year I decided to test out some cloud based OCR to see if it was any better. I tried Amazon Recognition and Google Cloud Vision. After testing both it became clear that Google Cloud Vision is miles ahead of Recognition in both text detection and paragraph detection, so I went with that. I’ve also hooked up all the data to a metabase instance, which is great for easily displaying data.

100 warm tunas is now scraping twitter and instagram. I considered whether my script should do the same but decided against it for a few reasons. Last year,  ZestfullyGreen did a twitter scraper but it failed to predict the #1 song. This lead me to believe that the sample of people on twitter are not representative of the Hottest 100 voting population and would not improve the prediction while instagram has a strong history is accurate predictions.

The Results

Without further ado, these are the raw counts.

Interestingly it wasn’t always like this. If you look at the day by day counts, former bookie favourite This Is America won the first day and it took a few more days for Ocean Alley’s total to catch up. We could be in for a close Hottest 100.

Looking a little deeper

Every year, Triple J loves to wheel out the stats on the Hottest 100 while refusing to release counts. This makes seeing how far off previous predictions are difficult. I did some research and found several interesting articles.

This year’s Hottest 100 has set a new voting record! Gave us a breakdown by state, gender and age bracket (kinda) of who voted.

  • More women than men voted this year, 51% female compared to 48% male (rounded out by 1% for ‘Other’ and ‘No answer’)
  • New South Wales took the lion’s share of votes (29%), followed by Victoria (23%), backed up by QLD (20%), and in order after that, WA (11%), SA (8%), ACT and TAS (3%), Overseas voters (2%), and NT (1%).
  • The most common age of voters was 21 years old. About half of voters were aged 18-24 and around 80% of voters were under 30.

Did guys and gals vote differently in the Hottest 100? Let’s find out showed us the gender divide in music tastes. Hottest 100: What songs were most popular with each state and territory? did the same for states/territories.

Instagram doesn’t list a location for people or their gender, but I figured gender could be approximated by running people’s names through gender_guesser, a library for python that uses a name dataset to guess gender. This decreases our sample size as not everyone has their name of instagram, but is an interesting experiment. Here you can see the differences in votes.

The divide is clear. Everybody loves Ocean Alley and Gambino, but people with masculine names seem to have an aversion to Wafia and Amy Shark (Could this be why she has never gotten a #1?). Masculine names also enjoy Ruby Fields – Dinosaur more than people with feminine names.

For location, I used a different approximation. Sometimes people tag their photos with a location, and it’s probable that that location is where they live. So the script tries to find the last tagged location and puts them into that state. It’s not perfect but provides some interesting results.

When we put this altogether, we can produce weighted prediction of the Hottest 100 based on either gender, state or both.

This doesn’t affect the top songs but you can see ones with a particular bias (e.g Mallrat, which is popular with feminine names) shoots up.

This year’s Hottest 100 is set to be a close one. If you think you’re better at predicting these things, submit your prediction here and then watch it count down here.

Hottest 100 Predictions – A Comparison

This Hottest 100 I made a program to scrape Instagram for hottest 100 votes. I then collated the predictions from other programs (100 Warm Tunas and ZestfullyGreen’s Twitter scraper) and scored them based on performance, you can see the results here (I also opened this up to manual entries, one which outscored all the predictors).

I also decided to combine the results of the twitter scraper and my Instagram scraper, which turned out the be a better predictor than any of them. Next year I will have to incorporate a twitter scraper into my predictor.

Below is a summary of some interesting stats about the three automated prediction methods, plus the combination of 100 Toasty Tofu(s) and ZestfullyGreen’s Twitter scraper. I decided to take the results from ZestfullyGreen’s twitter scrape and add them to my results to see if this would be any better. I had a look at my predictions that included duplicate votes, however these performed worse than everything except the twitter prediction, so I have excluded them. This means my hypothesis on excluding duplicate votes (that they make the prediction less accurate) seems confirmed.

The final question that remains is, who truely is the internet’s most accurate Hottest 100 predictor? As you can see below, there isn’t really an answer for this. By my (somewhat arbitary) scoring system, 100 Warm Tunas and myself have a very similar accuracy. I think we will have to wait until next year to really test them.

JG 100 Tunas ZG JG + ZG
Points 7289/10000 7288/10000 5679/10000 7297/10000
Number of Songs in Correct Position 7/100 4/100 1/100 3/100
Number of Correct Songs in any Position 83/100 83/100 70/100 83/100
Number of Correct Top 5 Songs in Correct Position 2/5 2/5 1/5 2/5
Number of Correct Top 5 Songs in any Top 5 Position 4/5 4/5 4/5 4/5
Number of Correct Top 10 Songs in Correct Position 2/10 2/10 1/10 2/10
Number of Correct Top 10 Songs in any Top 10 Position 8/10 8/10 5/10 8/10
Number of Correct Top 20 Songs in Correct Position 2/20 3/20 1/20 2/20
Number of best predictions (see below) 45 50 34 45
Number of worst predictions (see below) 16 20 65 15
Number of Correct Top 20 Songs in any Top 20 Position 16/20 16/20 11/20 16/20
Guessed #1? Yes Yes No Yes

Song-by-song comparison of predictors

# JG 100 Tunas ZG JG + ZG Title Artist
1 1 1 2 1 HUMBLE. Kendrick Lamar
2 2 3 4 2 Let Me Down Easy Gang Of Youths
3 6 6 25 6 Chateau Angus & Julia Stone
4 3 4 3 3 Ubu Methyl Ethel
5 4 2 5 4 The Deepest Sighs, The Frankest Shadows Gang Of Youths
6 10 8 1 10 Green Light Lorde
7 5 5 13 5 Go Bang PNAU
8 11 10 43 11 Sally {Ft. Mataya} Thundamentals
9 16 15 33 16 Lay It On Me Vance Joy
10 9 13 14 9 What Can I Do If The Fire Goes Out? Gang Of Youths
11 7 7 29 7 SWEET BROCKHAMPTON
12 15 16 39 15 Fake Magic Peking Duk & AlunaGeorge
13 23 24 30 23 Young Dumb & Broke Khalid
14 29 30 6 29 Homemade Dynamite Lorde
15 12 11 24 12 Regular Touch Vera Blue
16 30 32 36 30 Feel The Way I Do Jungle Giants, The
17 13 12 20 13 Marryuna {Ft. Yirrmal} Baker Boy
18 14 14 9 14 Exactly How You Are Ball Park Music
19 17 19 15 17 The Man Killers, The
20 35 38 59 35 Let You Down {Ft. Icona Pop} Peking Duk
21 8 9 22 8 Birthdays Smith Street Band, The
22 26 26 27 26 Lemon To A Knife Fight Wombats, The
23 19 18 10 19 Not Worth Hiding Alex The Astronaut
24 78 86 N/A 77 rockstar {Ft. 21 Savage} Post Malone
25 34 31 18 33 Weekends Amy Shark
26 39 39 23 39 Feel It Still Portugal. The Man
27 43 41 N/A 43 Be About You Winston Surfshirt
28 47 51 76 47 Mystik Tash Sultana
29 28 27 37 28 Mended Vera Blue
30 36 35 26 36 Low Blows Meg Mac
31 25 25 48 25 Lay Down Touch Sensitive
32 27 28 91 27 NUMB {Ft. GRAACE} Hayden James
33 22 23 58 22 Slow Mover Angie McMahon
34 37 37 19 37 DNA. Kendrick Lamar
35 51 46 31 51 Passionfruit Drake
36 18 17 12 18 I Haven’t Been Taking Care Of Myself Alex Lahey
37 63 70 52 62 Slide {Ft. Frank Ocean/Migos} Calvin Harris
38 46 48 34 46 Bellyache Billie Eilish
39 53 49 N/A 52 Got On My Skateboard Skegss
40 24 21 44 24 True Lovers Holy Holy
41 41 40 35 41 Blood {triple j Like A Version 2017} Gang Of Youths
42 59 56 N/A 59 Cola CamelPhat & Elderbrook
43 91 74 74 91 Murder To The Mind Tash Sultana
44 49 50 42 49 In Motion {Ft. Japanese Wallpaper} Allday
45 21 20 7 21 Every Day’s The Weekend Alex Lahey
46 57 54 17 57 Better Mallrat
47 45 52 16 45 Want You Back HAIM
48 54 47 N/A 53 The Comedown Ocean Alley
49 33 34 82 34 Passiona Smith Street Band, The
50 77 84 84 74 On Your Way Down Jungle Giants, The
51 N/A N/A 56 N/A Man’s Not Hot Big Shaq
52 N/A N/A N/A N/A Glorious {Ft. Skylar Grey} Macklemore
53 62 68 87 63 Moments {Ft. Gavin James} Bliss N Eso
54 50 57 N/A 50 Homely Feeling Hockey Dad
55 42 44 N/A 42 6 Pack Dune Rats
56 32 29 72 32 Watch Me Read You Odette
57 67 67 N/A 67 Bad Dream Jungle Giants, The
58 20 22 11 20 The Opener Camp Cope
59 80 79 N/A 80 Used To Be In Love Jungle Giants, The
60 69 66 8 69 Boys Charli XCX
61 73 77 N/A 73 21 Grams {Ft. Hilltop Hoods} Thundamentals
62 92 89 N/A 92 Saved Khalid
63 40 43 28 40 Life Goes On E^ST
64 60 58 45 60 Fool’s Gold Jack River
65 65 62 38 64 Everything Now Arcade Fire
66 66 65 93 65 Lemon N.E.R.D. & Rihanna
67 38 36 N/A 38 Shred For Summer DZ Deathrays
68 48 45 80 48 Golden Kingswood
69 44 42 96 44 I Love You, Will You Marry Me Yungblud
70 31 33 54 31 Amsterdam Nothing But Thieves
71 N/A N/A 21 N/A Perfect Places Lorde
72 88 85 71 88 In Cold Blood alt-J
73 83 64 N/A 82 Nuclear Fusion King Gizzard & The Lizard Wizard
74 N/A N/A 98 N/A XO TOUR Llif3 Lil Uzi Vert
75 61 60 N/A 61 Braindead Dune Rats
76 76 76 N/A 75 Cloud 9 {Ft. Kian} Baker Boy
77 N/A 100 66 N/A Million Man Rubens, The
78 N/A N/A N/A N/A Electric Feel {triple j Like A Version 2017} Tash Sultana
79 N/A N/A 69 N/A Hey, Did I Do You Wrong? San Cisco
80 90 90 61 90 Say Something Loving xx, The
81 N/A N/A 32 N/A Liability Lorde
82 N/A N/A 46 N/A 1-800-273-8255 {Ft. Alessia Cara/Khalid} Logic
83 74 72 60 76 Blood Brothers Amy Shark
84 84 73 N/A 85 Oceans Vallis Alps
85 58 59 N/A 58 Does This Last Boo Seeka
86 94 91 95 94 Maybe It’s My First Time Meg Mac
87 72 63 78 71 The Way You Used To Do Queens Of The Stone Age
88 56 61 N/A 56 Edge Of Town {triple j Like A Version 2017} Paul Dempsey
89 N/A N/A N/A N/A Dawning DMA’s
90 N/A N/A N/A N/A Hyperreal {Ft. Kučka} Flume
91 N/A N/A N/A N/A Big For Your Boots Stormzy
92 N/A N/A N/A N/A LOVE. {Ft. ZACARI} Kendrick Lamar
93 95 95 85 96 Do What You Want Presets, The
94 99 93 N/A 98 Second Hand Car Kim Churchill
95 N/A N/A N/A N/A Mask Off Future
96 100 97 55 100 Chasin’ Cub Sport
97 N/A N/A N/A N/A LOYALTY. {Ft. RIHANNA} Kendrick Lamar
98 N/A N/A N/A N/A Snow Angus & Julia Stone
99 64 N/A N/A 66 Arty Boy {Ft. Emma Louise} Flight Facilities
100 N/A N/A N/A N/A Don’t Leave Snakehips & MØ

100 Toasty Tofu(s) – Another Triple J Hottest 100 Predictor

Update: Think you can do better than my prediction? Prove it by filling out your prediction here: Triple J Hottest 100 Prediction tracker submission. Also, you can look at the leaderboard of predictions over here.

100 Toasty Tofu(s) is another Triple J Hottest 100 Predictor, made for your entertainment with no guarantees what-so-ever.

Since 2012, various people have been predicting the Hottest 100 using social media scrapes and OCR. This started with The Warmest 100 and was continued by 100 Warm Tunas. I’ve long thought it’s an awesome experiment because the conditions are good for using social media as a predictor. Two factors make this a good experiment – the average person is willing to share their hottest 100 votes and the stakes are so low, unlike political elections, that there aren’t hoards of true believers/trolls/Russian government agents trying to manipulate public sentiment.

I use instagram-scraper to scrape the hashtags (the same as 100 Warm Tunas) and then a python script that uses Tesseract OCR to convert them to text. They are then matched with the Triple J song list (PDF) and saved. I removed any duplicate votes I found, that is people who voted for the same songs in the same order when there are greater than 3 songs in the image (a very unlikely occurrence). I figure these are probably the same person uploading the same image twice.

This is an initial cut, there’s still some extra work to do including:

  • Manually add songs that would be in the hottest 100 to the song list
  • Tune the OCR, including doing some pre-processing to images if needed
  • Tune the matching algorithm – currently using Levenshtein distance
  • Do more analysis on voting combinations (e.g are there factions who vote for particular songs together and what can we learn from this).
  • Make the table pretty like the other ones.
  • Make a form for people to upload their own predictions and show a leaderboard as they come in on the 27th.

The results are quite different to 100 Warm Tunas – I seem to be picking up more votes. I’m not sure if this is due to some sort of filtering I’m not doing or just algorithm differences, but we will see if 100 Warm Tunas still is the internet’s most accurate prediction of Triple J’s Hottest 100 for 2017 on January 27!

This table is updated automatically every few hours.
Total number of images: loading…
Total number of duplicates: loading…
Total number of votes: loading…

# Title Artist Votes % Votes Inc dupes %
Loading… Loading… Loading… Loading… Loading… Loading… Loading…

University Portfolio

I think one of the most important parts about studying computer science or software engineering at university is that it gives you the ability to slowly build a portfolio of small pieces of code which can demonstrate what you are capable of. I have embarked on a project over the last month to collate all of my significant university programming assignments. This is a general snapshot of what you learn in a CS degree these. If you are a student – don’t plagiarise my code for your assignments, you’ll get caught and lose your marks. It also violates the license on the code (GPL v2.0) where you must reference the author. If you’re a lecturer – I hope this doesn’t bother you, you should really be changing the assignments every semester anyway to allow for things like this :).

The projects are listed below. I haven’t included all projects as many were fairly trivial and not all computing units assess with programming assignments (much to my annoyance). The code has not been updated since it was first written, so please take note that generally my style of coding has evolved since first learning 4 years ago:

First Year

Second Year

Third Year

 

 

Youtube Song Downloader

A little project I worked on for the last day was getting a program to make downloading a list of songs off youtube easier. Initially it was just going to be command line, import from a CSV file. But this only works when you know the first hit will be the correct song. So I decided to flesh it out into a GUI.

This was inspired by looking at the sexy lists on the Triple J Hottest 100 Wikipedia page and deciding there should be a way to grab all those songs easily.

Unfortunately, the Google API restricts this kind of thing. But I’m sick of this project now, so here it is. You’re limited to downloading about ten songs at a time, otherwise you get service abuse messages. Hopefully in the future I can find a better method of searching for songs.

Youtube Downloader

Blackboard Scraper now has stand-alone download option.

I have finally gotten around to making the Blackboard Scraper stand alone so you no longer need to install lots of different things to get it working.

Hopefully this makes it easier for non-computing students to access and use. It still only works on Curtin’s blackboard system, however a UWA one is in the works.

Head to the Blackboard Scraper page to give it a whirl.

The Mystery Box Solver

Uther Party is a custom mini-game map for the game Warcraft 3. Our local university LAN club has a weird obsession with it. Basically there is a box with a number on it, and the game is to not be holding the box when it lands on 0. You pass the box to the person next to you by pressing either Q, which decrements it by 1 or W which decrements it by 2. If the box is on 1 and you decrement it by 2, it explodes before you pass it.

Uther Party 1

Uther Party

The game is pretty frustrating, so I decided to make a script to tell me the optimal choice to make. The script works by generating a minmax tree of all possible choices. Each leaf has an array the size of the amount of players, which has a 1 in the field if the player doesn’t die in that playthrough or a 0 if they do. It then goes up the tree with each level of the tree corresponding to a different players turn which attempts to maximise their index in the array. If the two choices have the same chance of death, it takes the average of both children’s chances.

The solver works pretty well, however it assumes that the players aren’t colluding and are only looking to maximise their own score which is a fair assumption, but may not be the case 100% of the time.

Mystery Box Solver

You can download the solver here (it has prompt, command and GUI modes). It only requires python, but optionally can use ete2 to print out a graph of the tree if you are really sadistic.

If you want to try it out, I’ve also made a module for the ComSSA IRC bot to facilitate a text based game of mystery box.