Published Apr 8, 2023 by Angeni Bai
The Easter Show is back and like the good enterprising bargain-hunter that I am, the biggest question I have before I go is: which showbag can I buy that will get me the most value for my money?
Suppose this question occurs to you right before your ethics tutorial. Suppose your friend1 says “I bet you could find out with some code”. And crucially, suppose you weren’t really planning on paying that much attention in ethics anyways.
Given those conditions, it appears logical that you would then spend the next couple hours on a journey towards discovering the most cost-effective showbag at the 2023 Royal Sydney Easter Show.
So whether those are your current conditions or not, join me on the steps I took on that journey! And you too can go into the Easter Show confident that you will know exactly how to get the most value for your hard-earned cash.
(but if you’re impatient just read step 5)
Showbags come in all shapes and sizes, but they generally have a certain theme, and a bunch of stuff in it that fits that theme.
For example, there’s Barbie Dreamtopia, which contains a bunch of Barbie items - including a backpack, trinket boxes, and a wand.
Very magical.
There’s also Allen’s Lollies, which contains - you guessed it - Allen’s lollies.
Most important for us though, is the retail value and price.
Retail value is the amount I would expect to pay if I bought all of the items inside the showbag separately. For Barbie Dreamtopia that’s $70.60.
Price is the actual cost of the showbag being sold. So for Barbie Dreamtopia that’s $32.
Now to properly compare the value for money of the showbags, we want to look at the value to price ratio. We calculate that by dividing retail value by price. This means that showbags with a high retail value and a low price will have a high value to price ratio. This is what we want!
Conversely, a low value to price ratio means that we probably might as well buy the items from the bag that we want on their own, and it wouldn’t make much of a difference.
Barbie Dreamtopia has a value to price ratio of 70.60 / 32 = 2.20625.
Allen’s Lollies has a value to price ratio of 15 / 15 = 1.
So with Barbie Dreamtopia we’re more than doubling the value of stuff we get compared to the actual price we pay. Whereas with Allen’s Lollies there is literally no difference if we were to instead grab the Party Mix, Frogs Alive, and Drumstick from Woolies ourselves. Evidently, the value to price ratio is telling us Barbie Dreamtopia is the way to go.
A table to summarise:
Low price | High price | |
---|---|---|
Low retail value | Low value to price ratio | Very low value to price ratio (< 1) |
High retail value | High value to price ratio | Low value to price ratio |
Ok time to scope out how we might use some code to get the information that we want.
All of the showbags are accessible at this URL: https://www.eastershow.com.au/explore/showbags/
If we click on a showbag, the section expands and we can see all of its info, including price, what’s included, and total retail value!
Since we can access this extra info without navigating to a different page, I’m guessing that all the HTML is loaded with the page, and it’s just some Javascript and CSS in the background that’s showing and hiding sections as necessary.
Using the Inspect Element tool confirms this theory!
The info for each showbag is inside a showbagsCard
div.
The showbag details including total retail value is then inside a showbagsCard-description
div.
We can see this showbagsCard-description
div also exists under the Avengers Bag showbagsCard
, but is currently hidden because I haven’t clicked on it yet.
This is good news! It looks like the HTML is very consistently structured, so we just need to figure out which divs contain the info important to us in each showbagsCard
.
Showbag name is in a showbagsCard-product--name
heading.
Price is in a showbagsCard-product--price
span.
The list of included items is within a showbagsCard-description-copy--included
div.
Retail value is between strong
tags in the last paragraph of the showbagsCard-description-copy--included
div.
So all the info we need has been marked up nicely for us in the HTML, but there’s 51 pages of showbags and going through them manually will be a pain. To help us, we’ll be using the requests
and BeautifulSoup
libraries in Python to fetch each page, and extract the right info for each showbag.
Fetching the web page is a simple GET request. With the requests
library, we just need to pass the URL into the get()
method.
import requests
SHOWBAGS_URL = "https://www.eastershow.com.au/explore/showbags/"
n = 1
page = requests.get(f"{SHOWBAGS_URL}/?page={n}")
Right now, we’ll just fetch the first page. However, we will eventually start looping through values of n
from 1 to 51.
Once we have the HTML for the page, we have to extract the info from the relevant tags that we picked out from before. The BeautifulSoup
library lets you do that easily without having to do some kind of painful regex matching.
You start by initialising the HTML page as a BeautifulSoup
object.
from bs4 import BeautifulSoup
soup = BeautifulSoup(page.content, "html.parser")
Now, you can run methods on soup
to search within its structure. Since we know exactly what kinds of tags and classes we need to extract our target information, we will be able to do everything using the find()
and find_all()
methods.
For example, to get all instances of the showbagsCard
div, we pass in "div"
as the tag type we’re targeting, and showbagsCard
as the class we’re targeting.
SHOWBAG_DIV = "showbagsCard"
showbags = soup.find_all("div", class_=SHOWBAG_DIV)
I’m also defining the name of the showbag class as a constant at the top of the code just so it’s easy to update if something about the webpage changes.
Using this idea, given a showbag BeautifulSoup
object, this is how we grab the name, price, and total retail value.
SHOWBAG_NAME_HEADING = "showbagsCard-product--name"
SHOWBAG_PRICE_SPAN = "showbagsCard-product--price"
SHOWBAG_VALUE_DIV = "showbagsCard-description-copy--included" # retail value is in the last paragraph inside <strong> tags
# showbag is a BeautifulSoup object with the content for a particular showbag
# grab name
name_heading = showbag.find("h3", class_=SHOWBAG_NAME_HEADING)
name = name_heading.text.strip()
# get price
price_span = showbag.find("span", class_=SHOWBAG_PRICE_SPAN)
# get list of all included items
value_div = showbag.find("div", class_=SHOWBAG_VALUE_DIV)
item_paragraphs = showbag.find_all("p")[:-2]
all_items = [el.text for el in item_paragraphs]
# get total retail value
total_value_strong = value_div.find("strong")
We run into a bit of a hurdle when we’re trying to get price values for both the showbag price and the total retail value.
The content from a showbagsCard-product--price
span could look something like " $32 "
. How do we get the numerical value on its own as a float?
We know that a numerical price value will be made up of one or more numerical digits, and then potentially followed by a decimal point and two more numerical digits. The regex pattern that matches these requirements looks like this: "\d+(\.\d{2})?"
.
If we use the re
regex library in Python, a function to extract the float value of a price from a given string would look like this:
PRICE_PATTERN = r"\d+(\.\d{2})?"
def extract_price(full_string):
matched = re.search(PRICE_PATTERN, full_string)
if matched:
return float(matched.group())
return None
Once we’ve extracted the name, price, retail value and items from a showbag, the only actual calculation we have to do is the all important value to price ratio.
Here’s what that looks like in code
# total_value is total retail value
# price is price of the showbag
if not total_value or not price:
value_to_price_ratio = 0
else:
value_to_price_ratio = total_value / price
I want to avoid zero division errors so if for some reason I couldn’t properly extract total_value
or price
, then I just by default set the ratio value to zero.
After this calculation, we have all the data we need! We just save this as a dictionary object, and append it to the list of all showbag data.
# showbags_data is the list of all showbag data
showbag_data = {
"name": name,
"price": price,
"retail_value": total_value,
"value_ratio": value_to_price_ratio,
"items": "\n".join(all_items),
"html": showbag,
}
showbags_data.append(showbag_data)
And so once we grab the showbag data for all the showbags on a page, we repeat this for all the pages available. That’s just a loop up to the total number of pages.
The code snippets so far have been a bit all over the place, but this is what the code for fetch_data()
function looks like after being able to extract info for every showbag on every page.
SHOWBAGS_URL = "https://www.eastershow.com.au/explore/showbags/"
NUM_PAGES = 52
SHOWBAG_DIV = "showbagsCard"
SHOWBAG_NAME_HEADING = "showbagsCard-product--name"
SHOWBAG_PRICE_SPAN = "showbagsCard-product--price"
SHOWBAG_VALUE_DIV = "showbagsCard-description-copy--included" # retail value is in the last paragraph inside <strong> tags
PRICE_PATTERN = r"\d+(\.\d{2})?"
def fetch_data():
showbags_data = []
unique_showbags = set()
for n in range(1, NUM_PAGES + 1):
page = requests.get(f"{SHOWBAGS_URL}/?page={n}")
soup = BeautifulSoup(page.content, "html.parser")
showbags = soup.find_all("div", class_=SHOWBAG_DIV)
for showbag in showbags:
# grab name
name_heading = showbag.find("h3", class_=SHOWBAG_NAME_HEADING)
name = name_heading.text.strip()
# skip duplicates
if name in unique_showbags:
print(f"duplicate: {name}")
continue
unique_showbags.add(name)
print(f"processing: {name}")
# get price
price_span = showbag.find("span", class_=SHOWBAG_PRICE_SPAN)
price = extract_price(price_span.text)
# get list of all included items
value_div = showbag.find("div", class_=SHOWBAG_VALUE_DIV)
item_paragraphs = showbag.find_all("p")[:-2]
all_items = [el.text for el in item_paragraphs]
if all_items[-1].startswith("*"):
all_items.pop()
# get total retail value
total_value_strong = value_div.find("strong")
total_value = extract_price(total_value_strong.text)
# calculate value to price ratio
if not total_value or not price:
value_to_price_ratio = 0
else:
value_to_price_ratio = total_value / price
showbag_data = {
"name": name,
"price": price,
"retail_value": total_value,
"value_ratio": value_to_price_ratio,
"items": "\n".join(all_items),
"html": showbag,
}
showbags_data.append(showbag_data)
Now that all the data is in a single list, we want to sort the list in order by value to price ratio.
To do this, we just need to specify a sorting key. And if we use a lambda function, the code looks like this:
showbags_data.sort(key=lambda x: x["value_ratio"], reverse=True)
I’ve also set the sort to be reversed because I want highest value ratio first, and lowest value ratio last.
So we could just go print(showbags_data)
and call it a day. But scrolling through hundreds of lines of data is no one’s idea of fun so we’ll be outputting it into a CSV file.
Every single time I need to read and write CSV files I pretty much just follow the Real Python tutorial. Since the data for each showbag is already conveniently stored as a dictionary, I’ll be using the DictWriter()
method.
def write_to_csv(showbags_data, csvfile="showbags.csv"):
"""
Takes in a list of dictionaries containing showbag data and outputs it to a
CSV file
"""
csv_output = open(csvfile, "w")
fieldnames = ["name", "items", "retail_value", "price", "value_ratio"]
writer = csv.DictWriter(csv_output, fieldnames=fieldnames)
writer.writeheader()
for showbag_row in showbags_data:
writer.writerow(
{
"name": showbag_row["name"],
"items": showbag_row["items"],
"retail_value": showbag_row["retail_value"],
"price": showbag_row["price"],
"value_ratio": showbag_row["value_ratio"],
}
)
csv_output.close()
Aaaand by now you’re more than ready to see the results! The best deal you can get at the Easter Show is with….
The Australian Women’s Weekly showbag!
For a very reasonable $22, you can get $490.61 worth of stuff. And that stuff is made up of 34 individual items.
It’s not the only bag with a >20 value to price ratio though, so check out the rest of the top ten most valuable showbags!
Price: $30, Total retail value: $605.62, Value to price ratio: 20.19
Price: $30.0, Total retail value: $418.64, Value to price ratio: 13.95
Price: $32.0, Total retail value: $433.48, Value to price ratio: 13.55
Price: $35.0, Total retail value: $433.48, Value to price ratio: 12.39
Price: $32, Total retail value: $391.89, Value to price ratio: 12.25
Price: $32, Total retail value: $385.33, Value to price ratio: 12.04
Price: $18, Total retail value: $180.63, Value to price ratio: 10.04
Price: $30, Total retail value: $297.00, Value to price ratio: 9.9
Price: $32, Total retail value: $299.86, Value to price ratio: 9.37
Since we have everything in a CSV file, it’ll be easy to pull it up in Google Sheets/Excel and see if we can ask some interesting questions about the showbags.
We can make a histogram showing us the distribution of value to price ratio in the showbags.
This graph makes it very clear that our top 10 showbags are outliers compared to the other showbags. The vast majority seem to have a value to price ratio between 1.32 and 2.63.
Ignoring the showbags with missing total retail values, there is a single showbag with a value to price less than 1.
So basically, if you’re in the mood for smarties, you’re better off buying them yourself instead of the BIG Smarties showbag.
There’s also only three showbags with a value to price ratio of exactly 1.
Buying Allen’s Lollies, Snakes Alive and Allen’s Bites showbags are all exactly equivalent to buying the individual iems literally anywhere else.
Luckily, it’s only those four showbags which would offer you either a negative or no return on your investment.
For most showbags, there’s a pretty uniform pattern of having a value to price ratio somewhere between 1 and 5. However, the high value showbags tend to be the showbags that are more on the expensive end anyways. You’re not going to catch a bargain on a 15 dollar showbag, but if you’re willing to pay 30 dollars for a showbag, then you will have quite a few options for getting a good deal.
So that’s it! Go forth to the 2023 Royal Sydney Easter Show knowing you have the power to maximise your returns on the all-important showbags 💪🤑
If you’re curious, all of the code is accessible on Github for you to run yourself. And if you want to avoid code entirely, the sorted showbag CSV is on Drive.
Footnotes:
Thanks Truc :) ↩