Building IT Systems – Global Homework Experts

ITD104 Building IT Systems
TP3, 2020
Portfolio 2 Assignment: Online Shopping Application
(10%, due Week 12, Sunday 24 January 11:59pm)
Overview
This is the first part of the second Portfolio ‐ worth 10% of your final grade for ITD104. The
Portfolio 2 Test is the second part of the Portfolio, and will be worth a further 15%. The Test
will test you for the same skills required to complete this Assignment, under time pressure.
Motivation
The Internet has dramatically changed the way we conduct commerce. Online shopping is
increasingly displacing traditional forms of retailing. Here you will develop an application that
provides an online shopping experience, using data downloaded from the World Wide Web.
The program will have a Graphical User Interface that allows its user to choose which products
they want to buy from different categories. Having done so they will then be able to create
an illustrated invoice summarising their purchases, which will be opened in a standard web
browser. Most importantly, the online shopping application will aggregate both
archived and
current data sourced from online “feeds” that are updated on a regular basis.
This “capstone” assignment is designed to incorporate most of the concepts taught in ITD104.
To complete it you will need to: (a) use Tkinter to create an interactive Graphical User
Interface; (b) download web documents using a Python script and use pattern matching to
extract particular elements from them; and (c) generate an HTML document containing the
extracted elements, presented in an attractive, easy‐to‐read format.
Goal
Your aim in this assignment is to develop an “online shopping app” which allows its users to
select items from one or more online retailers to add to a “shopping cart” of purchases. There
must be at least two distinct categories of items for sale. The items on offer are extracted
from the product lists offered by actual online shops. One category is derived from previously‐
downloaded web documents, and a further one (or more) categories must be based on “live”
web documents currently online. The online web documents
must be ones that are updated
on a regular basis. When users are happy with their selections they must be able to produce
an invoice detailing their purchases and the total cost (in Australiandollars).
For the purposes of this assignment you have a free choice of what products your online
shopping application will offer. Your application must offer (at least) two clearly‐distinct
categories of item. Categories could be:
clothing,
electrical goods,
books and magazines,
motor vehicles,
furniture,
ITD104 Building IT Systems
TP3, 2020
toys and games
jewellery and cosmetics,
real estate,
etc.
However, whatever categories of products you choose, you must be able to find online web
documents that contain regularly‐updated lists of such products. The categories could all
come from the same online retailer or from two or more different ones.
For each product category the corresponding online web document must include at least ten
items for sale at any time. For each item the document must include a product name, a
photograph of the product, and a price. A good source for such data is Rich Site Summary
(RSS) web‐feed documents. Appendix A below lists many such sites suitable for this
assignment, but you are encouraged to find your own of personal interest. For the “live”
product category the chosen web document must be updated regularly, preferably at least
once a day.
Using this data source you are required to build an IT system with the following general
architecture.
Your online shopping application will be a Python program with a Graphical User Interface.
Under the user’s control, it extracts individual elements from two distinct sets of web
documents to display the products for sale. One document is static and is stored as an
“archive” HTML document. The other source of web documents is the “live” Internet. Your
application must offer at least one category of product from an archive, and at least one
category of products from the internet. For each category of product there must be at least
ten items for sale at any time. Once the user has made their selections via the GUI, your
application must generate an HTML “invoice” for the purchases, which is opened
automatically in the user’s default web browser.

ITD104 Building IT Systems
TP3, 2020
Illustrative example
To demonstrate the idea, below is my own online shopping application, which uses data
extracted from several different web sites. My demonstration application allows users to
select from two product categories: ‘Beaver’ sportswear and Computer accessories. The
application allows users to see which items are available and pick those they want to buy from
each category. The program accesses all the necessary data from the various web sites (either
“live” or using a previously‐downloaded document depending on the category) and uses it to
produce an illustrated invoice in the form of an HTML document which is then opened in the
user’s default browser.
The screenshot below shows an example solution’s GUI when it first starts. I called my online
shop the “
Online Shopper” and have included a suitably evocative image to serve as the shop’s
logo.
The GUI offers the user two distinct categories of products. One category, “’Beaver’
sportswear” is static and is never updated and is listed under “Archive Sales” on the GUI. The
other category, “Computer accessories”, is “live” and represents current items for sale online.
This category will likely contain different items each time we check them!
[Notice in this GUI that the “Add to cart” button is disabled until a category is selected.]
The user can select a category of items by clicking on the corresponding radio button. For
instance, the user may choose to look firstly at the Archive Sales section of ‘Beaver’
sportswear. Notice now (in the following screenshot) that the “Add to cart” button has been
enabled:

ITD104 Building IT Systems
TP3, 2020
When the user selects a category of products they must then be presented with a list of items
for sale, together with each item’s price. In my demonstration solution I have done this by
popping up a new window:
Ten products are listed. This information was sourced from a web document downloaded via
the URL shown at the bottom of the window and stored in a local folder. My program uses
pattern matching to extract the necessary elements from the web document so that if a
different document is inserted in the archive the data displayed will update accordingly.

ITD104 Building IT Systems
TP3, 2020
In the example GUI, the user can then select a particular item to buy using a spinbox widget.
As per the screenshot below, the user has selected item number 4 and pressed the “Add to
cart” button to add the purchase to their shopping cart.
The shopper’s attention then turns to the “Online Sales” option, in the knowledge that this
category is “live” and currently online. The user selects the “’Computer accessories” category:
Again, a list of products for sale is produced, but this time from a live website (at the URL
indicated at the bottom of the window). In this illustrative example, the list of products for
sale was produced some months ago:

ITD104 Building IT Systems
TP3, 2020
This particular list was based on live data and would be different if the program was run today.
The user then selects items 1 and 7 and adds both to their cart:
At this point the user is satisfied with their purchases and presses the “Print invoice” button.
This causes the application to generate an HTML file containing a summary of the user’s
purchases.
In this illustrative example, during the generation of the HTML file, the status text box displays
a series of messages informing the user of progress i.e.:
Generating your invoice …
Listing your purchases…
Calculating total price…
Finishing your invoice…
and finally:
DONE!…
Screen shots are not included here to save space, but this feature will be demonstrated in
class. It is NOT necessary to output such messages in your own solution. They are simply
included in the demonstration program to help explain what processes are happening in the
program.

ITD104 Building IT Systems
TP3, 2020
The generated HTML file that has now been generated is opened automatically in the user’s
default browser. This document details the purchases in the order they were made, including
each product’s name, photo, and price (in the original web site’s currency), the total price for
all the purchases (converted to Australian dollars) and the web pages from which the product
details were sourced:

ITD104 Building IT Systems
TP3, 2020
In summary, therefore, the demonstration online shopping application has the following
features:
o Widgets that identify the application, including a name and a logo.
o Widgets that allow the user to select a product category and an item within that
category that they wish to purchase, choosing from (at least) one archived category
and (at least) one live category.
o Widgets that allow the user to see the names and prices of (at least) ten products for
sale in each category.
o A widget that allows the user to declare that they have completed their shopping
and want to receive their invoice.
The generated invoice has the following features:
o A name identifying the imaginary shop.
o For each item purchased:
o The product’s name;
o A photo of the product; and
o The product’s price (as shown on the original web site, even if it is in a foreign
currency).
o The total price for all of the purchases, converted to Australian dollars (all of my source
web sites listed their prices in US dollars).
o Links to the original web sites for all of the product categories, which can be used to
inspect the site’s current contents.
Importantly, the product image files, do
not reside on the local computer. They are all links to
image files online and are downloaded “live” when the invoice is viewed in a browser. (The
quality of the images varies, depending on the source. You will find that some images are
small “thumbnails” on the original site, so will be noticeably pixellated when enlarged to fit
your invoice’s layout.)
You are
not required to follow the details of the demonstration GUI or invoice layout. You are
strongly encouraged to use your own skills and initiative to devise your own solution, provided
it has all the functionality listed above. For instance, different widgets could be used for the
GUI, such as pull‐down menus for selecting items. Similarly, obvious improvements to this
example would be to add a widget to the GUI that displays the current contents of the user’s
“shopping cart”; and to allow for existing cart items to be deleted.
To produce the item listings and invoice, the demonstration online shopping application used
regular expressions to extract elements from the relevant web documents, whether they
were stored in the static archive or downloaded when the program is run. For instance, I
found by inspecting the HTML code from the Beaver sportswear shop that its photos were
thumbnail images that appeared between
<isc:thumb> </isc:thumb> XML tags,
which made them easy to find with a regular expression. Similarly, I found the prices of
Computer accessories between
<price> </price> tags.
Care was also taken to ensure that no HTML/XML tags or other HTML entities appeared in the
extracted text. In some cases it was necessary to (programmatically) delete or replace such
mark‐ups in the text after they were extracted from the web document. The generated
invoice must not contain any extraneous tags or unusual characters that would interfere with
the document’s appearance when viewed.

ITD104 Building IT Systems
TP3, 2020
A small part of the HTML code generated by the Python program is shown below. Although
not intended for human consumption, the generated HTML code is nonetheless laid out
neatly, and with comments indicating the purpose of each part.
Where the product data comes from
A significant challenge for this assignment is that web servers deliver different documents to
different web browsers, RSS readers and other clients. This means the web document you
see in a browser may be very different from the web page downloaded by your Python
application.
Your Python program “sees” these documents as a single character string. For instance, the
source code for the Computer accessories page appears as follows when viewed as “raw” XML
text in a webbrowser.
The HTML/XML source code of the web documents is the format your Python program must
work with. From it you will need to find the textual elements needed to construct the product

ITD104 Building IT Systems
TP3, 2020
descriptions in your GUI and your invoice document. To do so in a general way, that allows
for the document being updated, you will have to use pattern matching techniques. This is
most efficiently done using regular expressions. You can see above, for instance, part of the
name of the first product on offer in the Computer accessories site, a “2020 Santa Gell Mouse
Pad”. This name is enclosed within
<title> </title> HTML tags, which helps us find
it. Similarly, you can see there are a number of URLs for images of different sizes. The image
I have chosen to extract for each item is in the
<media:content…>.
Whereas the “archived” document never changes, the “live” shopping category must be
downloaded “fresh” from the Internet whenever the user runs the Python application. The
chosen web page/s must be ones that are updated on a regular basis, so you cannot be sure
of their precise contents when the program runs. In any event, it must work for
any product
lists adhering to the general source code format of the web sites, to allow for updates.
Obviously working with such complex code is challenging. You should begin with your static,
“archived” document to get some practice at pattern matching before trying dynamically
changeable web documents downloaded from online.
Robustness
Another important aspect of the system is that it must be resilient to user error. This depends,
of course, on your choice of GUI widgets and how they interact. Whatever the design of your
GUI, you must ensure that the user cannot cause it to “crash”.
For instance, in my demonstration solution:
the user must select a category before adding to the cart; and
the user is expected to select some items before printing an invoice. However, if the user
presses the “Print invoice” button before selecting any items a sensible result is still
produced:
Specific requirements and marking guide
To complete this task you are required to produce an application in Python similar to that
above, using the provided
online_shopping_app.py template file as your starting
point. In addition you must provide a folder containing (at least) one previously‐downloaded
web document as your “archive sales” items for sale. (The invoice’s images must be online
files and must not be included in your submission.)

ITD104 Building IT Systems
TP3, 2020
Your complete solution must support at least the following features.
An intuitive Graphical User Interface (1%). Your application must provide an
attractive, easy‐to‐use GUI. You have a free choice of which Tkinter widgets to do the
job, as long as they are effective and clear for the user. This interface must have the
following features:
o A name identifying your online shop.
o A image (locally stored) which serves as the shop’s logo. GIFs are the easiest
image files to work with Python. Other image file formats require additional
modules, so they are best avoided. The image file should be included in the
same folder as your Python application.
o A widget or widgets that allow the user to see all the items available for sale.
It must be possible to distinguish the archived sales category from the current
“live” ones. The items could all be displayed at once, or when the user selects
specific categories, as in the demonstration solution.
o The URLs from which the data was sourced must be displayed.
o A widget or widgets that allows the user to select items to buy.
o A widget or widgets that allows the user to declare that they have finished
shopping and wish to receive their invoice.
Displaying “archived” products for sale in the GUI (2%). Your GUI must be capable of
displaying (at least) one static category of products for sale, with (at least) ten items
in each category. For each item (at least) the product name and price must be shown,
in such a way that the user can select specific items to buy. The product data must be
extracted from HTML/XML files previously downloaded from online and stored in your
“archive” folder. The documents must be stored in exactly the form they were
downloaded from the web server; they cannot be edited or modified in any way.
Pattern matching must be used to extract the relevant elements from the documents
so that the code would still work if the documents were replaced with others in the
same format. To keep the size of the archive folder manageable only single HTML/XML
source files can be stored.
No image files may be stored in the archive.
Displaying “live” products, downloaded from the web, in the GUI (3%). Your GUI
must be capable of displaying (at least) one distinct “live” categories of products for
sale, with (at least) ten items in each category, as currently available online at the time
the program is run. For each item (at least) the product name and price must be
shown, in such a way that the user can select specific items to buy. The product data
must be extracted from HTML/XML files downloaded from online when the program
is run. Pattern matching must be used to extract the relevant elements from the
documents so that the code still works even after the online documents are updated.
The chosen web sites must be ones that are updated on a regular basis.
Generating an HTML invoice describing the items selected by the user (2%). Your
program must be able to generate an invoice listing the specific items selected by the
user. It must be created as an HTML document in the same local folder as your Python
program and must be called
invoice.html. This document must be automatically
opened in the user’s default browser. It must incorporate HTML markups that meet
the HTML5 standard and make its contents viewable in any standard web browser.
When viewed, the invoice must contain (at least) the following features:
o A name identifying your online shop.
ITD104 Building IT Systems
TP3, 2020
o A list of all the items selected by the GUI’s user (and no others). For each
item (at least) the following data must be displayed:
The name of the product.
An image of the product.
The product’s price, in the same currency as shown on the original web
page. (To make it easy to tell if you have extracted the correct price by
checking it against the source web page, you should not convert the
prices of individual purchases to Australian dollars. Only the total price
should be converted to AUD.)
All of this data must be extracted from web documents downloaded from the
Internet via pattern matching. Most importantly, each of these sets of items
must all belong together, e.g., you can’t have the image of one product paired
with the price of another. Each of the elements must be extracted from the
original document separately. It is
not acceptable to simply copy large chunks
of the original document’s HTML/XML source code.
o The total price of the user’s purchases, converted to Australian dollars, if
necessary. (You can use a fixed exchange rate for this purpose. In the
demonstration solution I assumed an exchange rate for converting USD to AUD
of 1.33.)
o URLs linking your invoice to the original web pages from which your shopping
data was extracted, for both your “archived” and “live” shopping categories. It
must be easily possible to follow these links to see the original web pages
online.
When viewed in a browser the invoice must be neatly laid out and appear well‐
presented regardless of the browser window’s dimensions. The textual parts
extracted from the original documents must not contain any visible HTML/XML tags
or entities or any other spurious characters. The images must all be links to images
found online, not in local files, and should be of a size consistent with that of the rest
of the document, regardless of their original dimensions.
Good Python and HTML code quality and presentation (2%). Your Python program
code and the generated HTML code must be
presented in a professional manner. See
the coding guidelines in the
ITD104 Code Presentation Guide (on Blackboard under
Assessment) for suggestions on how to achieve this for Python. In particular: each
significant code segment must be
clearly commented to say what it does, e.g., “Extract
the link to the photo”, “Show the price”, etc. Other important code quality and
presentation aspects relate to naming of variables, (absence of ) magic numbers and
duplicated code etc. The HTML document must also be commented and well
formatted, so that it easily read in a text editor. (Full marks for this criterion are not
possible with only partial attempts.)
You can add other features if you wish, as long as you meet these basic requirements.
You
must complete the task using only basic Python features and the modules already imported
into the provided template.
Support tools
To get started on this task you need to download various web pages of your choice and work
out how to extract the necessary elements for at least ten products for sale:

ITD104 Building IT Systems
TP3, 2020
The product’s name.
A link to a photograph of the product.
The product’s price.
You also need to allow for the fact that the contents of the web documents from which you
get your data will change regularly, so you cannot hardwire the locations of the elements in
your program. Instead you must use Python’s regular expression function
findall to
extract the necessary elements, no matter where they appear in the HTML/XML source code.
To help you develop your solution, I have included two small Python programs with these
instructions.
1. web_doc_downloader.py is a Python program containing a function called
download that downloads and saves the source code of a web document as a
Unicode file as well as returning the document’s contents to the caller as a string. You
can use it to save copies of your chosen web documents for storage in your “archive
sales” stock.
2. regex_tester.py is an interactive program introduced in class which makes it
easy to experiment with different regular expressions on small text segments. You can
use this together with downloaded text from the web to help perfect your regular
expressions. (There are also many online tools that do the same job.)
Internet ethics: Responsible scraping
The process of automatically extracting data from web documents is sometimes called
“scraping”. The RSS feeds recommended for use for this assignment are specifically intended
to be easily “scrapable”. However, in order to protect their intellectual property and their
computational resources, owners of some other web sites may not want their data exploited
in this way. They will therefore deny access to their web documents by anything other than
recognised web browser software such as Chrome, Firefox, Internet Explorer, etc. Typically in
this situation the web server will return a short “access denied” document to your Python
script instead of the expected web document (Appendix B).
In this situation it’s possible to trick the web server into delivering the desired document by
having your Python script impersonate a standard web browser. To do this you need to
change the “user agent” identity enclosed in the request sent to the web server. Instructions
for doing so can be found online. I leave it to your own conscience whether or not you wish
to do this, but note that this assignment can be completed successfully without resorting to
such subterfuge.
Development hints
This “capstone” assignment is a substantial task, so you should not attempt to do it all at once.
In particular, you should approach it methodically and attempt one part at a time.
o Developing a way of extracting relevant elements from web documents is a challenge
in its own right. Having selected the shopping web sites of interest you should
download copies of web documents for each of your product categories so that you
can study their structure (for both the “archived” and “live” pages). You should
download the documents using the supplied
download function, rather than saving
them from a web browser, to ensure that they have the same structure that will be
seen by your Python program. Examine the HTML/XML source code of the documents
to determine how the elements you want to extract are marked up. This is best done

ITD104 Building IT Systems
TP3, 2020
using a plain text editor rather than a web browser because a browser will attempt to
interpret the document’s contents and may not display all of the HTML/XML tags. For
instance, if you ask a browser to show you an RSS page’s source it may show you an
“XML tree” which does not include all the annotations in the source file. Typically you
will want to identify the markup tags, and perhaps other unchanging parts of the
document, that uniquely identify the beginning and end of the text and image
addresses you want to extract. Using the provided
regex_tester.py application,
you can then devise regular expressions which extract just the necessary elements
from the relevant parts of the web documents. Having perfected the regular
expressions you can then develop a simple prototype of your “back end” function(s)
that just extracts the required elements from a web document.
o In the demonstration solution I used two entirely different sources of web documents,
but you are not required to follow this example. You can choose to work with two
different product categories from the same web site, as long as they are clearly distinct
list of products. Doing so can save you some effort because the same pattern matching
solution is likely to be applicable to all categories.
o The assignment has been designed so that you have to work with previously‐
downloaded, unchanging web documents, and “live” web documents downloaded at
run time. Obviously it is easiest to work with the static “archived” documents, so you
should develop your code for that part of the assignment first, before tackling “live”
web pages.
o Similarly, having worked out how to extract relevant elements from the web
documents, you have to display them in two different ways. Firstly, you have to display
the product names and prices in the GUI, so that the user can choose items to buy.
Secondly, you need to design and generate an HTML invoice containing the product
names, photos, and prices for the items bought. The first of these two tasks is the
simplest, so I recommend tackling it before developing your code for the invoice.
o Finally, although it’s tempting to start work by developing the Tkinter GUI first, this
can be a time‐consuming task, and can be left until after you have the “back‐end”
features described above completed. Decide whether you want to use push buttons,
radio buttons, menus, lists or some other mechanism for choosing and displaying
product categories and selecting items, and producing the invoice. Developing the GUI
is the “messiest” step, and is best left to the end.
If you are unable to complete the whole task, just submit those stages you can get working.
You will receive
partial marks for incomplete solutions. If your solution is only partially
working it will help if you explain the limitations of your submission, either by adding
comments when you upload it to Blackboard or in the submission itself.
Deliverables
You should develop your solution by completing and submitting the provided Python
template file
online_shopping_app.py. Submit this in a “zip” archive containing all the
files needed to support your application as follows:
1. Your online_shopping_app.py solution. Make sure you have completed the
“statement” at the beginning of the Python file to confirm that this is your own
individual work by inserting your name and student number in the places indicated.
I
will assume that submissions without a completed statement are not your own
work, and will give you a mark of 0
.
2. One or more small GIF files needed to support your GUI, but no other image files.
ITD104 Building IT Systems
TP3, 2020
3. A folder containing the previously‐downloaded web document/s used for your static
“archive” sales item/s. Again, this folder may contain HTML/XML source code files
only.
It must not contain any image files. All images needed for your invoice must be
sourced from online when the invoice is viewed in a web browser.
Example file structure for submission:
Once you have completed your solution and have zipped up these items submit them to
Blackboard as a single file. (NB it must be a ZIP file, ie with a “.zip” extension, not any other
archive format.)
Plagiarism
This is an individual assessment item. All files submitted will be subjected to software
plagiarism analysis using the MoSS system (http://theory.stanford.edu/~aiken/moss/).
Serious violations of the university’s policies regarding plagiarism will be forwarded to the
Science and Engineering Faculty’s Academic Misconduct Committee for formal prosecution.
How to submit your solution
A link is available on Blackboard under Assessment for uploading your solution before the
deadline. Note that you will be able to submit as many drafts of your solution as you like. You
are strongly encouraged to
submit draft solutions before the deadline as insurance against
computer and network failures. If you are unsure whether or not you have successfully
uploaded your file, upload it again!
Students who encounter problems uploading their Python files to Blackboard should contact
the IT Helpdesk ([email protected]; 3138 4000) for assistance and advice.
Author: Colin Fidge – revised Donna Kingsbury 2020.

ITD104 Building IT Systems
TP3, 2020
Appendix A: Some RSS feeds that may prove helpful
For this assignment you need to find tw (or more) categories of regularly‐updated items for
sale, containing a product name, link to an image, and a price. You can choose any web site
that has these features, but to simplify your task you should seek one that has a simple source‐
code format. This appendix suggests some such sites, but you are strongly encouraged to find
your own of personal interest.
The following links point to
Rich Site Summary, a.k.a. Really Simple
Syndication
, web feed documents. RSS documents are written in XML
and are used for publishing information that is updated frequently in
a format that can be displayed by RSS reader software. Such
documents have a simple standardised format, so we can rely on
them always presenting their contents in the same way, making it
relatively easy to extract specific elements from the document’s
source code via pattern matching.
Another important advantage of RSS feeds for our purposes is that such documents are
specifically intended to serve as sources of online information for RSS readers and other such
software, so they are unlikely to block Python scripts from accessing their contents (see
Appendix B).
However, a disadvantage of using RSS feeds is that they can be hard to find! Often you can
discover them only by looking for the RSS symbol at the bottom of web pages. Many web sites
have associated RSS feeds but don’t advertise them at all. Also, because RSS feeds are not
intended for human consumption, they don’t usually feature prominently in the results of
web searches using standard search engines such as Google, DuckDuckGo, Bing, etc. You will
need to do some exploration online to find suitable feeds for your solution.
Note that you are
not limited to using RSS sites for this assignment, but you may find other,
more complex, web documents harder to work with. Most importantly, you are
not required
to use
any of the sources below for this task. You are strongly encouraged to find online
documents of your own, that contain material of personal interest.
For the example solution above I used two different sites, but there are many others that may
be suitable for this assignment. Some examples of RSS sites (or pages that point to such sites)
containing items for sale include the following. I have
not confirmed that these are all well‐
suited to the assignment. You will need to work that out for yourself!
In particular, for the “live” category of products for sale you must choose a site that is updated
regularly. Although RSS feeds typically have a “publication date” associated with each item,
it’s not always obvious if the site is updated regularly merely by checking these dates. Usually
the “publication date” is the time the item was added to the RSS feed, but sometimes it is
merely the time the web document was downloaded. In the latter case it’s possible that the
feed has not been updated for a long time even though its items have the current date on
them.
The following sites are all ones that provide online “department stores” or “marketplaces”
with many individual categories, and are thus likely to be suitable for this assignment.
DealNews, as its name suggests, is a site that advertises “good deals” for products in
a wide range of categories:
https://www.dealnews.com/pages/rss.html. Its RSS
feeds at
https://www.dealnews.com/pages/rss.html include lists of recent, popular
and “editor’s choice” deals. It also has lists in specific categories. For instance,

ITD104 Building IT Systems
TP3, 2020
https://www.dealnews.com/rss/c182/ is a list of “office supplies”. Other lists can be
found by changing the category number in this URL, e.g., c186 is toys and games.
A little‐known feature of Ebay is that any search for items can be retrieved as an RSS
feed. The simplest way of doing this is to substitute the search term of interest for
X
in the following URL: https://www.ebay.com.au/sch/i.html?&_nkw=X&_rss=1. For
instance, to find items for sale related to TV comedy series “The Goodies”, the RSS
feed is
https://www.ebay.com.au/sch/i.html?&_nkw=The+Goodies&_rss=1 and for
items related to science fiction series “Lost in Space”, the RSS feed is
https://www.ebay.com.au/sch/i.html?&_nkw=Lost+In+Space&_rss=1, and so on.
Zazzle is an online “marketplace” with a wide variety of product categories for sale
such as shoes, lamps, gift cards, hats, etc. Its default RSS feed is
https://feed.zazzle.com/rss but specific categories of goods can be selected by
including a query in the URL such as
https://feed.zazzle.com/rss?qs=wine+glasses
for wine glasses, https://feed.zazzle.com/rss?qs=Tshirts for T‐shirts, and so on. NB:
If the search term doesn’t match any of Zazzle’s known categories the default page
of products is returned, so you can’t assume the search was successful just because
a list of products comes back. Check carefully that the items listed are really in the
category you’ve chosen!
Etsy is another online marketplace, this time consisting of dozens of separate
handicraft shops. The shops available can be searched at
https://www.etsy.com/shop. Having found the name X of a shop, the equivalent RSS
feed is then constructed as https://www.etsy.com/au/shop/
X/rss or
https://www.etsy.com/shop/
X/rss. For instance, some Etsy shops are
https://www.etsy.com/au/shop/SapphireDesignStudio/rss,
https://www.etsy.com/shop/biancabeers/rss, and many, many more. The amount of
detail in each feed varies from shop to shop. Also be warned that some shops have
not updated their feeds in a long time and therefore may not be suitable as the
“live” feeds in this assignment.
Oodle is an online classified ad site that creates product listings in a huge number
of categories:
https://www.oodle.com/info/feed. For instance, cats can be found
at
https://cats.oodle.com/ and cars at https://cars.oodle.com/. Note that although
the Oodle web pages are XML documents they are
not RSS feeds per se, so their
structure is more complicated.
Spendfish is a web site that summarises product lists from Amazon. It has RSS feeds
at
http://spendfish.com/feeds/ and also has a “feed builder” which allows you to
create your own RSS filters.
Myer online provides a multitude of shopping category options including:
https://www.myer.com.au/c/offers/women‐739051‐1/footwear‐737707‐1 for
women’s shows on sale.
Apart from these aggregated sources, many individual shops have basic RSS feeds, typically
with three categories such as “new”, “popular” and “featured” products. Some such sites are
as follows. Warning: I have not checked all these sites to ensure that they are updated
regularly. Some sites, such as the following, have not been updated in months or even years,
so would be unsuitable for the “live” categories in this assignment.
Zombie Unlimited: http://zombieunlimited.com/rss‐syndication/
ShoeEver clothes: https://www.shoeever.com/rss‐syndication/
ITD104 Building IT Systems
TP3, 2020
Frankie’s Auto Electrics: https://frankiesautoelectrics.com.au/rss
The Fishing Tackle shop: https://www.fishingtackleshop.com.au/pages/RSS.html
The following similar sites may be suitable for the assignment but I have not checked them all
carefully to see if they are updated regularly.
Executive Accessories: http://www.executiveaccessories.com.au/rss‐syndication/
House of Tinks fashion: http://houseoftinks.com/rss‐syndication/
Seicane car accessories: https://www.seicane.com/rss
Persian rugs: https://www.persianrugs.com.au/rss‐syndication
Huds and Toke pet treats: https://hudsandtoke.com.au/rss‐syndication
Wine Factor: http://www.winefactor.com/pages/RSS‐Feeds.html
Gibson Acoustic Guitars:
https://www.gibson.com/Guitars/Collection/Original%20Acoustic
Vintage Machinery classifieds (but some photo links were broken when I visited):
http://www.vintagemachinery.org/classifieds/rss.aspx
Tech Bargains: https://www.techbargains.com/rss.xml
Unfortunately, not all web sites are friendly to Python programs (see Appendix B). Below is
one such web site which I thought would be suitable for this assignment but they denied
access to Python scripts, so are
not suitable.
A huge number of classified ads: https://www.classifiedads.com/. At the bottom of
each page is an RSS link, but access is denied to Python scripts, presumably to protect
the publisher’s commercial assets, which seems reasonable enough in this case.

ITD104 Building IT Systems
TP3, 2020
Appendix B: Web sites that block access to Python scripts
As noted above, some web servers will block access to web documents by Python programs
in the belief that they may be malware, or in order to protect the owner’s computing
resources and data assets from abuse. In this situation they usually return a short HTML
document containing an “access denied” message instead of the desired document. This can
be very confusing because you can usually view the document without any problems using a
standard web browser even though your Python program is delivered something different by
the server.
If you suspect that your Python program isn’t being allowed to access your chosen web page,
use the
web_doc_downloader.py program to check whether or not your Python
program is being sent an access denied message. When viewed in a web browser, such
messages typically look something like the following example. In this case an older version of
the blog
www.wayofcats.com had used anti‐malware application Cloudflare to block
access to the blog’s contents by the Python program.
In this situation you are encouraged to choose another source of data. Although it’s possible
to trick some web sites into delivering blocked pages to a Python script by changing the “user
agent” signature sent to the server in the request I
don’t recommend doing so, partly because
this solution is not reliable and partly because it could be considered unethical to deliberately
override the website owner’s wishes.

order now