SCHOOL OF ARCHITECTURE, COMPUTING &
Cover sheet to be attached to the front of the assignment when submitted
Question paper to be attached to assignmentwhen submitted
All pages to be numbered sequentially
All work has to be presented in a ready to submit state upon arrival at
the ACE Helpdesk. Assignment cover sheets or stationery will NOT be
provided by Helpdesk staff
|Module title||Big Data Analytics|
|Module leader||Amin Karami|
|Assignment tutor||Amin Karami and Fahimeh Jafari|
|Assignment title||Big Data Analytics: Group Coursework|
|Handout date||Week 5 (2nd November 2018)|
|Submission date||Presentation: 21st Dec. 2018 and 11th Jan. 2019
Turnitin Submission: 30th December 2018 (midnight)
assessed by this
|4, 5, 6, 7, 8, 9|
|Yes||Turnitin GradeMark feedback used?||No|
|UEL Plus Grade Book
|No||UEL Plus Grade Book feedback used?|
|Yes||Are submissions / feedback totally electronic?||Yes|
Form of assessment:
Individual work Group work
For group work assessment which requires members to submit both individual
and group work aspects for the assignment, the work should be submitted as:
Consolidated single document Separately by eachmember
Number of assignment copies required:
1 2 Other
Assignment to be presented in the following format:
Stapled once in the top left-hand corner
Placed in a A4 ring bound folder (not lever arch)
Note: To students submitting work on A3/A2 boards, work has to be
contained in suitable protective case to ensure any damage to work is
CD (to be attached to the work in an envelope or purpose made wallet
adhered to the rear)
USB (to be attached to the work in an envelope or purpose made wallet
adhered to the rear)
Soft copy not required
Note to all students
Assignment cover sheets can be downloaded from logging into UEL Direct via
the following pathway.
UEL Direct → My Record → My Programme → Assessment log dates with Barcoded Frontsheet
All work has to be presented in a ready to submit state upon arrival at the ACE Helpdesk.
Assignment cover sheets or stationery (including staplers) will NOT be provided by
Helpdesk staff. This will mean students will not be able to staple cover sheets at the
CN7022 – Big Data Analytics
Group assignment 2018-19 Academic Year
This coursework must be attempted in groups of 2-3 students. This coursework is divided
into two sections: (1) Big Data analytics on a real case study and (2) group presentation. All
the members of group must attend in the presentation date. If you do not turn up in the
presentation date, you will fail the module.
Overall mark for coursework comes from two main activities as follows:
1- Big Data Analytics (around 3,000 words, with a tolerance of ± 10%) (70%)
2- Presentation (30%)
(breakdown of marks for each sub-task)
|30||(20)||Provide big data query and analysis by Apache Hive.|
|(10)||Visualize the outcomes of queries into the graphical
representations to get big insights.
|50||(30)||Design and build advanced analytics over the big data for
converting raw data to knowledge.
|(10)||Visualize the outcomes into the graphical representations.|
|(10)||Evaluate the accuracy of the models.|
|(1) Express new understanding and knowledge of the topic,
(2) Find alternative solutions for high level query languages
and analytics approaches,
(3) Express findings from big data analytics with relevant
|Documentation||10||(10)||Write down a scientific report.|
Big Data Analytics using Hadoop and Spark
CN7022 – Big Data Analytics (70%)
(1) Understanding Dataset: UNSW-NB15
The raw network packets of the UNSW-NB151 dataset was created by the IXIA PerfectStorm
tool in the Cyber Range Lab of the Australian Centre for Cyber Security (ACCS) for
generating a hybrid of real modern normal activities and synthetic contemporary attack
behaviours. Tcpdump tool used to capture 100 GB of the raw traffic (e.g., Pcap files). This
data set has nine types of attacks, namely, Fuzzers, Analysis, Backdoors, DoS, Exploits,
Generic, Reconnaissance, Shellcode and Worms. The Argus and Bro-IDS tools are used
and twelve algorithms are developed to generate totally 49 features with the class label.
a) The features are described here.
b) The number of records per traffic type are described here.
c) In this coursework, we use the total number of 2,540,044 records that was stored in
the CSV file (download). The total size is 560MB, which is big enough to employ big
data methodologies for analysis. As a big data specialist, firstly, we would like to read
and understand its features, then apply modeling techniques. If you want to see a
few records of this dataset, you can import it into Hadoop HDFS, then make a Hive
query for printing the first 5-10 records for your understanding.
(2) Big Data Query & Analysis by Apache Hive (30 marks)
This task is using Apache Hive for converting big raw data into useful information for end
users. To do so, firstly understand the dataset carefully. Then, make at least four Hive
queries to be able to get information from this big dataset. Apply appropriate visualization
tools to present your findings numerically and graphically. Interpret shortly your findings.
Finally, take screenshot of your scripts/codes into the report.
Tip: the mark for this section depends on the level of Hive queries’ complexities, for instance
using simple select query is not supposed for full mark.
(3) Advanced Analytics using PySpark (50 marks)
In this section, you will conduct advanced analytics using PySpark.
3.1. Analyze and Interpret Big Data (20 marks)
a) We need to learn and understand the data through 3-4 descriptive analysis methods.
You need to present your work numerically and graphically. Apply tooltip text, legend,
title, X-Y labels etc. accordingly to help end-users for getting insights. [10 marks]
b) Apply 3-4 advanced statistical analysis methods (e.g., correlation, hypothesis testing,
density estimation and so on) to interpret data precisely. You need to write down a
report of your methods, their configurations and interpret your findings. [10 marks]
3.2. Design and Build a Classifier (30 marks)
a) Design and build a binary classifier over the dataset. Explain your algorithm and its
configuration. Explain your findings into both numerical and graphical
representations. [5 marks]
b) How do you evaluate the performance of the model? [5 marks]
c) How do you verify the accuracy and the effectiveness of your model? [5 marks]
d) Apply a multi-class classifier to classify data into ten class: one normal and nine
attack (e.g., Fuzzers, Analysis, Backdoors, DoS, Exploits, Generic, Reconnaissance,
Shellcode and Worms). Briefly explain your model with supportive statement on its
parameters, accuracy and effectiveness. [15 marks]
Tip: For this task 3.2, you can get help from:
https://goo.gl/GSJb8s (Book: Learning Apache Spark with Python, 2018)
(4) Individual Assessment (10 marks)
Discuss (1) what did you learn from this coursework, (2) what other alternative technologies
are available for tasks 2 and 3 and how they are differ (use academic references), and (3)
what was surprisingly new thinking evoked and/or neglected at your end?
Tip: add individual assessment of each member in a same report.
(5) Documentation (10 marks)
Document all your work. Your final report must follow 5 sections detailed in the “format of
final submission” section (refer to next page). Your work must demonstrate appropriate
understanding of academic writing and integrity.
FORMAT OF FINAL SUBMISSION
You need to prepare one single file in PDF format as your group coursework within the
1. Cover Letter
2. Table of Contents
3. Report of above-mentioned tasks 1-4 (it needs sub-sections of each task, accordingly)
4. Teamwork minutes (including minutes of meetings, task allocation, etc.)
5. References (if any)
Please upload ONLY one single PDF per group into Turnitin in Moodle. One member of
each group must submit the work, NOT all members. The submission link will be available
from week 10, and you are free to amend your submitted file several times before submission
deadline. Your last submission will be saved in the Moodle database for marking.
The University defines an assessment offence as any action(s) or behaviour likely to confer
an unfair advantage in assessment, whether by advantaging the alleged offender or
disadvantaging (deliberately or unconsciously) another or others. A number of examples are
set out in the Regulations and these include:
“D.5.7.1 (e) the submission of material (written, visual or oral), originally produced by another
person or persons, without due acknowledgement, so that the work could be assumed the
student’s own. For the purposes of these Regulations, this includes incorporation of
significant extracts or elements taken from the work of (an) other(s), without
acknowledgement or reference, and the submission of work produced in collaboration for an
assignment based on the assessment of individual work. (Such offences are typically
described as plagiarism and collusion.)”. The University’s Assessment Offences Regulations
can be found on our web site. Also, information about plagiarism can be found on the
FEEDBACK TO STUDENTS
Feedback is central to learning and is provided to students to develop their knowledge,
understanding, skills and to help promote learning and facilitate improvement.
Feedback will be provided as soon as possible after the student has completed
the assessment task.
Feedback will be in relation to the learning outcomes and assessment criteria.
It will be offered via Turnitin GradeMark or Moodle post.
As the feedback (including marks) is provided before Award & Field Board, marks are:
available for External Examiner scrutiny
subject to change and approval by the Assessment Board
ASSESSMENT FORM FOR PRESENTATION
CN7022 – Big Data Analytics (30%)
|Students have to fill this section correctly. Assessors will not be liable for any mistakes.
Group No: ……………….
Group Members (Student IDs): ……………………………………………………………………………
All students agree to equal distribution of marks? : Yes / No
If NO, state percentage for each.
Assessors are responsible for filling the rested form as follows.
|All Group Members|
|All students agreed distribution of marks|
|Clear, concise and all the group members played an active part||5 marks|
|Able to demonstrate the Big Data stack and its analysis||5 marks|
|Present a working solution of the application (scripting,
programming and analysis)
|Ability to answer questions||10 marks|
Overall Mark: ………….………
Date & Time: ………………….
Assessors’ signature and comments: