Data Driven Decision Making – Global Homework Experts

Session 3
Data Driven Decision Making – DDDM
Agenda for this/next week
Recap session two
Virtual Environments
NumPy and Pandas
Module Assessment
Reminder: see Blackboard for assessment details
To start with…
Recap on session two
Learning Outcomes
On completion of this workshop you will be able to:
Install a Python programming environment
Write and execute Python scripts
Access and query data sets using Python scripts
Reflect on the applicability of these topics to different organisations
Setting up a Virtual Environment (venv)
At its core, the main purpose of Python virtual environments is to create an
isolated environment for Python projects.
This means that each project can have its own dependencies, regardless of what
dependencies every other project has.
In this session, we will demonstrate how to set up multiple virtual environments,
so that you can have more than one Data Driven Decision-Making App in
development all at once.
To get started, if you’re not using Python 3, you’ll want to install the virtualenv tool with pip:
$ pip install virtualenv
Create a new virtual environment inside the directory:
# Python 2:
$ virtualenv mynewenv
# Python 3
$ python3 -m venv mynewenv
Using Virtual Environments
The Python 3 venv approach has the benefit of forcing you to choose a specific version of the Python 3
interpreter that should be used to create the virtual environment. This avoids any confusion as to
which Python installation the new environment is based on.
From Python 3.3 to 3.4, the recommended way to create a virtual environment was to use the pyvenv
command-line tool that also comes included with your Python 3 installation by default. But on 3.6 and
above, python3 -m venv is the way to go.
Approaches to Virtual Environments
In this example, this command creates a directory called mynewenv, which contains a directory structure
similar to this:
What exactly does it mean to “activate” an environment? Knowing what’s going on
under the hood can be pretty important for a developer, especially when you need
to understand execution environments, dependency resolution, and so on.
To explain how this works, let’s first check out the locations of the different python
executables. With the environment “deactivated,” run the following:
$ which python
Now, activate it and run the command again:
$ source mynewenv/bin/activate
(env) $ which python
How does a Virtual Environment work?
Python Language Constructs
There are some core characteristics of Python, largely arising from the fact that it
is an ‘object-oriented’ language.
This means that any item has both a data element and a functional element (it
does something).
This is actually very powerful in a language.
In this session we will consider the characteristics of the language and the
implications that result from the fact that it is and object-oriented language.
File I/O
f = file(“foo”, “r”)
line = f.readline()
print line,
# Can use sys.stdin as input;
# Can use sys.stdout as output.

Files: Input

order now
input = open(‘data’, ‘r’) Open the file for input
S = Read whole file into
one String
S = Reads N bytes
(N >= 1)
L = input.readlines() Returns a list of line

Files: Output

output = open(‘data’, ‘w’) Open the file for
output.write(S) Writes the string S to
output.writelines(L) Writes each of the
strings in list L to file
output.close() Manual close

open() and file()
These are identical:
f = open(filename, “r”)
f = file(filename, “r”)
The open() version is older
The file() version is the recommended way to
open a file now
uses object constructor syntax (next lecture)
OOP Terminology
class — a template for building objects
instance — an object created from the template (an
instance of the class)
method — a function that is part of the object and
acts on instances directly
constructor — special “method” that creates new

What is an object?
data structure, and
functions (methods) that operate on it
class thingy:
# Definition of the class here, next slide
t = thingy()
print t.field
Built-in data structures (lists, dictionaries) are also objects
though internal representation is different
Defining a class
class Thingy:
“””This class stores an arbitrary object.”””
__init__(self, value):
“””Initialize a Thingy.”””
self.value = value
def showme(self):
“””Print this object to stdout.”””
print “value = %s” % self.value
Using a class (1)
t = Thingy(10) # calls __init__ method
t.showme() # prints “value = 10”
t is an instance of class Thingy
showme is a method of class Thingy
__init__ is the constructor method of class Thingy
when a Thingy is created, the __init__ method is called
Methods starting and ending with __ are “special” methods
Using a class (2)
print t.value # prints “10”
value is a field of class Thingy
t.value = 20 # change the field value
print t.value # prints “20”

“Special” methods
All start and end with __ (two underscores)
Most are used to emulate functionality of built-in
types in user-defined classes
e.g. operator overloading
__add__, __sub__, __mult__, …
see python docs for more information
Control flow (1)
if, if/else, if/elif/else
a == 0:
print “zero!”
elif a < 0:
print “negative!”
print “positive!”
blocks delimited by indentation!
colon (:) used at end of lines containing control flow keywords
Control flow (3)
while loops
a = 10
while a > 0:
print a
a -= 1

Control flow (4)
for loops
for a in range(10):
print a
really a “foreach” loop
Control flow (5)
Common for loop idiom:
a = [3, 1, 4, 1, 5, 9]
for i in range(len(a)):
print a[i]

Control flow (6)
Common while loop idiom:
f = open(filename, “r”)
while True:
line = f.readline()
if not line:
# do something with line
Control flow (7): odds & ends
continue statement like in C
pass keyword:
if a == 0:
pass # do nothing
# whatever

Defining functions
def foo(x):
y = 10 * x + 2
return y
All variables are local unless
specified as
Arguments passed by value
Executing functions
def foo(x):
y = 10 * x + 2
return y
foo(10) # 102
Pandas and NumPy
Granddad of all other important data science libraries
Fundamental library for scientific computing in Python
Libraries like Pandas, Matplotlib, SciKit Learn, TensorFlow, Pytorch are built on
top of it.
© 2021 ULaw and ULBS 33
It is a multidimensional array library
Manipulating vectors (represented as arrays in Python) and matrixes is much
easier with NumPy.
© 2021 ULaw and ULBS 34
© 2021 ULaw and ULBS 35
One of the popular libraries that is built on top of NumPy
Some people consider it the most important tool of the data analysts
Data pre-processing, Data cleanup, Exploratory data analysis and feature
© 2021 ULaw and ULBS 36
Food for thought…
Thank you