Data Driven Decision Making – Global Homework Experts

Session 3
Data Driven Decision Making – DDDM
Agenda for this/next week
Recap session two
Python
Virtual Environments
NumPy and Pandas
2
Module Assessment
Reminder: see Blackboard for assessment details
3
To start with…
Recap on session two
4
Learning Outcomes
5
On completion of this workshop you will be able to:
Install a Python programming environment
Write and execute Python scripts
Access and query data sets using Python scripts
Reflect on the applicability of these topics to different organisations
Setting up a Virtual Environment (venv)
At its core, the main purpose of Python virtual environments is to create an
isolated environment for Python projects.
This means that each project can have its own dependencies, regardless of what
dependencies every other project has.
In this session, we will demonstrate how to set up multiple virtual environments,
so that you can have more than one Data Driven Decision-Making App in
development all at once.
Synopsis
To get started, if you’re not using Python 3, you’ll want to install the virtualenv tool with pip:
$ pip install virtualenv
Create a new virtual environment inside the directory:
# Python 2:
$ virtualenv mynewenv
# Python 3
$ python3 -m venv mynewenv
Using Virtual Environments
The Python 3 venv approach has the benefit of forcing you to choose a specific version of the Python 3
interpreter that should be used to create the virtual environment. This avoids any confusion as to
which Python installation the new environment is based on.
From Python 3.3 to 3.4, the recommended way to create a virtual environment was to use the pyvenv
command-line tool that also comes included with your Python 3 installation by default. But on 3.6 and
above, python3 -m venv is the way to go.
Approaches to Virtual Environments
In this example, this command creates a directory called mynewenv, which contains a directory structure
similar to this:
Example
What exactly does it mean to “activate” an environment? Knowing what’s going on
under the hood can be pretty important for a developer, especially when you need
to understand execution environments, dependency resolution, and so on.
To explain how this works, let’s first check out the locations of the different python
executables. With the environment “deactivated,” run the following:
$ which python
/usr/bin/python
Now, activate it and run the command again:
$ source mynewenv/bin/activate
(env) $ which python
How does a Virtual Environment work?
Python Language Constructs
There are some core characteristics of Python, largely arising from the fact that it
is an ‘object-oriented’ language.
This means that any item has both a data element and a functional element (it
does something).
This is actually very powerful in a language.
In this session we will consider the characteristics of the language and the
implications that result from the fact that it is and object-oriented language.
Synopsis
File I/O
f = file(“foo”, “r”)
line = f.readline()
print line,
f.close()
# Can use sys.stdin as input;
# Can use sys.stdout as output.

Files: Input

order now
input = open(‘data’, ‘r’) Open the file for input
S = input.read() Read whole file into
one String
S = input.read(N) Reads N bytes
(N >= 1)
L = input.readlines() Returns a list of line
strings

Files: Output

output = open(‘data’, ‘w’) Open the file for
writing
output.write(S) Writes the string S to
file
output.writelines(L) Writes each of the
strings in list L to file
output.close() Manual close

open() and file()
These are identical:
f = open(filename, “r”)
f = file(filename, “r”)
The open() version is older
The file() version is the recommended way to
open a file now
uses object constructor syntax (next lecture)
OOP Terminology
class — a template for building objects
instance — an object created from the template (an
instance of the class)
method — a function that is part of the object and
acts on instances directly
constructor — special “method” that creates new
instances

Objects
Objects:
What is an object?
data structure, and
functions (methods) that operate on it
class thingy:
# Definition of the class here, next slide
t = thingy()
t.method()
print t.field
Built-in data structures (lists, dictionaries) are also objects
though internal representation is different
Defining a class
class Thingy:
“””This class stores an arbitrary object.”””
def
__init__(self, value):
“””Initialize a Thingy.”””
self.value = value
def showme(self):
“””Print this object to stdout.”””
print “value = %s” % self.value
constructor
method
Using a class (1)
t = Thingy(10) # calls __init__ method
t.showme() # prints “value = 10”
t is an instance of class Thingy
showme is a method of class Thingy
__init__ is the constructor method of class Thingy
when a Thingy is created, the __init__ method is called
Methods starting and ending with __ are “special” methods
Using a class (2)
print t.value # prints “10”
value is a field of class Thingy
t.value = 20 # change the field value
print t.value # prints “20”

“Special” methods
All start and end with __ (two underscores)
Most are used to emulate functionality of built-in
types in user-defined classes
e.g. operator overloading
__add__, __sub__, __mult__, …
see python docs for more information
Control flow (1)
if, if/else, if/elif/else
if
a == 0:
print “zero!”
elif a < 0:
print “negative!”
else:
print “positive!”
Notes:
blocks delimited by indentation!
colon (:) used at end of lines containing control flow keywords
Control flow (3)
while loops
a = 10
while a > 0:
print a
a -= 1

Control flow (4)
for loops
for a in range(10):
print a
really a “foreach” loop
Control flow (5)
Common for loop idiom:
a = [3, 1, 4, 1, 5, 9]
for i in range(len(a)):
print a[i]

Control flow (6)
Common while loop idiom:
f = open(filename, “r”)
while True:
line = f.readline()
if not line:
break
# do something with line
Control flow (7): odds & ends
continue statement like in C
pass keyword:
if a == 0:
pass # do nothing
else:
# whatever

Defining functions
def foo(x):
y = 10 * x + 2
return y
All variables are local unless
specified as
global
Arguments passed by value
Executing functions
def foo(x):
y = 10 * x + 2
return y
print
foo(10) # 102
Pandas and NumPy
NumPy
Granddad of all other important data science libraries
Fundamental library for scientific computing in Python
Libraries like Pandas, Matplotlib, SciKit Learn, TensorFlow, Pytorch are built on
top of it.
© 2021 ULaw and ULBS 33
NumPy
It is a multidimensional array library
Manipulating vectors (represented as arrays in Python) and matrixes is much
easier with NumPy.
© 2021 ULaw and ULBS 34
NumPy
© 2021 ULaw and ULBS 35
Pandas
One of the popular libraries that is built on top of NumPy
Some people consider it the most important tool of the data analysts
Data pre-processing, Data cleanup, Exploratory data analysis and feature
engineering
© 2021 ULaw and ULBS 36
QUESTIONS?
37
Food for thought…
38
Thank you
References
40