Andrew Lau
# this just gets the notebook to print all the output
from IPython.core.interactiveshell import InteractiveShell
InteractiveShell.ast_node_interactivity = "all"
Introduction
Python Crash Course
Before we begin, let's talk a bit about notebooks and IDEs.
Jupyter Notebooks
IDEs
Python:
R:
SAS
# tabs and new lines are used to delimit the condition and the action to be executed
# if the condition is true
for i in range(5):
print(i, "hello")
# function and if statement
def double_num(x):
if x > 5:
return 2 * x
return x / 2
# hashes are used to comment. press ctrl + '/' to (un)comment
print("double_num(2):", double_num(2)) # you can also inline comment
print("double_num(2):", double_num(6))
"""
Use three quotation marks to do block commenting.
"""
x = 5
if x > 2:
print("x is greater than 2")
if x == 5: # note that unlike SAS, the equality operator "==" is different to the assignment operator "="
print("x is 5")
# in Python we can loop over many things (we say it is 'iterable')
for i in range(10): # the range(x) 'generator' allows us to loop over 0 to 1 - x
print(i)
my_list = ['t', 'o', 'n', 'y']
for letter in my_list: # we can loop over lists
print(letter)
x = 0
while x < 10:
print(x)
x += 1
The basic data types in Python are:
# integers
2
3
4
# floats - you can think of these as decimals, (floats are scientific notation in binary to
# allow efficient storage of decimals and very big/small numbers)
2.234
3.2309
# strings - series of characters
"hello"
'bye' # you can use single or double quotations marks to delimit strings
# if you want single (double) quotation marks in your string, you can use the double (single)
# quotation marks to delimit
"single 'quotations'"
'double "quotations"'
# there are many useful string operations
[x for x in dir(str) if not x.startswith('_')] # this is a list comprehension, more on this later
bad = 'bad'
bad.capitalize()
bad.upper()
# booleans
True
False
# data types can be 'cast' into other data types. The boolean True has the value 1 and False 0:
True + 1
50 - False
# zero is considered True, and other numbers False
def is_true(number):
if number:
print(number, "is True")
else:
print(number, "is False")
for number in [-100, -0.5, 0, 0.5, 100, 'Jessie']:
is_true(number)
The standard data structures (things that can hold data) in Python are:
Perhaps the most versatile and commonly used data structure in Python
# lists can hold anything! from basic data types like integers, floats, strings...
[1, 2, 3, 'one', 'two', 'three', 1.0, 2.0, 3.0]
# to other data structures...
[['this is a str in a list'], {'this is a str in a set'}, {'apple':'red', 'strawberry':'red'}]
# to functions...
def add_2(x):
return x + 2
def add_3(x):
return x + 2
[add_2, add_3]
# and any other 'object' <- more on this later!
# accessing elements of a list
my_list = [1 ,2, 3, 4, 5]
my_list[0] # note that Python (and a lot of other languages) start their indexing at 0 rather than 1!
my_list[3]
# 'slicing' - accessing subsets of a list
my_list[0:2] # the index at the right of the colon operator is not included in the slice
my_list[2:4]
my_list[:2]
my_list[2:]
# generating lists
# one of the most common ways to build a list is to start with an empty one and build it up with a loop
x = [] # initialise empty list
for i in range(5):
x.append(i)
x
# list comprehensions are a more compact way to build a list
y = [i for i in range(5)]
# the dir() function gives you a list of any objects attributes/methods <- more on this later
dir(list)
# use a 'list comprehension' to find all the list methods
# [DO ACTION TO SOMETHING for THAT SOMETHING in ITERABLE if CONDITION]
['list.' + something for something in dir(list) if not something.startswith('_')]
[x for x in dir(str) if not x.startswith('_')]
my_string = 'pandas'
my_string
my_string = my_string.upper()
my_string
# useful list methods and the help function
help(list.extend)
help(list.append)
help(list.pop)
help(list.sort)
# like lists, tuples can hold any object
my_tup = ('p', 'a', 'n', 'd', 'a', 123, 2.3, dir())
# and they can be sliced
my_tup[:5]
# but they are 'immutable', meaning they cannot be altered once they have been defined
# useful if you need to store something that you DO NOT want changed after creation
As in maths, a set in Python is an unordered collection with no duplicate elements. Uses of sets:
[x for x in dir(set) if not x.startswith('_')]
stuff_powerlifters_like = {"bench", "squat"}
stuff_weightlifters_like = {"clean and jerk", "snatch", "squat"}
stuff_powerlifters_like
stuff_weightlifters_like
stuff_powerlifters_like.add("deadlift")
stuff_powerlifters_like.add("bench") # adding an existing element does nothing
# can perform mathematical set operations
stuff_powerlifters_like.intersection(stuff_weightlifters_like)
stuff_powerlifters_like.intersection(stuff_powerlifters_like)
'squat' in stuff_powerlifters_like
'foam rolling' in stuff_powerlifters_like
A mapping from a unique key to a value (you can think of it like a v-lookup, except the lookup value has to be unique)
ML_models = {'Linear Regression Models': 'Regressors', 'Decision Trees': 'Regressors or Classifiers', 'SVMs':'Regressors or Classifiers',
'Naive Bayes Models':'Classifiers', 'K Nearest Neighbors Models':'Regressors or Classifiers'}
ML_models
# can access dictionary values using the index notation
ML_models['Naive Bayes Models']
ML_models['K Nearest Neighbors Models']
for model in ML_models: # can iterate over dictionaries
print(model, "are", ML_models[model])
def func_name(arg_1, arg_2, ...):
# an example of a recursive function
def fibonacci(x=5): # you can set a default parameter value
"""
It is good practice to add documentation at the beginning of your function.
When someone calls help() on your function, they will see the documentation.
"""
if x <= 1:
return 1 # the function call ends when a return statement is reached, so an else statement is not needed
return fibonacci(x - 2) + fibonacci(x - 1) # if not return statement is specified, None will be returned
# list comprehension
[fibonacci(x) for x in range(10)]
# default parameter value
fibonacci()
help(fibonacci)
Something to be aware of in Python, is that everything is an object. The full details of object oriented programming are beyond the scope of this training, but there are some things to be aware of:
object.attribute
- attributes are data about that object, stored in the objectobject.method()
- methods are functions associated with an object and as such need a set of brackets to call them object.method()
. Like all functions, these methods may take arguments.This is all probably a bit confusing, so let's go through some examples.
# everything in Python is an object. For example, strings are objects
my_string = "python" # this instantiates an instance/object of the class string
my_string_2 = "pandas" # this instantiates another instance/object of the class string
# the below are string attributes and methods. all strings have these.
[x for x in dir(my_string) if not x.startswith("_")]
[x for x in dir(my_string_2) if not x.startswith("_")]
# both the above strings have inherited the same attributes/methods from the overarching string class.
# we can access string methods like this
my_string.upper() # returns the string with everything made into upper case
my_string_2.isdigit() # returns whether the string is a digit
# lists are objects as well
higher_level_languages = ['Python', 'Java Script'] # this creates an instance/object of the class string
lower_level_languages = ['C', 'x86 Assembly']
# all lists have the below attributes/methods, including the ones we just made
[x for x in dir(higher_level_languages) if not x.startswith("_")]
higher_level_languages.extend(lower_level_languages)
higher_level_languages
You can also create your own classes as well!
We will see next week that machine learning models are objects, and that a basic understanding of how objects work will be needed to use them.
library()
functionimport pandas as pd
import numpy as np
import sklearn
import your_module
import pandas as pd # import a module under an alias
from matplotlib import pyplot as plt # you can import specific things from a module with this syntax
# use a dot '.' to access things from that module, in this case the DataFrame class
my_df = pd.DataFrame({'x':[1, 2, 3, 4, 5], 'y':[2, 4, 6, 8, 10]})
# the DataFrame class has a plot method.
my_df.plot()
plt.show()
from math import sqrt # import a specific thing (sqrt function) from a module. you can now call sqrt().
import math # import the whole module, you will need to specify what you are calling with math.THING.
print("sqrt(4) is", sqrt(4))
print("math.pi is", math.pi)