# Pavia Python seminar

If you want to know how to run Python scripts on Windows (in its command prompt), read the FAQ.

If you want to learn more about Python, you may try these: an Italian tutorial, a good open source book translated into Italian, official Python tutorial (in English). You will find many more tutorials on the Internet and you may even look at video tutorials on YouTube or do some interactive tutorials. You Google it!

You will find anwsers to many questions related to Python programming at stackoverflow.

You may use the online tool repl.it but I recommend you to install Python 3 on your computer. You will find many how-tos online (just Google it), you may use e.g. this. After the installation you can interact with Python in a few ways, more info here.

### Expressions

Write the following formulae as Python expressions. Print out the results.

$${2^{(2+3)}} \over {(8\times2)}$$

# "to the power of" is written as **
a = 2 ** (2+3) / (8*2)
print(a)
# 2 ** ((2+3) / (8*2)) would give a different result!


$$\sqrt x$$

You may use the fact that

$$\sqrt[{n}]{x}=x^{\frac {1}{n}}$$

# square root of x is x to the power of 1/2
x = 10
print(x**(1/2))


### Area of a general triangle

If a, b, c are sides of a general triangle, then

$$s = {{a+b+c} \over 2}$$

and

$$area = \sqrt{s \times (s-a) \times (s-b) \times (s-c)}$$

Define three variables a, b, c. Be aware of triangle inequality. Write these formulae as Python expressions.

a = 4
b = 5
c = 6
s = (a+b+c)/2
area = (s*(s-a)*(s-b)*(s-c))**(1/2)
print(area)


### Floating point precision

The precision of floats (real numbers) is not unlimited.

# the following evaluates to True
1.00000000000000000000001 == 1
# the following evaluates to False
0.1 + 0.1 + 0.1 == 0.3


### Numeral systems

I’ve briefly spoken about hexadecimal and binary number systems. Read more about it (in Italian) as you may encounter some of the systems in the future.

For the following you will need to use so called list comprehension. This is very useful expression in Python as it allows to create a list in a very flexible way. So first, the general syntax is this:

l = [expression for item in list if condition]

It reads as follows: for each item from list, evaluate expression and put it into the resulting list if condition is True.

This is very powerful since you can generate e.g. a list of even numbers easily:

evens = [x for x in [1, 2, 3, 4, 5, 6] if (x % 2) == 0]

You can also use more complex expression to be evaluated, e.g.

evens_squared = [x**2 for x in [1, 2, 3, 4, 5, 6] if (x**2 % 2) == 0]

which will generate a list of numbers which are even when squared.

For the following task you will also need enumerate function which takes a list and creates a new list where the items of the list are enumerated with numbers.

print(list(enumerate([5,6,7])))

Here we need to explicitly convert it into list with function list otherwise we would use Python’s abstract representation of the enumeration. It is for technical reason (to safe memory space), just take this as a fact. :) Notice that the enumeration starts at 0 as with list indexes.

### H-index

H-index is used for comparing academic performance of researchers and institutions using the number of citations of their papers. If L is the list that corresponds to the list of numbers of citations for each publication, we compute the h index as follows. First we order the values of L from the largest to the lowest value. Then, we look for the last position in L where the value is greater than or equal to the position (we call h this position).

Your task is for a given list of numbers representing number of citations to compute the h-index. Since we don’t know iteration over list items in a loop yet we can use list comprehensions.


citations = [5, 20, 1, 7, 8, 14]
citations.sort()
citations.reverse()
hindex = len([x for i, x in enumerate(citations) if x >= i+1])
print(hindex)


### Palindrome

Palindrome is a word (a string) which reads the same backward as forward. Define a variable x and then write an expression which will evaluates to True if and only if x is a palindrome.


x = "1234321"
l = len(x) // 2

"""
we use list comprehension with a condition which will test characters from the
left and from the right at the same time. It is enough to test up to the centre,
to the index l. We generate comprehension list in the way that its length is
equal to l only if the word reads as palindrome. The character in the middle is
omitted.
"""
indexes = [i for i in range(l) if x[i] == x[-i-1]]
# -i-1 because we compare x[0] to x[-1], x[1] to x[-2] etc.
is_palindrome = len(indexes) == l
print(is_palindrome)


### 3rd May

By calling set([1, 2, 3]) you can turn a list into a set. You can do a similar conversion between a list and a dictionary with function dict but since in a dictionary, we need to have pairs key-value, the list we will convert must be in the form of pairs (lists) where the first items will be used for keys and the second items for values. Try this: dict([[1, 2], [2, 3], [3, 4]])

Or you can try this: dict([["a", 1], ["b", True], ["c", "TEXT"]])

The result is a dictionary with the key-value pairs taken from the list of lists. Be aware that the sublists must always contain two items.

You can turn any list into a dictionary with numerical keys like this:

dict(enumerate(["a", "word", 1, 5.2, True, [3, 9]]))

### Lexicon

We saw how to turn a text into a frequency list. With dictionaries it is more natural. We will turn words into keys and their frequencies into values. Your task is to copy a text from Internet, store it in a variable text, remove punctuation, split into words (tokens) and pick one word from it and print out its frequency. You will again use the list comprehension which will generate the list of pairs (a word and its frequency) which you will turn into a dictionary with function dict. Then you can access the picked word with your_dictionary[your_word_of_choice]
The dictionary data type is very suitable for this task.

text = """Harari is interested in how Homo sapiens reached their current condition, and in
their future. His research focuses on macro-historical questions such as: What
is the relation between history and biology? What is the essential difference
between Homo sapiens and other animals? Is there justice in history? Does
history have a direction? Did people become happier as history unfolded?
Harari regards dissatisfaction as the "deep root" of human reality, and as
related to evolution.
In a 2017 article, Harari has argued that through continuing technological
progress and advances in the field of artificial intelligence, "by 2050 a new
class of people might emerge – the useless class. People who are not just
unemployed, but unemployable." He put forward the case that dealing with
this new social class economically, socially and politically will be a central
challenge for humanity in the coming decades.
Harari has commented on the plight of animals, particularly domesticated animals
since the agricultural revolution, and is a vegan. In a 2015 Guardian article
under the title "Industrial farming is one of the worst crimes in history" he
called "the fate of industrially farmed animals [...] one of the most pressing
ethical questions of our time."
Harari summed up his views on the world in a 2018 interview with Steve
Paulson of Nautilus thusly: "Things are better than ever before. Things are
still quite bad. Things can get much worse. This adds up to a somewhat
optimistic view because if you realize things are better than before, this means
we can make them even better."
Harari wrote that although the idea of free will and the liberal values it
helped consolidate "emboldened people who had to fight against the Inquisition,
the divine right of kings, the KGB and the KKK", it has become dangerous in a
world of a data economy, where, he argues, in reality there is no such thing,
and governments and corporations are coming to know the individual better than
they know themselves and "if governments and corporations succeed in hacking the
human animal, the easiest people to manipulate will be those who believe in free
will." Harari elaborates that "Humans certainly have a will – but it isn’t
free. You cannot decide what desires you have. Every choice depends on a lot
of biological, social and personal conditions that you cannot determine for
yourself. I can choose what to eat, whom to marry and whom to vote for, but
these choices are determined in part by my genes, my biochemistry, my gender, my
family background, my national culture, etc – and I didn’t choose which genes or
family to have."
"""

# I can compose replace methods like this:
text = text.replace('.', '').replace(',', '').replace(':', '').replace('"', '')
# split without an argument splits at whitespaces (newlines, tabulators, ...)
tokens = text.split()
# list comprehension will generate pairs [word, word_count]
# it doesn't matter there are duplicities, the dictionary stores unique keys
lexicon = dict([[x, tokens.count(x)] for x in tokens])
# print out the frequency of word "people"
print(lexicon["people"])


### Interlanguage homonyms

Here you have two lists of top 100 English and Italian words. Your task is to find the common words in the lists which should give you the homonyms or false friends or “international” words.

English: the and of to a in for is that on with it as are i this be by at from was you we have or an will not has their can they but all he more our his which one your new about also who there were if my been so other do its when what up out had time would these n’t some her people into how first work them she like than may no us only through over many two most just after any years use such said now well information year very where me make world get

Italian: di e il la che in a per un del è l’ con i della non le una si da al dei sono nel più come ha delle alla dell’ ma anche o gli se ed ad dal lo questo nella all’ essere su cui ci alle tra ai dalla degli mi tutti sul solo d’ hanno questa due anni loro stato parte prima sua sia sempre un’ tutto ho era uno suo dopo molto sulla può ogni c nei quando poi nell’ senza perché così ancora quello tempo fare fatto dall’ nelle vita quanto proprio altri chi lavoro dove

english = """
the and of to a in for is that on with it as are i this be by at from
was you we have or an will not has their can they but all he more our
his which one your new about also who there were if my been so other do
its when what up out had time would these n't some her people into how
first work them she like than may no us only through over many two most
just after any years use such said now well information year very where
me make world get"""

italian = """
di e il la che in a per un del è l' con i della non le una si da al dei
sono nel più come ha delle alla dell' ma anche o gli se ed ad dal lo
questo nella all' essere su cui ci alle tra ai dalla degli mi tutti sul
solo d' hanno questa due anni loro stato parte prima sua sia sempre un'
tutto ho era uno suo dopo molto sulla può ogni c nei quando poi nell'
senza perché così ancora quello tempo fare fatto dall' nelle vita quanto
proprio altri chi lavoro dove"""

# I can intersect sets so I need to convert the list of words
# into sets and only then I can use & operator
intersection = set(english.split()) & set(italian.split())
# I want to sort the intersection so I need to convert the set into a list
intersection_list = list(intersection)
# now I can sort it
intersection_list.sort()
print(intersection_list)


### Sort words in retrograde order

On the internet you will find many wordlists, here you will find 1,000 Italian words which we will use in this example. The task is to save the list and open it from a script and store the words into a list. Then to sort the list alphabetically but consider the words from backward. So “abc” will be after “cba” since the former ends with “c” and the latter with “a”. By sorting the list this way you can discover some common suffixes of words. It will put together words which rhyme with each other.

# we need the file in the same directory as the script
f = open("italian.txt")
italianWords = []
for line in f:
# each line ends with "\n", remove this with strip() method
word = line.strip()
# now we generate the retrograde string (we can't use reverse, it is only for lists
revword = ""
# we create the reversed word character by character
for character in word:
# by prefixing the current partial word with the current character
revword = character + revword
# if word was "abcd", revword now contains "dcba"
# we append pair ["dcba", "abcd"] as an item to the whole list
italianWords.append([revword, word])
# now we processed all words from the file
# if we sort the list, it will sort according to revwords
italianWords.sort()
for item in italianWords:
# item[0] is the reversed word, we will print the second item (original word)
print(item[1])


### Find similar words

Now we have some interesting data to analyse! We can e.g. find pairs of Italian word which are similar to each other. Have you heard about Levenshtein distance? Google it. But we will use much simpler metric for measuring “distance” between words. We will consider only words with the same length and at the same time at least with 4 characters. Then we will count how many characters at the same positions are the same and that will be our measure of “similarity”. We will check all 1,000 x 1,000 pairs (!) and sort them according to the similarity measure.

Your task is to: get the list from the file “italian.txt”, put it into a list. Go through all possible pairs of words (minimum length is 4, word in pairs must have the same length) and measure the similarity, save the score into a new list, sort it and print out 10 most similar words (since there will be many many items in the list).

f = open("italian.txt")

words = []

for line in f:
word = line.strip()
if len(word) > 3:
words.append(word)

# here we will store the pairs and their distances
allPairs = []

# for all words
for i in range(len(words)):
word1 = words[i]
# again for all words => all pairs
for j in range(len(words)):
word2 = words[j]
# skip pairs abc-abc and also
# consider only abc-def and not def-abc
if i < j:
# only when the lengths are the same
if len(word1) == len(word2):
# number of the same characters
shared = 0
for k in range(len(word1)):
# check characters at the same positions
if word1[k] == word2[k]:
shared += 1
# triplets where the first item is the number
allPairs.append([shared, word1, word2])

# sort by the similarity number (first item in triplets)
allPairs.sort()
# the highest number ~ the most similar
allPairs.reverse()

# print out only the top 10 items
for i in range(10):
item = allPairs[i]
print(item[1], item[2], item[0])

print("All possible pairs: ", (len(words)*(len(words)-1))//2)
print("All considered pairs: ", len(allPairs))

Last modification: 2020-06-05 10:24:11 +0200 CEST