WHAT ON EARTH ARE LOG FUNCTIONS¶

...and why are you making me think about them?¶

The next lab works with something called log probabilities

No log jokes please

THE PLAN¶

I will (try to) explain log functions (and why we're using them)...

at three levels of detail.

THE PLAN¶

The levels of detail are:

1) I hate and fear log functions, is this a problem?

2) I want to know why we're using log functions, with minimal mathematics.

3) Maths is just super, and there's a good chance I already understand log functions, but...

I want to see how linguists do maths.

THE PLAN¶

IMPORTANT: Level 1 is enough to do everything on this course.

Everything else is to support your understanding...
...if you think that helps.
I will...
- leave these slides up
- put a notebook of these slides on Noteable and the website
- you can always come back later.

THE PLAN¶

I will go through each level of detail:
- You can stop at any point and come back later
  - (or not come back at all)
- Everything is going to be OK.

fishy

LEVEL ONE¶

Q: I hate and fear log functions, is this going to be a problem?¶

A: No¶

This is not a problem at all.

Many historical figures hated and feared logarithms and went on to do great things.

Q: So what do I need to do?¶

A: Just ignore the word 'log'¶

Whenever you see a log function, just pretend the word isn't there.

It's all just probability stuff, except with a trick to make it work on computers.

That's it! But would you like to know a little more?

LEVEL TWO¶

Q: Actually, can you tell me why we use these log things? With minimal mathematics please?¶

A: OK then¶

We multiply probabilities for multiple events.

Look at these awesome dice: fishy

fishy

The dice have 20 sides each, so...

The probability of rolling one of them...
and getting a 20 is:
- $\frac{1}{20}$, or 0.05 as a decimal.

the probability of rolling both and getting two 20s is:
- $\frac{1}{20}$ X $\frac{1}{20}$ = $\frac{1}{400}$

When you multiply probabilities they get really small

$\frac{1}{400}$ is way smaller than $\frac{1}{20}$

Written as a decimal, it's 0.0025

What if we roll 100 dice?

What's the probability of getting all 20s?

It's $\frac{1}{20}$ X $\frac{1}{20}$ X $\frac{1}{20}$ X $\frac{1}{20}$ etc etc and on and on...

until we've done it a hundred times.

A better way to write this is $\frac{1}{20^{100}}$

What is it written in decimal?

Let's get Python to do the maths:

The maths¶

What is $\frac{1}{20^{100}}$ written in decimal?

In [1]:

1/(20**100)

Out[1]:

7.888609052210118e-131

How small is that number?

Another way to write the number above is 7.9 x $10^{-131}$

Written out in full, that's 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000079

That's a really really small number.

What does 7.9 x $10^{-131}$ mean in terms of probability?

To give an extremely natural and relatable example:

it's roughly the same probability as...

two people

each pick out a single sub-atomic particle

at random from anywhere in the entire known universe

and choosing the same one...

by chance

What's the lesson here?

Once you start combining probabilistic events and observations

the chance of any particular event...

becomes very very small...

...very very quickly.

That's really very interesting thank you so much, but why should I care?

Let's roll some more dice!

Exactly 249 of them in fact, for no particular reason.

What are the chances of rolling 249 dice and getting all 20s?
- Written the fancy way: $\frac{1}{20^{249}}$
- And written in decimal?

In [2]:

1/(20**249)

Out[2]:

0.0

In [3]:

1/(20**249)

Out[3]:

0.0

We broke Python

Python has some clever ways of representing very small numbers, but only up to a point.
After that point, it just gives up and breaks.
This is not ideal.

Log functions are a way for Python to deal with really really small numbers.

That's the most important lesson, really.

Once numbers get really small, Python starts to break in weird ways.

We use logarithms (i.e. log functions) to convert really small numbers into something Python can deal with.

When you see the log functions in the notebook, that's all they're doing.

When do we use log functions?

In general, the procedure works like this:

1) Turn all the probabilities into log-probabilities

2) Do all the math stuff using the log-probabilities (so Python doesn't break in weird ways)

3) When all the math stuff is finished, turn the log-probabilities back into normal probabilities.

That's it! Onwards...

Level 3¶

Give me the maths¶

Here's what you need to know:

Logarithms scale over orders of magnitude
- $log_{10}$ of {1,10,100,1000,...} $\rightarrow$ {1,2,3,4,...}

Logarithms change operators from:
- multiplication $\rightarrow$ addition
- division $\rightarrow$ subtraction

Logarithms are the inverse of exponents
- $10^{exp}$ of {1,2,3,4,...} $\rightarrow$ {1,10,100,1000,...}

Logarithmic scale¶

Here are the awesome dice again! fishy

We can plot the probability of getting all 20s for increasing numbers of dice:

dice_probs = []
for i in range(300):
    dice_probs.append(1/(20**i))
plt.plot(dice_probs)

In [27]:

dice_probs = []
for i in range(100):
    dice_probs.append(1/(20**i))
plt.plot(dice_probs)
plt.ylim(-0.05,1.05)
plt.ylabel('probability all 20s')
plt.xlabel('Number of dice rolled')

Out[27]:

Text(0.5, 0, 'Number of dice rolled')

That's incredibly boring and unhelpful: the probabilities get super small really quickly

What if we plot the logarithm of the probabilities?

In [47]:

dice_logprobs = []
for p in dice_probs:
    dice_logprobs.append(np.log10(p))
plt.plot(dice_logprobs)
plt.ylabel('log probability of all 20s')
plt.xlabel('Number of dice rolled')

Out[47]:

Text(0.5, 0, 'Number of dice rolled')

It's still a bit boring, but now it decreases linearly

Logarithms scale over orders of magnitude

For numbers bigger than 1, that means that:
- 1, 10, 100, 1000, 10000, ... $10^{big number}$
Have logarithms (in base 10) of:
- 0, 1, 2, 3, 4, ... $big number$

For numbers between 1 and 0, that means that:
- $\frac{1}{1}$, $\frac{1}{10}$, $\frac{1}{100}$, $\frac{1}{1000}$, $\frac{1}{10000}$, ... $\frac{1}{1^{huge number}}$
Have logarithms (in base 10) of:
- 0, -1, -2, -3, -4, -$huge number$

That's why logarithm-transformed numbers are easier for computers to represent.

Logarithms are the inverse of exponents¶

Remember:

We take the logarithm of some probabilities
We do a bunch of maths
We convert the log-probabilities back into normal probabilities

How do we do the last bit?

We convert back by using the log-probability as the exponent

In [41]:

probs = [1, 0.1, 0.01, 0.001, 0.0001, 10**-42]
print('Initial probs:    ', probs)
log_probs = []
converted_probs = []
for p in probs:
    log_probs.append(np.log10(p))
print('Log probabilities:', log_probs)
for lp in log_probs:
    converted_probs.append(10**lp)
print('Converted back:   ', converted_probs)

Initial probs:     [1, 0.1, 0.01, 0.001, 0.0001, 1e-42]
Log probabilities: [0.0, -1.0, -2.0, -3.0, -4.0, -42.0]
Converted back:    [1.0, 0.1, 0.01, 0.001, 0.0001, 1e-42]

Logarithms change the Operators¶

Remember:

We take the logarithm of some probabilities
We do a bunch of maths
We convert the log-probabilities back into normal probabilities

We still haven't talked about the bit where we do the maths!

Logarithms change the Operators¶

It's really simple:

Multiplication $\rightarrow$ Addition
Division $\rightarrow$ Subtraction

Because we changed the scale (to orders of magnitude),

the operators become:
1. simpler
2. easier on the computer
3. prevent weird errors

In [45]:

prob_a = 1/(10**50)
prob_b = 1/(10**60)
prob_a * prob_b

Out[45]:

1e-110

Now converting into log-probabilities, adding, and converting back into normal probability:

In [46]:

logprob_a = np.log10(prob_a)
logprob_b = np.log10(prob_b)
log_product = logprob_a + logprob_b
10**log_product

Out[46]:

1e-110

We get the same result! But why should I care?

An example of why we should care¶

When we're doing stuff with Bayesian probability, most of the time:

We don't care about the actual probability that something will happen

The actual probability something will happen is extremely small

Instead, we usually want to compare probabilities, for example:

I need to choose between A and B

Individually, both A and B are highly unlikely

But is A is 100 times as likely as B, I should choose A

Let's see this in action:

In [63]:

prob_A = 10**-100 * 10**-150 * 10**-200
prob_B = 10**-100 * 10**-150 * 10**-202

We want to compare the likelihood of A and B.
What will happen if we divide A by B?

In [64]:

prob_A / prob_B

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-64-7084ccd17c93> in <module>
----> 1 prob_A / prob_B

ZeroDivisionError: float division by zero

Disaster!

Logs to the rescue¶

We can still divide A by B

Using logs of course!

In [65]:

logprob_A = np.log10(10**-100) + np.log10(10**-150) + np.log10(10**-200)
logprob_B = np.log10(10**-100) + np.log10(10**-150) + np.log10(10**-202)
logratio = logprob_A - logprob_B
ratio = 10**logratio
print('The probability ratio of A/B=', ratio)

The probability ratio of A/B= 100.0

And that's pretty much it!