WHAT ON EARTH ARE LOG FUNCTIONS

...and why are you making me think about them?

  • The next lab works with something called log probabilities
  • No log jokes please

THE PLAN

  • I will (try to) explain log functions (and why we're using them)...
  • at three levels of detail.

THE PLAN

The levels of detail are:

1) I hate and fear log functions, is this a problem?

2) I want to know why we're using log functions, with minimal mathematics.

3) Maths is just super, and there's a good chance I already understand log functions, but...

      I want to see how linguists do maths.

THE PLAN

IMPORTANT: Level 1 is enough to do everything on this course.

  • Everything else is to support your understanding...
  • ...if you think that helps.
  • I will...
    • leave these slides up
    • put a notebook of these slides on Noteable and the website
    • you can always come back later.

THE PLAN

  • I will go through each level of detail:
    • You can stop at any point and come back later
      • (or not come back at all)
    • Everything is going to be OK.

fishy

LEVEL ONE

Q: I hate and fear log functions, is this going to be a problem?

A: No

  • This is not a problem at all.
  • Many historical figures hated and feared logarithms and went on to do great things.

Q: So what do I need to do?

A: Just ignore the word 'log'

  • Whenever you see a log function, just pretend the word isn't there.
  • It's all just probability stuff, except with a trick to make it work on computers.

That's it! But would you like to know a little more?

LEVEL TWO

Q: Actually, can you tell me why we use these log things? With minimal mathematics please?

A: OK then

We multiply probabilities for multiple events.

Look at these awesome dice: fishy

fishy

  • The dice have 20 sides each, so...
  • The probability of rolling one of them...
  • and getting a 20 is:
    • $\frac{1}{20}$, or 0.05 as a decimal.
  • the probability of rolling both and getting two 20s is:
    • $\frac{1}{20}$ X $\frac{1}{20}$ = $\frac{1}{400}$

When you multiply probabilities they get really small

  • $\frac{1}{400}$ is way smaller than $\frac{1}{20}$
  • Written as a decimal, it's 0.0025

What if we roll 100 dice?

  • What's the probability of getting all 20s?
  • It's $\frac{1}{20}$ X $\frac{1}{20}$ X $\frac{1}{20}$ X $\frac{1}{20}$ etc etc and on and on...
  • until we've done it a hundred times.
  • A better way to write this is $\frac{1}{20^{100}}$
  • What is it written in decimal?
  • Let's get Python to do the maths:

The maths

What is $\frac{1}{20^{100}}$ written in decimal?

In [1]:
1/(20**100)
Out[1]:
7.888609052210118e-131

How small is that number?

  • Another way to write the number above is 7.9 x $10^{-131}$
  • Written out in full, that's 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000079
  • That's a really really small number.

What does 7.9 x $10^{-131}$ mean in terms of probability?

To give an extremely natural and relatable example:

  • it's roughly the same probability as...
  • two people
  • each pick out a single sub-atomic particle
  • at random from anywhere in the entire known universe
  • and choosing the same one...
  • by chance

What's the lesson here?

  • Once you start combining probabilistic events and observations
  • the chance of any particular event...
  • becomes very very small...
  • ...very very quickly.

That's really very interesting thank you so much, but why should I care?

  • Let's roll some more dice!
  • Exactly 249 of them in fact, for no particular reason.
  • What are the chances of rolling 249 dice and getting all 20s?
    • Written the fancy way: $\frac{1}{20^{249}}$
    • And written in decimal?
In [2]:
1/(20**249)
Out[2]:
0.0
In [3]:
1/(20**249)
Out[3]:
0.0

We broke Python

  • Python has some clever ways of representing very small numbers, but only up to a point.
  • After that point, it just gives up and breaks.
  • This is not ideal.

Log functions are a way for Python to deal with really really small numbers.

  • That's the most important lesson, really.
  • Once numbers get really small, Python starts to break in weird ways.
  • We use logarithms (i.e. log functions) to convert really small numbers into something Python can deal with.
  • When you see the log functions in the notebook, that's all they're doing.

When do we use log functions?

In general, the procedure works like this:

1) Turn all the probabilities into log-probabilities

2) Do all the math stuff using the log-probabilities (so Python doesn't break in weird ways)

3) When all the math stuff is finished, turn the log-probabilities back into normal probabilities.

That's it! Onwards...

Level 3

Give me the maths

Here's what you need to know:

  • Logarithms scale over orders of magnitude
    • $log_{10}$ of {1,10,100,1000,...} $\rightarrow$ {1,2,3,4,...}
  • Logarithms change operators from:
    • multiplication $\rightarrow$ addition
    • division $\rightarrow$ subtraction
  • Logarithms are the inverse of exponents
    • $10^{exp}$ of {1,2,3,4,...} $\rightarrow$ {1,10,100,1000,...}

Logarithmic scale

Here are the awesome dice again! fishy

We can plot the probability of getting all 20s for increasing numbers of dice:

dice_probs = []
for i in range(300):
    dice_probs.append(1/(20**i))
plt.plot(dice_probs)
In [27]:
dice_probs = []
for i in range(100):
    dice_probs.append(1/(20**i))
plt.plot(dice_probs)
plt.ylim(-0.05,1.05)
plt.ylabel('probability all 20s')
plt.xlabel('Number of dice rolled')
Out[27]:
Text(0.5, 0, 'Number of dice rolled')

That's incredibly boring and unhelpful: the probabilities get super small really quickly

What if we plot the logarithm of the probabilities?

In [47]:
dice_logprobs = []
for p in dice_probs:
    dice_logprobs.append(np.log10(p))
plt.plot(dice_logprobs)
plt.ylabel('log probability of all 20s')
plt.xlabel('Number of dice rolled')
Out[47]:
Text(0.5, 0, 'Number of dice rolled')

It's still a bit boring, but now it decreases linearly

Logarithms scale over orders of magnitude

  • For numbers bigger than 1, that means that:
    • 1, 10, 100, 1000, 10000, ... $10^{big number}$
  • Have logarithms (in base 10) of:
    • 0, 1, 2, 3, 4, ... $big number$
  • For numbers between 1 and 0, that means that:
    • $\frac{1}{1}$, $\frac{1}{10}$, $\frac{1}{100}$, $\frac{1}{1000}$, $\frac{1}{10000}$, ... $\frac{1}{1^{huge number}}$
  • Have logarithms (in base 10) of:
    • 0, -1, -2, -3, -4, -$huge number$

That's why logarithm-transformed numbers are easier for computers to represent.

Logarithms are the inverse of exponents

Remember:

  1. We take the logarithm of some probabilities
  2. We do a bunch of maths
  3. We convert the log-probabilities back into normal probabilities

How do we do the last bit?

We convert back by using the log-probability as the exponent

In [41]:
probs = [1, 0.1, 0.01, 0.001, 0.0001, 10**-42]
print('Initial probs:    ', probs)
log_probs = []
converted_probs = []
for p in probs:
    log_probs.append(np.log10(p))
print('Log probabilities:', log_probs)
for lp in log_probs:
    converted_probs.append(10**lp)
print('Converted back:   ', converted_probs)
Initial probs:     [1, 0.1, 0.01, 0.001, 0.0001, 1e-42]
Log probabilities: [0.0, -1.0, -2.0, -3.0, -4.0, -42.0]
Converted back:    [1.0, 0.1, 0.01, 0.001, 0.0001, 1e-42]

Logarithms change the Operators

Remember:

  1. We take the logarithm of some probabilities
  2. We do a bunch of maths
  3. We convert the log-probabilities back into normal probabilities

We still haven't talked about the bit where we do the maths!

Logarithms change the Operators

It's really simple:

  1. Multiplication $\rightarrow$ Addition
  2. Division $\rightarrow$ Subtraction

Because we changed the scale (to orders of magnitude),

  • the operators become:
    1. simpler
    2. easier on the computer
    3. prevent weird errors
In [45]:
prob_a = 1/(10**50)
prob_b = 1/(10**60)
prob_a * prob_b
Out[45]:
1e-110

Now converting into log-probabilities, adding, and converting back into normal probability:

In [46]:
logprob_a = np.log10(prob_a)
logprob_b = np.log10(prob_b)
log_product = logprob_a + logprob_b
10**log_product
Out[46]:
1e-110

We get the same result! But why should I care?

An example of why we should care

When we're doing stuff with Bayesian probability, most of the time:

  • We don't care about the actual probability that something will happen
  • The actual probability something will happen is extremely small

Instead, we usually want to compare probabilities, for example:

  • I need to choose between A and B
  • Individually, both A and B are highly unlikely
  • But is A is 100 times as likely as B, I should choose A

Let's see this in action:

In [63]:
prob_A = 10**-100 * 10**-150 * 10**-200
prob_B = 10**-100 * 10**-150 * 10**-202
  • We want to compare the likelihood of A and B.
  • What will happen if we divide A by B?
In [64]:
prob_A / prob_B
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-64-7084ccd17c93> in <module>
----> 1 prob_A / prob_B

ZeroDivisionError: float division by zero

Disaster!

Logs to the rescue

  • We can still divide A by B
  • Using logs of course!
In [65]:
logprob_A = np.log10(10**-100) + np.log10(10**-150) + np.log10(10**-200)
logprob_B = np.log10(10**-100) + np.log10(10**-150) + np.log10(10**-202)
logratio = logprob_A - logprob_B
ratio = 10**logratio
print('The probability ratio of A/B=', ratio)
The probability ratio of A/B= 100.0

And that's pretty much it!