logistische regressie werkt niet als er een 0 in de gegevens zit.

Koen Van de Moortel shared this problem 3 months ago
Not a Problem

Logistische regressie werkt niet als er een 0 in de gegevens zit. Dat komt omdat de log van de gegevens wordt genomen, en dat is de slechte methode om regressie te doen.

Comments (9)

photo
1

I don't see any log() here https://github.com/geogebra...


Do you have a suggestion / reference for a better algorithm?

photo
1

I don't see it either, but it may be hidden in some routine. I've seen the same problem in other places too. For normal exponential regression too, everybody seems to have the bad habit of reducing it to linear regression by first taking the logs of the data.

This causes the log0 error, which is not only annoying (a measured 0 is a measured 0), but also it causes the weights of the data points to be wrong.

Using pure iteration, it works correctly.

photo
1

Do you have a reference for a better algorithm?


photo
1

Yes, my own. It searches the minimum of the sum of the squares of the deviations purely with iteration.

photo
1

Is it secret or would you like to share it? :)

photo
1

I can't share the code, but the essence of the method is not complicated: just like the Newton-Raphson method brings you closer to a zero by calculating the zero of the tangent line, you can come closer to a minimum (of S = the sum of the squares of the deviations) by calculating the minimum of a tangent parabole. So you vary each parameter p plus and minus dp and calculate the parabole through (p-dp, S(p-dp)), (p, S(p)) and (p+dp, S(p+dp)) and you get S=ap²+bp+c. So -b/(2a) is the next new value for p. And so on. You have to add some code to check if the p values stay within the specified limits, of course. I invented the method 30 years ago but I guess other did it too?

photo
1

Sorry, that's not enough to go off. Maybe you can make a demo where clicking a button in GeoGebra does one cycle of the iteration?

photo
1

Sorry, that would be a lot of work. If GeoGebra would hire me.... (I'm struggling for life because the corona madness destroyed my business!)

photo
photo
1

For reference

FitLogistic( {(0, 232), (0.99, 40), (1.96, 5), (2.91, 1), (3.85, 0)} )

© 2020 International GeoGebra Institute