# Feature request: 95% ellipse for bivariate data

themadmathematician shared this idea 8 years ago

It would make a very nice addition to the statistical functions in GG. Something like

StatisticalEllipse[ <List of Points>, <Confidence Level> ]

1

Have you tried making a custom tool?

1

Well... To be completely honest I can't wrap my head around the math enough to translate the proposed R-commands to GeoGebra commands. :flushed: I never did enough statistics to really cope with this. I did do enough to realize what a wonderful 2D extension this is to the ordinary concept of confidence intervals though and I'm one of those teachers who believe in letting students get a glimpse of possibilities ahead. It's rather like the RegularPolygon command. You don't have to know how it works, or even be able to do it yourself, to be able to use it to great effect.

1

Hi,

one of Geogebra best features concerning Statistics in multivariate analysis is shown in the att'd screenshot. The pink coloured area is the so-called IQR (interquartile range) and pretty visualizes (along with the values written in the bottom) the samples behaviour. Shortly, the column B has a tiny variance and min-max range is embedded closely.

Nevertheless, more GG features may be well accepted and appreciated. Please post a screenshot of your idea, referred to bi/multi-analysis and possibly using data of the att'd sheet (solved through a different software, i.e. R)

Best regards

Philippe

1

In the link I gave there is this suggestive image:

I completely agree that Boxplots are great but they really only operate on one variable at the time. If you have x-y-data it would be nice to be able to plot the y-diagram along the y-axis.

The ellipse is very nice though as it lies parallel to the regression line and thus shows the spread away from the line rather than the overall spread along the koordinate axes. If the fit is good and the slope approximately 1, then the ellipse will be narrow but the boxplots for x- and y-data will still show big spreads.

1

Well, here's data used in my boxplots example. Col. B, the second one, has a slight standard deviation, so we are expecting ... your ellipse be flattened!

Cheers

Philippe

2.298 4.470

2.330 4.469

2.370 4.471

2.409 4.472

2.449 4.471

2.496 4.472

2.520 4.472

2.559 4.472

2.599 4.472

2.646 4.470

2.678 4.473

2.717 4.474

2.765 4.474

2.789 4.475

2.836 4.476

2.868 4.478

2.899 4.477

2.939 4.477

2.970 4.476

3.002 4.480

3.042 4.479

3.073 4.479

3.113 4.479

3.144 4.477

3.184 4.478

3.208 4.479

3.239 4.483

3.271 4.483

3.310 4.483

3.350 4.483

3.382 4.482

3.421 4.482

3.445 4.486

3.477 4.486

3.516 4.488

3.556 4.489

3.587 4.486

3.619 4.484

3.651 4.487

3.690 4.491

3.714 4.490

3.761 4.490

3.785 4.491

3.824 4.493

3.856 4.490

3.896 4.488

3.935 4.491

3.967 4.490

4.006 4.493

4.030 4.492

4.062 4.494

4.093 4.490

4.141 4.493

4.172 4.494

4.204 4.493

4.244 4.492

4.267 4.497

4.315 4.494

4.354 4.494

4.386 4.494

4.425 4.494

4.465 4.497

4.497 4.497

4.536 4.497

4.584 4.496

4.607 4.499

4.655 4.499

4.678 4.498

4.718 4.498

4.750 4.499

4.789 4.501

4.829 4.499

4.868 4.500

4.916 4.500

4.939 4.502

4.987 4.498

5.018 4.500

5.050 4.499

5.090 4.502

5.137 4.503

5.169 4.503

5.208 4.505

5.256 4.502

5.287 4.504

5.327 4.504

5.374 4.505

5.406 4.504

5.453 4.507

5.493 4.505

5.525 4.506

5.572 4.504

5.604 4.506

5.651 4.506

5.698 4.508

5.722 4.510

5.762 4.509

5.809 4.510

5.841 4.509

5.888 4.511

5.928 4.509

Edit. The article cited above (initial post) seems a little bit criptic, as shown in next screenshot.

1

Well, I don't know R either... :cry:

I provide a visual example of the data I'm suggesting. (fake) ellipse is flat but boxplots show large standard deviations anyway because data is not along the coordinate axis. The suggested algorithm finds the deviations along the regression line and orthogonal to that which means more than the deviations along the axis.

1

Good! Besides, the to-be feature may be condensed as follows and seems ideal for Geogebra performances.

Best regards

Philippe

Files: 19.png
1

Here's another one of my old favourites that I'm pushing for inclusion. It would be great as a fit command: FitEllipseOfConfidence[ <List of Points>, <Confidence Level> ] :D

1

It's explained better here if you want to make a custom tool http://www.visiondummy.com/...

1

Maybe this code is doing what you want?

N=100

l1=Sequence[RandomNormal[1, 1], k, 1, N]

l2=Sequence[RandomNormal[1, 1], k, 1, N]

l=Zip[(A,A+B), A, l1, B, l2]

x0=MeanX[l]

y0=MeanY[l]

sxx=Sxx[l]/(N-1)

sxy=Sxy[l]/(N-1)

syy=Syy[l]/(N-1)

c2=InverseChiSquared[2, .95]

Conic[syy, sxx, syy x0^2 - 2sxy x0 y0 + sxx y0^2 - (sxx syy - sxy^2) c2, -2sxy, 2 (sxy y0 - syy x0), 2 (sxy x0 - sxx y0)]

Regards,

Håkan

PS: I do not think it is correct to call this ellipse a confidence ellipse though, but IANAS (I am not a statistician).

1

Thanks both for useful help. I offer up two tools created, one with a variable confidence level, the other with 95% confidence. Both work on lists of points.

It wasn't clear to me exactly what the covariance matrix was but with Håkans solution I didn't need to know that, fortunately. I'm most certainly not a statistician so I'll call them what I want :D

https://ggbm.at/1576503