In math I’ve been working on bivariate data. Bivariate data is the mathematical study of relationships between two variables, studying how they affect each other. This is done using graphs, displaying collected data on the two variables. For example, we could study the relationship between sea temperature and penguin population. The sea temperature would be shown on the X axis, because it is independent – unaffected by the other variable, but will affect the other variable if it is changed. The other variable is the penguin population, which will change based on the value of the other variable – thus it is the dependent variable and is shown on the Y axis. Other parts of this study involve identifying outliers – points of data that do not fit with the overall trend of the rest of the data, and making predictions. For example, if we find that as the sea temperature rises, the penguin population decreases, we can make a prediction that at a certan point not shown on the graph, the population of penguins will be of a certain value based off of previous results. Let’s say that the population of penguins is, on average, equal to 1 million minus the temperature of the sea times 31200. With this model, even if there is no data beyond 7 degrees, we can predict that if the temperature of the sea is 8 degrees, for example, there will be approximately 750400 penguins alive, because
1000000-(8*31200)=750400
This prediction may not be fully accurate, but it is on average. There are many other possible factors influencing the number of alive penguins which are not considered in these two variables. These are assumed to be unchanging for the purposes of the model, and are known as “control variables”. These variables, which in reality do indeed change, are usually considered in the conclusions made by the model.
A simplified conclusion for the example used could be: In conclusion, the number of penguins appears to have a downwards linear relationship with the temperature of the ocean. As the temperature of the ocean increases, the number of penguins decreases. The confidence interval (Amount of distance from the idealized trend line to the data) is small enough for there to be a distinct relationship between the sea temperature and penguin population.
I predict that, based on this model, if the temperature of the sea is 8 degrees, there will be approximately 750400 penguins alive, with a confidence interval of 20000 for the response variable. (The penguin population)
The confidence interval is by how much it can vary, the level of uncertainty in the result. This means that the result could be at a maximum of 20000 more penguins than the predicted variable, or a minimum of 20000 less, or anywhere in between. 750400 is the average of all possible outcomes.