Skip to content















Table of Contents

Introduction. 3

Task 1. 3

Task 2. 7

Task 3. 10

Task 4. 11

Task 5. 12

Task 6. 15

Task 7. 17

Task 8. 23

Conclusion. 25

References. 26



List of fuigures

Figure 1: Graph of count vs. longevity. 5

Figure 2: Graph of close relation vs. longevity. 6

Figure 3: Graph of social integration vs. longevity. 7

Figure 4: Graph of IQ ranges vs. longevity. 8

Figure 5: Linear model 9

Figure 6: Summary Linear model 9

Figure 7: Standardized residuals. 10

Figure 8: Visualization in linear regression. 11

Figure 9: Two-sample t-test 12

Figure 10: Summary of the quadratic model 13

Figure 11: Graph of residual vs. leverage. 14

Figure 12: Summary of cubic model 15

Figure 13: Visualization of cubic model 16

Figure 14: Multivariate regression. 17

Figure 15: Visualization of Multivariate repressor 18

Figure 16: Prediction of close_relationship. 19

Figure 17: Graph of close relation with longevity. 20

Figure 18: Prediction of social_integration. 21

Figure 19: Graph of social integration with longevity. 22

Figure 21: Graph of IQ with longevity. 24

Figure 22: Multivariate_lr with close relation and social integration. 25

Figure 23: Visualization of multivariate regression with two variables. 26





The main purpose of this project is to analyze the dataset that is related to the dolphin’s health as well as longevity. The dataset has been implemented and all the visualization has been performed by the user of R language and for that R studio has been used as the platform in an effective way. The aim of this project is to identify the health risk and the detection of the health risks of dolphins should be properly done in the way to understand their longevity by analyzing it with GPS tracking. Here there are various tasks in this for getting the proper result for getting the best outcome from the dataset (Alfons, 2021). Here various types of statistical approaches are properly visualized for understanding the relation of the other variables with longevity.

Task 1

This task is performed by creating a learner model and for that, the dataset has been imported in r studio and after cleaning the dataset or after performing the preprocessing stage the data set has been prepared for building various models. After that, the visualization of the dataset has been performed in an accurate way (Jensen and Panduro, 2018). Through this, the graphical representation is portrayed in understanding the processes and the relationship has also clearly portrayed by this.

Figure 1: Graph of count vs. longevity

(Source: Self-developed)

Here in the above graph, the longevity count has been measured for making the visualization and the distribution plot is used here for graphical representation.



Figure 2: Graph of close relation vs. longevity

(Source: Self-developed)

This graph has been plotted here in understanding the close relations of the data with the terms of longevity through R studio.


Figure 3: Graph of social integration vs. longevity

(Source: Self-developed)

Social integration has been effectively measured for understanding the social norms with the longevity of dolphins and if there are some new actions available that that are also understandable by this graphical representation.


Figure 4: Graph of IQ ranges vs. longevity

(Source: Self-developed)

The box plot has been plotted here for understanding IQ ranges with longevity and through this longevity has been measured in terms of understanding the distribution of data.

Task 2

The train test split has been made and then a linear model is built for understanding the relational approach of longevity with all variables from the dataset. Here the prediction of the dataset has been calculated in making the integration.


Figure 5: Linear model

(Source: Self-developed)

The building of the learner model has been effectively done for getting the highest accuracy score from the dataset so that the correlation of the dataset has been calculated (Gorzelany et al. 2022). The correlation between social integration and longevity is 0.87 and with close relation, it is 0.92 and with IQ it is -0.043.

Figure 6: Summary Linear model

(Source: Self-developed)

The summary of the linear model has been effectively portrayed in a way to create the understanding of min as well the median values which are -13.33 and -0.45 and the max value is 15.28.

Figure 7: Standardized residuals

(Source: Self-developed)

The standardized residuals are created by the implementation of scale location as well as normal Q-Q  and the residual factors are effectively plotted here by the process of making the linear model.


Figure 8: Visualization in linear regression

(Source: Self-developed)

Through this visualization, the best prediction of the dataset has been created for better prediction by using the techniques of optimization effectively. This has also been considered the gradient descent.

Task 3

The confidentiality of these models has been measured and the calculator has been properly visualized so that the perfect types of authentication of the result has been clearly done by building the model through the use of effective techniques as well as the approaches in r studio. The hypothesis test has been conducted here in understanding the population of the parameter by using assumptions for calculating the function effectively (Medina et al. 2019).

Figure 9: Two-sample t-test

(Source: Self-developed)

T-testing has been conducted here in the terms of hypothesis testing so that the proper result has been built in the way to create population integration and the population distribution has also been calculated. Here X and Y have been effectively performed in making the sample estimation so that the mean of x is 0.16 and the man of y has been calculated that is resulted with 0.123 values by following the condition of difference of mean is zero. Through this testing, the main purpose of building the linear regression has been perfectly processed. This has been calculated for the determination of the treatment of the health issues so that that longevity has been increased effectively.

Task 4

The assumption of the result of best accuracy has been properly implemented and the prediction of the dataset has been performed by this model so that the best outcome has been can get which has portrayed all the best detection of the high risk so that the development of the techniques has been perfectly analyzed in making the best outcomes from the dataset (Hosseinzadeh et al. 2020). Through this, the prediction of the future can be possible which has provided the best solution to prevent the high risk so that longevity has been increased. All the statistical types of procedures can be understood effectively in the making of the best understanding of training procedures by building the linear model.



Task 5

The linear model has been chosen as the best model that has been created in making the understanding of the summaries (Gaillard et al. 2019). The building of quadratic, as well as the cubics model, is created here.

Figure 10: Summary of the quadratic model

(Source: Self-developed)

Here the summary of quadratic regression has created an effective model for understanding all the best probability that has been fitted in this dataset. The quadratic equation is here

Formula = longevity- close_relationships+close_relationships2


Figure 11: Graph of residual vs. leverage

(Source: Self-developed)

The graph for residual vs. leverage has been perfectly created in a way of making the understanding of the disadvantages so that the improvement must have done accordion to that.


Figure 12: Summary of cubic model

(Source: Self-developed)

Summary for the cubic model has been built for getting the improved result for linear regression with this dataset. This has been utilized for the estimation of the errors and the minimum, as well as the maximum values, are also calculated here.


Figure 13: Visualization of cubic model

(Source: Self-developed)

The visualization of the cubic regression has made the plotting and the graphical representation of close relationships with the longevity extent has also considered as the understanding of equation by the cubic form of data.

Task 6

Multivariate regression is one of the best as well as the most useful techniques that are measured in the way of making the analyzing so that the measuring of the degrees for the independent variable has been properly calculated. Here the measuring of those degrees of the dependent variable has also been conducted in a way of making the correlation of the data from the dolphin’s dataset.



Figure 14: Multivariate regression

(Source: Self-developed)

A multivariate regression has been used here in this dataset implementation so that the proper visualization has been covered in the estimation of the coefficients of the errors and the t values and the pr is calculated for the prediction of the longevity of dolphins (El Aissaoui et al. 2019). Here the minimal value is -15.483 and maximum value is 18.791 and the standardization error is 0.8889.


Figure 15: Visualization of Multivariate repressor

(Source: Self-developed)

Here the longevity extent with the multiple variables has been plotted with the graphics so that the best result has been portrayed with the multiple variables of the dataset.

Task 7

The prediction of the dataset has been performed for getting results after the implementation of historical data and the forecasting has been applied to the dataset to predict the future outcomes according to that result, all the proper development and the enhancement of approaches are done. Here the p-values of each variable have been calculated effectively (Chen et al. 2020). The prediction value of the close relationship is above 95% and that is 111% so this is the best prediction in the comparison of social_integratioon and the predicted value of IQ is lower than 95%.


Figure 16: Prediction of close_relationship

(Source: Self-developed)


Figure 17: Graph of close relation with longevity

(Source: Self-developed)


Figure 18: Prediction of social_integration

(Source: Self-developed)


Figure 19: Graph of social integration with longevity

(Source: Self-developed)


Figure 20: Prediction of IQ

(Source: Self-developed)



Figure 21: Graph of IQ with longevity

(Source: Self-developed)

Task 8

Here the building of the multivariate regression has been performed by the best prediction variable is the close relation and according to that, all the calculations have been measured.

Figure 22: Multivariate_lr with close relation and social integration

(Source: Self-developed)



Figure 23: Visualization of multivariate regression with two variables

(Source: Self-developed)


Here the building of the linear model and understanding of the correlation has been properly created. Multivariate linear regression has been effectively built in making all the proper predictions of the dataset so that the longevity has been measured successfully (Castilla et al. 2020). Through doing this the main objectives have been successfully performed for meeting the main goal of this research for getting the best longevity of dolphins.




Alfons, A., 2021. robustHD: An R package for robust regression with high-dimensional data. Journal of Open Source Software, 6(67), p.3786.

Castilla, E., Ghosh, A., Jaenada, M. and Pardo, L., 2020. On regularization methods based on R\’enyi’s pseudodistances for sparse high-dimensional linear regression models. arXiv preprint arXiv:2007.15929.

Chen, J., de Hoogh, K., Gulliver, J., Hoffmann, B., Hertel, O., Ketzel, M., Weinmayr, G., Bauwelinck, M., van Donkelaar, A., Hvidtfeldt, U.A. and Atkinson, R., 2020. Development of Europe-wide models for particle elemental composition using supervised linear regression and random forest. Environmental science & technology, 54(24), pp.15698-15709.

El Aissaoui, O., El Alami El Madani, Y., Oughdir, L., Dakkak, A. and El Allioui, Y., 2019, July. A multiple linear regression-based approach to predict student performance. In International Conference on Advanced Intelligent Systems for Sustainable Development (pp. 9-23). Springer, Cham.

Fernández-Delgado, M., Sirsat, M.S., Cernadas, E., Alawadi, S., Barro, S. and Febrero-Bande, M., 2019. An extensive experimental survey of regression methods. Neural Networks, 111, pp.11-34.

Gaillard, P., Gerchinovitz, S., Huard, M. and Stoltz, G., 2019, March. Uniform regret bounds over $\mathbb {R}^ d $ for the sequential linear regression problem with the square loss. In Algorithmic Learning Theory (pp. 404-432). PMLR.

Gorzelany, J., Belcar, J., Kuźniar, P., Niedbała, G. and Pentoś, K., 2022. Modelling of Mechanical Properties of Fresh and Stored Fruit of Large Cranberry Using Multiple Linear Regression and Machine Learning. Agriculture, 12(2), p.200.

Hosseinzadeh, A., Baziar, M., Alidadi, H., Zhou, J.L., Altaee, A., Najafpoor, A.A. and Jafarpour, S., 2020. Application of artificial neural network and multiple linear regression in modeling nutrient recovery in vermicompost under different conditions. Bioresource technology, 303, p.122926.

Jensen, C.U. and Panduro, T.E., 2018. PanJen: an R package for ranking transformations in a linear regression. R Journal, 10(1).

Medina, L., Kreutzmann, A.K., Rojas-Perilla, N. and Castro, P., 2019. The R Package trafo for Transforming Linear Regression Models. R J., 11(2), p.99.



Leave a Reply

Your email address will not be published. Required fields are marked *