Skip to content

COMP5318 DATA MINING & MACHINE LEARNING

Introduction

The respective assignment can be able to provide the knowledge regarding the models that can be used in order to conduct the voting classification model for predicting the diabetes for the patient. The dataset that has been chosen for the voting assembly has been analysed and discussed in the respective report that can be effective for understanding the different attributes of the dataset. This report can also be able to provide a better idea regarding the voting classification model, along with the command that can be used for conducting the respective model in python language. The result of the algorithms has been provided in the report, which has proved helpful for understanding the effectiveness of the assembly language. The respective project has also provided a proper idea regarding the LSTM method for visualising the business statistics of Apple, by analysing one year’s customer data.

Figure 1: Importing variables

Above command has been used in order to import the pandas model that can be helpful for reading and importing the dataset for conducting the voting assembly by using the python programming language. “Pandas”, “numpy” and “seaborn” variables have been imported in this code that has proved helpful for understanding and reading the respected dataset that has been chosen in order to conduct the voting assembly in this project. The variables can be helpful for understanding the variable types that has present in the dataset that includes the diabetes information, along with the patient details.

Figure 2: Importing dataset

The above data set has been chosen in order to achieve a better idea regarding the information of the diabatic patient, which includes the personal data of the patient by using different columns. The column can be used in order to provide information of the pregnancy, blood pressure, skin thickness, insulin level, BMI, age and other information related to diabetes. The diabetes pedigree function can be provided in the dataset that can be helpful for understanding and predicting the outcome by using the python language in jupyter notebook platform. Different algorithms can be used in order to achieve a better study for the respective dataset that has been chosen in order to conduct the corresponding project. The whole dataset has been developed by using only numerical values that has proved helpful for achieving a smooth and easy process for implementing the voting assembly. Different parameters of the patient has been provided in the respective dataset that has proved effective for providing a better train data for the algorithms.

Figure 3: Train test split

The train test split for the dataset has been provided in the assignment university online above figure that can be helpful for understanding the train data and test data for the machine using the “sklearn” model for the variable train test split. The whole dataset has been splitted into two parts in this section. One can be helpful for training the algorithms to obtain the knowledge regarding diabetes by analysing the dataset; the other one can be helpful for testing the algorithm. Above figure illustrates that the algorithm has used 80% of data as the train data, however 20% of data has been used for testing the accuracy of the algorithm that has been used in the project for understanding the chance of diabetes by analysing the patient’s health information. Two different data frames have been developed in order to conduct the respective algorithms that include the x and y. The data frame has been created in order to maintain a better and smooth easy understanding regarding the output and variable that has proved helpful for understanding and conducting the process for predicting the diabetes chance by analysing the dataset. “Sk learn model selection” has been used in order to import the train test split variable for the dataset, which can be able to divide the dataset as per the requirement of the developer. The command random state can be helpful for choosing the train and test data from the diabetes dataset randomly.

Figure 4: Importing KNN classifier model

The above figure has shown the process to implement the KNN algorithm for understanding and implementing the model for the development of confusion matrix that can be effective for finding the prediction of the diabetes chance by analysing the patient’s health information (Mahabub, 2020). “K neighbours” classification can be helpful for the prediction of the chance of diabetes that has been provided by using simple and easy processes that can be helpful in order to understand the accuracy score. The confusion matrix can be helpful for understanding weighted average and macro average that has proved helpful for understanding the accuracy of the KNN algorithm. The recall and precision has been checked in order to obtain a better accuracy for the dataset and used model, which can be able to obtain the accuracy score that has been provided by using the heat map and confusion matrix. The KNN algorithm has achieved 75% of accuracy that needs to increase by using the voting classifier, along with the process for understanding the chance of diabetes by considering the old data of the patient’s health that has been provided in the respective dataset. “Ypred” dataframe has been developed by using the python language in the jupyter notebook platform that can be able to provide a better result and accuracy for the algorithm that can be helpful for predicting the diabetes from the patient dataset. The respective data frame has proved helpful for storing the prediction value and accuracy of the respective algorithm, which can be helpful for conducting the voting assemble.

Figure 5: Importing Logistic regression model

Logistic regression model has been provided by the developer that can be helpful in order to achieve a better and successful process for analysing the “diabetes” dataset, which has been chosen in order to analyse the respective dataset. In order to maintain a better accuracy for the respective algorithm the above code has been used in the jupyter notebook platform. The respective platform has been chosen for the simple and effective way to develop data visualisation and prediction for the resective dataset. It can be illustrated that the corresponding algorithm has been used for achieving better accuracy and efficiency of the model, which can be increased by  using the voting classifier. The coding that has been used for developing and importing the logistic regression model has been shown in the above figure, along with the result and accuracy that has been achieved by using the corresponding model for the diabetics dataset. The respective model has achieved the accuracy of 75%, which includes the use of precision and recall for the corresponding research. The voting assemble method has been used for different method that can be helpful in order to understand the effectiveness of each model that has been implemented in the jupyter notebook software by using the python language and different library.

Figure 6: Importing SVM model

Support vector machine model can be helpful in order to develop the heatmap and confusion matrix for the dataset that has been used for conducting the voting classifier model that has proved helpful for increasing the accuracy of the respective model by using soft and hard voting for different algorithms. The code that has been developed in order to import the SVM algorithm has been illustrated in the above figure that can be able to understand and analyse the dataset, which has been collected in order to predict the diabetes with the help of patient health data. The confusion matrix has been developed for the data frame “y test” and “y pred”, which can be able to provide the visual presentation for the prediction and testing value of the y data frame. The y dataframe includes the column of “output” that can be helpful for understanding the chance of diabetes for the patient that has been analysed from the patient dataset. The patient dataset has proved helpful for gathering the idea regarding patient skin thickness, blood pressure, age and other important variables that can be effective for diabetes. The SVM model has analysed the dependency of the outcome for other attributes that have been provided in the dataset. The respective model has gained the accuracy of 79% that has proved helpful for understanding the effectiveness of the corresponding model for the dataset.

Figure 7: Importing different algorithms

The voting classifier has been implemented in the corresponding assignment that can be helpful for implementing different models in the voting classifier. The voting classifier can be able to improve the effectiveness and efficiency of the models that have been used for predicting the diabetes chance for the patient by using the process for analysing the different attributes that have been provided regarding the patient’s health. The respective figure has proved helpful for delivering the code that has been used in order to maintain and call the functions that have been used for the algorithms. Each classifier method has been defined as a different data frame that can be able to provide the knowledge regarding the confusion matrix and heatmap. The algorithms have been im[ported in their respective project from the sklearn library, along with the process for defining the respective algorithm by using different variables. The respective model can be used in order to achieve a better understanding and process to improve the accuracy of the respective model. The above figure can be effective for understanding the different model that has been implemented in the respective work, which has been developed by using the python language in Jupyter notebook software.

Figure 8: Importing voting classifier model

The voting classification model has been implemented by using above commands in the Jupyter notebook platform, along with the process for giving the machine regarding the model that includes the estimators (Renet al. 2020). The type of voting has also been implemented in the respective program, which can be able to understand the accuracy of different models along with the practice to achieve better and effective results for the variables and algorithms that have been used in the respective project (Park et al. 2020). Model fit command has been used in order to provide the proper size of the confusion matrix that has been developed for providing a better and effective way for understanding the chance of diabetes by analysing the chosen dataset. The dataset has proved helpful for providing the information regarding the patient health information that has been collected earlier, which can be able to deliver the practice for predicting the diabetes using the train and test dataframe in the Jupyter notebook software.

Figure 9: Checking accuracy for voting classifier

It can be illustrated from the above figure that the voting classifier has proved helpful for improving the accuracy of the algorithm, along with the process for showing the result using the confusion matrix and heat map. The visual presentation has proved helpful for understanding the respective result that has been achieved by using the different algorithms, which can be able to provide a simple and smooth process also. The voting classifier has proved helpful for achieving 81% of accuracy for the models by increasing the effectiveness of the corresponding models. Therefore the respective model has improved the accuracy of the respective models that can be helpful for predicting the chance of diabetes for the patient by analysing the important information regarding the patient’s health information.

LSTM

Long short time memory can be used in order to achieve a better visualisation of the sales and revenue data of the apple company that can be effective for understanding the current and future business of the respective company. In order to achieve a better way for visualising the data python language and R script variable can be used that has proved effective for maintaining the accuracy of the models (Yousafet al. 2020). The respective process can be conducted using Google collab online software platform by using the python language. Long Short Term Memory is a kind of genuinely extensive cerebrum affiliation. In RNN yield from the last step is guided as commitment to the reliable step. It dealt with the issue of fundamental length areas of RNN wherein the RNN can’t expect the word set aside in the long memory regardless can give extra exact notions from the new information. As the basic length makes RNN don’t give a fit show. LSTM can precisely hold the information for a fundamental time frame length. It is used for making due, expecting, and depicting thinking about time-series data (Kumari, Kumar and Mittal, 2021).

Long Short time memory (LSTM) is a phoney cerebrum network used in the fields of man-made understanding and central learning. Not at all like standard feed forward mind affiliations, LSTM has input affiliations. Such a fitful cerebrum affiliation can facilitate single pieces of data, yet what’s all of the more entire upgrades of data. For example, LSTM is material to attempts, for instance, unregimented, related handwriting affirmation, talk approval, machine getting it, robot control, PC games and clinical ideas. LSTM has changed into the most referred to cerebrum relationship of the 20th 100 years.

A conventional LSTM unit is made utilising a cell, a data entrance, an outcome entrance and a rashness entrance. The telephone outlines regards commonly through the extent of impulsive time created and the three entryways deal with the improvement of information into and out of the phone (El-Kenawyet al. 2020).

LSTM networks are certified to get-together, coordinating and making assumptions contemplating time series data, since there can be slacks of frail term between beast events in a period series. LSTMs were made to deal with the scattering point issue that can be fit while organising standard RNNs. Relative lack of care toward opening length is an advantage of LSTM over RNNs, hidden away Markov models and other improvement learning techniques in different applications.

Conclusion

It can be concluded from the above chapter that the data visualising and prediction can be done by using the python language in jupiter platform, which can be helpful in order to achieve a better and successful accuracy for the model. Different models can be used in order to achieve a better prediction for the dataset that has been chosen for detecting and predicting the diabetis chance for the patient. The prediction can be achieved by using the attribute in the dataset that can be helpful for providing the knowledge regarding the health of the patient. The data visualisation can be conducted by using the python language for the data of apple company, along with the process for finding the activity of the and predicting the future business revenue. The study can be helpful for understanding the long short time memory of the machine learning that can be effective for understanding the future scope of the corresponding company by implementing the machine learning algorithm that can be used for analysing the business statistics of the company. The voting classification has been conducted in this report that can be helpful for the improvement that can increase the accuracy of the used model and algorithms. The voting classification code also been provided in this report that can be effective for providing a brief idea regarding the respective process in machine learning.

Reference

El-Kenawy, E.S.M., Ibrahim, A., Mirjalili, S., Eid, M.M. and Hussein, S.E., 2020.Novel feature selection and voting classifier algorithms for COVID-19 classification in CT images. IEEE access8, pp.179317-179335.

Kumari, S., Kumar, D. and Mittal, M., 2021.An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier. International Journal of Cognitive Computing in Engineering2, pp.40-46.

Yousaf, A., Umer, M., Sadiq, S., Ullah, S., Mirjalili, S., Rupapara, V. and Nappi, M., 2020.Emotion recognition by textual tweets classification using voting classifier (LR-SGD). IEEE Access9, pp.6286-6295.

Mahabub, A., 2020. A robust technique of fake news detection using Ensemble Voting Classifier and comparison with other classifiers. SN Applied Sciences2(4), pp.1-9.

Park, K., Choi, Y., Choi, W.J., Ryu, H.Y. and Kim, H., 2020. LSTM-based battery remaining useful life prediction with multi-channel charging profiles. Ieee Access8, pp.20786-20798.

Ren, L., Dong, J., Wang, X., Meng, Z., Zhao, L. and Deen, M.J., 2020.A data-driven auto-cnn-lstm prediction model for lithium-ion battery remaining useful life. IEEE Transactions on Industrial Informatics17(5), pp.3478-3487.

Leave a Reply

Your email address will not be published. Required fields are marked *