Please use this identifier to cite or link to this item: http://hdl.handle.net/1822/38196

TitleUsing data mining for prediction of hospital length of stay: an application of the CRISP-DM Methodology
Author(s)Caetano, Nuno
Cortez, Paulo
Laureano, Raul
KeywordsMedical data mining
Hospitalization process
Length of stay
CRISP-DM
Regression
Random forest
Issue dateSep-2015
PublisherSpringer
JournalLecture Notes in Business Information Processing
CitationCaetano, N., Cortez, P., & Laureano, R. S. (2015). Using Data Mining for Prediction of Hospital Length of Stay: An Application of the CRISP-DM Methodology. In J. Cordeiro, S. Hammoudi, L. Maciaszek, O. Camp & J. Filipe (Eds.), Enterprise Information Systems (Vol. 227, pp. 149-166): Springer International Publishing.
Abstract(s)Hospitals are nowadays collecting vast amounts of data related with patient records. All this data hold valuable knowledge that can be used to improve hospital decision making. Data mining techniques aim precisely at the extraction of useful knowledge from raw data. This work describes an implementation of a medical data mining project approach based on the CRISP-DM methodology. Recent real-world data, from 2000 to 2013, were collected from a Portuguese hospital and related with inpatient hospitalization. The goal was to predict generic hospital Length Of Stay based on indicators that are commonly available at the hospitalization process (e.g., gender, age, episode type, medical specialty). At the data preparation stage, the data were cleaned and variables were selected and transformed, leading to 14 inputs. Next, at the modeling stage, a regression approach was adopted, where six learning methods were compared: Average Prediction, Multiple Regression, Decision Tree, Artificial Neural Network ensemble, Support Vector Machine and Random Forest. The best learning model was obtained by the Random Forest method, which presents a high quality coefficient of determination value (0.81). This model was then opened by using a sensitivity analysis procedure that revealed three influential input attributes: the hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such extracted knowledge confirmed that the obtained predictive model is credible and with potential value for supporting decisions of hospital managers.
TypeConference paper
URIhttp://hdl.handle.net/1822/38196
ISBN978-3-319-22347-6
978-3-319-22348-3
DOI10.1007/978-3-319-22348-3_9
ISSN1865-1348
Publisher versionThe original publication is available at : http://link.springer.com/chapter/10.1007%2F978-3-319-22348-3_9#
Peer-Reviewedyes
AccessOpen access
Appears in Collections:CAlg - Livros e capítulos de livros/Books and book chapters

Files in This Item:
File Description SizeFormat 
ext151.pdf370 kBAdobe PDFView/Open

Partilhe no FacebookPartilhe no TwitterPartilhe no DeliciousPartilhe no LinkedInPartilhe no DiggAdicionar ao Google BookmarksPartilhe no MySpacePartilhe no Orkut
Exporte no formato BibTex mendeley Exporte no formato Endnote Adicione ao seu ORCID