|Year : 2016 | Volume
| Issue : 4 | Page : 124-130
Hemodialysis mining and patients intelligent clustering technologies
Mohammed El-Rashedy1, Ahmed Akl2
1 Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menofia University, Menofia, Egypt
2 Nephrology Department, Urology & Nephrology Center, Mansoura University, Mansoura, Egypt
|Date of Submission||29-Nov-2016|
|Date of Acceptance||22-Dec-2016|
|Date of Web Publication||20-Feb-2017|
Nephrology Department, Urology & Nephrology Center, Mansoura University, Mansoura, 35111
Source of Support: None, Conflict of Interest: None
Medical information systems collect vast amount of monitored clinical data. Interpreting the portions of the data that are relevant to the identification of a specific clinical problem can become a hard task. Data mining are largely used in a very wide range of applications. Data mining mainly depends on mathematical algorithms and analytical skills to drive the desired results from the huge database sets and/or collections. Clustering is one of the most important data mining techniques. Most of the earlier work on clustering has focused on numerical relationships between the values of the attributes, and ignored the inherent meaning of the values.
In this work, an enhancement is added to the k-means algorithm for clustering data.
Material & Methods:
Furthermore, modification of the difference values between the attributes was done. The proposed clustering technique has been used to improve the quality, efficiency of health services and decision making in hemodialysis centers. Long experimentations and heavy tests were done on a variety of clustered different attributes for hemodialysis patient information systems.
The results showed that, our enhancement on the k-means algorithm has realized a better maximum distance and separate values for each cluster lower than the traditional k-means algorithm.
The decision making for the session period and blood rate has been improved and made more accurate. This provides the robust and best dialysis adequacy for the specific patient case.
Keywords: clustering, data mining, enhanced k-means, hemodialysis adequacy, k-means
|How to cite this article:|
El-Rashedy M, Akl A. Hemodialysis mining and patients intelligent clustering technologies. J Egypt Soc Nephrol Transplant 2016;16:124-30
| Introduction|| |
The quantity and complexity of data acquired, time-stamped and stored in clinical database by automated medical devices are rapidly and continuously increasing. As a result, it becomes more and more important to provide clinicians with easy to use interactive tools to analyze huge amounts of such data . Data mining is concerned with finding models, patterns, and knowledge from the available huge data. Data mining includes, but not limited to, predictive data-mining algorithms, which result in models that can be used for prediction and classification, and descriptive data-mining algorithms for finding interesting patterns in the data, like associations, clusters, and subgroups. Decision support is concerned with helping decision makers to solve problems and take decisions . Data mining and decision support can be integrated to build better problem solving, data analysis, and decision support systems that help and assist in making clinical decision, evaluating the quality of provided care, and carrying out medical research .
In the evaluation of patients on long-term hemodialysis (HD), biochemical data determined at monthly intervals, as well as clinical parameters registered at each dialysis session, hide important information that could be very useful for the management of the patients and for the continuing education of the nephrologists themselves. Efforts to predict HD adequacy have already started . Kt/V is an index that describes the efficiency of the removal of protein catabolism products [urea and creatinine (CR)]. K is the in-vivo clearance of blood urea nitrogen (BUN) of the dialyzer being used (Blood flow rate (QB) ml/min), t is the session length (h), and V is the urea distribution volume . We focused on the evaluation of these parameters to predict the blood flow rate, session efficiency, and duration using data-mining clustering technique.
Clustering technique of data mining is a useful tool for grouping data points such that points within single cluster have similar characteristics or close to each other, whereas points in different groups are dissimilar . Consider the HD database, it can be used to cluster new patients such that patients with similar cases are grouped together. Many methods of clustering algorithms have been developed; the most prominent among them being the partitioned, hierarchical, and graph theoretical methods. Typical examples of the three methods are the well-known k-means, single linkage, and the minimal spanning tree based algorithms, respectively . To improve the performance of k-means algorithm, several improved k-means algorithms have been developed over the past several years. Stochastic k-means algorithm is developed to improve the clustering result of the k-means  (based on the K-d tree data structure ). An improved k-means algorithm can speed up the time performance while preserving the same clustering results as in the original k-means algorithm .
A global k-means algorithm is presented, which is an incremental approach to clustering, that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N executions (N is the size of the data set) of the k-means algorithm . An algorithm based on the k-means, namely, a split and merge circular k-means, is proposed for circular invariant clustering of vectors .
Our aim was to use data mining and decision support for HD session to predict the blood rate, HD duration, and best quality of HD from clustering the newly arrived patient data. All enhancement algorithms of k-means ignore the meaning of the closest, near, and far values of the database variables. We will illustrate the traditional k-means algorithm and compare with our novel enhancement to face this problem.
| Patients and methods|| |
Patients and hemodialysis sessions
Patients’ cohort included 30 patients (27 male and three female) on regular HD therapy for 12–120 months. The study was approved by the institute ethical committee & all patient signed an approval consent. No patient had systemic or metabolic diseases, and all patients were considered metabolically stable at the time of the study. All patients underwent dialysis thrice weekly through two needles inserted into an arteriovenous fistula. Blood flow was 400 ml/min for all patients, and dialysate flow was 500 ml/min. All patients had no residual renal function; dialyzers were polysulfone.
Sampling and laboratory analysis
A total of nine blood samples were collected during each HD session (predialysis sample, 30, 60, 90, 120, 150, 180, and 210 min from the start of the session and 240 min at the end of the session). An online bed-scale monitor was used to measure patient weight every 30 min during the session.
Data mining model
Traditional k-means algorithm and our enhancement
The k-means algorithm takes the input parameter k and partitions a set of n objects into k clusters so that the resulting intracluster similarity is high but the intercluster similarity is low. Cluster similarity is measured regarding the mean value of the objects in a cluster. The k-means algorithm proceeds as follows. First, it randomly selects k of objects, such that each of them initially represents a cluster mean or center. For each one of the remaining objects, an object is assigned to the cluster to which it is the most similar, based on the distance between the object and the cluster mean. It then computes the new mean for each cluster. This process continually iterates until the criterion function converges. Typically, the square-error criterion is used, which defined as follows:
where E is the sum of the square errors for all objects in the data set, p is the point in the space that is representing a given object, and mi is the mean of cluster Ci (both p and mi are multidimensional). In other words, for each object in each cluster, the distance from the object to its cluster center is squared, and these distances are summed. This criterion tries to make the resulting k clusters as compact and separate as possible . The k-means algorithm works well with multidimensional objects that have closed and nearest values. The distance difference between objects measure by Euclidean distance can be computed as follows:
where i=(xi1, xi2,…, xip) and j=(xj1, xj2,… xjp) are two p-dimensional data objects, which ignores the meaning of the values by adding the largest values with the smallest values for different dimensions. The summation of different scale values produces unproportional distance with some dimensions. In this it appears that on adding large-scale values with far smallest scale values, the values change for the same smallest scale dimension has no effect on the distance. These values are ineffective in the summation with other large-scale dimension values. This totally ignores the meaning of the values that are appearing in the result clusters. The clusters that have nearest values with large-scale dimensions and far values with small-scale dimensions, the intercluster similarity is low. Classic k-means data-mining model building steps and prediction concept are illustrated in [Figure 1]a.
|Figure 1: Model concept: (a) classic k-means clustering; (b) enhanced k-means clustering; and (c) prediction.|
Click here to view
We enhanced the k-means algorithm to avoid this problem. This enhancement concentrates on scaling multidimensional objects. Given D database of n objects, and k the number of clusters to form:
where i=1, 2, 3,… n and x, y, z, …w are dimensions of D database.
First, it calculates the maximum value for each dimension and divides each value by its own maximum value for this dimension and produces F database:
where i=1, 2, 3,… n and Mx, My, Mz, …, Mw are maximum values for each dimension x, y, z, …, w.
F database has the same scale from zero to one for each dimension. This equalized scaling produces saved meaning for the values in the Euclidean distance, and change in values for the same dimension in the summation of Euclidean distance produces appearance difference in the distance. Enhanced k-means data-mining model building steps and concept are illustrated in [Figure 1]b and [Figure 1]c.
Data mining and decision support for the hemodialysis session
HD session for patients depends on significant parameters determined or decisions taken by the physician. These decisions include HD duration, blood rate under dialysis session, filter type, and dialysis rate. All these decisions represent the necessary parameters to achieve high-quality dialysis session (Kt/V). These decision-making parameters depend on the patient case which specify his/her age, sex, weight, volume, height, CR, BUN, hematocrit (HCT), bicarbonate (HCO3), albumin (ALB), calcium (CA), and phosphorus (PO4). The decision made by the physician is very difficult because decision parameters have many probabilities. Session period, it includes many answers yielding unequal diverted values as 30 min, 1, 1.5, 2, 3, and 4 h. Blood rate (dialysis rate) was 400 ml/min. Dialyzer types; many types of HD filters are available.
To achieve high Kt/V rate, the physician must reconcile his choices in probabilities, but with many false probabilities choices. Accordingly, the patient health deteriorates. So, many HD centers have adopted fixed HD period to four hours to be sure of achieving acceptable Kt/V. But, patients are not alike; one patient may achieve optimal Kt/V by longer or shorter HD period or can achieve higher Kt/V by choosing different probability parameters.
We cluster the data of HD patients that may be a helpful decision tool for the nephrologist for choosing the best probability parameters to achieve higher and/or acceptable Kt/V for each patient case. Given k clusters, each cluster has mean value M and large distance in cluster L, which is the maximum distance between each object and the mean values in the cluster. When assigning a new object, the following:
- Compute the difference between the new object and the mean values (M) for any cluster.
- If the difference distance is lower than or equal to the maximum distance for this cluster, then it belongs to this cluster, and compute the nearest object to this new object using Euclidean distance and predict the remaining values (Kt/V, HD duration, and blood rate) as illustrated in [Figure 1].
- If not, repeat (1) with another cluster.
- If repeated with all clusters, it does not belong to these clusters.
k-means data-mining model clustering concept is illustrated in [Figure 2].
|Figure 2: Our enhanced clustering concept. M, mean value; C, cluster; L, large distance in cluster.|
Click here to view
Finding were recorded and analyzed using SPSS for Windows (SPSS Inc., Chicago, Illinois, USA). Quantitative data are described in terms of arithmetic mean±SD. Qualitative data were measured by χ2. Paired t-test was used for comparisons of the mean of two-related variables.
| Results|| |
[Table 1] lists demographic characteristics of the patients. The experimental results taken were used to compare the performance of k-means algorithm and our enhancement version. Long-term experiments were carried out on the database of 30 patients; each one has 18 sessions with different HD duration and blood rate and with fixed dialysis rate and filter type. These data were taken from the Urology and Nephrology Center, Mansoura, Egypt. Initially, these data were multidimensional: small-scale dimension values (sex, CR, ALB, CA, and PO4), medium-scale dimension values (age, weight, volume, HCT, and HCO3), high-scale dimension values (height and BUN), high separate scale dimension values (age, weight, volume, BUN, HCT, HCO3, and PO4) as a high distance between a maximum and a minimum value of their dimension, and high nearest scale dimension values (height, CR, ALB) as a small distance between a maximum and a minimum value of their dimension ([Figure 3]).
These are illustrated in [Figure 4]. The unification of the scale for multidimensional database made the k-means algorithm enhanced for clustering data of high-scale dimensions with small-scale dimensions and even more enhanced in high separate scale dimensions with small nearest scale dimensions. Clustering the HD database was done in two phases: first, clustering the database without BUN (high separate scale dimension) and second, clustering the database with BUN dimension to inter-relate the effect of high separate scale dimensions with small nearest scale dimensions in the clustering. The resulting clusters of our enhancement algorithm have a maximum distance and separate values (average number of distinct values in a clusters) lower than the resulting clusters of traditional k-means algorithm with and without BUN clustering. This is clear in [Figure 4]. Generally, our enhancement made the intercluster similarity high with or without higher different scaling in dimensions. Clusters without BUN, the square-error Eq (1) of age, weight, and volume dimensions of the resulting clusters of our enhancement is higher than the resulting clusters of the traditional k-means algorithm. Clusters with BUN, the square-error Eq (1) of these dimensions for the resulting clusters of our enhancement is lower than the resulting clusters of the traditional k-means algorithm. It is appearing that the effect is less in sex, CR, HCT, HCO3, ALB, CA, and PO4 dimensions. These dimensions have the nearest scale values. Age, weight, and volume dimensions have far separate scale values and in the same time lower than BUN scale dimension. The traditional k-means algorithm ignores the fare values in the smallest scale that are clustering with higher scale values which ignore these smallest values in the total summation. [Figure 5] shows the generating data-mining HD software and the application phase windows.
|Figure 4: Multidimensional clustering of hemodialysis data without and with blood urea nitrogen (BUN) clustering. (a) Multidimensional hemodialysis database; (b) maximum distance for each dimension in the clusters without BUN clustering; (c) separate values for each dimension in the clusters without BUN clustering; (d) the square error for each dimension in the clusters without BUN clustering; (e) maximum distance for each dimension in the clusters with BUN clustering; (f) separate values for each dimension in the clusters with BUN clustering; and (g) the square error for each dimension in the clusters with BUN clustering.|
Click here to view
|Figure 5: Our developed data-mining software interface. (a) Model generation and (b) model validation|
Click here to view
| Discussion|| |
Many scholars have applied data-mining techniques for disease prediction. These techniques include clustering, association rules, and time-series analysis. Different analyses may require different mining techniques. Selection of an appropriate mining technique is the key to obtaining valuable data .
HD adequacy has been estimated by two methods. First, direct dialysate quantification and second the urea kinetic modelling (UKM). UKM has many drawbacks and limitations in comparison with direct dialysate quantification, as it utilize a variety of formulas to drive protein catabolic rate and Kt/V from just two (or three) BUN determinations, ultrafiltration volume, assumed weight parameter, and an assigned dialyzer clearance ,. The main advantage of using a urea kinetic model instead of direct quantification is that the model can predict dialysis dose, which should be achieved by any prescription . Taking into account the present state of knowledge, dialysis treatment should be planned to achieve a minimal Kt/V of 1.2 and strive for a Kt/V of 1.4. The basis of the dialysis session should be dialysis dose rather than dialysis duration .
UKM only assess BUN and CR levels and CR clearance. However, increasing amounts of data indicate that some hidden rules and relationships may exist. Therefore, this work uses an entropy function to identify key features related to HD. By identifying these key features, nephrologist can determine which dose of HD a patient requires. This work uses these key features as dimensions in cluster analysis. When patients requiring HD are classified into the same group, and the other patients are classified into the other group, the key features can effectively determine whether a patient requires HD. The proposed data-mining scheme finds association rules of each cluster. Hidden rules for causing any kidney disease can therefore be identified.Although many clustering techniques have been proposed, and the k-means algorithm is the most representative and widely applied. The k-means algorithm is also called the generalized Lloyd algorithm . The k-means algorithm transforms each data record into a data point, and random numbers are utilized to generate the initial cluster center to determine which data point belongs to which cluster point. The divided data points are used to calculate the distance between a data point and the cluster center, such that a data point will belong to one cluster center when the data point is closer to one cluster center than another cluster center. The newly recomputed cluster center is the average among all data points in a cluster, and the new cluster center is taken as a basis for the next iteration. This process is repeated until no change occurs.
Our analysis was carried out on the database of 30 patients; each one has 18 sessions with different HD duration and blood rate and fixed dialysis rate and filter type. Our HD data were multidimensional, ranging from small-scale dimension values to high-scale dimension values, and high separate scale dimension values to those with high nearest scale dimension values. These are illustrated in [Figure 3]. The unification of the scale for multidimensional database made the k-means algorithm enhanced for clustering data of high-scale dimensions with small-scale dimensions and even more enhanced in high separate scale dimensions with small nearest scale dimensions.
Clustering the HD database was done in two phases: first, clustering the database without BUN (high separate scale dimension), and second, clustering the database with BUN dimension to inter-relate the effect of high separate with small nearest scale dimensions in the clustering. The resulting clusters of our enhancement algorithm have a maximum distance and average number of distinct values in a cluster, lower than the resulting clusters of traditional k-means algorithm with and without BUN clustering. Generally, our enhancement made the intercluster similarity high with or without higher different scaling in dimensions. The traditional k-means algorithm ignores the far values in the smallest scale that are clustering with higher scale values which ignore these smallest values in the total summation.
| Conclusion|| |
The traditional k-means algorithm organizes data into clusters. The clusters are of different scale multidimensional data, and the resulting intracluster similarity is high but the intercluster similarity is low. The equalized scaling of multidimensional data saved the meaning for the data values, and the resulting intercluster similarity is high, and maximum distance and separate values for each dimension are lower than the traditional k-means algorithm. Data mining and decision support have been integrated to analysis HD data to predict the best HD adequacy (Kt/V), HD duration, and blood rate.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| References|| |
Chittaro L. Information visualization and its application to medicine. Artif Intell Med 2001; 22:81–88.
Lavrac N, Bohanec M, Pur A, Cestnik B, Debeljak M, Kobler A. Data mining and visualization for decision support and modeling of public health-care resources. J Biomed Inform 2007; 40:438–447.
Chittaro L, Combi C, Trapasso G. Data mining on temporal data: a visual approach and its clinical application to hemodialysis. J Visual Lang Comput 2003; 14:591–620.
Akl AI, Sobh MA, Enab YM, Tattersall J. Artificial intelligence: a new approach for prescription and monitoring of hemodialysis therapy. Am J Kidney Dis 2001; 38:1277–1283.
Catarci T, Santucci G, Silva S. An interactive visual exploration of medical data for evaluating health centers. J Res Pract Inf Tech 2003; 35:99–119.
Guha S, Rastogi R, Shim K. A robust clustering algorithm for categorical attributes. J Inf Syst 2000; 2:345–366.
Bandyopadhyay S, Saha S. A clustering method using a new point symmetry-based distance measure. J Pattern Recognit 2007; 40:3430–3451.
Kovesi B, Boucher J, Saoodi S. Stochastic k-means algorithm for vector quantization. Pattern Recognit 2001; 22:603–610.
Anderberg M. Computational geometry: algorithms and applications. Berlin: Springer; 2000.
Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A. An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 2002; 24:881–892.
Likas A, Vlassis N, Verbeek J. The global k-means clustering algorithm. J Pattern Recognit 2003; 36:451–461.
Charalampidis D. A modified k-means algorithm for circular invariant clustering. IEEE Trans Pattern Anal Mach Intell 2005; 27:1856–1865.
Han J, Kamber M. Data mining: concepts and techniques. 2nd. University of Illinois at Urbana Champaign: Morgan Kaufmann; 2006.
Lai JZC, Huang TJ, Liaw YC. A fast k-means clustering algorithm using cluster center displacement. Pattern Recognit 2009; 42:2551–2556.
Aebischer P, Schorderet D, Juillerat A, Wauters JP, Felly G. Comparison of urea kinetics and direct dialysis quantification in hemodialysis patients. Trans Am Soc Artif Intern Organs 1985; 31:338–342.
Jindal KK, Goldstein MB. Urea kinetic modelling in chronic hemodialysis: Benefits, problems and practical solutions. Semin Dial 1988; 1:82–85.
Thayer JF, Von Eye A, Rovine MJ. Assessment of neural network models using prediction analysis. Biomed Sci Instrum 1995; 31:25–28.
Shohat J, Boner G. Adequacy of hemodialysis 1996. Nephron 1997; 76:1–6.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]