

ORIGINAL ARTICLE 

Year : 2016  Volume
: 16
 Issue : 4  Page : 124130 

Hemodialysis mining and patients intelligent clustering technologies
Mohammed ElRashedy^{1}, Ahmed Akl^{2}
^{1} Department of Computer Science and Engineering, Faculty of Electronic Engineering, Menofia University, Menofia, Egypt ^{2} Nephrology Department, Urology & Nephrology Center, Mansoura University, Mansoura, Egypt
Date of Submission  29Nov2016 
Date of Acceptance  22Dec2016 
Date of Web Publication  20Feb2017 
Correspondence Address: Ahmed Akl Nephrology Department, Urology & Nephrology Center, Mansoura University, Mansoura, 35111 Egypt
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/11109165.200355
Background: Medical information systems collect vast amount of monitored clinical data. Interpreting the portions of the data that are relevant to the identification of a specific clinical problem can become a hard task. Data mining are largely used in a very wide range of applications. Data mining mainly depends on mathematical algorithms and analytical skills to drive the desired results from the huge database sets and/or collections. Clustering is one of the most important data mining techniques. Most of the earlier work on clustering has focused on numerical relationships between the values of the attributes, and ignored the inherent meaning of the values. Aim: In this work, an enhancement is added to the kmeans algorithm for clustering data. Material & Methods: Furthermore, modification of the difference values between the attributes was done. The proposed clustering technique has been used to improve the quality, efficiency of health services and decision making in hemodialysis centers. Long experimentations and heavy tests were done on a variety of clustered different attributes for hemodialysis patient information systems. Results: The results showed that, our enhancement on the kmeans algorithm has realized a better maximum distance and separate values for each cluster lower than the traditional kmeans algorithm. Conclusion: The decision making for the session period and blood rate has been improved and made more accurate. This provides the robust and best dialysis adequacy for the specific patient case. Keywords: clustering, data mining, enhanced kmeans, hemodialysis adequacy, kmeans
How to cite this article: ElRashedy M, Akl A. Hemodialysis mining and patients intelligent clustering technologies. J Egypt Soc Nephrol Transplant 2016;16:12430 
Introduction   
The quantity and complexity of data acquired, timestamped and stored in clinical database by automated medical devices are rapidly and continuously increasing. As a result, it becomes more and more important to provide clinicians with easy to use interactive tools to analyze huge amounts of such data [1]. Data mining is concerned with finding models, patterns, and knowledge from the available huge data. Data mining includes, but not limited to, predictive datamining algorithms, which result in models that can be used for prediction and classification, and descriptive datamining algorithms for finding interesting patterns in the data, like associations, clusters, and subgroups. Decision support is concerned with helping decision makers to solve problems and take decisions [2]. Data mining and decision support can be integrated to build better problem solving, data analysis, and decision support systems that help and assist in making clinical decision, evaluating the quality of provided care, and carrying out medical research [3].
In the evaluation of patients on longterm hemodialysis (HD), biochemical data determined at monthly intervals, as well as clinical parameters registered at each dialysis session, hide important information that could be very useful for the management of the patients and for the continuing education of the nephrologists themselves. Efforts to predict HD adequacy have already started [4]. K_{t}/V is an index that describes the efficiency of the removal of protein catabolism products [urea and creatinine (CR)]. K is the invivo clearance of blood urea nitrogen (BUN) of the dialyzer being used (Blood flow rate (QB) ml/min), t is the session length (h), and V is the urea distribution volume [5]. We focused on the evaluation of these parameters to predict the blood flow rate, session efficiency, and duration using datamining clustering technique.
Clustering technique of data mining is a useful tool for grouping data points such that points within single cluster have similar characteristics or close to each other, whereas points in different groups are dissimilar [6]. Consider the HD database, it can be used to cluster new patients such that patients with similar cases are grouped together. Many methods of clustering algorithms have been developed; the most prominent among them being the partitioned, hierarchical, and graph theoretical methods. Typical examples of the three methods are the wellknown kmeans, single linkage, and the minimal spanning tree based algorithms, respectively [7]. To improve the performance of kmeans algorithm, several improved kmeans algorithms have been developed over the past several years. Stochastic kmeans algorithm is developed to improve the clustering result of the kmeans [8] (based on the Kd tree data structure [9]). An improved kmeans algorithm can speed up the time performance while preserving the same clustering results as in the original kmeans algorithm [10].
A global kmeans algorithm is presented, which is an incremental approach to clustering, that dynamically adds one cluster center at a time through a deterministic global search procedure consisting of N executions (N is the size of the data set) of the kmeans algorithm [11]. An algorithm based on the kmeans, namely, a split and merge circular kmeans, is proposed for circular invariant clustering of vectors [12].
Our aim was to use data mining and decision support for HD session to predict the blood rate, HD duration, and best quality of HD from clustering the newly arrived patient data. All enhancement algorithms of kmeans ignore the meaning of the closest, near, and far values of the database variables. We will illustrate the traditional kmeans algorithm and compare with our novel enhancement to face this problem.
Patients and methods   
Patients and hemodialysis sessions
Patients’ cohort included 30 patients (27 male and three female) on regular HD therapy for 12–120 months. The study was approved by the institute ethical committee & all patient signed an approval consent. No patient had systemic or metabolic diseases, and all patients were considered metabolically stable at the time of the study. All patients underwent dialysis thrice weekly through two needles inserted into an arteriovenous fistula. Blood flow was 400 ml/min for all patients, and dialysate flow was 500 ml/min. All patients had no residual renal function; dialyzers were polysulfone.
Sampling and laboratory analysis
A total of nine blood samples were collected during each HD session (predialysis sample, 30, 60, 90, 120, 150, 180, and 210 min from the start of the session and 240 min at the end of the session). An online bedscale monitor was used to measure patient weight every 30 min during the session.
Data mining model
Traditional kmeans algorithm and our enhancement
The kmeans algorithm takes the input parameter k and partitions a set of n objects into k clusters so that the resulting intracluster similarity is high but the intercluster similarity is low. Cluster similarity is measured regarding the mean value of the objects in a cluster. The kmeans algorithm proceeds as follows. First, it randomly selects k of objects, such that each of them initially represents a cluster mean or center. For each one of the remaining objects, an object is assigned to the cluster to which it is the most similar, based on the distance between the object and the cluster mean. It then computes the new mean for each cluster. This process continually iterates until the criterion function converges. Typically, the squareerror criterion is used, which defined as follows:
where E is the sum of the square errors for all objects in the data set, p is the point in the space that is representing a given object, and m_{i} is the mean of cluster C_{i} (both p and m_{i} are multidimensional). In other words, for each object in each cluster, the distance from the object to its cluster center is squared, and these distances are summed. This criterion tries to make the resulting k clusters as compact and separate as possible [13]. The kmeans algorithm works well with multidimensional objects that have closed and nearest values. The distance difference between objects measure by Euclidean distance can be computed as follows:
where i=(x_{i}_{1}, x_{i}_{2},…, x_{ip}) and j=(x_{j}_{1}, x_{j}_{2},… x_{jp}) are two pdimensional data objects, which ignores the meaning of the values by adding the largest values with the smallest values for different dimensions. The summation of different scale values produces unproportional distance with some dimensions. In this it appears that on adding largescale values with far smallest scale values, the values change for the same smallest scale dimension has no effect on the distance. These values are ineffective in the summation with other largescale dimension values. This totally ignores the meaning of the values that are appearing in the result clusters. The clusters that have nearest values with largescale dimensions and far values with smallscale dimensions, the intercluster similarity is low. Classic kmeans datamining model building steps and prediction concept are illustrated in [Figure 1]a.  Figure 1: Model concept: (a) classic kmeans clustering; (b) enhanced kmeans clustering; and (c) prediction.
Click here to view 
We enhanced the kmeans algorithm to avoid this problem. This enhancement concentrates on scaling multidimensional objects. Given D database of n objects, and k the number of clusters to form:
where i=1, 2, 3,… n and x, y, z, …w are dimensions of D database.
First, it calculates the maximum value for each dimension and divides each value by its own maximum value for this dimension and produces F database:
where i=1, 2, 3,… n and M_{x}, M_{y}, M_{z}, …, M_{w} are maximum values for each dimension x, y, z, …, w.
F database has the same scale from zero to one for each dimension. This equalized scaling produces saved meaning for the values in the Euclidean distance, and change in values for the same dimension in the summation of Euclidean distance produces appearance difference in the distance. Enhanced kmeans datamining model building steps and concept are illustrated in [Figure 1]b and [Figure 1]c.
Data mining and decision support for the hemodialysis session
HD session for patients depends on significant parameters determined or decisions taken by the physician. These decisions include HD duration, blood rate under dialysis session, filter type, and dialysis rate. All these decisions represent the necessary parameters to achieve highquality dialysis session (K_{t}/V). These decisionmaking parameters depend on the patient case which specify his/her age, sex, weight, volume, height, CR, BUN, hematocrit (HCT), bicarbonate (HCO_{3}), albumin (ALB), calcium (CA), and phosphorus (PO_{4}). The decision made by the physician is very difficult because decision parameters have many probabilities. Session period, it includes many answers yielding unequal diverted values as 30 min, 1, 1.5, 2, 3, and 4 h. Blood rate (dialysis rate) was 400 ml/min. Dialyzer types; many types of HD filters are available.
To achieve high K_{t}/V rate, the physician must reconcile his choices in probabilities, but with many false probabilities choices. Accordingly, the patient health deteriorates. So, many HD centers have adopted fixed HD period to four hours to be sure of achieving acceptable K_{t}/V. But, patients are not alike; one patient may achieve optimal K_{t}/V by longer or shorter HD period or can achieve higher K_{t}/V by choosing different probability parameters.
We cluster the data of HD patients that may be a helpful decision tool for the nephrologist for choosing the best probability parameters to achieve higher and/or acceptable K_{t}/V for each patient case. Given k clusters, each cluster has mean value M and large distance in cluster L, which is the maximum distance between each object and the mean values in the cluster. When assigning a new object, the following:
 Compute the difference between the new object and the mean values (M) for any cluster.
 If the difference distance is lower than or equal to the maximum distance for this cluster, then it belongs to this cluster, and compute the nearest object to this new object using Euclidean distance and predict the remaining values (K_{t}/V, HD duration, and blood rate) as illustrated in [Figure 1].
 If not, repeat (1) with another cluster.
 If repeated with all clusters, it does not belong to these clusters.
kmeans datamining model clustering concept is illustrated in [Figure 2].  Figure 2: Our enhanced clustering concept. M, mean value; C, cluster; L, large distance in cluster.
Click here to view 
Statistical analysis
Finding were recorded and analyzed using SPSS for Windows (SPSS Inc., Chicago, Illinois, USA). Quantitative data are described in terms of arithmetic mean±SD. Qualitative data were measured by χ^{2}. Paired ttest was used for comparisons of the mean of tworelated variables.
Results   
[Table 1] lists demographic characteristics of the patients. The experimental results taken were used to compare the performance of kmeans algorithm and our enhancement version. Longterm experiments were carried out on the database of 30 patients; each one has 18 sessions with different HD duration and blood rate and with fixed dialysis rate and filter type. These data were taken from the Urology and Nephrology Center, Mansoura, Egypt. Initially, these data were multidimensional: smallscale dimension values (sex, CR, ALB, CA, and PO_{4}), mediumscale dimension values (age, weight, volume, HCT, and HCO_{3}), highscale dimension values (height and BUN), high separate scale dimension values (age, weight, volume, BUN, HCT, HCO_{3}, and PO_{4}) as a high distance between a maximum and a minimum value of their dimension, and high nearest scale dimension values (height, CR, ALB) as a small distance between a maximum and a minimum value of their dimension ([Figure 3]).
These are illustrated in [Figure 4]. The unification of the scale for multidimensional database made the kmeans algorithm enhanced for clustering data of highscale dimensions with smallscale dimensions and even more enhanced in high separate scale dimensions with small nearest scale dimensions. Clustering the HD database was done in two phases: first, clustering the database without BUN (high separate scale dimension) and second, clustering the database with BUN dimension to interrelate the effect of high separate scale dimensions with small nearest scale dimensions in the clustering. The resulting clusters of our enhancement algorithm have a maximum distance and separate values (average number of distinct values in a clusters) lower than the resulting clusters of traditional kmeans algorithm with and without BUN clustering. This is clear in [Figure 4]. Generally, our enhancement made the intercluster similarity high with or without higher different scaling in dimensions. Clusters without BUN, the squareerror Eq (1) of age, weight, and volume dimensions of the resulting clusters of our enhancement is higher than the resulting clusters of the traditional kmeans algorithm. Clusters with BUN, the squareerror Eq (1) of these dimensions for the resulting clusters of our enhancement is lower than the resulting clusters of the traditional kmeans algorithm. It is appearing that the effect is less in sex, CR, HCT, HCO_{3}, ALB, CA, and PO_{4} dimensions. These dimensions have the nearest scale values. Age, weight, and volume dimensions have far separate scale values and in the same time lower than BUN scale dimension. The traditional kmeans algorithm ignores the fare values in the smallest scale that are clustering with higher scale values which ignore these smallest values in the total summation. [Figure 5] shows the generating datamining HD software and the application phase windows.  Figure 4: Multidimensional clustering of hemodialysis data without and with blood urea nitrogen (BUN) clustering. (a) Multidimensional hemodialysis database; (b) maximum distance for each dimension in the clusters without BUN clustering; (c) separate values for each dimension in the clusters without BUN clustering; (d) the square error for each dimension in the clusters without BUN clustering; (e) maximum distance for each dimension in the clusters with BUN clustering; (f) separate values for each dimension in the clusters with BUN clustering; and (g) the square error for each dimension in the clusters with BUN clustering.
Click here to view 
 Figure 5: Our developed datamining software interface. (a) Model generation and (b) model validation
Click here to view 
Discussion   
Many scholars have applied datamining techniques for disease prediction. These techniques include clustering, association rules, and timeseries analysis. Different analyses may require different mining techniques. Selection of an appropriate mining technique is the key to obtaining valuable data [14].
HD adequacy has been estimated by two methods. First, direct dialysate quantification and second the urea kinetic modelling (UKM). UKM has many drawbacks and limitations in comparison with direct dialysate quantification, as it utilize a variety of formulas to drive protein catabolic rate and K_{t}/V from just two (or three) BUN determinations, ultrafiltration volume, assumed weight parameter, and an assigned dialyzer clearance [15],[16]. The main advantage of using a urea kinetic model instead of direct quantification is that the model can predict dialysis dose, which should be achieved by any prescription [17]. Taking into account the present state of knowledge, dialysis treatment should be planned to achieve a minimal K_{t}/V of 1.2 and strive for a K_{t}/V of 1.4. The basis of the dialysis session should be dialysis dose rather than dialysis duration [18].
UKM only assess BUN and CR levels and CR clearance. However, increasing amounts of data indicate that some hidden rules and relationships may exist. Therefore, this work uses an entropy function to identify key features related to HD. By identifying these key features, nephrologist can determine which dose of HD a patient requires. This work uses these key features as dimensions in cluster analysis. When patients requiring HD are classified into the same group, and the other patients are classified into the other group, the key features can effectively determine whether a patient requires HD. The proposed datamining scheme finds association rules of each cluster. Hidden rules for causing any kidney disease can therefore be identified.Although many clustering techniques have been proposed, and the kmeans algorithm is the most representative and widely applied. The kmeans algorithm is also called the generalized Lloyd algorithm [14]. The kmeans algorithm transforms each data record into a data point, and random numbers are utilized to generate the initial cluster center to determine which data point belongs to which cluster point. The divided data points are used to calculate the distance between a data point and the cluster center, such that a data point will belong to one cluster center when the data point is closer to one cluster center than another cluster center. The newly recomputed cluster center is the average among all data points in a cluster, and the new cluster center is taken as a basis for the next iteration. This process is repeated until no change occurs.
Our analysis was carried out on the database of 30 patients; each one has 18 sessions with different HD duration and blood rate and fixed dialysis rate and filter type. Our HD data were multidimensional, ranging from smallscale dimension values to highscale dimension values, and high separate scale dimension values to those with high nearest scale dimension values. These are illustrated in [Figure 3]. The unification of the scale for multidimensional database made the kmeans algorithm enhanced for clustering data of highscale dimensions with smallscale dimensions and even more enhanced in high separate scale dimensions with small nearest scale dimensions.
Clustering the HD database was done in two phases: first, clustering the database without BUN (high separate scale dimension), and second, clustering the database with BUN dimension to interrelate the effect of high separate with small nearest scale dimensions in the clustering. The resulting clusters of our enhancement algorithm have a maximum distance and average number of distinct values in a cluster, lower than the resulting clusters of traditional kmeans algorithm with and without BUN clustering. Generally, our enhancement made the intercluster similarity high with or without higher different scaling in dimensions. The traditional kmeans algorithm ignores the far values in the smallest scale that are clustering with higher scale values which ignore these smallest values in the total summation.
Conclusion   
The traditional kmeans algorithm organizes data into clusters. The clusters are of different scale multidimensional data, and the resulting intracluster similarity is high but the intercluster similarity is low. The equalized scaling of multidimensional data saved the meaning for the data values, and the resulting intercluster similarity is high, and maximum distance and separate values for each dimension are lower than the traditional kmeans algorithm. Data mining and decision support have been integrated to analysis HD data to predict the best HD adequacy (K_{t}/V), HD duration, and blood rate.
Financial support and sponsorship
Nil.
Conflicts of interest
There are no conflicts of interest.
References   
1.  Chittaro L. Information visualization and its application to medicine. Artif Intell Med 2001; 22:81–88. 
2.  Lavrac N, Bohanec M, Pur A, Cestnik B, Debeljak M, Kobler A. Data mining and visualization for decision support and modeling of public healthcare resources. J Biomed Inform 2007; 40:438–447. 
3.  Chittaro L, Combi C, Trapasso G. Data mining on temporal data: a visual approach and its clinical application to hemodialysis. J Visual Lang Comput 2003; 14:591–620. 
4.  Akl AI, Sobh MA, Enab YM, Tattersall J. Artificial intelligence: a new approach for prescription and monitoring of hemodialysis therapy. Am J Kidney Dis 2001; 38:1277–1283. 
5.  Catarci T, Santucci G, Silva S. An interactive visual exploration of medical data for evaluating health centers. J Res Pract Inf Tech 2003; 35:99–119. 
6.  Guha S, Rastogi R, Shim K. A robust clustering algorithm for categorical attributes. J Inf Syst 2000; 2:345–366. 
7.  Bandyopadhyay S, Saha S. A clustering method using a new point symmetrybased distance measure. J Pattern Recognit 2007; 40:3430–3451. 
8.  Kovesi B, Boucher J, Saoodi S. Stochastic kmeans algorithm for vector quantization. Pattern Recognit 2001; 22:603–610. 
9.  Anderberg M. Computational geometry: algorithms and applications. Berlin: Springer; 2000. 
10.  Kanungo T, Mount D, Netanyahu N, Piatko C, Silverman R, Wu A. An efficient kmeans clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 2002; 24:881–892. 
11.  Likas A, Vlassis N, Verbeek J. The global kmeans clustering algorithm. J Pattern Recognit 2003; 36:451–461. 
12.  Charalampidis D. A modified kmeans algorithm for circular invariant clustering. IEEE Trans Pattern Anal Mach Intell 2005; 27:1856–1865. 
13.  Han J, Kamber M. Data mining: concepts and techniques. 2nd. University of Illinois at Urbana Champaign: Morgan Kaufmann; 2006. 
14.  Lai JZC, Huang TJ, Liaw YC. A fast kmeans clustering algorithm using cluster center displacement. Pattern Recognit 2009; 42:2551–2556. 
15.  Aebischer P, Schorderet D, Juillerat A, Wauters JP, Felly G. Comparison of urea kinetics and direct dialysis quantification in hemodialysis patients. Trans Am Soc Artif Intern Organs 1985; 31:338–342. 
16.  Jindal KK, Goldstein MB. Urea kinetic modelling in chronic hemodialysis: Benefits, problems and practical solutions. Semin Dial 1988; 1:82–85. 
17.  Thayer JF, Von Eye A, Rovine MJ. Assessment of neural network models using prediction analysis. Biomed Sci Instrum 1995; 31:25–28. 
18.  Shohat J, Boner G. Adequacy of hemodialysis 1996. Nephron 1997; 76:1–6. 
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5]
[Table 1]
