IJDATICS_V13_No2 | PDF Host

INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS The International Journal of Design, Analysis and Tools for Integrated Circuits and Systems (IJDATICS) was created by a netwo rk of researchers and engineers both from academia and industry. IJDATICS is an international journal intended for professionals and researchers in all fields of desig n, analysis and tools for integrated circuits and systems. The objective of the IJDATICS is to serve a better understanding between the community of researchers and practitioners both from academia and industry. Vijayakumar Nanjappan Jie Zhang University College Cork, Ireland Xi'an Jiaotong - Liverpool University Hui - Huang Hsu Tamkang University, Taiwan Editor - In - Chief Ka Lok Man Xi'an Jiaotong - Liverpool University, China Associate Editor s Danny Hughes Katholieke Universiteit Leuven, Belgium M L Dennis Wong Heriot - Watt University, Scotland Editorial Board Yuxuan Zhao Kamran Siddique Xi'an Jiaotong - Liverpool University, China University of Alaska Anchorage Tomas Krilavičius Young B. Park Vytautas Magnus University, Lithuania Dankook University, Kore a Vladimir Hahanov Salah Merniz Kharkov National University of Radio Electronics, Ukraine Paolo Prinetto Politecnico di Torino, Italy Massimo Poncino Politecnico di Torino, Italy Alberto Macii Politecnico di Torino, Italy Joongho Choi University of Seoul, South Korea Wei Li Fudan University, China Michel Schellekens University College Cork, Ireland Emanuel Popovici University College Cork, Ireland Jong - Kug Seon LS Industrial Systems R&D Center, South Korea Umberto Rossi STMicroelectronics, Italy Franco Fummi University of Verona, Italy Graziano Pravadelli University of Verona, Italy Vladimir PavLov Intl. Software and Productivity Engineering Institute, USA Ajay Patel Intelligent Support Ltd, United Kingdom Thierry Vallee Georgia Southern University, USA Menouer Boubekeur University College Cork, Ireland Monica Donno Minteos, Italy Jun - Dong Cho Sung Kyun Kwan University, South Korea AHM Zahirul Alam International Islamic University Malaysia, Malaysia Gregory Provan University College Cork, Ireland Miroslav N. Velev Aries Design Automation, USA M. Nasir Uddin Lakehead University, Canada Dragan Bosnacki Eindhoven University of Technology, The Netherlands Dave Hickey University College Cork, Ireland Maria OKeeffe University College Cork, Ireland Milan Pastrnak Siemens IT Solutions and Services, Slovakia John Herbert University College Cork, Ireland Zhe - Ming Lu Sun Yat - Sen University, China Jeng - Shyang Pan National Kaohsiung University of Applied Sciences, Taiwan Chin - Chen Chang Feng Chia University, Taiwan Mong - Fong Horng Shu - Te University, Taiwan Liang Chen University of Northern British Columbia, Canada Chee - Peng Lim University of Science Malaysia, Malaysia Ngo Quoc Tao Vietnamese Academy of Science and Technology, Vietnam Mentouri University, Algeria Oscar Valero University of Balearic Islands, Spain Yang Yi Sun Yat - Sen University, China Damien Woods University of Seville, Spain Franck Vedrine CEA LIST, France Bruno Monsuez ENSTA, France Kang Yen Florida International University, USA Takenobu Matsuura Tokai University, Japan R. Timothy Edwards MultiGiG, Inc., USA Olga Tveretina Karlsruhe University, Germany Maria Helena Fino Universidade Nova De Lisboa, Portugal Adrian Patrick ORiordan University College Cork, Ireland Grzegorz Labiak University of Zielona Gora, Poland Jian Chang Texas Instruments Inc, USA Yeh - Ching Chung National Tsing - Hua University, Taiwan Anna Derezinska Warsaw University of Technology, Poland Kyoung - Rok Cho Chungbuk National University, South Korea Yong Zhang Shenzhen University, China R. Liutkevicius Vytautas Magnus University, Lithuania Yuanyuan Zeng University College Cork, Ireland D.P. Vasudevan University College Cork, Ireland Arkadiusz Bukowiec University of Zielona Gora, Poland Maziar Goudarzi University College Cork, Ireland Jin Song Dong National University of Singapore, Singapore Dhamin Al - Khalili Royal Military College of Canada, Canada Zainalabedin Navabi University of Tehran, Iran Lyudmila Zinchenko Bauman Moscow State Technical University, Russia Muhammad Almas Anjum National University of Sciences and Technology, Pakistan Deepak Laxmi Narasimha University of Malaya, Malaysia Danny Hughes Xi'an Jiaotong - Liverpool University, China Jun Wang Fujitsu Laboratories of America, Inc., USA A.P. Sathish Kumar PSG Institute of Advanced Studies, India N. Jaisankar VIT University. India Atif Mansoor National University of Sciences and Technology, Pakistan Steven Hollands Synopsys, Ireland Felipe Klein State University of Campinas, Brazil Enggee Lim Xi'an Jiaotong - Liverpool University, China Kevin Lee Murdoch University, Australia Prabhat Mahanti University of New Brunswick, Saint John, Canada Tammam Tillo Xi'an Jiaotong - Liverpool University, China Yanyan Wu Xi'an Jiaotong - Liverpool University, China Wen Chang Huang Kun Shan University, Taiwan Masahiro Sasaki The University of Tokyo, Japan Vineet Sahula Malaviya National Institute of Technology, India D. Boolchandani Malaviya National Institute of Technology, India Zhao Wang Xi'an Jiaotong - Liverpool University, China Shishir K. Shandilya NRI Institute of Information Science & Technology, India J.P.M. Voeten Eindhoven University of Technology, The Netherlands Wichian Sittiprapaporn Mahasarakham University, Thailand Aseem Gupta Freescale Semiconductor Inc., USA Kevin Marquet Verimag Laboratory, France Matthieu Moy Verimag Laboratory, France Ramy Iskander LIP6 Laboratory, France Suryaprasad Jayadevappa PES School of Engineering, India S. Hariharan B. S. Abdur Rahman University, India Chung - Ho Chen National Cheng - Kung University, Taiwan Kyung Ki Kim Daegu University, South Korea Shiho Kim Chungbuk National University, South Korea Hi Seok Kim Cheongju University, South Korea Siamak Mohammadi University of Tehran, Iran Brian Logan University of Nottingham, UK Ben Kwang - Mong Sim Gwangju Institute of Science & Technology, South Korea Asoke Nath St. Xavier's College, India Tharwon Arunuphaptrairong Chulalongkorn University, Thailand Shin - Ya Takahasi Fukuoka University, Japan Cheng C. Liu University of Wisconsin at Stout, USA Farhan Siddiqui Walden University, Minneapolis, USA Yui Fai Lam Hong Kong University of Science & Technology, Hong Kong Jinfeng Huang Philips & LiteOn Digital Solutions, The Netherlands Assistant Editor - In - Chief Shuaibu Musa Adam Katholieke Universiteit Leuven, Belgium Publisher Cooperation Name : Solari Co., Hong Kong Address : Unit 1 - 5, 20/F, Midas Plaza, 1 Tai Yau Street, San Po Kong, Kowloon, Hong Kong Phone : (852) 3966 - 2536 ISSN: 2071 - 2987 (online version), 2223 - 523X (print version) INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTEGRATED CIRCUITS AND SYSTEMS https://www.cicet.org/ijdatics / i Preface Welcome to the Volume 13 Number 2 of the International Journal of Design, Analysis and Tools for Integrated Circuits and Systems (IJDATICS). This issue presents six high-quality academic papers, providing a well-rounded snapshot of current research in the field of Machine Learning Applications, Intelligent Systems and AI, and Advanced Data Processing. There are two key themes evident in these papers: • Machine Learning for Medical Diagnosis Clinical Decision Support: Three papers propose interpretable machine learning frameworks using ensemble learning, preprocessing, and clustering to improve survival prediction and personalized clinical decision-making for hepatocellular carcinoma patients. • Intelligent Systems and Efficient AI Processing for Real-World Applications: Three papers design intelligent systems integrating natural language processing, multimodal learning, and parallel computation for IT event management, violence detection, and model explainability, demonstrating practical, scalable AI solutions with improved performance. We would also like to thank the IJDATICS editorial team, which is led by: Editor-In-Chief Ka Lok Man Xi’an Jiaotong Liverpool University, China Xiaomei Fang Xi’an Jiaotong Liverpool University, China Guest Editors Jie Zhang Xi’an Jiaotong Liverpool University, China Yuxuan Zhao Xi'an Jiaotong-Liverpool University, China Assistant Editor-In-Chief Shuaibu Musa Adam Katholieke Universiteit Leuven, Belgium ii Table of Contents Vol. 13, No. 2, December 2024 Preface ................................................................................................. i Table of Contents .................................................................................... ii 1. Tsung-Jung Lin, Tzu-Chia Huang, Syu-Jhih Jhang, Hsiang-Chuan Chang, Chih-Yung Chang ， ISPF: An Interpretable Survival Prediction Framework for Liver Cancer , Tamkang University, Taiwan, China 1 2. Tsung-Jung Lin and Chih-Yung Chang, HCC-SPML: Hepatocellular Carcinoma Survival Prediction via Machine Learning , Tamkang University, Taiwan, China 5 3. Wei-Ting Chang and Chih-Yung Chang, An Intelligent Event Management Framework Integrating NLP, Knowledge Graphs, and Multi-Agent Coordination , Tamkang University, Taiwan, China 9 4. Zelong Liu, Nanlin Jin, Pingfan Wang, and Ka Lok Man, Adaptive Feature Selection for Drift Detection in Big Data Streams , Xi’an Jiaotong-Liverpool University, China 14 5. Syu-Jhih Jhang, Yu-Ting Chin, Chih-Yung Chang, and Shih-Jung Wu, Efficient Violence Recognition in Surveillance Videos with Semantic-Aware Multimodal Matching , Tamkang University, Taiwan, China 20 6. Wen-Dong Jiang, Yu-Ting Chin, Tzu-Chia Huang, Syu-Jhih Jhang, and Chih-Yung Chang, EnEXPVI 5: A Parallel Tree-Based Framework for Fast Interpretable Matrix Operations , Tamkang University, Taiwan, China 24 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 2, Dec. 2024 ISPF: An Interpretable Survival Prediction Framework for Liver Cancer Tsung-Jung Lin, Tzu-Chia Huang, Syu-Jhih Jhang, Hsiang-Chuan Chang, Chih-Yung Chang Abstract —Liver cancer is a leading cause of cancer-related death, with variable patient prognoses due to clinical heterogeneity. Traditional models struggle with nonlinear patterns in high- dimensional data, hindering accurate survival prediction. This study proposes an interpretable survival prediction framework for liver cancer, called ISPF, consisting of data preprocessing, K- means pre-clustering, and model training. Data preprocessing includes handling missing values and normalizing features to ensure data quality. Pre-clustering reveals latent patient subgroups to reduce heterogeneity. Finally, ensemble learning models are trained on clustered data to predict survival time accurately. Experimental results demonstrate that the proposed ISPF achieves improved predictive performance and interpretability, offering valuable insights for clinical decision- making. Index Terms —Interpretable framework, K-means, Liver cancer prediction. I. INTRODUCTION Globally, hepatocellular carcinoma (HCC) is the most common form of primary liver cancer and ranks among the top two causes of cancer-related deaths in Taiwan and many parts of Asia [1]. The global burden of HCC is increasing due to rising incidences of viral hepatitis and metabolic-related liver diseases. The onset of HCC is often preceded by liver cirrhosis, with well-established risk factors including chronic infection with hepatitis B virus (HBV) or hepatitis C virus (HCV), excessive alcohol intake, prolonged exposure to aflatoxins, and inherited conditions such as hereditary hemochromatosis [2][3]. Despite advances in screening and treatment, HCC prognosis remains poor, with survival outcomes varying significantly across patients. This variability stems from multiple interacting factors such as tumor size and number, liver function (e.g., Child-Pugh classification), vascular invasion, metastatic status, as well as the patient's general health and comorbidities. Therefore, accurately predicting patient-specific survival duration is critical for informing clinical decision-making, selecting appropriate therapies, and facilitating long-term care planning. Traditionally, statistical models like the Cox proportional hazards regression have been widely used for survival prediction due to their interpretability and mathematical tractability. However, such models assume linearity and proportional hazards, which severely limit their ability to model complex, nonlinear, and high-dimensional clinical data. With the emergence of large-scale electronic health records and multi-omics datasets, there is a growing need for more sophisticated predictive tools that can fully exploit these data sources. In recent years, artificial intelligence (AI), particularly machine learning (ML) and deep learning (DL), has gained momentum in the domain of survival analysis. These approaches can capture intricate patterns and interactions among features, offering improved predictive accuracy in comparison to traditional methods. Nonetheless, existing AI models often rely heavily on tree-based architectures for feature selection and prediction, such as random forests or gradient boosting. While effective in many cases, such models may introduce bias, are prone to overfitting on imbalanced datasets, and generally function as black boxes with limited clinical interpretability. Moreover, most previous studies fail to account for the inherent heterogeneity among HCC patients. Ignoring such diversity can lead to generalized models that perform poorly for specific subpopulations. In addition, many frameworks do not incorporate any stratification or clustering mechanism prior to modeling, resulting in an averaged prediction that may not be clinically informative. To address these limitations, this study introduces an integrated and interpretable survival prediction framework (ISPF) specifically tailored for liver cancer prognosis. The ISPF framework incorporates a K-means-based pre-clustering stage to identify latent subgroups of patients with shared clinical characteristics. This enables the model to better handle data heterogeneity by allowing more personalized, subgroup- specific learning. Furthermore, ISPF integrates the predictive outputs of multiple machine learning models through ensemble learning, which enhances overall robustness and accuracy. Through this hybrid strategy—combining rigorous data preprocessing, patient stratification, and ensemble modeling— ISPF achieves improved survival prediction performance while maintaining interpretability, thus offering a promising clinical decision-support tool in the management of HCC. II. RELATED WORK Survival prediction in oncology, particularly for hepatocellular carcinoma (HCC), has been extensively studied using both traditional statistical methods and modern machine learning techniques. The Cox proportional hazards model remains one of the most widely used approaches due to its interpretability; however, its assumption of linearity and proportional hazards often fails to capture complex, nonlinear interactions among clinical variables [4]. To overcome these Tsung - Jung Lin is with the Department of Computer Science and Information Engineering, Tamkang University, New Taipei, Taiwan. He is also with Department of Gastroenterology of Taipei City Hospital in Ren- Ai branch, Taipei, Taiwan. Chih-Yung Chang is with the Department of Computer Science and Information Engineering, Tamkang University, New Taipei 25137, Taiwan. email: dab70@tpech.gov.tw , cychang@mail.tku.edu.t w 1 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 2, Dec. 2024 limitations, numerous studies have explored the use of machine learning (ML) and deep learning (DL) methods, such as random forests, gradient boosting machines, support vector regression, and deep neural networks, which have shown superior predictive performance in high-dimensional medical datasets [5][6]. Despite their accuracy, many of these models act as “black boxes,” offering limited transparency in how individual features contribute to predictions—a major drawback in clinical settings where interpretability is critical for decision-making. Moreover, most existing ML-based frameworks do not explicitly address the heterogeneity of liver cancer patients, leading to biased predictions for minority or atypical subgroups. Some recent approaches have attempted to mitigate this issue by incorporating latent variable models or hierarchical learning, but often at the expense of computational simplicity and practical deployment [7]. Furthermore, while ensemble learning has demonstrated robustness in various predictive tasks, its integration with patient stratification strategies remains underexplored in the context of HCC. Clustering techniques such as K-means have been employed in oncology for subgroup identification, yet rarely are they combined systematically with predictive modeling pipelines to enhance both accuracy and interpretability. The proposed ISPF framework addresses these gaps by combining rigorous data preprocessing, unsupervised patient stratification using K-means, and ensemble-based survival modeling. This integration not only boosts prediction performance but also improves clinical trust by allowing subgroup-specific insights that align with real-world heterogeneity observed in liver cancer patients. III. THE PROPOSED ISPF MECHANISM This study aims to develop a robust and interpretable prediction framework, referred to as ISPF, designed to accurately estimate the survival time of liver cancer patients based on their comprehensive clinical records. Given the inherent complexity, heterogeneity, and high dimensionality of medical datasets, traditional prediction methods often struggle to capture the nonlinear patterns and interactions among clinical variables. Additionally, clinical data are frequently plagued by issues such as missing entries, variable scales, and hidden subpopulations, further complicating the modeling process. In response to these challenges, the proposed ISPF framework integrates advanced data processing techniques with machine learning strategies to enhance prediction accuracy, generalizability, and interpretability in real-world clinical contexts. The ISPF architecture consists of three sequential phases: Data Preprocessing, Pre-Clustering, and Model Training. Each phase is designed to incrementally refine the raw input data and enable the construction of more accurate, subgroup-aware predictive models. The process ensures that the final predictions are not only statistically reliable but also clinically meaningful. A. Data Preprocessing Phase This phase ensures that the raw clinical records are properly cleaned and formatted for modeling. The goal is to eliminate noise, reduce bias, and standardize the dataset so that downstream machine learning models can extract patterns more effectively. Let the raw dataset be denoted as 𝑋𝑋 � = [ 𝑥𝑥 � 𝑖𝑖𝑖𝑖 ] ∈ 𝑅𝑅 𝑀𝑀 × 𝑁𝑁 , where 𝑥𝑥 � 𝑖𝑖𝑖𝑖 represents the value of the 𝑗𝑗 -th feature for the 𝑖𝑖 -th patient, 𝑀𝑀 is the number of patients, and 𝑁𝑁 is the number of clinical features. (1) Missing value imputation Missing values in 𝑋𝑋 � are identified and imputed based on the data type. For continuous numerical features, the mean value is used to fill in missing entries, thereby preserving the overall data distribution while preventing information loss. The imputation is calculated using: 𝑥𝑥 � 𝑖𝑖𝑖𝑖 = 1 𝑀𝑀 � 𝑥𝑥 � 𝑖𝑖𝑖𝑖 𝑀𝑀 𝑖𝑖 = 1 (1) (2) Normalization of Continuous Features To avoid scale dominance by features with larger numerical ranges, all continuous variables are normalized using Z-score standardization. This not only accelerates model convergence but also ensures fair comparison across features. For each entry 𝑥𝑥 � 𝑖𝑖𝑖𝑖 , the normalized value 𝑥𝑥 𝑖𝑖𝑖𝑖 is given by: 𝑥𝑥 𝑖𝑖𝑖𝑖 = 𝑥𝑥 � 𝑖𝑖𝑖𝑖 − 𝜇𝜇 𝑖𝑖 𝜎𝜎 𝑖𝑖 , where 𝜇𝜇 𝑖𝑖 and 𝜎𝜎 𝑖𝑖 represent the mean and standard deviation of feature 𝑗𝑗 , respectively. The resulting standardized feature matrix is denoted as 𝑋𝑋 = [ 𝑥𝑥 𝑖𝑖𝑖𝑖 ] ∈ 𝑅𝑅 𝑀𝑀 × 𝑁𝑁 B. Pre-Clustering Phase Given the substantial variability observed among liver cancer patients—in terms of disease progression, treatment responsiveness, and a wide array of clinical indicators—this phase introduces an unsupervised learning approach to uncover latent subgroups within the patient population. Specifically, K- means clustering is employed to stratify patients based on similarities in their standardized clinical profiles. The rationale behind this step is to mitigate intra-group heterogeneity and enable more focused, subgroup-specific modeling, thereby enhancing both predictive accuracy and clinical relevance. The clustering process operates on the standardized feature matrix 𝑋𝑋 = [ 𝑥𝑥 𝑖𝑖𝑖𝑖 ] ∈ ℝ 𝑀𝑀 × 𝑁𝑁 , where each row vector 𝑥𝑥 𝑖𝑖 ∈ 𝑅𝑅 𝑁𝑁 represents the complete clinical feature set of the 𝑖𝑖 -th patient. The goal of K-means is to partition the dataset into K clusters 𝐶𝐶 = { 𝐶𝐶 1 , 𝐶𝐶 2 , ... , 𝐶𝐶 𝐾𝐾 } in such a way that the total within- cluster variance is minimized. Mathematically, this is expressed as: 𝑚𝑚𝑖𝑖𝑚𝑚 𝐶𝐶 � � || 𝑥𝑥 𝑖𝑖 − 𝜐𝜐 𝑘𝑘 || 2 𝑥𝑥 𝑖𝑖 ∈𝐶𝐶 𝑘𝑘 𝐾𝐾 𝑘𝑘=1 , (3) where 𝜐𝜐 𝑘𝑘 denotes the centroid of cluster 𝐶𝐶 𝑘𝑘 , representing the average clinical profile of patients assigned to that group. 2 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 2, Dec. 2024 Once the clustering is complete, the resulting cluster labels are integrated into the prediction pipeline in one of two ways. In the first approach, the cluster label is appended as a categorical feature to each patient's input vector, allowing a global model to incorporate subgroup information as part of the learning process. In the second approach, the dataset is divided according to cluster membership, and a distinct predictive model is trained within each subgroup. Both strategies aim to capture localized survival dynamics that might otherwise be diluted in a pooled analysis of the entire cohort. By aligning the modeling process with the natural structure of the patient population, this clustering-based stratification enhances the model’s capacity to detect subgroup- specific prognostic patterns. Furthermore, it facilitates clinical interpretability by highlighting which combinations of features are most influential within each identified cluster—thereby supporting more personalized risk assessment and treatment planning for liver cancer patients. C. Model Training Phase The final phase of the ISPF framework focuses on the construction and optimization of predictive models that estimate patient-specific survival durations based on their clinical feature representations. Let the target variable be defined as the survival duration vector 𝑦𝑦 = [ 𝑦𝑦 1 , 𝑦𝑦 2 , ... , 𝑦𝑦 𝑀𝑀 ] ⊤ , where 𝑦𝑦 𝑖𝑖 denotes the observed survival time for the i-th patient. The model’s predictions are represented by 𝑦𝑦 � = [ 𝑦𝑦 � 1 , 𝑦𝑦 � 2 , ... , 𝑦𝑦 � 𝑀𝑀 ] ⊤ To assess and guide model learning, the Mean Squared Error (MSE) is adopted as the objective loss function, measuring the average squared difference between actual and predicted survival times. The MSE is formally defined as: 𝐿𝐿 ( 𝑦𝑦 , 𝑦𝑦 � ) = 1 𝑀𝑀 � ( 𝑦𝑦 𝑖𝑖 − 𝑦𝑦 � 𝑖𝑖 ) 2 𝑀𝑀 𝑖𝑖 = 1 (4) Minimizing this objective function allows the model to learn patterns within the clinical data that are most predictive of survival duration. The choice of regression architecture— whether based on decision trees, gradient boosting machines, or neural networks—is flexible within the ISPF framework, and ensemble techniques may be adopted to enhance predictive robustness and reduce overfitting. A distinguishing aspect of this phase lies in its integration with the clustering results from the previous stage. Each patient is first assigned to a specific cluster based on their clinical profile, as determined by the K-means algorithm. This clustering information is then incorporated into the modeling process in one of two ways: either as an additional categorical input feature that informs a global predictive model, or by segmenting the dataset and training a separate, specialized model within each cluster. Regardless of the approach, the goal remains consistent—to enable the model to adapt to heterogeneity in the patient population and to capture distinct survival patterns that may be obscured in aggregate modeling. By leveraging the structure revealed through pre- clustering, the predictive models within ISPF can more accurately reflect intra-cohort variability. This not only leads to improved prediction accuracy but also enhances the interpretability of the results, allowing clinicians to gain insights into subgroup-specific risk factors and survival trajectories. Through this integrative design, the ISPF framework demonstrates its capacity to support personalized and data-driven decision-making in liver cancer prognosis. IV. SIMULATION STUDY This section aims to explore latent patient subgroups and enhance model personalization, the proposed ISPF applied K- means clustering to the pre-processed clinical dataset. The number of clusters 𝐾𝐾 was varied from 2 to 10, and the clustering performance was evaluated using the silhouette coefficient, which quantifies the consistency within clusters compared to separation from other clusters. As shown in Fig. 1, the silhouette score reaches its maximum value of 0.36 when 𝐾𝐾 = 4 , suggesting the best trade-off between intra-cluster compactness and inter-cluster separation at this configuration. Based on this quantitative assessment, four clusters are adopted as the optimal partition, enabling the model to effectively capture patient heterogeneity while preserving clinical coherence within each subgroup. Fig. 1 Silhouette scores for K - means clustering with varying numbers of clusters IV. CONCLUTION This study presents a novel and interpretable prediction framework, ISPF, for estimating survival duration in liver cancer patients. By integrating advanced data preprocessing, unsupervised pre-clustering, and machine learning-based modeling, ISPF addresses key challenges in medical survival prediction, including data heterogeneity, high dimensionality, and the need for personalized prognostic insights. The introduction of K-means clustering enables the identification of latent patient subgroups, effectively capturing intra-cohort variability and enhancing model performance. Experimental results, particularly the silhouette analysis, confirm the 3 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 2, Dec. 2024 optimality of four distinct clusters, supporting the framework’s ability to reflect clinically meaningful stratification. Through comprehensive model training guided by subgroup-specific characteristics, ISPF demonstrates improved prediction accuracy and robustness compared to conventional approaches. This work not only contributes to methodological advancements in survival analysis but also provides a practical tool for personalized decision-making in liver cancer treatment and management. Future research will explore dynamic temporal modeling and external clinical validation to further extend the applicability of ISPF in broader oncological contexts. REFERENCES [1] Wakil, A., Wu, Y. C., Mazzaferro, N., Greenberg, P., & Pyrsopoulos, N. T. (2024). Trends of hepatocellular carcinoma (HCC) inpatients mortality and financial burden from 2011 to 2017: A nationwide analysis. Journal of Clinical Gastroenterology, 58(1), 85-90. T. Wu, et al, “A rigid-flex wearable health monitoring sensor patch for IoT-connected healthcare applications,” IEEE Internet of Things Journal, vol. 7, no. 8, pp. 6932-6945, Aug. 2020. [2] Huang, J., Wu, Q., Geller, D. A., & Yan, Y. (2023). Macrophage metabolism, phenotype, function, and therapy in hepatocellular carcinoma (HCC). Journal of Translational Medicine, 21(1), 815. [3] Toh, M. R., Wong, E. Y. T., Wong, S. H., Ng, A. W. T., Loo, L. H., Chow, P. K. H., & Ngeow, J. (2023). Global epidemiology and genetics of hepatocellular carcinoma. Gastroenterology, 164(5), 766-782. [4] Cox, D. R. (1972). Regression models and life-tables. Journal of the Royal Statistical Society: Series B (Methodological) , 34(2), 187–202. [5] Wang, P., Li, Y., & Reddy, C. K. (2019). Machine learning for survival analysis: A survey. ACM Computing Surveys (CSUR) , 51(6), 1–36. [6] Huang, C., Clayton, E. A., Matyunina, L. V., McDonald, L. D., Benigno, B. B., Vannberg, F., & McDonald, J. F. (2020). Machine learning predicts individual cancer patient responses to therapeutic drugs with high accuracy. Scientific Reports , 10(1), 1–12. [7] Zhang, Y., Jiang, J., Chen, Y., Zhang, Y., & Lv, Q. (2022). A multi-task learning approach for cancer survival prediction using patient subgroup information. IEEE Journal of Biomedical and Health Informatics , 26(1), 364–375. 4 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 2, Dec. 2024 HCC-SPML: Hepatocellular Carcinoma Survival Prediction via Machine Learning Tsung-Jung Lin, Chih-Yung Chang, dab70@tpech.gov.tw, cychang@mail.tku.edu.tw Abstract —This study proposes HCC-SPML, a predictive framework designed to estimate both survival status and postoperative survival duration in patients with hepatocellular carcinoma (HCC) using clinical and pathological data. The preprocessing pipeline consists of three stages: (1) sample selection—patients who underwent definitive liver resection and whose cause of death was recorded as liver cancer or are still alive, (2) dual-label construction—binary labels for classification and survival months for regression, and (3) normalization and removal of samples with missing values. Three machine learning models, including Support Vector Machine (SVM), Random Forest, and Linear Regression, were trained and compared. Input features include eight clinical variables: age, gender, AFP level, tumor size and number, histological grade, vascular invasion, and liver fibrosis score. Experimental results demonstrate that the proposed method effectively predicts postoperative prognosis and offers valuable insights for personalized clinical decision-making in HCC management. Index Terms —Support Vector Machine (SVM), Linear Regression, Random Forest, Machine learning, Liver cancer prediction. I. INTRODUCTION According to the GLOBOCAN 2020 global cancer statistics, hepatocellular carcinoma (HCC) ranks as the sixth most diagnosed malignancy and the fourth leading cause of cancer- related mortality worldwide, with an estimated 830,000 deaths annually [1]. This alarming figure reflects the aggressive nature of HCC and the challenges in its early detection and effective treatment. In Taiwan, the incidence and mortality rates of liver cancer have remained persistently high over the past decades, with HCC consistently ranking among the top ten causes of cancer-related deaths, thereby exerting a considerable burden on both the public health system and clinical care resources [2]. This underscores the urgent need for more effective tools to stratify patients, guide treatment, and improve outcomes. The development of HCC is strongly associated with underlying chronic liver diseases, especially liver cirrhosis, which itself is primarily driven by chronic hepatitis B virus (HBV) and hepatitis C virus (HCV) infections. Other well- documented etiological factors include excessive alcohol consumption, exposure to dietary aflatoxins, nonalcoholic fatty liver disease (NAFLD), and genetic conditions such as hereditary hemochromatosis [3–5]. These factors often interact in complex and individualized ways, influencing tumor progression, treatment response, and ultimately, survival outcomes. Although significant progress has been made in surgical resection techniques, anesthesia, and perioperative care, the long-term prognosis of HCC patients following hepatic resection remains highly heterogeneous. Postoperative survival is influenced not only by tumor-related characteristics (e.g., size, vascular invasion) but also by liver functional reserve, comorbidity burden, and host immune status [6]. Given this clinical complexity, accurately predicting patient outcomes remains a challenge, and more sophisticated approaches are required beyond traditional rule-based staging. Currently, conventional prognostic systems, such as the Tumor-Node-Metastasis (TNM) staging and the Barcelona Clinic Liver Cancer (BCLC) classification, are widely adopted in clinical guidelines and daily practice [7,8]. These systems provide a standardized framework for treatment decisions, but they are largely constructed from a limited set of linear, static variables and often fail to account for the nonlinear and high- dimensional nature of real-world patient data. Consequently, patients with similar TNM or BCLC stages may exhibit widely divergent outcomes, highlighting the limitations of one-size- fits-all staging approaches. With the advent of electronic health records (EHRs) and the increasing availability of multi-modal clinical datasets, the field of machine learning (ML) has shown substantial promise in augmenting clinical decision support for cancer prognosis. ML algorithms, including both classical approaches and more recent developments in deep learning (DL), offer powerful capabilities to automatically learn latent patterns, capture feature interactions, and adaptively model complex survival dynamics without requiring manually predefined assumptions [9–11]. In oncology, ML-based models have already demonstrated impressive performance in predicting cancer recurrence, response to therapy, and patient stratification, especially in domains like breast, lung, and prostate cancer. However, despite the rapid advancement in computational tools, the application of ML to postoperative survival prediction in HCC remains relatively underexplored. Existing studies are often limited by small sample sizes, lack of external validation, or a narrow focus on specific variables. Furthermore, few studies have systematically compared ML models against conventional staging systems or evaluated their performance across different time thresholds of survival. There is also a lack of consensus on which ML techniques are most effective in this 5 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 2, Dec. 2024 domain, and which feature combinations yield the most clinically actionable results. This research therefore seeks to address these gaps by applying and evaluating multiple machine learning classifiers—including Logistic Regression, Random Forest, and Support Vector Machine—on a curated clinical dataset of postoperative HCC patients. The objective is to assess their discriminative power, robustness, and potential clinical utility in predicting patient survival across defined time horizons. By leveraging both confusion matrix analysis and ROC curve- based evaluation, this study aims to provide insights into the feasibility and advantages of ML-based survival modeling in HCC, and to contribute empirical evidence for its integration into future clinical workflows. II. THE PROPOSED MULTI-TASK PREDICTION FRAMEWORK A. Data Preparation and Label Construction Let the original clinical dataset be denoted as: D = 𝑥𝑥 𝑖𝑖 , 𝑦𝑦 𝑖𝑖 𝑖𝑖=1 ௡ where 𝑥𝑥 𝑖𝑖 ∈ 𝑅𝑅 ௗ represents the clinical feature vector of the 𝑖𝑖 -th patient, and 𝑦𝑦 𝑖𝑖 denotes the associated survival label. To ensure relevant samples, the dataset is filtered as: ܦ ᇱ = {( 𝑥𝑥 𝑖𝑖 , 𝑦𝑦 𝑖𝑖 ) ݎ݁݃ݎݑ ܵ פ ܦ ∈ 𝑦𝑦𝐶𝐶݁݀݋ 𝑖𝑖 > 20 ר ( ܽ ݁ܦ ݁ݏ ݑܽ𝐶𝐶 ݄ݐ 𝑖𝑖 = 𝐶𝐶 22.0 ש 0000) } Two tasks are defined for prediction: Task 1 : 𝑦𝑦 𝑖𝑖 ( 1 ) = 1 if alive, 0 if deceased Task 2 : If 𝑦𝑦 𝑖𝑖 ( 1 ) = 0, then 𝑦𝑦 𝑖𝑖 ( 2 ) = 1 if survival months ≥24 , otherwise 0 B. Feature Engineering and Preprocessing The dataset contains variables such as age, gender, AFP level, tumor size and count, histological grade, vascular invasion, and liver fibrosis. Preprocessing includes: (1) Missing Value Removal ܦ ෩ = {( 𝑥𝑥 𝑖𝑖 , 𝑦𝑦 𝑖𝑖 ) ܦ ∈ ᇱ פ 𝑥𝑥 𝑖𝑖 ݄ ܽ ݏ 𝑚𝑚݋ 𝑚𝑚𝑖𝑖ݏݏ𝑖𝑖𝑚𝑚݃ ݁ ݏ݁𝑖𝑖ݎݐ 𝑚𝑚 } (2) Normalization of Continuous Features 𝑥𝑥 పఫ ෞ = 𝑥𝑥 𝑖𝑖𝑖𝑖 − min ൫𝑥𝑥 𝑖𝑖 ൯ max ൫𝑥𝑥 𝑖𝑖 ൯ − min ൫𝑥𝑥 𝑖𝑖 ൯ C. Pre-Clustering for Patient Stratification To reflect latent population heterogeneity, K-means clustering is applied on 𝑋𝑋 ∈ 𝑅𝑅 𝑀𝑀 × 𝑁𝑁 : 𝑚𝑚𝑖𝑖𝑚𝑚 𝐶𝐶 � � || 𝑥𝑥 𝑖𝑖 − 𝜐𝜐 𝑘𝑘 || 2 𝑥𝑥 𝑖𝑖 ∈𝐶𝐶 𝑘𝑘 𝐾𝐾 𝑘𝑘=1 where ߥ 𝑘𝑘 is the centroid of cluster ܿ 𝑘𝑘 D. Model Architecture and Learning Objective Let classifiers f ( 1 ) and f ( 2 ) be learned for the two tasks. Models used: (1) Logistic Regression (2) Random Forest (3) SVM Each model is trained with 10-fold cross-validation in Orange. Performance is evaluated by Accuracy, AUC, and Confusion Matrix. Each task uses binary cross-entropy loss: Task 1: ࣦ ( 1 ) = − 1 𝑚𝑚 � ቂ𝑦𝑦 𝑖𝑖 ( 1 ) log 𝑦𝑦 ప ( 1 ) ෢ + ൫ 1 − 𝑦𝑦 𝑖𝑖 ( 1 ) ൯ log ቀ 1 − 𝑦𝑦 ప ( 1 ) ෢ ቁቃ ௡ 𝑖𝑖=1 Task 2: ࣦ ( 2 ) = − 1 𝑚𝑚 ᇱ � ቂ𝑦𝑦 𝑖𝑖 ( 2 ) log 𝑦𝑦 ప ( 2 ) ෢ + ൫ 1 − 𝑦𝑦 𝑖𝑖 ( 2 ) ൯ log ቀ 1 − 𝑦𝑦 ప ( 2 ) ෢ ቁቃ ௡ ᇲ 𝑖𝑖=1 Total Multi-task Loss: ࣦ ௧௢௧௔௟ = ࣦ ( 1 ) + ࣦ ڄ ߣ ( 2 ) , ∈ ߣ [0,1] II. SIMULATION STUDY To evaluate model performance on postoperative survival classification (Task 1), we compared three classifiers— Logistic Regression, Random Forest, and SVM—using 10-fold cross-validation. The evaluation results are illustrated using confusion matrices in Fig. 1. As shown in Fig. 1(a), Random Forest achieved higher true positive rates compared to Logistic Regression [Fig. 1(b)] and SVM [Fig. 1(c)], while maintaining a lower false negative rate. This indicates that Random Forest can better capture the survival characteristics of HCC patients. Figure 1(a): Confusion matrix of Random Forest Tsung - Jung Lin is with the Department of Computer Science and Information Engineering, Tamkang University, New Taipei, Taiwan. He is also with Department of Gastroenterology of Taipei City Hospital in Ren-Ai branch, Taipei, Taiwan. Chih-Yung Chang is with the Department of Computer Science and Information Engineering, Tamkang University, New Taipei 25137, Taiwan. email: dab70@tpech.gov.tw , cychang@mail.tku.edu.tw 6 INTERNATIONAL JOURNAL OF DESIGN, ANALYSIS AND TOOLS FOR INTERGRATED CIRCUITS AND SYSTEMS, VOL. 13, NO. 2, Dec. 2024 Figure 1(b): Confusion matrix of Logistic Regression Figure 1(c): Confusion matrix of SVM To evaluate the performance of different classification models in the postoperative survival prediction task (Task 1), we compared three widely used machine learning classifiers: Logistic Regression, Random Forest, and Support Vector Machine (SVM). The models were trained and tested using a 10-fold cross-validation strategy to ensure robustness and generalizability of the results. The classification performance of each model is illustrated through the confusion matrices presented in Fig. 1. As shown in Fig. 1(a), the Random Forest classifier demonstrates a well-balanced prediction capability, achieving 87 true positives and 20 true negatives out of a total of 166 samples. This indicates that Random Forest effectively identifies patients who survive more than two years (Class 1), while maintaining a relatively low number of false negatives (20 cases) and false positives (39 cases). Compared to other models, Random Forest provides a stronger ability to distinguish between survival outcomes, especially for longer- term survival cases. In contrast, the confusion matrix of Logistic Regression [Fig. 1(b)] shows a slightly lower performance. Although it correctly classifi