Publications

For recent publications, refer to my Google Scholar profile.
† denotes equal contribution, * denotes corresponding author.

Published

Graph frequency-domain factor modeling
Kyusoon Kim, Hee-Seok Oh
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

We propose a novel factor model in the graph frequency domain for multivariate data residing on the vertices of a graph, referred to as a multivariate graph signal. By utilizing graph filters, our model extends the frequency-domain approach of the dynamic factor model from time series to graphs, enabling a graph-aware and multiscale interpretation of factors across graph frequencies. This latent modeling approach reduces the dimensionality of graph signals, thereby improving the understanding of their structure. It also allows the use of the extracted factors for subsequent analyses, such as clustering. We describe the estimation of factors and their loadings and investigate the consistency of the factor estimator. In addition, we propose two consistent estimators for determining the number of factors. The finite sample performance of the proposed method is demonstrated through simulation studies across various graph structures. We also compare it with classical factor analysis and examine how the choice of graph structure affects the results. The findings show that our model achieves lower reconstruction errors and successfully incorporates the graph structure. Furthermore, we illustrate the effectiveness of the proposed method by applying it to G20 economic data, water quality data from the Geum River, and passenger data from the Seoul Metropolitan subway.

Cross-spectral analysis of bivariate graph signals
Kyusoon Kim, Hee-Seok Oh
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

With the advancements in technology and monitoring tools, we often encounter multivariate graph signals, which can be seen as the realizations of multivariate graph processes, and revealing the relationship between their constituent quantities is one of the important problems. To address this issue, we propose a cross-spectral analysis tool for bivariate graph signals. The main goal of this study is to extend the scope of spectral analysis of graph signals to bivariate graph signals. In this study, we define joint weak stationarity graph processes and introduce graph cross-spectral density and coherence for bivariate graph processes. We propose several estimators for the cross-spectral density and investigate the theoretical properties of the proposed estimators. Furthermore, we demonstrate the effectiveness of the proposed estimators through numerical experiments, including simulation studies and a real data application. Finally, as an interesting extension, we discuss robust spectral analysis of graph signals in the presence of outliers.

Quantile-based fitting for graph signals
Kyusoon Kim, Hee-Seok Oh
Statistics and Computing, 2025

The development of monitoring tools has led to an emerging demand for analyzing data residing on graphs, referred to as graph signals. In this study, we propose a quantile-based fitting method for graph signals, which can be applicable to graph signals with a wide range of distributions. Unlike traditional data fitting methods, such as smoothing splines or quantile smoothing splines in Euclidean space, the proposed method is designed for the graph domain, considering the inherent structure of graphs. In contrast to prevalent graph signal fitting methods that rely on optimization problems with L2-norm fidelity, the proposed method provides robust fits for graph signals in the presence of outliers. More importantly, it identifies various distributional structures of graph signals beyond the mean feature. We further investigate the theoretical properties of the proposed solution, including its existence and uniqueness. Through a comprehensive simulation study and real data analysis, we demonstrate the promising performance of the proposed method.

Semiparametric approaches for the inference of univariate and multivariate extremes
Seungwoo Kang†, Kyusoon Kim†, Youngwook Kwon†, Seeun Park†, Seoncheol Park†, Ha-Young Shin†, Joonpyo Kim, Hee-Seok Oh
Extremes, 2025

In this paper, we present several semiparametric approaches for the inference of univariate and multivariate extremes to resolve the tasks from the EVA (2023) Conference Data Challenge. We implement generalized additive models to capture the flexible relationship for point and interval estimations of the conditional quantiles. We also adopt Lp-quantile to estimate the marginal quantiles of extreme levels. To predict probabilities of multivariate extreme events, we implement conditional methods by Heffernan and Tawn (2004) and Keef et al. (2013). We further validate predicted models evaluating their performance scores constructed based on the notion of equally extreme level of quantiles and cross-validation to select the best estimates to achieve high accuracy. When estimating the excess probability of 50-dimensional data, we cluster variables with high correlation after simple data exploration and combine the results obtained from each cluster. Finally, we also provide post-mortem analysis based on the ground truth.}

Prediction of wafer performance: Use of functional outlier detection and regression
Kyusoon Kim†, Seunghee Oh†, Kiwook Bae, Hee-Seok Oh
IEEE Access, 2025

Optical emission spectroscopy (OES) data is essential for virtual metrology, enabling accurate predictions of wafer performance in plasma etching processes. This approach not only reduces the need for physical measurements of product quality, leading to significant resource savings, but also supports improved decision-making, particularly in process control and quality assurance. To exploit the consecutive nature of OES data, we propose a prediction method based on a functional approach using multivariate functional partial least squares regression, coupled with dimension reduction and a novel outlier detection technique via functional independent component analysis. The proposed approach improves prediction performance by capturing the continuous nature of OES data and effectively extracting the components that describe the data structure. Numerical experiments, including simulation studies and real-world applications of OES data, demonstrate the effectiveness of the proposed method through superior prediction performance, as evidenced by low RMSE and MAE values, particularly in the presence of outliers.

Network time series forecasting using spectral graph wavelet transform
Kyusoon Kim, Hee-Seok Oh
International Journal of Forecasting, 2024

We propose a novel method for forecasting network time series that occur in graphs or networks. Our approach is based on a spectral graph wavelet transform (SGWT) that provides the localized behavior of graph signals around each node. The proposed method improves forecasting performance over other existing methods. In particular, the advantages of the proposed method stand out when signals observed on a graph are inhomogeneous or non-stationary. We demonstrate the strength of the proposed approach through real-world data analysis. This analysis uses two network time series datasets: the daily number of people getting on and off the Seoul Metropolitan Subway, and daily Covid-19 confirmed cases reported in South Korea. We further conduct a simulation study to evaluate the effectiveness of the proposed method.

Principal component analysis for river network data: Use of spatiotemporal correlation and heterogeneous covariance structure
Kyusoon Kim, Hee-Seok Oh, Minsu Park
Environmetrics, 2023

Spatiotemporal measurements observed through river networks have two distinct characteristics: a spatiotemporal correlation under the flow-connected structure and the existence of heterogeneous covariances, which require a careful approach to implement principal component analysis (PCA). This paper focuses on developing a PCA method to reflect the unique characteristics of river networks. We propose a novel method combining flow-directed PCA and geographically weighted PCA for the domain of river networks. The strengths of our approach are that it can (i) reduce dimensionality for streamflow data while effectively removing correlation among them and (ii) identify the group structure of data. It is possible to find essential patterns and sources of variation that may not be disclosed due to the attributes of flow-connected networks. We apply the proposed method to the daily monitoring records of total organic carbon in the Geum River catchment area in South Korea. The results show that the proposed method successfully adjusts for the topological structure of the network and temporal correlation among observations while considering the spatial heterogeneity, enabling a more concrete understanding of monitoring networks.

Under Review

Extremal PCA-based synthetic event generation for the inference of extreme precipitation
Kyusoon Kim†, Seeun Park†, Ha-Young Shin†, Jisoo Choi, Jihoon Ha, Joonpyo Kim, Youngwook Kwon, Seoncheol Park, Seungwoo Kang
Under review, 2026+
Graph canonical coherence analysis
Kyusoon Kim, Hee-Seok Oh
Under review, 2026+
Statistical graph empirical mode decomposition by graph denoising and boundary treatment
Hyeonglae Cho†, Kyusoon Kim†, Hee-Seok Oh
Under review, 2026+
Graph frequency-domain regression
Kyusoon Kim*
Under review, 2026+
Principal component analysis in the graph frequency domain
Kyusoon Kim, Hee-Seok Oh
Under review, 2026+
Absolute average and median treatment effects as causal estimands on metric spaces
Ha-Young Shin, Kyusoon Kim*, Kwonsang Lee, Hee-Seok Oh
Under review, 2026+