KORE-Map 1.0: Korean medicine Omics Resource Extension Map on transcriptome data of tonifying herbal medicine
Preparation of herbs
Dried medicinal plants, conforming to the Korean Pharmacopoeia standards, were provided by Kwangmyung-dang Medicinal Herbs Co., located in Ulsan, Republic of Korea. These samples underwent an organoleptic examination by Dr. Choi Goya, a herbal medicine organoleptic examination expert appointed by the Korea Food and Drug Administration. The identification to species level was accomplished through DNA barcode region sequencing. Voucher specimens have been stored at the Korean Herbarium of Standard Herbal Resources, within the Herbal Medicine Resources Research Center, at the Korea Institute of Oriental Medicine in Naju, Republic of Korea (Table 1). All herbs and extracts, which were sourced from the Oriental Medicine Resources Research Center (KIOM), are available online at https://oasis.kiom.re.kr/herblib.
Preparation method of hot water and 70% ethanol extracts of herbs and THMs
Hot water and 70% ethanol extracts of each plant were prepared and supplied by KOC Biotech Co., located in Daejeon, Republic of Korea. Initially, dried plants (1,000 g) were pulverized and extracted in 10 L of hot distilled water for 3 h using a reflux extraction system (MS-DM609; MTOPS, Seoul, Republic of Korea), or in 10 L of 70% ethanol for 1 h using an ultrasonication system (VCP-20, Lab companion, Dajeon, Republic of Korea) twice. The resulting extract solutions were filtered through a 5 µm cartridge filter, concentrated using a rotary evaporator (Ev-1020, SciLab, Seoul, Republic of Korea), and finally lyophilized in a freeze dryer (LP-20, Ilshin-Bio-Base, Dongducheon, Republic of Korea) to produce the final extracts. These extracts were then finely homogenized and packaged in glass bottles with desiccant silica gel. THMs were prepared by blending and homogenizing these extracts in accordance with the composition ratios and extract yields of the individual medicinal herbs, according to the Korean Pharmacopoeia (Table 2). For in vitro applications, extracts (100 mg) were vigorously vortexed for 30 min in 10 mL of phosphate-buffered saline (PBS; Thermo Fisher Scientific, Rockford, IL, USA) containing 2% DMSO. This mixture was then sterilized by filtration through a 0.22 µm membrane to obtain a stock solution (10 mg/mL), which was divided into small aliquots and stored at −80 °C until their use.
Cell culture
All cell lines were purchased from the American Type Culture Collection (Manassas, VA, USA) and were cultured in a basal medium enriched with 10% heat-inactivated fetal bovine serum, 100 IU/mL penicillin, and 100 µg/mL streptomycin, all within a humidified incubator (Table 3). Cell confluence levels between 80–90% prompted the replacement of the growth medium every 3–4 days to maintain optimal growth conditions. To ensure the absence of mycoplasma contamination, the MycoAlert PLUS mycoplasma detection kit (Lonza, Rockland, ME, USA) was employed for regular testing.
Drug treatment and total RNA preparation for RNA sequencing (RNA-seq) analysis
To determine the appropriate treatment drug concentrations, we performed cell cytotoxicity tests to investigate drug doses that maintained 80% cell viability (IC20s), which were then adopted as the maximal doses for RNA-seq data collection. For drugs whose IC20s could not be determined, the highest treatment concentrations were capped at 500 µg/mL for extracts, considering both their solubility and relevance for clinical application. To confirm the influence of concentration, cells were treated with three different concentrations using 1/5 serial dilutions, thereby exposing them to low, medium, and high doses. Positive control drugs such as wortmannin (Sigma, W1628), LY294002 (Sigma, L9908), and Thioridazine (Sigma, T9025) were incorporated into the assay for comparative analysis against the connectivity map (CMap) data. Cells treated with a 2% DMSO/PBS solution served as the vehicle control. One day before drug administration, cells were seeded into 6-well culture plates with 3 mL of growth medium. Following a 24 h treatment period, the cells were washed with ice-cold PBS, and total RNA was isolated using QIAzol RNA isolation reagents (Thermo Fisher Scientific) in accordance with the manufacturer’s instructions.
RNA-seq data generation and preprocessing
Total RNA (over 500 ng) from each sample was processed for the mRNA sequencing library using the MGIEasy RNA directional library prep kit (MGI Tech Co., Ltd., China), following the manufacturer’s instructions. The library concentration was quantified using the QuantiFluor® ssDNA System (Promega Corporation, WI, USA). The prepared DNA nanoball was sequenced on an MGISEQ system (MGI Tech Co., Ltd., China) employing 100 bp paired-end reads. The RNA-seq data quality was assessed using FastQC (v0.11.9). To remove common MGISEQ adapter sequences, TrimGalore (v0.6.6) was utilized. Trimmed reads were then mapped to the human reference genome assembly GRCh38 (hg38) using the STAR aligner (v2.7.3a) with default settings12. Gene transcript abundance, including expected read counts and transcripts per million, was quantified using RSEM (v1.3.3), with the gene annotation GRCh38.8413. The raw sequence data (FASTQ files) and the preprocessed expression values for each gene have been deposited in the GEO under accession numbers GSE244687, GSE244707, GSE244694, and GSE245912.
Differential gene expression analysis
Using the gene symbols of protein-coding genes, we utilized the collapseRows function from the WGCNA package (v.1.72-1)14, specifically designed to merge expression data for genes represented by multiple probes. This approach effectively reduces redundancy and potential noise, enhancing the clarity of subsequent analyses. Additionally, the filterByExpr function from the genefilter package (v.1.78.0)15 was utilized to exclude genes that failed to meet predetermined expression criteria across samples. This filtering ensured that only genes most likely to provide reliable and relevant signals were retained for analysis.
For evaluation of each set of treatment conditions—encompassing four cell lines, 14 herbs and THMs, two extraction methods, and three concentration levels— we conducted differential gene expression (DGE) analysis against the corresponding control samples. This analysis was performed using the Wald test statistic as implemented in the DESeq. 2 package (v.1.36.0)16. Differentially expressed genes (DEGs) were determined based on a fold-change threshold of 1.5 and an adjusted P-value of less than 0.05.
Clustering analysis
The fold-change values derived from the DGE analysis across all treatment conditions were clustered using the t-distributed stochastic neighbor embedding (t-SNE) algorithm. This machine learning technique, designed for dimensionality reduction, excels in visualizing high-dimensional datasets, making it a valuable tool for interpreting complex gene expression patterns. The analysis was conducted utilizing the Rtsne package (v.0.16), with the perplexity parameter set to 1017.
Comparisons with connectivity map transcriptome data
Connectivity Map data were obtained from the Clue.io platform(clue.io/data/CMap2020#LINCS2020). For our analysis, we selected level five gene expression signatures with high reproducibility, defined by moderated z-scores that met specific criteria (distil_cc_q75 > 0.5 and pct_self_rank_q25 > 0.05), to compare with our RNA-seq data. The R package CMapR (v1.8.0) was used to manipulate the level 5 GCTX file (level5_beta_trt_cp_ n720216 × 12328.gctx). Given the variance in gene expression profiling between our RNA-seq data and L1000 assays9 used in CMap, a direct comparison between gene expression values was difficult due to distinct distributions of expression values. To navigate this, we employed gene set enrichment analysis (GSEA) as an alternative method to explore the genome-wide perturbing effects of treatments such as wortmannin at the pathway level18. We utilized 2,229 gene sets from several databases—Hallmark, Biocarta, KEGG, REACTOME, PID, and Wikipathways—available through MSigDB ( The analysis involved performing GSEA on all genes, ranked according to their Wald test statistics or level5 z-scores. To obtain the MSigDB gene sets and conduct GSEA, we utilized the R package MSigDBR (v7.5.1) and FGSEA (v3.18). From the GSEA results, we defined pathway activity score (PAC) as ‘sign (enrichment score) × -log10(p-value)’ value to quantify the significance level. PAC vectors of equal lengths (n = 2,229) were generated for both our dataset and the CMap dataset. Subsequently, we determined the Pearson correlation coefficient to assess the relationship between the PACs from our samples and those from CMap (Fig. 1).
link