An integrated knowledge database dedicated to ncRNAs, especially lncRNAs.
Overview


NONCODE (version 7.0) is a comprehensive knowledge database specifically dedicated to the collection of long non-coding RNAs . Compared with version 6.0, version 7.0 has added expression profile information of long non-coding RNAs in single cells. NONCODE v7.0 has collected raw sequencing data from hundreds of single-cell RNA sequencing datasets related to cancer, development, hematological diseases, and healthy human peripheral blood, obtained from the Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information (NCBI) using the 10x Genomics platform. The data overview is presented in Table 1 , encompassing disease categories, datasets, and sample sizes. In summary, the database has collected data from 229 studies and 2061 samples, including 557 cancer samples, 217 hematological disease samples, 459 normal human peripheral blood mononuclear cell (PBMC) samples, 250 development-related samples, 162 samples of immune cells in diseases, tissues and organs,and 416 other diseases samples. The single-cell RNA sequencing data from these samples have been incorporated into the database.

Table 1 Data overview of single-cell datasets

Data analysis workflow

Since our analysis focuses on long non-coding RNAs, we first downloaded the Gene Transfer Format(GTF) file of lncRNA from the NONCODE v6 database, and merged it with the GTF file of mRNA from the hg38 reference dataset to construct a GTF file containing both lncRNAs and mRNA. After obtaining the GTF file, we used the Cell Ranger v8.0.1 software, which is specifically designed for the 10x Genomics platform, to process the data and generate gene expression matrices (UMI counts). We then used the Seurat v5.1.0 package for cell quality control, normalization, principal component analysis (PCA) dimension reduction, and clustering based on personalized principal component numbers and clustering resolution using tSNE and UMAP.
Subsequently, we annotated the cells using ScType, and SingleR v2.4.1, as well as manual annotation based on dozens of marker genes. It is important to note that we defined cell types based on the expression of mRNAs and associated single-cell barcodes with lncRNAs to obtain the expression profiles . Finally, we performed differential gene expression analysis for each specific cell type using the "FindMarkers" function in Seurat. Figure 1 illustrates the detailed cell processing workflow.

Figure 1 Data analysis workflow

Web Interface

We have further expanded the web-based visualization capabilities for single-cell data, incorporating several key modules.

1. Browsing of Single-Cell Datasets

The website has added a dedicated browsing function for comprehensive access to all single-cell datasets:

Tool Access: Available via the "Single-cell" option in the navigation bar;

Operation Method: Click to access and browse the complete list of single-cell datasets;

Function Purpose: Allows users to systematically view and explore all available single-cell datasets.

2. Quick Retrieval of Single-Cell Datasets

To improve the efficiency of retrieving single-cell datasets, the current version of NONCODE has added a "Search Single-cell" function. The specific usage methods are as follows:

Tool Access: Integrated in the main search interface;

Operation Method: Enter an lncRNA or GSE ID as the query term;

Function Purpose: Returns relevant single-cell datasets (including lncRNA expression profiles for lncRNA queries) or corresponding dataset details (for GSE ID queries).

3. Cross-Dataset Comparison Tool for Single-Cell Data

NONCODE provides a user-friendly cross-dataset comparison tool, supporting multi-dimensional sample analysis:

Tool Access: Located in the "Single-cell" module under the "Tools" section;

Operation Method: Select multiple target datasets through filter criteria such as "Category", "Tissue", and "Disease";

Function Purpose: Enables comparative analysis of cell composition and gene expression across different samples.

4. Download of Single-Cell Datasets

A dedicated download function has been added to facilitate data acquisition:

Tool Access: Located on the "Download" page;

Operation Method: Select the single-cell datasets of interest from the available options;

Function Purpose: Enables users to download selected datasets for further analysis and research.