Search for bisulfite data
This post was originally published on Medium. You can also read it there and leave comments.
The main methylation data types are bisulfite and array. I’ll focus on searching for bisulfite datasets.
Query NCBI SRA advanced search
With the SRA Advanced Search Builder:
- For field strategy, click Show index list, choose “bisulfite seq”
- click search, I filter for species and DNA on the next page

Filter runs
I filtered for source: DNA and Organism: Homo Sapiens

You can also search by modifying Search details. Mine right now:
"bisulfite seq"[Strategy] AND "Homo sapiens"[orgn] AND "biomol dna"[Properties]
You get the same results with this URL:
https://www.ncbi.nlm.nih.gov/sra?term=%22bisulfite%20seq%22%5BStrategy%5D%20AND%20%22Homo%20sapiens%22%5Borgn%5D%20AND%20%22biomol%20dna%22%5BProperties%5D&cmd=DetailsSearch
I got 54368 results as of 2025.07.23.
Send to SRA Run Selector
Click Send to -> Choose Destination: Run Selector
This brings you to:

Download Metadata
Click Download Metadata. You get a csv.

This contains metadata like Age for all runs. Not every dataset publishes age though.
Process metadata
My goal in this step was to find datasets with Age metadata from certain tissues. Unfortunately the metadata is not unified, it takes some work. I processed in R.
library(tidyverse)
df_read <- read_csv("data/sra_human_bisulfite.csv")
df <-
df_read %>%
filter(!is.na(AGE)) %>%
group_by(BioProject) %>%
summarise(
n=n(),
unique_tissues = list(unique(tissue)),
unique_celltype= list(unique(cell_type)),
unique_tissue_cell_type = list(unique(`tissue/cell_type`)),
unique_tissue_type = list(unique(tissue_type)),
unique_sample_type= list(unique(sample_type)),
unique_isolate= list(unique(isolate)),
unique_age = list(unique(AGE)),
unique_source_name = list(unique(source_name))
) %>%
arrange(desc(n))
It aggregates them by dataset (BioProject) and sort them by sample number. I then investigated further the most interesting ones.
