Datasets

Build, manage, and explore omics datasets with BioMiner Indexd

Featured Dataset

Explore our curated collection of omics datasets

Building Your Own Datasets

Learn how to create and import custom datasets into BioMiner Indexd

Required Files

To create a dataset, you need two essential files:

dataset.txt

Dataset description file containing metadata and configuration

Example:
key: my_dataset
name: My Research Dataset
description: Comprehensive analysis of...
citation: Author et al. Journal 2024
pmid: 12345678
groups: PUBLIC; RESEARCH;
tags: disease:Cancer; organ:Lung;
total: 100
is_filebased: false
version: v1.0.0
license: CC-BY-4.0
metadata_table.tsv

Tab-separated file with sample metadata and clinical information

Format:
#Patient ID    Age    Gender    Diagnosis
#Unique ID     Age    Sex       Disease
#STRING        NUMBER STRING    STRING
#1             1      1         1
PATIENT_ID     AGE    GENDER    DIAGNOSIS
SAMPLE001      45     Female    Cancer
SAMPLE002      52     Male      Control

Conversion Process

Use our conversion script to transform your files into BioMiner-compatible format:

1
Prepare Your Files

Ensure your dataset.txt and metadata_table.tsv files are properly formatted

2
Run Conversion Script
python examples/build_dataset.py convert /path/to/input /path/to/output --version v1.0.0
3
Generate Index
./biominer-indexd-cli index-datasets --datasets-dir datasets
4
Access Your Dataset

Your dataset is now available in the BioMiner interface and API

File Format Specifications

dataset.txt Requirements
  • key: Unique identifier (letters, numbers, underscores only)
  • name: Human-readable dataset name
  • description: Detailed dataset description
  • citation: Publication reference (optional)
  • pmid: PubMed ID (optional)
  • groups: Access groups (semicolon-separated)
  • tags: Classification tags (semicolon-separated)
  • total: Number of samples/records
  • is_filebased: true/false for file-based datasets
  • version: Version identifier
  • license: License information (optional)
metadata_table.tsv Requirements
  • 4 header rows starting with # (Name, Description, Type, Order)
  • Data types: STRING, NUMBER, BOOLEAN
  • Tab-separated values
  • UTF-8 encoding
  • Consistent column count across all rows
  • Sample ID column as first column (recommended)

Best Practices

Guidelines for creating high-quality datasets

Organize Your Data

Keep raw data separate from processed files. Use clear, consistent naming conventions for all files.

Validate Your Data

Ensure data quality by checking for missing values, format consistency, and logical relationships.

Document Everything

Provide comprehensive documentation including data collection methods, processing steps, and quality metrics.

Privacy & Ethics

Ensure compliance with data privacy regulations and obtain proper consent for data sharing.

Version Control

Use semantic versioning for dataset updates and maintain backward compatibility when possible.

Community Standards

Follow established community standards for data formats, metadata, and sharing practices.

Additional Resources

Tools and documentation to help you get started

Documentation

Comprehensive guides and tutorials for dataset creation and management.

Read Documentation

Conversion Tools

Download our conversion scripts and utilities for dataset preparation.

Get Tools

Support

Get help from our community and find answers to common questions.

Get Support

Community

Connect with other researchers and share best practices.

Join Community