Overview
This document provides comprehensive API documentation for the BioMedGPS Explainer toolkit, covering all major classes, methods, and their usage.
Core Classes
DrugDiseaseCore
The main class responsible for drug discovery analysis using knowledge graph embeddings.
Constructor
DrugDiseaseCore()
DrugDiseaseCore Methods
run_full_pipeline()
run_full_pipeline(
disease_id: str,
entity_file: Optional[str] = None,
knowledge_graph: Optional[str] = None,
entity_embeddings: Optional[str] = None,
relation_embeddings: Optional[str] = None,
output_dir: Optional[str] = None,
model: str = 'TransE_l2',
top_n_diseases: int = 100,
gamma: float = 12.0,
threshold: float = 0.5,
relation_type: str = 'GNBR::T::Compound:Disease',
top_n_drugs: int = 1000
) -> None
Description: One-click complete analysis pipeline that generates annotated_drugs.xlsx with all analysis results.
Parameters:
disease_id
: Disease identifier (e.g., "MONDO:0004979")entity_file
: Path to entity annotations file (optional, automatically downloaded from wandb if not specified)knowledge_graph
: Path to knowledge graph file (optional, automatically downloaded from wandb if not specified)entity_embeddings
: Path to entity embeddings file (optional, automatically downloaded from wandb if not specified)relation_embeddings
: Path to relation embeddings file (optional, automatically downloaded from wandb if not specified)output_dir
: Output directory for resultsmodel
: KGE model type (default: 'TransE_l2')top_n_diseases
: Number of similar diseases to consider (default: 100)gamma
: Margin parameter for KGE training (default: 12.0)threshold
: Drug filtering threshold (default: 0.5)relation_type
: Relation type for drug-disease associations (default: 'GNBR::T::Compound:Disease')top_n_drugs
: Number of drugs to analyze (default: 1000)
Returns:
None
Example:
core = DrugDiseaseCore()
core.run_full_pipeline(
disease_id="MONDO:0004979",
output_dir="results/",
model='TransE_l2',
top_n_diseases=50,
gamma=12.0,
threshold=0.5,
top_n_drugs=100
)
predict_drugs()
predict_drugs(
disease_id: str,
entity_file: str,
knowledge_graph: str,
entity_embeddings: str,
relation_embeddings: str,
model: str,
top_n_diseases: int,
gamma: float,
threshold: float,
relation_type: str,
output_file: str
) -> None
Description: Generate potential drug list using KGE models.
Parameters:
disease_id
: Target disease identifierentity_file
: Path to entity annotations fileknowledge_graph
: Path to knowledge graph fileentity_embeddings
: Path to entity embeddings filerelation_embeddings
: Path to relation embeddings filemodel
: KGE model typetop_n_diseases
: Number of similar diseasesgamma
: Margin parameterthreshold
: Prediction thresholdrelation_type
: Relation type for drug-disease associationsoutput_file
: Output Excel file path
get_disease_name()
get_disease_name(disease_id: str, entity_file: str) -> str
Description: Get disease name from disease ID.
Parameters:
disease_id
: Disease identifierentity_file
: Path to entity annotations file
Returns:
Disease name as string
get_drug_names()
get_drug_names(drug_ids: List[str], entity_file: str) -> List[str]
Description: Get drug names from drug IDs.
Parameters:
drug_ids
: List of drug identifiersentity_file
: Path to entity annotations file
Returns:
List of drug names
DrugFilter
Class for filtering drug candidates based on various criteria.
Constructor
DrugFilter()
filter_drugs()
filter_drugs(
input_file: str,
expression: str,
output_file: str,
sheet_names: Tuple[str, str] = ("annotated_drugs", "filtered_drugs")
) -> None
Description: Filter drugs based on logical expressions.
Parameters:
input_file
: Input Excel file pathexpression
: Filter expression (e.g., "score > 0.6 and existing == False")output_file
: Output Excel file pathsheet_names
: Tuple of (input_sheet, output_sheet) names
Example:
filter = DrugFilter()
filter.filter_drugs(
input_file="results/annotated_drugs.xlsx",
expression="score > 0.7 and num_of_shared_genes_in_path >= 1",
output_file="results/filtered_drugs.xlsx"
)
Supported Filter Expressions:
- Numerical comparisons:
>
,<
,>=
,<=
,==
,!=
- Logical operators:
and
,or
,not
- Boolean fields:
existing
,is_key_gene
- String matching and pattern matching
Visualizer
Class for generating comprehensive visualizations and reports.
Constructor
Visualizer(disease_id: str, disease_name: str, embed_images: bool = True)
Parameters:
disease_id
: Disease identifierdisease_name
: Disease nameembed_images
: Whether to embed images in HTML report (default: True)
create_visualization()
create_visualization(
data_file: str,
viz_type: str,
output_file: str,
sheet_names: Tuple[str, str] = ("annotated_drugs", "filtered_drugs")
) -> str
Description: Generate specific visualization chart.
Parameters:
data_file
: Input data file pathviz_type
: Visualization type (see supported types below)output_file
: Output file pathsheet_names
: Tuple of sheet names for input data
Supported Visualization Types:
score_distribution
- Score distribution histogramscore_boxplot
- Score distribution by existing statusdisease_similarity
- Disease similarity heatmapnetwork_centrality
- Network centrality analysisshared_genes_pathways
- Gene and pathway overlapdrug_similarity_network
- Drug similarity networkshared_gene_count
- Shared gene count distributionscore_vs_degree
- Score vs network degree
generate_report()
generate_report(
data_file: str,
output_file: str,
title: str = "Drug Discovery Analysis Report"
) -> str
Description: Generate comprehensive HTML report with all visualizations.
Parameters:
data_file
: Input data file pathoutput_file
: Output HTML file pathtitle
: Report title
Returns:
Path to generated HTML report
Command Line Interface
Main Commands
Run Analysis
biomedgps-explainer run [OPTIONS]
Options:
--disease-id
: Disease ID (required)--output-dir
: Output directory (required)--model-run-id
: Model run ID (default: 6vlvgvfq)--top-n-diseases
: Number of similar diseases (default: 100)--threshold
: Drug filtering threshold (default: 0.5)--relation-type
: Relation type (default: GNBR::T::Compound:Disease)--top-n-drugs
: Number of drugs to analyze (default: 1000)
Filter Drugs
biomedgps-explainer filter [OPTIONS]
Options:
--input-file
: Input Excel file (required)--expression
: Filter expression (required)--output-file
: Output Excel file (required)
Generate Visualizations
biomedgps-explainer visualize [OPTIONS]
Options:
--input-file
: Input Excel file (required)--output-dir
: Output directory (required)--viz-type
: Visualization type (default: all)--disease-id
: Disease ID (required)--disease-name
: Disease name (required)
Run Complete Pipeline
biomedgps-explainer pipeline [OPTIONS]
Description: Execute complete workflow (run → filter → visualize) in a single command.
Options:
--disease-id
: Disease ID (required)--model-run-id
: Model run ID (default: 6vlvgvfq)--filter-expression
: Filter expression (optional)--output-dir
: Output directory (default: results)--top-n-diseases
: Number of similar diseases (default: 100)--threshold
: Drug filtering threshold (default: 0.5)--relation-type
: Relation type (default: GNBR::T::Compound:Disease)--top-n-drugs
: Number of drugs to interpret (default: 100)
Data Structures
Input Data Format
Entity Annotations (annotated_entities.tsv)
id label name
MONDO:0004979 Disease asthma
CHEBI:12345 Compound aspirin
HGNC:1234 Gene TNF
Knowledge Graph (knowledge_graph.tsv)
source_id source_type source_name target_id target_type target_name relation_type
CHEBI:12345 Compound aspirin MONDO:0004979 Disease asthma GNBR::T::Compound:Disease
HGNC:1234 Gene TNF MONDO:0004979 Disease asthma GNBR::T::Gene:Disease
Entity Embeddings (entity_embeddings.tsv)
entity_id entity_type embedding
MONDO:0004979 Disease 0.1|0.2|0.3|0.4|...
CHEBI:12345 Compound 0.5|0.6|0.7|0.8|...
Output Data Format
Annotated Drugs (annotated_drugs.xlsx)
Excel file with multiple sheets containing:
- annotated_drugs: Main results with all annotations
- predicted_drugs: Initial drug predictions
- shared_genes_pathways: Gene and pathway overlap analysis
- shared_diseases: Disease similarity analysis
- network_annotations: Network centrality features
Error Handling
Common Exceptions
FileNotFoundError
Cause: Required data files not found
Solution: Verify file paths and run data validation
ValueError
Cause: Invalid parameters or data format
Solution: Check parameter values and data format
MemoryError
Cause: Insufficient memory for large datasets
Solution: Reduce dataset size or increase system memory
Error Handling Best Practices
- Always validate data before running analysis
- Use try-catch blocks for file operations
- Check system resources before large computations
- Implement proper logging for debugging