METADATA

Project

field_name

description

example

notes

data_format

project_id

Short unique project identifier or acronym

ocular-microbiome

required

string

project_name

Full name of the research project

Ocular Microbiome Project

required

string

project_goal

Brief summary of project aim or objective

Investigate the microbial composition on the surface of the human eye.

required

string

project_type

Main methodological approach (e.g., metagenomic, genomic,RNA-Seq)

Metagenomic

required

string

project_design

Description of project design, experiment type or sampling scheme

Cross-sectional observational study of healthy and diseased eyes

recommended

string

bioproject_accession

NCBI BioProject accession number (if available)

PRJNA123456

recommended

string

contact_name

Project coordinator, PI, or submitter

Lisa A. Neuhold

required

string

contact_email

Email address of coordinator/submitter

lneuhold@nei.nih.gov

required

string

library_count

Number of libraries

10

recommended

numeric

Center

field_name

description

example

notes

data_format

center_id

Unique code or acronym for the center

JHU

required

string

center_name

Full name of the parent center, lab, or institution

Johns Hopkins University

required

string

contact_name

Name of submitting PI or contact person

Laura Ensign

required

string

contact_email

Primary contact’s email

lensign@jhmi.edu

required

string

Individual

field_name

description

example

notes

data_format

individual_id

Unique identifier for the individual/subject

OMI00001

required

string

individual_alias

Internal study label or code

52

recommended

string

age

Age at time of sampling (years)

35

recommended

numeric

birth_year

Year of birth (YYYY)

1990

recommended

numeric

consent_code

IRB or consent identifier

IRB2024-001

required

string

disease_status

Overall health status or diagnosis

healthy

recommended

string

ethnicity

0=Non Hispanic, 1=Hispanic

0

recommended

categorical

gender

Self-identified gender

Female

recommended

string

has_autoimmune_disease

Autoimmune disease presence

1

0=No, 1=Yes

binary

has_diabetes

Diabetes mellitus status

0

0=No, 1=Yes

binary

has_hypercholesterolemia

Hypercholesterolemia status

1

0=No, 1=Yes

binary

has_ocular_allergies

Ocular allergy presence

0

0=No, 1=Yes

binary

has_ocular_trauma_history

History of ocular trauma

0

0=No, 1=Yes

binary

has_pets

Household pet exposure

cats and dogs

recommended

string

has_refractive_error

Presence of refractive error

1

0=No, 1=Yes

binary

has_refractive_surgery_history

History of refractive surgery

0

0=No, 1=Yes

binary

has_rosacea

Rosacea diagnosis

1

0=No, 1=Yes

binary

has_systemic_allergies

Systemic allergy presence

0

0=No, 1=Yes

binary

longitudinal_sampling

Repeated sampling indicator (yes/no)

yes

recommended

string

race

0=White, 1=Black, 2= Asian 3=Other

0

recommended

categorical

sex

Biological sex at birth (M, F, Other, Unknown)

M

required

string

smoking_status

Smoking status category

current

0=never, 1=former, 2=current

categorical

uses_anti_depressants

Antidepressant medication use

0

0=No, 1=Yes

binary

uses_anti_epileptics

Anti-epileptic medication use

0

0=No, 1=Yes

binary

uses_anti_hypertensives

Antihypertensive medication use

1

0=No, 1=Yes

binary

uses_asa

Aspirin (ASA) use

0

0=No, 1=Yes

binary

uses_antibiotics

Antibiotic exposure (systemic or topical)

penicillin

drug name or binary depending on encoding

string/binary

uses_antihistamines

Antihistamine use

0

0=No, 1=Yes

binary

uses_artificial_tears

Artificial tear use

1

0=No, 1=Yes

binary

uses_contact_lenses

Contact lens use status

1

0=No, 1=Yes or lens type if available

string/binary

uses_eye_mask

Eye mask use

0

0=No, 1=Yes

binary

uses_fish_oil

Fish oil supplementation

0

0=No, 1=Yes

binary

uses_lid_wipes

Lid hygiene wipe use

1

0=No, 1=Yes

binary

uses_makeup_routinely

Routine makeup use

0

0=No, 1=Yes

binary

uses_multivitamins

Multivitamin use

1

0=No, 1=Yes

binary

uses_nsaids

NSAID use

0

0=No, 1=Yes

binary

uses_probiotics

Probiotic supplementation

1

0=No, 1=Yes

binary

uses_statins

Statin medication use

0

0=No, 1=Yes

binary

other_medication_or_condition

Other reported condition or medication

NA

optional

string

BioSample

field_name

description

example

notes

data_format

biosample_id

Unique identifier for the sample

OMS00001

required

string

biosample_alias

Alternate code or name used at collection site

S89

recommended*

string

project_id

Short unique project identifier or acronym

ocular-microbiome

required

string

individual_id

Unique subject/individual identifier (linked to subject table)

OMI000123

recommended*

string

center_id

Unique identifier for associated (collection or extraction) center

JHU

recommended

string

visit_number

Number/label of visit or collection event

1

recommended

numeric

collection_date

Date of sample collection (ISO 8601)

2024-05-07

required

date

biosample_type

Anatomical or environmental site type

Microbial Community Standard,Human Ocular Surface,OMR-110,Air Control

required*

string

biosample_site

Anatomical site of origin

Skin,Lid,Conj

required

string

biosample_material

Material type of sample

swab, tissue, tear

required

string

biosample_side

Laterality of sample

left right, NA

recommended*

string

collection_method

Method used for sample collection

swab, wash, filter paper

recommended

string

collection_time

AM/PM

5pm

string

collection_medium

Fluid in which the biosample is stored

ANE(Amies Transport Medium),BSS(Balanced Salt Solution)

recommended

string

collection_device_lot

Collection device batch

BG001

optional

string

collection_device_id

Device identifier or lot number for collection device

DEV001-2023

optional

string

swab_type

Brand/model/type of swab used for sample collection

FloQ,Isohelix,Puritan

optional

string

swab_kit_lot

Manufacturer and lot for swab/kit

Qiagen-12345

optional

string

swab_lot_number

Lot number of the swab used

A12345

optional

string

storage_duration

Time from collection to processing (ISO 8601)

PT2H30M

optional

string

storage_temperature

Sample storage temperature after collection (with units)

-80 °C

optional

string

preservation_method

Method or reagent used for preservation

RNAlater, ethanol

optional

string

stabilizing_fluid

Type of fluid used to stabilize the collected sample

OMR

optional*

string

stabilizing_fluid_lot

Lot number of stabilizing fluid used

BD801

optional*

string

microbial_standard

Indicates if a microbial community standard was used

MCS

optional*

string

replicate_type

Replicate type

biological

optional

string

batch_number

Identifier for batch

Batch5

optional*

string

plate_number

Identifier for plate

Plate2

optional*

string

plate_location

Identifier for plate location

B3

optional*

string

lysozyme_treatment

Indicates if lysozyme treatment was applied during processing

yes

optional*

bool

host_depletion_performed

Indicates host depletion step performed

yes, no

recommended

bool

host_depletion_method

Specific host depletion method applied

MolYsis Basic5, NEBNext

recommended*

string

host_depletion_kit

Kit(s) used for host depletion

NEB, Molzym, HostZero

optional*

string

extraction_date

Date of DNA/RNA extraction (ISO 8601)

2024-05-08

recommended

date

extraction_kit

Brand/model of DNA extraction kit

Qiagen DNeasy Blood_Tissue Kit,Zymo Micro Prep Kit,Qiagen PowerSoil Pro Kits,MasterPure Gram Positive DNA Purification Kit

recommended*

string

extraction_kit_lot

Lot number for DNA extraction kit

L20240155

optional

string

extraction_protocol_version

Version or code/identifier of extraction protocol

v1.2

optional

string

extraction_protocol_modifications

Details of any protocol modifications

Increased elution volume to 100 µl

optional

string

sample_volume

Total volume or mass collected (with units)

200 µl

recommended*

string

elution_volume

Final elution volume (with units)

50 µl

optional

string

nucleic_acid_concentration

DNA/RNA concentration (with units)

20 ng/µl

recommended*

string

dna_yield

Measured DNA quantity (with units)

100 ng

optional

string

dna_qc_metrics

DNA quality control metrics

260/280 ratio 1.85, Qubit 22 ng/µl

optional

string

biosample_accession

External accession IDs (BioSample SRA, ENA)

SAMN12345678

recommended

string

geo_loc_name

Geographic location where the sample was collected

USA: Maryland: Baltimore

Should include country, state/province, and city if known; standardized names recommended

string

Library

field_name

description

example

notes

data_format

library_id

Unique experiment identifier

OMX00001

required

string

library_title

Title or brief name of experiment

Ocular Microbiome RNA-Seq Batch 1

recommended

string

library_description

Short summary of experiment purpose/design

RNA-seq to profile eye surface samples

recommended

string

biosample_id

Related sample ID

OMS00001

required

string

library_name

Name/identifier of the prepared sequencing library

LIB001-A

required

string

library_strategy

Sequencing strategy (controlled vocabulary)

rna-seq

required

string

library_source

Source material for library

mRNA

recommended

string

library_selection

Method of nucleic acid selection/enrichment

polyA selection

recommended

string

library_layout

Read layout (single/paired-end)

paired-end

required

string

library_prep_kit

Kit used to prepare sequencing library

NEBNext Ultra II RNA

recommended

string

library_prep_kit_lot

Lot number of library prep kit

RN2023-02354

recommended

string

library_prep_protocol_version

Version or code for the library prep protocol

v2.0

recommended

string

library_prep_date

Date of library preparation (ISO 8601)

2024-05-09

recommended

date

index_barcodes

Index/barcodes used for multiplexing, comma-separated

CGATCAGG,ACTGTGTA

recommended

string

sequencing_platform

Sequencing platform manufacturer/model

illumina

required

string

sequencer_model

Sequencer instrument model

novaseq

required

string

sequencing_date

Date sequencing run performed (ISO 8601)

2024-05-12

required

date

sample_count

Number of samples in the library

10

recommended

numeric

design_description

Additional details or rationale for experiment design

Comparative RNA-Seq of left/right eye samples over 3 timepoints

recommended

string

Sample

field_name

description

example

notes

data_format

omc_id

Unique identifier for the sequencing run

OMR00001 (jhu00001)

required

string

sample_id

Unique center identifier for the sequencing run

Molzym_Ultra_Deep_Air_Ctr_3

required

string

sample_accession

Public database run accession (e.g. NCBI SRA) if deposited

SRR00001

required

string

library_id

Associated experiment identifier linking to the experiment table

OMX00001

required

string

sample_name

Short title or name for the run

Ocular_Run01_2024-05-12

recommended

string

sample_description

Summary or purpose of the sequencing run

Non-human read run for sample OMS00001

recommended

string

avg_spot_length

Average read length (specify units)

150 bp

recommended

numeric

adapter_sequences

Adapter sequences used

AGATCGGAAGAGC

recommended

string

original_file_id

Original raw read file name or ID

OMS00001_R1.fastq

recommended

string

total_reads

Total number of read pairs in the original file

45000000

required

numeric

total_bases

Total number of bases in the original file

4500000000

required

numeric

unmapped_reads

Total number of reads not aligned to the reference(s)

required

numeric

original_read_file_size

File size of original read file (e.g. MB/ GB)

10 GB

recommended

string

original_read_file_checksum

Checksum (e.g. MD5 or SHA256) of original read file

d41d8cd98f00b204e9800998ecf8427e

recommended

string

non_human_read_file_id

Processed non-human read file name or ID

OMS00001_R1_nonhuman.fastq

optional

string

non_human_reads

Number of non-human read pairs

4500000

optional

numeric

non_human_read_file_size

File size of non-human read file

1 GB

optional

string

non_human_read_file_checksum

Checksum of non-human read file

e339a515c5ed4f561237b1799335c30b

optional

string

analysis_date

Date the sample was analyzed (ISO 8601)

2024-05-12

required

date

reference_genome

Reference genome version(s) used for human read detection

hs38DH, chm13v2.0

optional

string

microbial_database

Microbial database(s) used for classification

krakendb-2020-08-16-all_plus_eupath

optional

string

human_read_detection_software

Software (name and version) used for human read detection

minimap2 2.30-r1287

optional

string

non_human_read_classification_software

Software (name and version) used for non-human read classification

KrakenUniq 1.0.2

optional

string

base_quality_metrics

Sequencing base quality metrics

90% Q30, Q20

optional

string

data_processing_pipeline

URL or name of processing pipeline (include version/date)

https://github.com/dpuiu/ocular-metagenome 20240608

optional

string

sequencing_control

Sequencing run control(s) used (e.g. spike-ins)

PhiX spike-in 1%

optional

string

release_status

Data release status

public

optional

string

release_date

Date of public or controlled release (ISO 8601)

2024-06-30

optional

date

sequencing_lane

Illumina sequencing lane

L001

optional

string

Analysis

field_name

description

example

notes

data_format

sample_id

Unique identifier for the sequencing run

OMR00001 (jhu00001)

string

taxonomy_name

The scientific classification name of an organism

Cutibacterium acnes

string

abbreviation

The shortened form or taxonomy_name

  1. acnes

string

taxonomy_id

A unique identifier that corresponds to a specific taxonomic classification within NCBI

1747

numeric

taxonomy_lvl

The rank or level of classification within the taxonomic hierarchy, such as kingdom, phylum, or species.

S (species)

string

assigned_read

The number of sequencing reads assigned unambiguously to this specific taxonomic group by the non_human_read_classification_software

10000

numeric

added_reads

The number of additional reads assigned to this specific taxonomic group

1000

numeric

microbial_reads

assigned_reads+added_reads assigned to this specific taxonomic group

11000

numeric

total_microbial_read

sum of total_microbial_read for all species in this sample

10000000

numeric

fraction_microbial_reads

(assigned_reads+added_reads)/total_microbial_reads

0.1

numeric

avg_fraction_microbial_reads

average fraction_microbial_reads assigned to this specific taxonomic group for multiple samples

0.15

numeric