File Formats

Below are the FileFormat values currently supported by SMaHT Portal.

For AlignedReads

Name ID Description
bam d13d06cf-218e-4f61-aaf0-91f226248b3c Binary version of a SAM file. Format used to represent aligned sequences.
File Extension: .bam
Valid Types: AlignedReads, OutputFile, ReferenceFile, UnalignedReads
cram d363c5f9-7159-45b1-b516-e5ec433f9b86 Compressed version of a BAM file. Format used to represent aligned sequences.
File Extension: .cram
Valid Types: AlignedReads, OutputFile, ReferenceFile

For UnalignedReads

Name ID Description
bam d13d06cf-218e-4f61-aaf0-91f226248b3c Binary version of a SAM file. Format used to represent aligned sequences.
File Extension: .bam
Valid Types: AlignedReads, OutputFile, ReferenceFile, UnalignedReads
fastq_gz c13d06cf-218e-4f61-aaf0-91f226248b2c Format used to represent short read sequence data, compressed.
File Extension: .fastq.gz
Valid Types: OutputFile, ReferenceFile, UnalignedReads

For VariantCalls

Name ID Description
vcf fcc2647d-301b-4888-8d9d-83ea4270309c Format used to represent genomics variants.
File Extension: .vcf
Valid Types: OutputFile, ReferenceFile, VariantCalls
vcf_gz 1b8f525f-aecb-4211-9ae5-a2c998b05599 Compressed version of a VCF file. Format used to represent genomics variants.
File Extension: .vcf.gz
Valid Types: OutputFile, ReferenceFile, VariantCalls

All File Formats

Name ID Description
Rdata ce424ef5-86c8-4522-aecf-6c1c98f365b5 Format used to represent R objects.
File Extension: .Rdata
Valid Types: OutputFile, ReferenceFile
alt 9ed3e9f9-fee2-47e3-bbe3-c63a52f8d3b8 Companion format to BWT.
File Extension: .alt
Valid Types: OutputFile, ReferenceFile
amb 8db70ed6-0121-4fe1-a72e-d91dc5aa6cd3 Companion format to BWT.
File Extension: .amb
Valid Types: OutputFile, ReferenceFile
ann 106199e5-5a85-4817-9a55-7b31698e1fda Companion format to BWT.
File Extension: .ann
Valid Types: OutputFile, ReferenceFile
bam d13d06cf-218e-4f61-aaf0-91f226248b3c Binary version of a SAM file. Format used to represent aligned sequences.
File Extension: .bam
Valid Types: AlignedReads, OutputFile, ReferenceFile, UnalignedReads
bam_bai d13d06c1-218e-4f61-aaf0-91f226248b3c Companion format to BAM. Format used to represent the index of a BAM file.
File Extension: .bam.bai
Valid Types: OutputFile, ReferenceFile
bed 4c04f6de-89a7-4477-8dc4-811b50c67401 BED (Browser Extensible Data) format is a text file format used to store genomic regions as coordinates and associated annotations.
File Extension: .bed
Valid Types: OutputFile, ReferenceFile
bed_gz 4f074eca-29a0-4a49-b335-aef988e6cbd7 Compressed version of a BED file. Format used to store genomic regions as coordinates and associated annotations.
File Extension: .bed.gz
Valid Types: OutputFile, ReferenceFile
bed_gz_tbi 40346690-6359-4436-97ff-562698ab4b31 Companion format to compressed BED. Format used to represent the index of a compressed BED file (Tabix generated).
File Extension: .bed.gz.tbi
Valid Types: OutputFile, ReferenceFile
big f66af4df-c107-44f0-bc94-227f1b4ccf72 Format used to represent a binary index for the genome.
File Extension: .big
Valid Types: OutputFile, ReferenceFile
bigWig 33f30c42-d582-4163-af44-fecf586b9dd3 Binary version of a Wig file. Format used for display of dense continuous data with genomic coordinates.
File Extension: .bw
Valid Types: OutputFile, ReferenceFile
bwt 813b0001-5f3f-4e28-9203-4cdf261e19c4 Format used to represent the genome index based on Burrows-Wheeler Transform (BWT).
File Extension: .bwt
Valid Types: OutputFile, ReferenceFile
chain dd1ef82d-da5e-4680-bd5c-cf471a87eb5b Format used to represent pairwise alignment that allow gaps in both sequences simultaneously, compressed.
File Extension: .chain.gz
Valid Types: OutputFile, ReferenceFile
contamination d0cba8b5-cd01-41f5-bfed-e5369293d2dd TXT format to report metrics generated by Sentieon ContaminationModel algorithm.
File Extension: .contamination
Valid Types: OutputFile, ReferenceFile
cram d363c5f9-7159-45b1-b516-e5ec433f9b86 Compressed version of a BAM file. Format used to represent aligned sequences.
File Extension: .cram
Valid Types: AlignedReads, OutputFile, ReferenceFile
dbnsfp_gz 65a2cca2-dae8-4ff2-ac8b-aa1e92f5416b Format to represent the dbNSFP database as a compressed VCF.
File Extension: .dbnsfp.gz
Valid Types: OutputFile, ReferenceFile
dbnsfp_gz_tbi 311ac7bf-e1d5-463f-af15-61ebfea29608 Companion format to compressed dbNSFP. Format used to represent the index of a compressed dbNSFP file (Tabix generated).
File Extension: .dbnsfp.gz.tbi
Valid Types: OutputFile, ReferenceFile
dbnsfp_readme_txt ac822ea4-d281-41e0-b9c9-f283c51f1c15 Companion format to compressed dbNSFP. Format used to store dbNSFP README as plain text.
File Extension: .dbnsfp.readme.txt
Valid Types: OutputFile, ReferenceFile
dict 4ed9f7e0-2b2f-4aca-9533-a0a652b43442 Companion format to FASTA.
File Extension: .dict
Valid Types: OutputFile, ReferenceFile
fa 5ced774b-a73e-4d1b-8186-d7fbbde7a3c2 FASTA format. Format used to represent the genome reference sequence.
File Extension: .fa
Valid Types: OutputFile, ReferenceFile
fa_fai fb728bb4-bc56-46d5-8df5-a05562826b9a Companion format to FASTA.
File Extension: .fa.fai
Valid Types: OutputFile, ReferenceFile
fastq eb417c0a-70dd-42e3-9841-ac7f1ee22962 File Extension: .fastq
Valid Types: OutputFile, ReferenceFile
fastq_gz c13d06cf-218e-4f61-aaf0-91f226248b2c Format used to represent short read sequence data, compressed.
File Extension: .fastq.gz
Valid Types: OutputFile, ReferenceFile, UnalignedReads
gff3 f87864e0-7d55-46bd-a67a-fb8753ce87db GFF (General Feature Format) Version 3, used for storing genomic features as a text file.
File Extension: .gff3
Valid Types: OutputFile, ReferenceFile
gvcf f592a45e-3b8a-4bad-bfd4-52acf9fd663d Format used to represent genomics variant sites, GVCF has records for all sites (whether there is a variant call there or not).
File Extension: .gvcf
Valid Types: OutputFile, ReferenceFile
gvcf_gz ad47d469-4561-4234-bce2-820f08f58e7c Compressed version of a GVCF file. Format used to represent genomics variant sites.
File Extension: .gvcf.gz
Valid Types: OutputFile, ReferenceFile
gvcf_gz_tbi b01ee86e-b2c7-4725-81d7-798718674485 Companion format to compressed GVCF. Format used to represent the index of a compressed GVCF file (Tabix generated).
File Extension: .gvcf.gz.tbi
Valid Types: OutputFile, ReferenceFile
json ff12517a-d51e-45a4-8f44-a1cfe418dba5 Format used to represent JavaScript Object Notation (JSON).
File Extension: .json
Valid Types: OutputFile, ReferenceFile
md5_list 1362126e-e6ee-4010-9fb8-06e9b39dbb83 Format to represent the list of contigs MD5 produced by cramtools getref command.
File Extension: .md5_list
Valid Types: OutputFile, ReferenceFile
pac 7373ca48-0b3e-467b-967a-80870846f89b Companion format to BWT.
File Extension: .pac
Valid Types: OutputFile, ReferenceFile
pdf 81b7ce7f-64ed-4933-96d1-b6df498a7664 Format used to represent Portable Document Format (PDF).
File Extension: .pdf
Valid Types: OutputFile, ReferenceFile
plugins_tar 65ccbf65-80f9-41b4-b002-468500821c62 Companion format to VEP archive. Format used to represent VEP plugins as archive, compressed.
File Extension: .plugins.tar.gz
Valid Types: OutputFile, ReferenceFile
png 7c525767-e142-45f6-b4c3-84f52bc6f4cc PNG (Portable Graphics Format). Format used to represent an uncompressed image.
File Extension: .png
Valid Types: OutputFile, ReferenceFile
priors c4f4538f-ff79-42f0-a3be-d416251475ae TXT format to report metrics generated by Sentieon OrientationBias algorithm.
File Extension: .priors
Valid Types: OutputFile, ReferenceFile
rck 228190b1-4178-46ad-872e-9b8ca1931a31 RCK (Read Count Keeper) format, used to store pileup read counts by strand and allele.
File Extension: .rck
Valid Types: OutputFile, ReferenceFile
rck_gz 20d4d3aa-5f1c-4b75-9e25-73f9f370fefa Compressed version of RCK file. Format used to store pileup read counts by strand and allele.
File Extension: .rck.gz
Valid Types: OutputFile, ReferenceFile
rck_gz_tbi c55ace88-3289-49b0-a67a-c046e1004e5a Companion format to compressed RCK. Format used to represent the index of a compressed RCK file (Tabix generated).
File Extension: .rck.gz.tbi
Valid Types: OutputFile, ReferenceFile
rck_tar 39f836d8-bbb1-46c7-80d4-e321d4a44204 Format used to represent an archive of compressed RCK files.
File Extension: .rck.tar
Valid Types: OutputFile, ReferenceFile
rck_tar_index 1c7dc723-811c-4fcf-b8e5-d5e17a2013f7 Companion format to RCK archive. Format used to represent the index of files in the archive.
File Extension: .rck.tar.index
Valid Types: OutputFile, ReferenceFile
sa 11f2fc36-9a65-4d60-9365-d8ff241df4f7 Companion format to BWT.
File Extension: .sa
Valid Types: OutputFile, ReferenceFile
sam 3311fb05-a0df-43e5-b0af-234c82b6bee9 Format used to represent aligned sequences.
File Extension: .sam
Valid Types: OutputFile, ReferenceFile
tar 39866342-e4f8-4a50-87bf-b61a782549c8 Format used to represent an archive of files.
File Extension: .tar
Valid Types: OutputFile, ReferenceFile
tar_gz f2ec3b9f-a898-4e6c-8da5-734a7a6410b8 Compressed version of a TAR archive. Format used to represent an archive of files.
File Extension: .tar.gz
Valid Types: OutputFile, ReferenceFile
tsv c369d5d6-2861-47ab-bc39-99083cfe48bd Format used to represent Tab-Separate Values (TSV).
File Extension: .tsv
Valid Types: OutputFile, ReferenceFile
tsv_gz 11ca3783-db6e-430e-997b-9cf0ca275814 Compressed version of a TSV file. Format used to represent Tab-Separate Values.
File Extension: .tsv.gz
Valid Types: OutputFile, ReferenceFile
tsv_gz_tbi 829ed303-e427-4d9a-a217-be75ad11317e Companion format to compressed TSV. Format used to represent the index of a compressed TSV file (Tabix generated).
File Extension: .tsv.gz.tbi
Valid Types: OutputFile, ReferenceFile
txt 0cd4e777-a596-4927-95c8-b07716121aa3 Format used to represent plain text.
File Extension: .txt
Valid Types: OutputFile, ReferenceFile
vcf fcc2647d-301b-4888-8d9d-83ea4270309c Format used to represent genomics variants.
File Extension: .vcf
Valid Types: OutputFile, ReferenceFile, VariantCalls
vcf_gz 1b8f525f-aecb-4211-9ae5-a2c998b05599 Compressed version of a VCF file. Format used to represent genomics variants.
File Extension: .vcf.gz
Valid Types: OutputFile, ReferenceFile, VariantCalls
vcf_gz_stats ec465f66-f1ae-44e0-9885-a2b24e7ce268 Companion format to compressed VCF. Format used to collect metrics for a compressed VCF file.
File Extension: .vcf.gz.stats
Valid Types: OutputFile, ReferenceFile
vcf_gz_tbi f84f1922-7f4e-4afc-922f-bec620969bf1 Companion format to compressed VCF. Format used to represent the index of a compressed VCF file (Tabix generated).
File Extension: .vcf.gz.tbi
Valid Types: OutputFile, ReferenceFile
vcf_idx ec96f95a-cf13-4633-ab0d-c4a5138bbe0b Companion format to VCF. Format used to represent the index of a VCF file.
File Extension: .vcf.idx
Valid Types: OutputFile, ReferenceFile
vcf_tar 3d140fc3-fd0e-4d09-8294-4536e388e665 Format used to represent an archive of compressed VCF files.
File Extension: .vcf.tar
Valid Types: OutputFile, ReferenceFile
vep_tar d05f9688-0ee1-4a86-83f4-656e6e21352a Format to represent VEP datasource as archive, compressed.
File Extension: .vep.tar.gz
Valid Types: OutputFile, ReferenceFile
wig 19e290b5-2743-4311-a860-5dfca41858b1 Format used for display of dense continuous data with genomic coordinates.
File Extension: .wig
Valid Types: OutputFile, ReferenceFile
zip 1125243b-3acd-4793-9264-4abd7d788e58 Archive file format that supports lossless data compression.
File Extension: .zip
Valid Types: OutputFile, ReferenceFile