File Formatsī
Below are the FileFormat values currently supported by SMaHT Portal.
For AlignedReadsī
| Name | ID | Description |
|---|---|---|
| bam | d13d06cf-218e-4f61-aaf0-91f226248b3c | Binary version of a SAM file. Format used to represent aligned sequences. File Extension: .bam Valid Types: AlignedReads, OutputFile, ReferenceFile, UnalignedReads |
| cram | d363c5f9-7159-45b1-b516-e5ec433f9b86 | Compressed version of a BAM file. Format used to represent aligned sequences. File Extension: .cram Valid Types: AlignedReads, OutputFile, ReferenceFile |
For UnalignedReadsī
| Name | ID | Description |
|---|---|---|
| bam | d13d06cf-218e-4f61-aaf0-91f226248b3c | Binary version of a SAM file. Format used to represent aligned sequences. File Extension: .bam Valid Types: AlignedReads, OutputFile, ReferenceFile, UnalignedReads |
| fastq_gz | c13d06cf-218e-4f61-aaf0-91f226248b2c | Format used to represent short read sequence data, compressed. File Extension: .fastq.gz Valid Types: OutputFile, ReferenceFile, UnalignedReads |
For VariantCallsī
| Name | ID | Description |
|---|---|---|
| vcf | fcc2647d-301b-4888-8d9d-83ea4270309c | Format used to represent genomics variants. File Extension: .vcf Valid Types: OutputFile, ReferenceFile, VariantCalls |
| vcf_gz | 1b8f525f-aecb-4211-9ae5-a2c998b05599 | Compressed version of a VCF file. Format used to represent genomics variants. File Extension: .vcf.gz Valid Types: OutputFile, ReferenceFile, VariantCalls |
All File Formatsī
| Name | ID | Description |
|---|---|---|
| Rdata | ce424ef5-86c8-4522-aecf-6c1c98f365b5 | Format used to represent R objects. File Extension: .Rdata Valid Types: OutputFile, ReferenceFile |
| alt | 9ed3e9f9-fee2-47e3-bbe3-c63a52f8d3b8 | Companion format to BWT. File Extension: .alt Valid Types: OutputFile, ReferenceFile |
| amb | 8db70ed6-0121-4fe1-a72e-d91dc5aa6cd3 | Companion format to BWT. File Extension: .amb Valid Types: OutputFile, ReferenceFile |
| ann | 106199e5-5a85-4817-9a55-7b31698e1fda | Companion format to BWT. File Extension: .ann Valid Types: OutputFile, ReferenceFile |
| bam | d13d06cf-218e-4f61-aaf0-91f226248b3c | Binary version of a SAM file. Format used to represent aligned sequences. File Extension: .bam Valid Types: AlignedReads, OutputFile, ReferenceFile, UnalignedReads |
| bam_bai | d13d06c1-218e-4f61-aaf0-91f226248b3c | Companion format to BAM. Format used to represent the index of a BAM file. File Extension: .bam.bai Valid Types: OutputFile, ReferenceFile |
| bed | 4c04f6de-89a7-4477-8dc4-811b50c67401 | BED (Browser Extensible Data) format is a text file format used to store genomic regions as coordinates and associated annotations. File Extension: .bed Valid Types: OutputFile, ReferenceFile |
| bed_gz | 4f074eca-29a0-4a49-b335-aef988e6cbd7 | Compressed version of a BED file. Format used to store genomic regions as coordinates and associated annotations. File Extension: .bed.gz Valid Types: OutputFile, ReferenceFile |
| bed_gz_tbi | 40346690-6359-4436-97ff-562698ab4b31 | Companion format to compressed BED. Format used to represent the index of a compressed BED file (Tabix generated). File Extension: .bed.gz.tbi Valid Types: OutputFile, ReferenceFile |
| big | f66af4df-c107-44f0-bc94-227f1b4ccf72 | Format used to represent a binary index for the genome. File Extension: .big Valid Types: OutputFile, ReferenceFile |
| bigWig | 33f30c42-d582-4163-af44-fecf586b9dd3 | Binary version of a Wig file. Format used for display of dense continuous data with genomic coordinates. File Extension: .bw Valid Types: OutputFile, ReferenceFile |
| bwt | 813b0001-5f3f-4e28-9203-4cdf261e19c4 | Format used to represent the genome index based on Burrows-Wheeler Transform (BWT). File Extension: .bwt Valid Types: OutputFile, ReferenceFile |
| chain | dd1ef82d-da5e-4680-bd5c-cf471a87eb5b | Format used to represent pairwise alignment that allow gaps in both sequences simultaneously, compressed. File Extension: .chain.gz Valid Types: OutputFile, ReferenceFile |
| contamination | d0cba8b5-cd01-41f5-bfed-e5369293d2dd | TXT format to report metrics generated by Sentieon ContaminationModel algorithm. File Extension: .contamination Valid Types: OutputFile, ReferenceFile |
| cram | d363c5f9-7159-45b1-b516-e5ec433f9b86 | Compressed version of a BAM file. Format used to represent aligned sequences. File Extension: .cram Valid Types: AlignedReads, OutputFile, ReferenceFile |
| dbnsfp_gz | 65a2cca2-dae8-4ff2-ac8b-aa1e92f5416b | Format to represent the dbNSFP database as a compressed VCF. File Extension: .dbnsfp.gz Valid Types: OutputFile, ReferenceFile |
| dbnsfp_gz_tbi | 311ac7bf-e1d5-463f-af15-61ebfea29608 | Companion format to compressed dbNSFP. Format used to represent the index of a compressed dbNSFP file (Tabix generated). File Extension: .dbnsfp.gz.tbi Valid Types: OutputFile, ReferenceFile |
| dbnsfp_readme_txt | ac822ea4-d281-41e0-b9c9-f283c51f1c15 | Companion format to compressed dbNSFP. Format used to store dbNSFP README as plain text. File Extension: .dbnsfp.readme.txt Valid Types: OutputFile, ReferenceFile |
| dict | 4ed9f7e0-2b2f-4aca-9533-a0a652b43442 | Companion format to FASTA. File Extension: .dict Valid Types: OutputFile, ReferenceFile |
| fa | 5ced774b-a73e-4d1b-8186-d7fbbde7a3c2 | FASTA format. Format used to represent the genome reference sequence. File Extension: .fa Valid Types: OutputFile, ReferenceFile |
| fa_fai | fb728bb4-bc56-46d5-8df5-a05562826b9a | Companion format to FASTA. File Extension: .fa.fai Valid Types: OutputFile, ReferenceFile |
| fastq | eb417c0a-70dd-42e3-9841-ac7f1ee22962 | File Extension: .fastq Valid Types: OutputFile, ReferenceFile |
| fastq_gz | c13d06cf-218e-4f61-aaf0-91f226248b2c | Format used to represent short read sequence data, compressed. File Extension: .fastq.gz Valid Types: OutputFile, ReferenceFile, UnalignedReads |
| gff3 | f87864e0-7d55-46bd-a67a-fb8753ce87db | GFF (General Feature Format) Version 3, used for storing genomic features as a text file. File Extension: .gff3 Valid Types: OutputFile, ReferenceFile |
| gvcf | f592a45e-3b8a-4bad-bfd4-52acf9fd663d | Format used to represent genomics variant sites, GVCF has records for all sites (whether there is a variant call there or not). File Extension: .gvcf Valid Types: OutputFile, ReferenceFile |
| gvcf_gz | ad47d469-4561-4234-bce2-820f08f58e7c | Compressed version of a GVCF file. Format used to represent genomics variant sites. File Extension: .gvcf.gz Valid Types: OutputFile, ReferenceFile |
| gvcf_gz_tbi | b01ee86e-b2c7-4725-81d7-798718674485 | Companion format to compressed GVCF. Format used to represent the index of a compressed GVCF file (Tabix generated). File Extension: .gvcf.gz.tbi Valid Types: OutputFile, ReferenceFile |
| json | ff12517a-d51e-45a4-8f44-a1cfe418dba5 | Format used to represent JavaScript Object Notation (JSON). File Extension: .json Valid Types: OutputFile, ReferenceFile |
| md5_list | 1362126e-e6ee-4010-9fb8-06e9b39dbb83 | Format to represent the list of contigs MD5 produced by cramtools getref command. File Extension: .md5_list Valid Types: OutputFile, ReferenceFile |
| pac | 7373ca48-0b3e-467b-967a-80870846f89b | Companion format to BWT. File Extension: .pac Valid Types: OutputFile, ReferenceFile |
| 81b7ce7f-64ed-4933-96d1-b6df498a7664 | Format used to represent Portable Document Format (PDF). File Extension: .pdf Valid Types: OutputFile, ReferenceFile | |
| plugins_tar | 65ccbf65-80f9-41b4-b002-468500821c62 | Companion format to VEP archive. Format used to represent VEP plugins as archive, compressed. File Extension: .plugins.tar.gz Valid Types: OutputFile, ReferenceFile |
| png | 7c525767-e142-45f6-b4c3-84f52bc6f4cc | PNG (Portable Graphics Format). Format used to represent an uncompressed image. File Extension: .png Valid Types: OutputFile, ReferenceFile |
| priors | c4f4538f-ff79-42f0-a3be-d416251475ae | TXT format to report metrics generated by Sentieon OrientationBias algorithm. File Extension: .priors Valid Types: OutputFile, ReferenceFile |
| rck | 228190b1-4178-46ad-872e-9b8ca1931a31 | RCK (Read Count Keeper) format, used to store pileup read counts by strand and allele. File Extension: .rck Valid Types: OutputFile, ReferenceFile |
| rck_gz | 20d4d3aa-5f1c-4b75-9e25-73f9f370fefa | Compressed version of RCK file. Format used to store pileup read counts by strand and allele. File Extension: .rck.gz Valid Types: OutputFile, ReferenceFile |
| rck_gz_tbi | c55ace88-3289-49b0-a67a-c046e1004e5a | Companion format to compressed RCK. Format used to represent the index of a compressed RCK file (Tabix generated). File Extension: .rck.gz.tbi Valid Types: OutputFile, ReferenceFile |
| rck_tar | 39f836d8-bbb1-46c7-80d4-e321d4a44204 | Format used to represent an archive of compressed RCK files. File Extension: .rck.tar Valid Types: OutputFile, ReferenceFile |
| rck_tar_index | 1c7dc723-811c-4fcf-b8e5-d5e17a2013f7 | Companion format to RCK archive. Format used to represent the index of files in the archive. File Extension: .rck.tar.index Valid Types: OutputFile, ReferenceFile |
| sa | 11f2fc36-9a65-4d60-9365-d8ff241df4f7 | Companion format to BWT. File Extension: .sa Valid Types: OutputFile, ReferenceFile |
| sam | 3311fb05-a0df-43e5-b0af-234c82b6bee9 | Format used to represent aligned sequences. File Extension: .sam Valid Types: OutputFile, ReferenceFile |
| tar | 39866342-e4f8-4a50-87bf-b61a782549c8 | Format used to represent an archive of files. File Extension: .tar Valid Types: OutputFile, ReferenceFile |
| tar_gz | f2ec3b9f-a898-4e6c-8da5-734a7a6410b8 | Compressed version of a TAR archive. Format used to represent an archive of files. File Extension: .tar.gz Valid Types: OutputFile, ReferenceFile |
| tsv | c369d5d6-2861-47ab-bc39-99083cfe48bd | Format used to represent Tab-Separate Values (TSV). File Extension: .tsv Valid Types: OutputFile, ReferenceFile |
| tsv_gz | 11ca3783-db6e-430e-997b-9cf0ca275814 | Compressed version of a TSV file. Format used to represent Tab-Separate Values. File Extension: .tsv.gz Valid Types: OutputFile, ReferenceFile |
| tsv_gz_tbi | 829ed303-e427-4d9a-a217-be75ad11317e | Companion format to compressed TSV. Format used to represent the index of a compressed TSV file (Tabix generated). File Extension: .tsv.gz.tbi Valid Types: OutputFile, ReferenceFile |
| txt | 0cd4e777-a596-4927-95c8-b07716121aa3 | Format used to represent plain text. File Extension: .txt Valid Types: OutputFile, ReferenceFile |
| vcf | fcc2647d-301b-4888-8d9d-83ea4270309c | Format used to represent genomics variants. File Extension: .vcf Valid Types: OutputFile, ReferenceFile, VariantCalls |
| vcf_gz | 1b8f525f-aecb-4211-9ae5-a2c998b05599 | Compressed version of a VCF file. Format used to represent genomics variants. File Extension: .vcf.gz Valid Types: OutputFile, ReferenceFile, VariantCalls |
| vcf_gz_stats | ec465f66-f1ae-44e0-9885-a2b24e7ce268 | Companion format to compressed VCF. Format used to collect metrics for a compressed VCF file. File Extension: .vcf.gz.stats Valid Types: OutputFile, ReferenceFile |
| vcf_gz_tbi | f84f1922-7f4e-4afc-922f-bec620969bf1 | Companion format to compressed VCF. Format used to represent the index of a compressed VCF file (Tabix generated). File Extension: .vcf.gz.tbi Valid Types: OutputFile, ReferenceFile |
| vcf_idx | ec96f95a-cf13-4633-ab0d-c4a5138bbe0b | Companion format to VCF. Format used to represent the index of a VCF file. File Extension: .vcf.idx Valid Types: OutputFile, ReferenceFile |
| vcf_tar | 3d140fc3-fd0e-4d09-8294-4536e388e665 | Format used to represent an archive of compressed VCF files. File Extension: .vcf.tar Valid Types: OutputFile, ReferenceFile |
| vep_tar | d05f9688-0ee1-4a86-83f4-656e6e21352a | Format to represent VEP datasource as archive, compressed. File Extension: .vep.tar.gz Valid Types: OutputFile, ReferenceFile |
| wig | 19e290b5-2743-4311-a860-5dfca41858b1 | Format used for display of dense continuous data with genomic coordinates. File Extension: .wig Valid Types: OutputFile, ReferenceFile |
| zip | 1125243b-3acd-4793-9264-4abd7d788e58 | Archive file format that supports lossless data compression. File Extension: .zip Valid Types: OutputFile, ReferenceFile |