|
samtools的說(shuō)明文檔:http://samtools./samtools.shtml 1. viewview命令的主要功能是:將sam文件轉(zhuǎn)換成bam文件;然后對(duì)bam文件進(jìn)行各種操作,比如數(shù)據(jù)的排序(不屬于本命令的功能)和提取(這些操作 是對(duì)bam文件進(jìn)行的,因而當(dāng)輸入為sam文件的時(shí)候,不能進(jìn)行該操作);最后將排序或提取得到的數(shù)據(jù)輸出為bam或sam(默認(rèn)的)格式。 bam文件優(yōu)點(diǎn):bam文件為二進(jìn)制文件,占用的磁盤空間比sam文本文件小;利用bam二進(jìn)制文件的運(yùn)算速度快。 view命令中,對(duì)sam文件頭部的輸入(-t或-T)和輸出(-h)是單獨(dú)的一些參數(shù)來(lái)控制的。 Usage: samtools view [options] <in.bam>|<in.sam> [region1 [...]]
默認(rèn)情況下不加 region,則是輸出所有的 region.
Options: -b output BAM
默認(rèn)下輸出是 SAM 格式文件,該參數(shù)設(shè)置輸出 BAM 格式
-h print header for the SAM output
默認(rèn)下輸出的 sam 格式文件不帶 header,該參數(shù)設(shè)定輸出sam文件時(shí)帶 header 信息
-H print header only (no alignments)
-S input is SAM
默認(rèn)下輸入是 BAM 文件,若是輸入是 SAM 文件,則最好加該參數(shù),否則有時(shí)候會(huì)報(bào)錯(cuò)。
-u uncompressed BAM output (force -b)
該參數(shù)的使用需要有-b參數(shù),能節(jié)約時(shí)間,但是需要更多磁盤空間。
-c Instead of printing the alignments, only count them and print the
total number. All filter options, such as ‘-f’, ‘-F’ and ‘-q’ ,
are taken into account.
-1 fast compression (force -b)
-x output FLAG in HEX (samtools-C specific)
-X output FLAG in string (samtools-C specific)
-c print only the count of matching records
-L FILE output alignments overlapping the input BED FILE [null]
-t FILE list of reference names and lengths (force -S) [null]
使用一個(gè)list文件來(lái)作為header的輸入
-T FILE reference sequence file (force -S) [null]
使用序列fasta文件作為header的輸入
-o FILE output file name [stdout]
-R FILE list of read groups to be outputted [null]
-f INT required flag, 0 for unset [0]
-F INT filtering flag, 0 for unset [0]
Skip alignments with bits present in INT [0]
數(shù)字4代表該序列沒有比對(duì)到參考序列上
數(shù)字8代表該序列的mate序列沒有比對(duì)到參考序列上
-q INT minimum mapping quality [0]
-l STR only output reads in library STR [null]
-r STR only output reads in read group STR [null]
-s FLOAT fraction of templates to subsample; integer part as seed [-1]
-? longer help
例子: 將sam文件轉(zhuǎn)換成bam文件
$ samtools view -bS abc.sam > abc.bam
$ samtools view -b -S abc.sam -o abc.bam
提取比對(duì)到參考序列上的比對(duì)結(jié)果
$ samtools view -bF 4 abc.bam > abc.F.bam
提取paired reads中兩條reads都比對(duì)到參考序列上的比對(duì)結(jié)果,只需要把兩個(gè)4+8的值12作為過(guò)濾參數(shù)即可
$ samtools view -bF 12 abc.bam > abc.F12.bam
提取沒有比對(duì)到參考序列上的比對(duì)結(jié)果
$ samtools view -bf 4 abc.bam > abc.f.bam
提取bam文件中比對(duì)到caffold1上的比對(duì)結(jié)果,并保存到sam文件格式
$ samtools view abc.bam scaffold1 > scaffold1.sam
提取scaffold1上能比對(duì)到30k到100k區(qū)域的比對(duì)結(jié)果
$ samtools view abc.bam scaffold1:30000-100000 $gt; scaffold1_30k-100k.sam
根據(jù)fasta文件,將 header 加入到 sam 或 bam 文件中
$ samtools view -T genome.fasta -h scaffold1.sam > scaffold1.h.sam
2. sortsort對(duì)bam文件進(jìn)行排序。 Usage: samtools sort [-n] [-m <maxMem>] <in.bam> <out.prefix> -m 參數(shù)默認(rèn)下是 500,000,000 即500M(不支持K,M,G等縮寫)。對(duì)于處理大數(shù)據(jù)時(shí),如果內(nèi)存夠用,則設(shè)置大點(diǎn)的值,以節(jié)約時(shí)間。 -n 設(shè)定排序方式按short reads的ID排序。默認(rèn)下是按序列在fasta文件中的順序(即header)和序列從左往右的位點(diǎn)排序。 例子: $ samtools sort abc.bam abc.sort ###注意 abc.sort 是輸出文件的前綴,實(shí)際輸出是 abc.sort.bam
$ samtools view abc.sort.bam | less -S
3.merge將2個(gè)或2個(gè)以上的已經(jīng)sort了的bam文件融合成一個(gè)bam文件。融合后的文件不需要?jiǎng)t是已經(jīng)sort過(guò)了的。 Usage: samtools merge [-nr] [-h inh.sam] <out.bam> <in1.bam> <in2.bam>[...]
Options: -n sort by read names
-r attach RG tag (inferred from file names)
-u uncompressed BAM output
-f overwrite the output BAM if exist
-1 compress level 1
-R STR merge file in the specified region STR [all]
-h FILE copy the header in FILE to <out.bam> [in1.bam]
Note: Samtools' merge does not reconstruct the @RG dictionary in the header. Users
must provide the correct header with -h, or uses Picard which properly maintains
the header dictionary in merging.
4.index必須對(duì)bam文件進(jìn)行默認(rèn)情況下的排序后,才能進(jìn)行index。否則會(huì)報(bào)錯(cuò)。 建立索引后將產(chǎn)生后綴為.bai的文件,用于快速的隨機(jī)處理。很多情況下需要有bai文件的存在,特別是顯示序列比對(duì)情況下。比如samtool的tview命令就需要;gbrowse2顯示reads的比對(duì)圖形的時(shí)候也需要。 Usage: samtools index <in.bam> [out.index] 例子: 以下兩種命令結(jié)果一樣
$ samtools index abc.sort.bam
$ samtools index abc.sort.bam abc.sort.bam.bai
5. faidx對(duì)fasta文件建立索引,生成的索引文件以.fai后綴結(jié)尾。該命令也能依據(jù)索引文件快速提取fasta文件中的某一條(子)序列 Usage: samtools faidx <in.bam> [ [...]] 對(duì)基因組文件建立索引 $ samtools faidx genome.fasta 生成了索引文件genome.fasta.fai,是一個(gè)文本文件,分成了5列。第一列是子序列的名稱; 第二列是子序列的長(zhǎng)度;個(gè)人認(rèn)為“第三列是序列所在的位置”,因?yàn)樵摂?shù)字從上往下逐漸變大, 最后的數(shù)字是genome.fasta文件的大?。坏?和5列不知是啥意思。于是通過(guò)此文件,可以定 位子序列在fasta文件在磁盤上的存放位置,直接快速調(diào)出子序列。 由于有索引文件,可以使用以下命令很快從基因組中提取到fasta格式的子序列 $ samtools faidx genome.fasta scffold_10 > scaffold_10.fasta 6. tviewtview能直觀的顯示出reads比對(duì)基因組的情況,和基因組瀏覽器有點(diǎn)類似。 Usage: samtools tview <aln.bam> [ref.fasta] 當(dāng)給出參考基因組的時(shí)候,會(huì)在第一排顯示參考基因組的序列,否則,第一排全用N表示。 按下 g ,則提示輸入要到達(dá)基因組的某一個(gè)位點(diǎn)。例子“scaffold_10:1000"表示到達(dá)第 10號(hào)scaffold的第1000個(gè)堿基位點(diǎn)處。 使用H(左)J(上)K(下)L(右)移動(dòng)顯示界面。大寫字母移動(dòng)快,小寫字母移動(dòng)慢。 使用空格建向左快速移動(dòng)(和 L 類似),使用Backspace鍵向左快速移動(dòng)(和 H 類似)。 Ctrl+H 向左移動(dòng)1kb堿基距離; Ctrl+L 向右移動(dòng)1kb堿基距離 可以用顏色標(biāo)注比對(duì)質(zhì)量,堿基質(zhì)量,核苷酸等。30~40的堿基質(zhì)量或比對(duì)質(zhì)量使用白色表示; 20~30黃色;10~20綠色;0~10藍(lán)色。 使用點(diǎn)號(hào)'.'切換顯示堿基和點(diǎn)號(hào);使用r切換顯示read name等 還有很多其它的使用說(shuō)明,具體按 ? 鍵來(lái)查看。 7. flagstat給出BAM文件的比對(duì)結(jié)果 Usage: samtools flagstat <in.bam> $ samtools flagstat example.bam 11945742 + 0 in total (QC-passed reads + QC-failed reads) #總共的reads數(shù) 0 + 0 duplicates 7536364 + 0 mapped (63.09%:-nan%) #總體上reads的匹配率 11945742 + 0 paired in sequencing #有多少reads是屬于paired reads 5972871 + 0 read1 #reads1中的reads數(shù) 5972871 + 0 read2 #reads2中的reads數(shù) 6412042 + 0 properly paired (53.68%:-nan%) #完美匹配的reads數(shù):比對(duì)到同一條參考序列,并且兩條reads之間的距離符合設(shè)置的閾值 6899708 + 0 with itself and mate mapped #paired reads中兩條都比對(duì)到參考序列上的reads數(shù) 636656 + 0 singletons (5.33%:-nan%) #單獨(dú)一條匹配到參考序列上的reads數(shù),和上一個(gè)相加,則是總的匹配上的reads數(shù)。 469868 + 0 with mate mapped to a different chr #paired reads中兩條分別比對(duì)到兩條不同的參考序列的reads數(shù) 243047 + 0 with mate mapped to a different chr (mapQ>=5) #同上一個(gè),只是其中比對(duì)質(zhì)量>=5的reads的數(shù)量 7. depth得到每個(gè)堿基位點(diǎn)的測(cè)序深度,并輸出到標(biāo)準(zhǔn)輸出。 Usage: bam2depth [-r reg] [-q baseQthres] [-Q mapQthres] [-b in.bed] <in1.bam> [...] 8. 其它有用的命令reheader 替換bam文件的頭 $ samtools reheader <in.header.sam> <in.bam> cat 連接多個(gè)bam文件,適用于非sorted的bam文件 $ samtools cat [-h header.sam] [-o out.bam] <in1.bam> <in2.bam> [ ... ] idxstats 統(tǒng)計(jì)一個(gè)表格,4列,分別為”序列名,序列長(zhǎng)度,比對(duì)上的reads數(shù),unmapped reads number”。第4列應(yīng)該是paired reads中有一端能匹配到該scaffold上,而另外一端不匹配到任何scaffolds上的reads數(shù)。 $ samtools idxstats <aln.bam> 9. 將bam文件轉(zhuǎn)換為fastq文件有時(shí)候,我們需要提取出比對(duì)到一段參考序列的reads,進(jìn)行小范圍的分析,以利于debug等。這時(shí)需要將bam或sam文件轉(zhuǎn)換為fastq格式。 $ wget http://www./gsl/static/software/bam2fastq-1.1.0.tgz $ tar zxf bam2fastq-1.1.0.tgz $ cd bam2fastq-1.1.0 $ make $ ./bam2fastq <in.bam> 10. mpileupsamtools還有個(gè)非常重要的命令mpileup,以前為pileup。該命令用于生成bcf文件,再使用bcftools進(jìn)行SNP和Indel的分析。bcftools是samtool中附帶的軟件,在samtools的安裝文件夾中可以找到。 最常用的參數(shù)有2: -f 來(lái)輸入有索引文件的fasta參考序列; -g 輸出到bcf格式。用法和最簡(jiǎn)單的例子如下 Usage: samtools mpileup [-EBug] [-C capQcoef] [-r reg] [-f in.fa] [-l list] [-M capMapQ] [-Q minBaseQ] [-q minMapQ] in.bam [in2.bam [...]]
$ samtools mpileup -f genome.fasta abc.bam > abc.txt
$ samtools mpileup -gSDf genome.fasta abc.bam > abc.bcf
$ samtools mpileup -guSDf genome.fasta abc.bam | bcftools view -cvNg - > abc.vcf
mpileup不使用-u或-g參數(shù)時(shí),則不生成二進(jìn)制的bcf文件,而生成一個(gè)文本文件(輸出到標(biāo)準(zhǔn)輸出)。該文本文件統(tǒng)計(jì)了參考序列中每個(gè)堿基位點(diǎn)的比對(duì)情況;該文件每一行代表了參考序列中某一個(gè)堿基位點(diǎn)的比對(duì)結(jié)果。比如: scaffold_1 2841 A 11 ,,,...,.... BHIGDGIJ?FF scaffold_1 2842 C 12 ,$,,...,....^I. CFGEGEGGCFF+ scaffold_1 2843 G 11 ,,...,..... FDDDDCD?DD+ scaffold_1 2844 G 11 ,,...,..... FA?AAAA<AA+ scaffold_1 2845 G 11 ,,...,..... F656666166* scaffold_1 2846 A 11 ,,...,..... (1.1111)11* scaffold_1 2847 A 11 ,,+9acggtgaag.+9ACGGTGAAT.+9ACGGTGAAG.+9ACGGTGAAG,+9acggtgaag.+9ACGGTGAAG.+9ACGGTGAAG.+9ACGGTGAAG.+9ACGGTGAAG.+9ACGGTGAAG %.+....-..) scaffold_1 2848 N 11 agGGGgGGGGG !!$!!!!!!!! scaffold_1 2849 A 11 c$,...,..... !0000000000 scaffold_1 2850 A 10 ,...,..... 353333333 mpileup生成的結(jié)果包含6行:參考序列名;位置;參考?jí)A基;比對(duì)上的reads數(shù);比對(duì)情況;比對(duì)上的堿基的質(zhì)量。其中第5列比較復(fù)雜,解釋如下: 1 ‘.’代表與參考序列正鏈匹配。 pileup具體的參數(shù)如下: 輸入?yún)?shù)
-6 Assume the quality is in the Illumina 1.3+ encoding. -A Do not skip anomalous read pairs in variant calling.
-B Disable probabilistic realignment for the computation of base alignment quality (BAQ). BAQ is the Phred-scaled probability of a read base being misaligned. Applying this option greatly helps to reduce false SNPs caused by misalignments.
-b FILE List of input BAM files, one file per line [null]
-C INT Coefficient for downgrading mapping quality for reads containing excessive mismatches. Given a read with a phred-scaled probability q of being generated from the mapped position, the new mapping quality is about sqrt((INT-q)/INT)*INT. A zero value disables this functionality; if enabled, the recommended value for BWA is 50. [0]
-d INT At a position, read maximally INT reads per input BAM. [250]
-E Extended BAQ computation. This option helps sensitivity especially for MNPs, but may hurt specificity a little bit.
-f FILE The faidx-indexed reference file in the FASTA format. The file can be optionally compressed by razip. [null]
-l FILE BED or position list file containing a list of regions or sites where pileup or BCF should be generated [null]
-M INT cap mapping quality at INT [60]
-q INT Minimum mapping quality for an alignment to be used [0]
-Q INT Minimum base quality for a base to be considered [13]
-r STR Only generate pileup in region STR [all sites]
輸出參數(shù)
-D Output per-sample read depth (require -g/-u)
-g Compute genotype likelihoods and output them in the binary call format (BCF).
-S Output per-sample Phred-scaled strand bias P-value (require -g/-u)
-u Similar to -g except that the output is uncompressed BCF, which is preferred for piping.
Options for Genotype Likelihood Computation (for -g or -u):
-e INT Phred-scaled gap extension sequencing error probability. Reducing INT leads to longer indels. [20]
-h INT Coefficient for modeling homopolymer errors. Given an l-long homopolymer run, the sequencing error of an indel of size s is modeled as INT*s/l. [100]
-I Do not perform INDEL calling
-L INT Skip INDEL calling if the average per-sample depth is above INT. [250]
-o INT Phred-scaled gap open sequencing error probability. Reducing INT leads to more indel calls. [40]
-P STR Comma dilimited list of platforms (determined by @RG-PL) from which indel candidates are obtained. It is recommended to collect indel candidates from sequencing technologies that have low indel error rate such as ILLUMINA. [all]
11. 使用bcftoolsbcftools和samtools類似,用于處理vcf(variant call format)文件和bcf(binary call format)文件。前者為文本文件,后者為其二進(jìn)制文件。 bcftools使用簡(jiǎn)單,最主要的命令是view命令,其次還有index和cat等命令。index和cat命令和samtools中類似。此處主講使用view命令來(lái)進(jìn)行SNP和Indel calling。該命令的使用方法和例子為: $ bcftools view [-AbFGNQSucgv] [-D seqDict] [-l listLoci] [-s listSample]
[-i gapSNPratio] [-t mutRate] [-p varThres] [-P prior]
[-1 nGroup1] [-d minFrac] [-U nPerm] [-X permThres]
[-T trioType] in.bcf [region]
$ bcftools view -cvNg abc.bcf > snp_indel.vcf
生成的結(jié)果文件為vcf格式,有10列,分別是:1 參考序列名;2 varianti所在的left-most位置;3 variant的ID(默認(rèn)未設(shè)置,用’.'表示);4 參考序列的allele;5 variant的allele(有多個(gè)alleles,則用’,'分隔);6 variant/reference QUALity;7 FILTers applied;8 variant的信息,使用分號(hào)隔開;9 FORMAT of the genotype fields, separated by colon (optional); 10 SAMPLE genotypes and per-sample information (optional)。 scaffold_1 2847 . A AACGGTGAAG 194 . INDEL;DP=11;VDB=0.0401;AF1=1;AC1=2;DP4=0,0,8,3;MQ=35;FQ=-67.5 GT:PL:GQ 1/1:235,33,0:63 scaffold_1 3908 . G A 111 . DP=13;VDB=0.0085;AF1=1;AC1=2;DP4=0,0,5,7;MQ=42;FQ=-63 GT:PL:GQ 1/1:144,36,0:69 scaffold_1 4500 . A G 31.5 . DP=8;VDB=0.0034;AF1=1;AC1=2;DP4=0,0,1,3;MQ=42;FQ=-39 GT:PL:GQ 1/1:64,12,0:21 scaffold_1 4581 . TGGNGG TGG 145 . INDEL;DP=8;VDB=0.0308;AF1=1;AC1=2;DP4=0,0,0,8;MQ=42;FQ=-58.5 GT:PL:GQ 1/1:186,24,0:45 scaffold_1 4644 . G A 195 . DP=21;VDB=0.0198;AF1=1;AC1=2;DP4=0,0,10,10;MQ=42;FQ=-87 GT:PL:GQ 1/1:228,60,0:99 scaffold_1 4827 . NACAAAGA NA 4.42 . INDEL;DP=1;AF1=1;AC1=2;DP4=0,0,1,0;MQ=40;FQ=-37.5 GT:PL:GQ 0/1:40,3,0:3 scaffold_1 4854 . A G 48 . DP=6;VDB=0.0085;AF1=1;AC1=2;DP4=0,0,2,1;MQ=41;FQ=-36 GT:PL:GQ 1/1:80,9,0:16 scaffold_1 5120 . A G 85 . DP=8;VDB=0.0355;AF1=1;AC1=2;DP4=0,0,5,3;MQ=42;FQ=-51 GT:PL:GQ 1/1:118,24,0:45 第8列中顯示了對(duì)variants的信息描述,比較重要,其中的 Tag 的描述如下: Tag Format Description AF1 double Max-likelihood estimate of the site allele frequency (AF) of the first ALT allele DP int Raw read depth (without quality filtering) DP4 int[4] # high-quality reference forward bases, ref reverse, alternate for and alt rev bases FQ int Consensus quality. Positive: sample genotypes different; negative: otherwise MQ int Root-Mean-Square mapping quality of covering reads PC2 int[2] Phred probability of AF in group1 samples being larger (,smaller) than in group2 PCHI2 double Posterior weighted chi^2 P-value between group1 and group2 samples PV4 double[4] P-value for strand bias, baseQ bias, mapQ bias and tail distance bias QCHI2 int Phred-scaled PCHI2 RP int # permutations yielding a smaller PCHI2 CLR int Phred log ratio of genotype likelihoods with and without the trio/pair constraint UGT string Most probable genotype configuration without the trio constraint CGT string Most probable configuration with the trio constraint bcftools view 的具體參數(shù)如下: Input/Output Options:
-A Retain all possible alternate alleles at variant sites. By default, the view command discards unlikely alleles.
-b Output in the BCF format. The default is VCF.
-D FILE Sequence dictionary (list of chromosome names) for VCF->BCF conversion [null]
-F Indicate PL is generated by r921 or before (ordering is different).
-G Suppress all individual genotype information.
-l FILE List of sites at which information are outputted [all sites]
-N Skip sites where the REF field is not A/C/G/T
-Q Output the QCALL likelihood format
-s FILE List of samples to use. The first column in the input gives the sample names and the second gives the ploidy, which can only be 1 or 2. When the 2nd column is absent, the sample ploidy is assumed to be 2. In the output, the ordering of samples will be identical to the one in FILE. [null]
-S The input is VCF instead of BCF.
-u Uncompressed BCF output (force -b).
Consensus/Variant Calling Options:
-c Call variants using Bayesian inference. This option automatically invokes option -e.
-d FLOAT When -v is in use, skip loci where the fraction of samples covered by reads is below FLOAT. [0]
當(dāng)有多個(gè)sample用于variants calling時(shí),比如多個(gè)轉(zhuǎn)錄組數(shù)據(jù)或多個(gè)重測(cè)序
數(shù)據(jù)需要比對(duì)到參考基因組上,設(shè)置該值,表明至少有該<float 0~1>比例的
samples在該位點(diǎn)都有覆蓋才計(jì)算入variant.所以對(duì)于只有一個(gè)sample的情況
下,該值設(shè)置在0~1之間沒有意義,大于1則得不到任何結(jié)果。
-e Perform max-likelihood inference only, including estimating the site allele frequency, testing Hardy-Weinberg equlibrium and testing associations with LRT.
-g Call per-sample genotypes at variant sites (force -c)
-i FLOAT Ratio of INDEL-to-SNP mutation rate [0.15]
-p FLOAT A site is considered to be a variant if P(ref|D)
-t FLOAT Scaled muttion rate for variant calling [0.001]
-T STR Enable pair/trio calling. For trio calling, option -s is usually needed to be applied to configure the trio members and their ordering. In the file supplied to the option -s, the first sample must be the child, the second the father and the third the mother. The valid values of STR are ‘pair’, ‘trioauto’, ‘trioxd’ and ‘trioxs’, where ‘pair’ calls differences between two input samples, and ‘trioxd’ (‘trioxs’) specifies that the input is from the X chromosome non-PAR regions and the child is a female (male). [null]
-v Output variant sites only (force -c)
Contrast Calling and Association Test Options:
-1 INT Number of group-1 samples. This option is used for dividing the samples into two groups for contrast SNP calling or association test. When this option is in use, the following VCF INFO will be outputted: PC2, PCHI2 and QCHI2. [0]
-U INT Number of permutations for association test (effective only with -1) [0]
-X FLOAT Only perform permutations for P(chi^2)
使用bcftools得到variant calling結(jié)果后。需要對(duì)結(jié)果再次進(jìn)行過(guò)濾。主要依據(jù)比對(duì)結(jié)果中第8列信息。其中的 DP4 一行尤為重要,提供了4個(gè)數(shù)據(jù):1 比對(duì)結(jié)果和正鏈一致的reads數(shù)、2 比對(duì)結(jié)果和負(fù)鏈一致的reads數(shù)、3 比對(duì)結(jié)果在正鏈的variant上的reads數(shù)、4 比對(duì)結(jié)果在負(fù)鏈的variant上的reads數(shù)??梢栽O(shè)定 (value3 + value4)大于某一閾值,才算是variant。比如: $ perl -ne 'print $_ if /DP4=(\d+),(\d+),(\d+),(\d+)/ && ($3+$4)>=10 && ($3+$4)/($1+$2+$3+$4)>=0.8' snp_indel.vcf > snp_indel.final.vcf 12. samtools rmdupNGS上機(jī)測(cè)序前需要進(jìn)行PCR一步,使一個(gè)模板擴(kuò)增出一簇,從而在上機(jī)測(cè)序的時(shí)候表現(xiàn)出為1個(gè)點(diǎn),即一個(gè)reads。若一個(gè)模板擴(kuò)增出了多簇,結(jié) 果得到了多個(gè)reads,這些reads的坐標(biāo)(coordinates)是相近的。在進(jìn)行了reads比對(duì)后需要將這些由PCR duplicates獲得的reads去掉,并只保留最高比對(duì)質(zhì)量的read。使用rmdup命令即可完成. Usage: samtools rmdup [-sS] -s 對(duì)single-end reads。默認(rèn)情況下,只對(duì)paired-end reads -S 將Paired-end reads作為single-end reads處理。 $ samtools input.sorted.bam output.bam |
|
|
來(lái)自: 我的生信收藏 > 《轉(zhuǎn)錄組分析軟件》