一、读取
pybedtools主要是使用BedTool对所有参考格式进行读取, 不但能够读取bed,gff, gtf,还可以读取gz等格式。
from pybedtools import BedTool
snps = BedTool('snps.bed.gz') # [1]
genes = BedTool('hg19.gff') # [1]
pybedtools.bedtool.BedTool
class?pybedtools.bedtool.BedTool(fn=None,?from_string=False,?remote=False)[source]
__init__(fn=None,?from_string=False,?remote=False)[source]
Wrapper around Aaron Quinlan's?BEDtools?suite of programs (https://github.com/arq5x/bedtools); also contains many useful methods for more detailed work with BED files.
fn?is typically the name of a BED-like file, but can also be one of the following:
a string filename
another BedTool object
an iterable of Interval objects
an open file object
a "file contents" string (see below)
If?from_string?is True, then you can pass a string that contains the contents of the BedTool you want to create. This will treat all spaces as TABs and write to tempfile, treating whatever you pass as?fn?as the contents of the bed file. This also strips empty lines.
Typical usage is to point to an existing file:
a=BedTool('a.bed')
But you can also create one from scratch from a string:
>>> s='''... chrX? 1? 100... chrX 25? 800... '''>>> a=BedTool(s,from_string=True)
Or use examples that come with pybedtools:
>>> example_files=pybedtools.list_example_files()>>> assert'a.bed'inexample_files>>> a=pybedtools.example_bedtool('a.bed')
二、写出
如果你想要保存结果为一个句柄,以便后续使用,使用BedTool.saveas()方法。通过Bedtool复制该文件进行操作。这个方法同事也会让你有选择的上传UCSC基因组浏览器的一个特征,而不是打开这些文件后,手动添加trackline。
>>> c=a_with_b.saveas('intersection-of-a-and-b.bed',trackline='track name="a and b"')
>>> print(c.fn)
intersection-of-a-and-b.bed
>>> # opening the underlying file shows the track line
>>> print(open(c.fn).read())
track name="a and b
"chr1? ? ? ? 155? ? 200? ? feature2? ? ? ? 0? ? ? +chr1? ? ? ? 155? ? 200? ? feature3? ? ? ? 0? ? ? -chr1? ? ? ? 900? ? 901? ? feature4? ? ? ? 0? ? ? +
>>> # printing file-based BedTool objects will not print the track line
>>> print(c)
chr1? ? ? ? 155? ? 200? ? feature2? ? ? ? 0? ? ? +chr1? ? ? ? 155? ? 200? ? feature3? ? ? ? 0? ? ? -chr1? ? ? ? 900? ? 901? ? feature4? ? ? ? 0? ? ? +
值得注意的是BedTool.saveas()方法是不是返回一个新的BedTool目标,这个目标指向于硬盘上新创建的文件。也可以允许你在你的多命令链中插入该命令。
BedTool.saveas(*args,?**kwargs)[source]
Make a copy of the BedTool.
Optionally adds?trackline?to the beginning of the file.
Optionally compresses output using gzip.
if the filename extension is .gz, or compressed=True, the output is compressed using gzip
Returns a new BedTool for the newly saved file.
A newline is automatically added to the trackline if it does not already have one.
Example usage:
>>> a=pybedtools.example_bedtool('a.bed')>>> b=a.saveas('other.bed')>>> b.fn'other.bed'>>> print(b==a)True
>>> b=a.saveas('other.bed',trackline="name='test run' color=0,55,0")>>> open(b.fn).readline()"name='test run' color=0,55,0\n">>> ifos.path.exists('other.bed'):... os.unlink('other.bed')
另外,如果你不想要加入一个track line,你也可以使用?BedTool.moveto() ,这个方法比较快,比较适合大文件。这个命名是重命名,而不是进行复制,也就意味着,如果试图使用原来的文件,就不会奏效,因为那个文件已经补存在了。
慎用
>>> d=a_with_b.moveto('another_location.bed')
BedTool.moveto(*args,?**kwargs)[source]?
Move to a new filename (can be much quicker than BedTool.saveas())
Move BED file to new filename,?fn.
Returns a new BedTool for the new file.
Example usage:
>>> # make a copy so we don't mess up the example file>>> a=pybedtools.example_bedtool('a.bed').saveas()>>> a_contents=str(a)>>> b=a.moveto('other.bed')>>> b.fn'other.bed'>>> b==a_contentsTrue