We are pleased to announce the availability of Genomedata 1.4.0!
This is the first major release in a number of years with a few notable enhancements.
Writing to archives is now possible. Write operations are significantly slower than read operations which is what Genomedata were originally designed for. Writes to existing archives should only be done on small
scales since each write can take on the order of seconds. Genome-wide alteration of archives with writes is not recommended. Create a new archive if you were planning on doing a genome-wide alteration of an
There is support to mask data as it is loaded in. Any overlaps between the mask, as a bed file, and your existing data is discarded. This is a
convenience step for users and requires bedtools to be installed.
AGP support has been fixed with improved support allowing more accurate representation of the underlying assembly.
Metadata in Genomedata archives regarding where data is considered present across contigs has been improved by trimming telometric regions and large empty gaps between contigs. Segway automatically takes advantage of this metadata. Performance
in Segway using these newer archives will likely improve since it will consider fewer large regions where data is not present.
Please let me know if you have any comments on Genomedata, its documentation, web site, installation, or anything else. (The preferred place to make these comments is on the genomedata-users mailing list or by reporting an issue on the issue tracker, both linked
from the main web site).
Here is a full list of changes made since the last announcement here:
* Genome: add ability to open archives for writing
* genomedata-close-data: chunk metadata now truncates telomeres and trims large
gaps between supercontigs
* genomedata-load-data: new option for masking data with --maskfile
* genomedata-hardmask: new command added to filter out track regions
* hardmask_data: new python interface to filter out track regions
* genomedata-load-seq: AGP are now correctly loaded regardless of filename and
may be concatenated together
* genomedata-load-seq: fix assertion failure on argument parsing when loading
fasta sequence (thanks to Kate Cook)
* genomedata-load: fix agp files not being recognized from this entry point
* docs: clarified that agp files cannot be combined
* docs: warned users that globs must be quoted to be parsed by genomedata-load