Bin-assembled Escherichia coli genomes from a study in Punjab, Pakistan

No Thumbnail Available

Restricted Availability

Date

2024-06-25, 2024-06-25

Persistent identifier of the Data Catalogue metadata

Creator/contributor

Editor

Journal title

Journal volume

Publisher

Publication Type

dataset

Peer Review Status

Repositories

Access rights

ISBN

ISSN

Description

Bin-assembled Escherichia coli genomes from Punjab, Pakistan These assemblies are a part of a cross-sectional study conducted in Punjab, Pakistan aimed at investigating E. coli colonisation diversity in healthy carriage with the use of CLED enrichment plates. About Version history v0.1.1 (current version) Added reference to the study. v0.1.0 Added brief description with a few missing parts. Distribution If you use these assemblies in your study please cite the source as appropriate. These assemblies are made available under a CC-BY 4.0 license. Citation Khawaja, T., Mäklin, T., Kallonen, T. et al. Deep sequencing of Escherichia coli exposes colonisation diversity and impact of antibiotics in Punjab, Pakistan. Nature Communications 15, 5196 (2024). https://doi.org/10.1038/s41467-024-49591-5 Methods briefly Species identification Sequencing data from the ENA project PRJEB36642 was error-corrected with fastp and pseudoaligned with Themisto against a species-level index (available from https://doi.org/10.5281/zenodo.6656881). Reads were assigned to species using the mSWEEP/mGEMS pipeline as described in https://www.nature.com/articles/s41467-022-35178-5. Lineage identification Read from the species-level bins were again pseudoaligned with Themisto against an E. coli index (will be made available in a later version). Lineage-level assignment was performed using mSWEEP and mGEMS at the level of PopPUNK sequence clusters. The created bins were screened with demix_check and bins that received a score of 1 or 2 were kept. Data in the kept bins were assembled with shovill and the bin-assembled genomes (BAGs) were quality controlled with checkm for >= 90% completeness and <= 10% contamination. Finally, BAGs shorter than 4 Mb or longer than 6 Mb were removed. Contact Tommi Mäklin <tommi'at'maklin.fi>.

Keyword (yso)

Publication Series

Journal title

Location of the original dataset