BBS phase 1 & phase 2 high quality E. coli bin assembled genomes

No Thumbnail Available

Restricted Availability

Date

2024-11-07, 2024-11-07

Persistent identifier of the Data Catalogue metadata

Creator/contributor

Editor

Journal title

Journal volume

Publisher

Publication Type

dataset

Peer Review Status

Repositories

Access rights

ISBN

ISSN

Description

1,402 Escherichia coli bin assembled genomes derived from the metagenome data collected as part of the BabyBiome study (BBS) phase 1 & phase 2. The data in this upload was first published as part of "Group 2 and 3 ABC-transporter dependant K-antigen loci contribute significantly to variation in the invasive potential of Escherichia coli"  (Gladstone et al. 2024, to be released). Files Assembly data: BBS_E_coli_BAGs.tar: Archive containing sequences of the 1,402 bin assembled genomes. BBS_E_coli_metadata.tsv: Table linking the sequence assemblies to the subject data. Capsule predictions: BBS_E_coli_Kaptive_output.csv: Capsule predictions for all sequence data. BBS_E_coli_deduplicated_sequences_IDs.txt: Filenames for assemblies that constitute the 873 deduplicated sequences analysed in Gladstone et al. 2024. Quality control data: BBS_E_coli_demix_check_scores.tsv: Output from demix_check for the sequence assemblies. BBS_E_coli_checkm_results.tsv: Output from checkm. BBS_E_coli_gunc_results.tsv: Output from gunc. Methods Bin assembled genomes Source data: BBS phase 1: Shao et al. 2019 BBS phase 2: Shao et al. 2024 The data was produced using the mSWEEP and mGEMS pipeline (Mäklin et al. 2020 & Mäklin et al. 2021) following the steps described in Khawaja, Mäklin, Kallonen, et al. 2024. Quality control The BAGs in this upload were filtered with demix_check (https://github.com/harry-thorpe/demix_check) and only those with a quality score 1 or 2 are included. For the capsule type annotations, contigs shorter than 5,000bp were removed but the short contigs are still present in the uploaded files). Further QC data is available from checkm (Parks et al. 2015) and gunc (Orakov et al. 2022) results. Multilocus sequence typing Sequence type (ST) was determined using fastmlst (Guerrero-Araya et al. 2021) with the `ecoli#1` database. PopPUNK  clustering Sequence clusters (SC) correspond to the database available from https://zenodo.org/records/12528310 and were created using PopPUNK (Lees et al. 2019). Construction is described in Khawaja, Mäklin, Kallonen, et al. 2024. Capsule type annotations The capsule type annotations were created using Kaptive (Lam et al. 2022) with an E. coli specific database available from https://github.com/rgladstone/EC-K-typing and described in Gladstone et al. 2024.

Keyword (yso)

Publication Series

Journal title

Location of the original dataset