Escherichia coli 13k assemblies PopPUNK database
dc.contributor.affiliation | Department of Computer Science, University of Helsinki, Finland-Mäklin, Tommi | |
dc.contributor.author | Mäklin, Tommi | |
dc.date.accessioned | 2025-04-29T14:02:46Z | |
dc.date.issued | 2022-03-01 | |
dc.date.issued | 2022-03-01 | |
dc.description | # _Escherichia coli_ 13k reference ## PopPUNK database files ## v1.0.0 (1 March 2022) ### Description This tarball contains the PopPUNK v2.4.0 [1] database files of a clustering for the 13435 _E. coli_ assemblies from three studies [2-4]. A file matching the clustering with the multilocus sequence types [5] (identified using mlst v2.19.0 [6]) is provided in `ecoli_sequence_information.tsv`. The corresponding assemblies and a Themisto v2.1.0 [7] pseudoalignment index are also available as separate uploads in Zenodo. __Note:__ the `esc_ra9772aa_as` entry in the PopPUNK files does not have a corresponding assembly nor is it included in the Themisto pseudoalignment index or the `ecoli_sequence_information.tsv` file. This is because the entry for this sequence was corrupted in the original run of PopPUNK. ### Files - `pop_db`: the PopPUNK sketch files. - `pop_fit_dbscan`: the initial DBSCAN fit for the sketch. - `pop_fit_refined`: the final refined version of the DBSCAN fit. - `pop_fit_refined_viz`: microreact visualisation files from the refined fit. - `ecoli_sequence_information.tsv`: a tab-separated text file containing the MLST types and the PopPUNK clusters. ### References - [1] Lees J et al., _Fast and flexible bacterial genomic epidemiology with PopPUNK._ https://doi.org/10.1101/gr.241455.118 - [2] Horesh G et al., _A comprehensive and high-quality collection of Escherichia coli genomes and their genes._ https://doi.org/10.1099/mgen.0.000499 - [3] Gladstone R et al., _Emergence and dissemination of antimicrobial resistance in Escherichia coli causing bloodstream infections in Norway in 2002–17: a nationwide, longitudinal, microbial population genomic study._ https://doi.org/10.1016/S2666-5247(21)00031-8 - [4] Shao Y et al., _Stunted microbiota and opportunistic pathogen colonization in caesarean-section birth._ https://doi.org/10.1038/s41586-019-1560-1 - [5] Jolley K et al., _Open-access bacterial population genomics: BIGSdb software, the PubMLST.org website and their applications._ https://doi.org/10.12688/wellcomeopenres.14826.1 - [6] Seemann T, _mlst_ _GitHub._ https://github.com/tseemann/mlst - [7] Mäklin T et al., _Bacterial genomic epidemiology with mixed samples._ https://doi.org/10.1099/mgen.0.000691 | |
dc.identifier | https://doi.org/10.5281/zenodo.6320571 | |
dc.identifier.uri | https://datakatalogi.helsinki.fi/handle/123456789/5356 | |
dc.rights.license | cc-by-4.0 | |
dc.subject | escherichia coli | |
dc.subject | poppunk | |
dc.title | Escherichia coli 13k assemblies PopPUNK database | |
dc.type | dataset |