Searching for just a few words should be enough to get started. If you need to make more complex queries, use the tips below to guide you.
Article type: Research Article
Authors: Lomas-Barrie, Victora; 1 | Reyes-Camacho, Michelleb | Neme, Antonioa; *
Affiliations: [a] Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, México | [b] Postgraduation Program in Computer Science, Universidad Nacional Autónoma de México, México
Correspondence: [*] Corresponding author. Antonio Neme, Instituto de Investigaciones en Matemáticas Aplicadas y en Sistemas, Universidad Nacional Autónoma de México, México. E-mail: [email protected]; ORCID: 0000-0002-9952-1943.
Note: [1] ORCID: 0000-0003-2281-0891. All authors contributed equally.
Abstract: Horizontal gene transference is a biological process that involves the donation of DNA or RNA from an organism to a second, unrelated organism. This process is different from the more common one, vertical transference, which is present whenever an organism or pair of organisms reproduce and transmit their genetic material to the descendants. The identification of segments of genetic material that are the result of horizontal transference is relevant to construct accurate phylogenetic trees, on one hand, and to detect possible drug-resistance mechanisms, on the other, since this movement of genetic material is the main cause behind antibiotic resistance in bacteria. Here, we describe a novel algorithm able to detect sequences of foreign origin, and thus, possible acquired via horizontal transference. The general idea of our method is that within the genome of an organism, there might be sequences that are different from the vast majority of the remaining sequences from the same organism. The former are candidate anomalies, and thus, their origin may be explained by horizontal transference. This approach is equivalent to a particular instance of the authorship attribution problem, that in which from a set of texts or paragraphs, almost all of them were written by the same author, whereas a minority has a different authorship. The constraint is that the author of each text is not known, so the algorithm has to attribute the authorship of each one of the texts. The texts detected to be written by a different author are the equivalent of the sequences of foreign origin for the case of genetic material. We describe here a novel method to detect anomalous sequences, based on interpretable embeddings derived from a common attention mechanism in humans, that of identifying novel tokens within a given sequence. Our proposal achieves novel and consistent results over the genome of a well known organism.
Keywords: Horizontal gene transference, anomaly detection, embeddings, natural language processing, genomics
DOI: 10.3233/JIFS-219337
Journal: Journal of Intelligent & Fuzzy Systems, vol. Pre-press, no. Pre-press, pp. 1-12, 2024
IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA
Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands
Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]
For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]
Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China
Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]
For editorial issues, like the status of your submitted paper or proposals, write to [email protected]
如果您在出版方面需要帮助或有任何建, 件至: [email protected]