Contents

IV106 Bionformatics seminar

2023-02-07 1496 words 8 mins read

There is no specific theme for this semester, so anything that falls within the broader definition of Bioinformatics goes. We will be sharing several guests with a “sister seminar” at CEITEC, described here:

The Bioinformatics Seminar is a seminar series organized together with the Faculty of Informatics in the interdisciplinary fields of Bioinformatics, Biostatistics, and Computational Biology. The seminar series aims to invite exciting speakers working on the current state-of-the-art bioinformatics problems. The goal is to broaden bioinformatics knowledge for scientists and students and promote mutual connections and interactions between bioinformaticians at MU. The seminars are held alternately at the CEITEC MU (E35/211) and FI MU (A319) and also streamed online every other Wednesday from 16:00.

Link to CEITEC Bioinformatics Seminar webpage

Please join us for the talks below, either in person or using the MSTeams platform (click on the title). All times are Central European (Prague, Vienna, Budapest). Please, note, that not all seminars will be held jointly. Contact lexa @ fi.muni.cz or vojtech.bystry @ ceitec.muni.cz for additional information.

15.2. 4 PM

FI MU
Introduction to the seminar
Students Only

22.2. 4 PM

Invited Talk
Matej Trojak, Sybila, FI MU
In systems biology, models play a crucial role in understanding studied systems. There are many modelling approaches, among which a rule-based framework provides a way for describing systems in a concise and understandable form. In this talk, the speaker will explain the key principles employed by the rule-based approach. It will be demonstrated using the so-called Biochemical Space Language, a representative of the languages with rule-based features. The talk will also include a brief demonstration of eBCSgen, a software tool to support modelling and analysis using BCSL, integrated within the Galaxy framework.
Biochemical Space Language: a rule-based modelling formalism for describing biochemical processes

1.3. 4 PM

Invited Talk
Sung won Lim, Binomica Labs, New York, USA
SWL is an amateur researcher located in New York City, working with a small research group of like-minded independent researchers called Binomica Labs. A warehouse worker with no formal academic training, he’s sequenced and assembled a number of microbial genomes, including first complete genome of Deinococcus radiophilus and first genome of Halococcus dombrowskii, revealing another genus of Haloarchaea containing plasmid-carried rRNA operons. He’s currently collaborating with a number of biology labs on different projects, focusing on microbial phylogeny and phage biology.
Independent research circa 2023; an amateur biology perspective

8.3. 4 PM

Invited Talk (CEITEC)
dr.Petr Simecek, Bioinformatics Core Facility, CEITEC MU
The field of natural language processing (NLP) has undergone a significant transformation with the emergence of large language models, such as GPT3.5, OPT, and BLOOM. More recently, similar neural network architectures have been adapted to genomics and proteomics, paving the way for exciting developments in these domains. During this presentation, we will examine the current protein language models, such as ProtDistillBert, ProtBertBFD, and ESM2, and illustrate how to fine-tune them to suit specific tasks. Moreover, we will elucidate how protein embeddings encapsulate both evolutionary and functional information. To conclude, we will showcase this methodology on the problem of detection of knotted proteins, i.e. proteins whose backbones are intricately tangled in knots. To be specific, we will classify knotted proteins based solely on their protein sequence.
Could we have ChatGPT for proteins?

15.3. 4 PM

Invited Talk
dr.Karel Sedlář, Institut für Informatik, Ludwig-Maximilians-University of Munich, Germany
White biotechnology, i.e., a technology that uses living cells to synthesize easily degradable products, is the key to the transition from a linear oil-based to a circular bio-based economy. Suitable microbes are selected by their potential functional capacity revealed by computational analyses of their genomes. Nevertheless, bioinformatics analyses of non-model organisms are specific because many computational tools require the use of datasets that are unavailable for novel bacteria due to non-existing microbiological kits to perform desired experiments or simply due to missing knowledge of required input data. As there are hundreds of computational tools and packages to perform various kinds of bioinformatics analyses providing different results, their combination is usually needed to infer novel knowledge. In this talk, we’ll go through some basic steps in analyses of bacterial genomes, we’ll show how to combine multiple tools, and we’ll try to adjust current methods to datasets they were not originally designed for.
Computational Analyses and Functional Annotations of Non-Model Bacteria for White Biotechnology

22.3. 4 PM

FI MU Journal Club.
Guman, Kubin+Olajec Student presentations (Students Only)

29.3. 4 PM

Invited Talk (CEITEC)
Tomáš Pluskal, IOCB Prague
Although plants are an incredibly rich source of pharmaceutically relevant specialized metabolites, biosynthetic pathway elucidation in plants has proven challenging. Unlike bacteria and many fungal species that contain biosynthetic operons, the genes of a given plant typically scatter randomly across the genome, making pathway discovery via genome mining nearly impossible. My lab is developing generalized workflows for connecting biosynthetic gene sequences (RNAseq data) to their downstream metabolites (LC-MS data). For this, I will demonstrate a top-down approach, which is based on correlating expression levels of enzymes with metabolite abundance across a plant family, and a bottom-up approach, which is based on predicting the substrate specificity and function of individual biosynthetic enzymes directly from their sequences using deep learning.
Decoding the chemical language of plants

5.4. 4 PM

Invited Talk
Jordan M Eizenga, University of Santa Cruz, Genomics Institute, USA
Many bioinformatics pipelines crucially depend on a reference genome: an assembly that stands in for the genome of a species. However, due to genomic variation, there is a limit to how well a single assembly can represent an entire species. This introduces a pervasive reference bias where analyses become less accurate whenever a sample’s genome differs from the reference. The field of pangenomics has emerged to combat this bias, mostly by using sequence graphs that incorporate genomic variation as an alternative to conventional reference genomes. This talk will survey some of the recent developments in human pangenomics, including the Human Pangenome Reference Consortium’s creation of high-quality pangenome data resources. It will also discuss some of the applications of pangenomes, with an emphasis on the presenter’s research into pantranscriptomics: methods that incorporate pangenomic techniques into transcriptomic analyses.
Sequence graph methods for the analysis of the human pangenome and pantranscriptome

12.4. 4 PM

FI MU Journal Club.
Michalik+Sloup, Sramkova+Slany
Student presentations (Students Only)

19.4. 4 PM

FI MU Journal Club.
Melus+Varga, Charpentier+Lostak
Student presentations (Students Only)

26.4. 4 PM

Invited Talk (CEITEC)
!!CANCELLED!! Adam Krejčí, Myllia Biotechnology, Vienna, Austria
**
Pooled CRISPR screens with single-cell readout - applications & challenges

3.5. 4 PM

FI MU Journal Club.
Kadasi+Pavlik, Ratajik
Student presentations (Students Only)

10.5. 4 PM

Invited Talk
!!CANCELLED!! Monika Cechova, FI MUNI, currently University of Santa Cruz, Genomics Institute, USA
**
Complete genomes of a multi-generational pedigree to expand studies of genetic and epigenetic inheritance

17.5. 4 PM

Invited Talk (CEITEC)
Karolína Trachtová, Christian Doppler Laboratory for Applied Metabolomics, Medical University of Vienna
Next-generation sequencing (NGS) is a powerful method that enables massive parallel sequencing of millions of DNA or RNA fragments. However, despite its widespread use, there is still a need for comprehensive bioinformatical approaches for NGS data analysis, particularly in the field of small RNA research. Accurate identification and quantification of the full spectrum of small RNA classes, including snoRNA, snRNA, piRNA, and isomiRs, is critical for obtaining reliable results. Unfortunately, most existing pipelines only focus on microRNA and ignore other important small RNA classes.

To address this issue, we have developed a novel bioinformatic pipeline for the accurate quantification of various small RNA classes. Our pipeline consists of stand-alone modules, each of which is dedicated to a specific part of the sequencing data analysis, including quality control, pre-processing, RNA quantification, and differential expression analysis. The most crucial is the RNA quantification module, where a successive number of mapping rounds ensure accurate quantification of all different small non-coding RNAs. To achieve this, we have created a custom Python tool that counts reads assigned to different small RNAs while addressing the issue of multi-loci RNAs (such as piRNA) and overlapping RNA annotations.

To aid in the interpretation of results, each module generates a comprehensive PDF/HTML report that includes tables, plots, and explanations. The report guides users in further exploring various small RNA expression levels. Moreover, we have developed an interactive application implemented in Shiny that allows real-time visualization of differential expression results. This application enables users to easily modify the content and appearance of popular plots such as heatmaps, principal component analysis (PCA), and volcano plots, making it ideal for publication-ready figures.

Overall, our novel bioinformatic pipeline offers a comprehensive approach to the analysis of small RNA sequencing data. Our pipeline is flexible, scalable, and user-friendly, providing researchers with a valuable tool for exploring the complex landscape of small RNAs.
Bioinformatic Pipeline for Comprehensive Analysis of Small RNA-seq Data

ANNOUNCEMENT: If you are looking for a late-summer bioinformatics and computational biology conference with a slightly Central European focus in the settings of the High Tatras (Slovakia) mountain range - please consider WBCB 2023 (conference language is English): http://wbcb.biocenter.sk/


author

Authored By MC and ML

Bioinformatics Group at FI MU. This article is licensed under a Creative Commons Attribution 4.0 International License.

This website uses cookies to ensure you get the best experience on our website. Learn more Got it