Olga Krebs Data, operations and models in SEEK: storing, interlinking, finding, and reusing
Maxim Zakharsev Topological analysis of the stoichiometric models of metabolic networks
Alexey Kolodkin Use modeling to make data predictive
Docu can be found here
Register here (local SEEK instance)
FAIRDOMHub (public instance) with real projects
Anastasia Bakulina Homology modeling of proteins
The introduction of the database of protein structures and classifications of PDB protein structures SCOP and CATH. The introduction of the servers that help to build a model protein (Genesilico metaserver, Swiss-Model, I-tasser). Running of models of sequence on different servers.
Working with the program of three-dimensional structures of proteins in PDB format (Accelrys Visualizer, PyMOL) view and theirs main features. The study of the PDB format.
Structural alignment. Comparison of the models obtained by different methods, and real structures. Using PyMOL and Accelrys Visualizer for preparing the illustrations.
UGENE Data analysis of high-throughput sequencing
The team will be presenting the task of high-throughput sequencing at the following examples: the Ebola virus and breast cancer
To visit our practice session all participants should bring their labtop with established recommended software:
1) UGENE 1.24
All classes will be conducted using UGENE program. Set UGENE, version 1.24, FULL package can link: ugene.net
2) JDK and JRE
Java Development Kit (JDK) and Java Runtime Environment (JRE) will need to use some tools NGS.
To install, you can follow the link.
It is desirable to establish cummeRbund. It will be used for problems with RNA-Seq. Instructions for setting can be viewed here.
4) Mac OS X or Linux
Some tasks, such as “Intro” and “Variant analysis” will be available for any operating system, but the solution of such tasks as “De novo assembly” and “Transcriptomics” is only available for operating systems such as Mac OS X or Linux.
If students computers will be on Windows operating system, it will be possible to miss the start of some of the tools and show the output.
Saik Olga “Application of ANDSystem for the reconstruction and analysis of
molecular-genetics networks, associated with diseases and phenotypic traits
What is needed for the practice:
1) Laptop is desirable.
2) Python 2.7 programming language
3) Notepad ++ for Windows (https://notepad-plus-plus.org/) or Gedit for
4) ANDSystem available at http://www-bionet.sscc.ru/andvisio/
5) Cytoscape is an open source software platform for visualizing complex
“Yuriy Bukin Comprehensive bioinformatics analysis of the data using the R programming language”
Ivo Grosse Multiple Testing
The fundamental problem of multiple testing occurs in bioinformatics and computational biology when high-throughput data are being analyzed and multiple statistical tests are being performed simultaneously. For example, this occurs when identifying differentially expressed genes or alternatively spliced isoforms from RNA-seq data, where the statistical significance of gene expression changes or of alternative splicing events must be estimated for every gene or for every isoform simultaneously. Several approaches for quantifying the probability of making false positive predictions, i.e., type-I errors, such as the family-wise error rate (FWER) or the false discovery rate (FDR) have been proposed, and several correction procedures that control the FWER or the FDR have been developed in the last decades. In this course, the participants of the summer school will learn about these approaches and several correction procedures including one-step correction procedures such as the Sidak procedure or the Bonferroni procedure, step-down correction procedures such as the Holm-Sidak procedure or the Holm-Bonferroni procedure, and step-up correction procedures such as the Hochberg procedure, the Benjamini-Hochberg procedure, or the Benjamini-Yekutieli procedure.