Introduction To Bioinformatics at Universität Für Bodenkultur Wien | Flashcards & Summaries

Lernmaterialien für Introduction to bioinformatics an der Universität für Bodenkultur Wien

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Introduction to bioinformatics Kurs an der Universität für Bodenkultur Wien zu.

TESTE DEIN WISSEN

2. Explain the two experimental methods to determine protein
structure! What parameters are measured? How do we get the
structure?

Lösung anzeigen
TESTE DEIN WISSEN

a. X-Ray Crystallography:
Proteins need to be grown into large crystal, in which their position is fixed.
Sending x-rays, the x-rays are deflected by the electron clouds surrounding
the atoms in the crystal, producing a regular pattern of diffraction. The
diffraction patterns can be converted into an electron map using Fourier
transformation. Parameters: Phase in diffraction data



b. NMR Spectroscopy:
It is based on the detection of spinning patterns in atomic nuclei in a magnetic
field. Protein samples are labelled with radioactive C13 and N15 isotopes. The
radio frequency radiation induces transition between nuclear spin states and
the radio signals can be interpreted. The proximity and distance between
labelled atoms can be determined

Lösung ausblenden
TESTE DEIN WISSEN

24. What is the difference between heuristic and dynamic  
methods and when do we use which?

Lösung anzeigen
TESTE DEIN WISSEN

 Exhaustive dynamic programming takes very long (time consuming due to complexity
and computational intensity) but are very accurate.
Heuristic methods are faster but not that detailed.



Dynamic programming:
• Needleman-Wunsch (global)
• Smith-Waterman (local)



Heuristic programming:
• Pairwise (word method for fast sequence alignment)
o BLAST
o FASTA
• Multiple sequence alignemtn
o Progressive
o Iterative
o Blockwise

Lösung ausblenden
TESTE DEIN WISSEN

10. Explain 3 non-experimental methods to get protein models!
Why do they work?

Lösung anzeigen
TESTE DEIN WISSEN

a. Homology modelling:
Homology modelling relies on previous knowledge and the structure is based
on sequence homology. If proteins share a high enough sequence similarity,
they are likely to have a similar 3D-structure. The production of an all-atom
model is based on alignment with template proteins:
• Search for homologous proteins in the database
• Align sequences
• Determine structurally conserved regions
• Determine coordinates

b. Protein threading:
Protein threading predicts the fold of an unknown protein sequence by fitting
the sequence into a structural database and selecting the best fitting model,

c. Ab initio structure prediction:
It is based on a single query sequence and measures the relative propensity of
each amino acid belonging to a certain secondary structure element (scores
derived from crystal structures). Statistical programs then predict the
secondary structure elements.

Lösung ausblenden
TESTE DEIN WISSEN

6. Describe the classification of proteins based on secondary structure!

Lösung anzeigen
TESTE DEIN WISSEN

To know the relationship among the structures (hierarchical classification system):
c. Remove redundancy from databases
d. Separate structurally distinct domains within the structure (manually or with
algorithms)
e. Grouping proteins/domains of similar structures and clustering them



According to Levitt and Chothia, domain structures can be classified into 3 main
classes:
• a-domains: core built up exclusively from a-helices
• b-domains: usually 2 antiparallel b-sheets packed against each other
• a/b-domains: combinations of b-a-b motifs; parallel b-sheets surrounded by
a-helices
Two main databases:
a) SCOP:
• Based on manual examination of structures
• Grouped in: classes, folds, superfamilies and families
• Classes consist of fold with similar core structure



b) CATH:
• Proteins are classified based on automatic structural alignment
program, and manual comparison
• Grouped in: class, architecture, Topology, homologous superfamily,
homologous family


Lösung ausblenden
TESTE DEIN WISSEN

1. What are the 4 basic steps to establish a phylogenetic tree?

Lösung anzeigen
TESTE DEIN WISSEN

a) Choose molecular markers
• Closely related organisms à nucleotide sequence
• Far related organisms à protein sequence



b) Perform a multiple sequence alignment
• It establishes the positional correspondence of sequences
• Incorrect alignments result in systematic error



c) Choose tree building method
• Based on distance (amount of dissimilarity) between pairs of
sequences
• Based on discrete characters (sequence), the basic assumption is that
characters at corresponding positions (in an MSA) are homologous
among the sequences



d) Assess tree reliability
• How reliable is the tree (or branch)?
• Is this tree better than another?
• Jackknifing
• Bootstrapping (parametric, non-parametric)

Lösung ausblenden
TESTE DEIN WISSEN

20. Is regression a supervised or an unsupervised method?

Lösung anzeigen
TESTE DEIN WISSEN

Regression is a supervised method. Supervised classification is the classification of
data into a set of predefined categories.

Lösung ausblenden
TESTE DEIN WISSEN

18. What is the molecular clock hypothesis and how is it used?

Lösung anzeigen
TESTE DEIN WISSEN

This hypothesis is used to estimate the time of occurrence of speciation or mutation
events by using fossil evidence or DNA/protein sequences. If molecular evolve at
constant rates, the amount of accumulated mutations is proportional to evolutionary
time. But the problem is that uniformity of evolutionary rates is rarely found because
of:
• Changing generation times
• Population size
• Species-specific differences
• Evolving functions of the encoded protein
• Changes in the intensity of natural selection

The “strict” clock assumes perfectly constant rates of evolution whereas the
“relaxed” clock uses different evolutionary rates on different branches.
Calibration: Individual molecular clocks can be tested for accuracy, they need to be
calibrated against material evidence, such as fossils. Over long time spans, estimates
can be off by 50% or more.

Lösung ausblenden
TESTE DEIN WISSEN

11. Explain the Ramachandran plot and how it can be used to
evaluate models!

Lösung anzeigen
TESTE DEIN WISSEN

The Ramachandran plot is a computer model that allows us to visualize the
energetically stable conformations of the bond angles of j against f for each of the
amino acids in a protein structure.
Rotation of the polypeptide backbone is limited to two angles (because of the planar
structure). The plot writes j and f against each other and maps the entire
conformational space of a peptide and shows allowed and disallowed regions. j is
not allowed to be 0 degrees because two oxygen molecules would bump into each
other. If j and f are 0 degrees, a hydrogen and an oxygen molecule would bump
into each other. The most stable conformation is at 180 degrees.

Lösung ausblenden
TESTE DEIN WISSEN

12. Explain the b-barrel!

Lösung anzeigen
TESTE DEIN WISSEN

d. Closed barrel:
The closed barrel has a simple structure – each successive b-strand is added
next to the previous b-strand until the last one is joined by hydrogen
bonds to the first b-strand. The strands are antiparallel and connected by
hairpins. They’re often hydrophilic inside and hydrophobic outside.



e. Jelly roll barrel:
The jelly roll structure consists of 8 b-strands arranged in two four-stranded
antiparallel b-sheets that pack together across a hydrophobic interface



f. TIM barrel:
The TIM barrel is a conserved protein fold consisting of 8 a-helices and
1. b-sheets (parallel) along the peptide backbone.

Lösung ausblenden
TESTE DEIN WISSEN

13. Is clustering a supervised or an unsupervised method?

Lösung anzeigen
TESTE DEIN WISSEN

Clustering is an unsupervised method. It does not assume predefined categories and
it identifies data categories according to similar patterns. When clustering the group
patterns get turned into clusters of genes with correlated profiles.

Lösung ausblenden
TESTE DEIN WISSEN

16. What is k-means clustering?

Lösung anzeigen
TESTE DEIN WISSEN

The k-means algorithm is an iterative procedure and depends on the (randomly)
chosen starting values.



Algorithm:
• Initially (at step 0), choose k observations by random; these points represent
the initial cluster centroids
• Then calculate the distances of each object to all centroids and assign it to a
cluster which has the nearest centroid
• When all objects have been assigned, recalculate the centroids of the k
clusters
• Repeat the last two steps until a maximum number of iterations is reached or
the centroids no longer change



k-means clustering:
• Classification of data through a single step partition
• Divisive approach (all data into single cluster and then dividing into smaller
groups according to similarity)

Lösung ausblenden
TESTE DEIN WISSEN

25. What are the 4 amino acids that are found in the core of
globular proteins?

Lösung anzeigen
TESTE DEIN WISSEN

• Leucine à participating in hydrophobic interactions
• Isoleucine à participating in hydrophobic interactions
• Methionine à non reactive side chains
• Valine à non reactive side chains

• Phenalynine à aromatic interactions (p-stacking)
• Tyrosine à aromatic interactions (p-stacking)

Lösung ausblenden
  • 30403 Karteikarten
  • 908 Studierende
  • 19 Lernmaterialien

Beispielhafte Karteikarten für deinen Introduction to bioinformatics Kurs an der Universität für Bodenkultur Wien - von Kommilitonen auf StudySmarter erstellt!

Q:

2. Explain the two experimental methods to determine protein
structure! What parameters are measured? How do we get the
structure?

A:

a. X-Ray Crystallography:
Proteins need to be grown into large crystal, in which their position is fixed.
Sending x-rays, the x-rays are deflected by the electron clouds surrounding
the atoms in the crystal, producing a regular pattern of diffraction. The
diffraction patterns can be converted into an electron map using Fourier
transformation. Parameters: Phase in diffraction data



b. NMR Spectroscopy:
It is based on the detection of spinning patterns in atomic nuclei in a magnetic
field. Protein samples are labelled with radioactive C13 and N15 isotopes. The
radio frequency radiation induces transition between nuclear spin states and
the radio signals can be interpreted. The proximity and distance between
labelled atoms can be determined

Q:

24. What is the difference between heuristic and dynamic  
methods and when do we use which?

A:

 Exhaustive dynamic programming takes very long (time consuming due to complexity
and computational intensity) but are very accurate.
Heuristic methods are faster but not that detailed.



Dynamic programming:
• Needleman-Wunsch (global)
• Smith-Waterman (local)



Heuristic programming:
• Pairwise (word method for fast sequence alignment)
o BLAST
o FASTA
• Multiple sequence alignemtn
o Progressive
o Iterative
o Blockwise

Q:

10. Explain 3 non-experimental methods to get protein models!
Why do they work?

A:

a. Homology modelling:
Homology modelling relies on previous knowledge and the structure is based
on sequence homology. If proteins share a high enough sequence similarity,
they are likely to have a similar 3D-structure. The production of an all-atom
model is based on alignment with template proteins:
• Search for homologous proteins in the database
• Align sequences
• Determine structurally conserved regions
• Determine coordinates

b. Protein threading:
Protein threading predicts the fold of an unknown protein sequence by fitting
the sequence into a structural database and selecting the best fitting model,

c. Ab initio structure prediction:
It is based on a single query sequence and measures the relative propensity of
each amino acid belonging to a certain secondary structure element (scores
derived from crystal structures). Statistical programs then predict the
secondary structure elements.

Q:

6. Describe the classification of proteins based on secondary structure!

A:

To know the relationship among the structures (hierarchical classification system):
c. Remove redundancy from databases
d. Separate structurally distinct domains within the structure (manually or with
algorithms)
e. Grouping proteins/domains of similar structures and clustering them



According to Levitt and Chothia, domain structures can be classified into 3 main
classes:
• a-domains: core built up exclusively from a-helices
• b-domains: usually 2 antiparallel b-sheets packed against each other
• a/b-domains: combinations of b-a-b motifs; parallel b-sheets surrounded by
a-helices
Two main databases:
a) SCOP:
• Based on manual examination of structures
• Grouped in: classes, folds, superfamilies and families
• Classes consist of fold with similar core structure



b) CATH:
• Proteins are classified based on automatic structural alignment
program, and manual comparison
• Grouped in: class, architecture, Topology, homologous superfamily,
homologous family


Q:

1. What are the 4 basic steps to establish a phylogenetic tree?

A:

a) Choose molecular markers
• Closely related organisms à nucleotide sequence
• Far related organisms à protein sequence



b) Perform a multiple sequence alignment
• It establishes the positional correspondence of sequences
• Incorrect alignments result in systematic error



c) Choose tree building method
• Based on distance (amount of dissimilarity) between pairs of
sequences
• Based on discrete characters (sequence), the basic assumption is that
characters at corresponding positions (in an MSA) are homologous
among the sequences



d) Assess tree reliability
• How reliable is the tree (or branch)?
• Is this tree better than another?
• Jackknifing
• Bootstrapping (parametric, non-parametric)

Mehr Karteikarten anzeigen
Q:

20. Is regression a supervised or an unsupervised method?

A:

Regression is a supervised method. Supervised classification is the classification of
data into a set of predefined categories.

Q:

18. What is the molecular clock hypothesis and how is it used?

A:

This hypothesis is used to estimate the time of occurrence of speciation or mutation
events by using fossil evidence or DNA/protein sequences. If molecular evolve at
constant rates, the amount of accumulated mutations is proportional to evolutionary
time. But the problem is that uniformity of evolutionary rates is rarely found because
of:
• Changing generation times
• Population size
• Species-specific differences
• Evolving functions of the encoded protein
• Changes in the intensity of natural selection

The “strict” clock assumes perfectly constant rates of evolution whereas the
“relaxed” clock uses different evolutionary rates on different branches.
Calibration: Individual molecular clocks can be tested for accuracy, they need to be
calibrated against material evidence, such as fossils. Over long time spans, estimates
can be off by 50% or more.

Q:

11. Explain the Ramachandran plot and how it can be used to
evaluate models!

A:

The Ramachandran plot is a computer model that allows us to visualize the
energetically stable conformations of the bond angles of j against f for each of the
amino acids in a protein structure.
Rotation of the polypeptide backbone is limited to two angles (because of the planar
structure). The plot writes j and f against each other and maps the entire
conformational space of a peptide and shows allowed and disallowed regions. j is
not allowed to be 0 degrees because two oxygen molecules would bump into each
other. If j and f are 0 degrees, a hydrogen and an oxygen molecule would bump
into each other. The most stable conformation is at 180 degrees.

Q:

12. Explain the b-barrel!

A:

d. Closed barrel:
The closed barrel has a simple structure – each successive b-strand is added
next to the previous b-strand until the last one is joined by hydrogen
bonds to the first b-strand. The strands are antiparallel and connected by
hairpins. They’re often hydrophilic inside and hydrophobic outside.



e. Jelly roll barrel:
The jelly roll structure consists of 8 b-strands arranged in two four-stranded
antiparallel b-sheets that pack together across a hydrophobic interface



f. TIM barrel:
The TIM barrel is a conserved protein fold consisting of 8 a-helices and
1. b-sheets (parallel) along the peptide backbone.

Q:

13. Is clustering a supervised or an unsupervised method?

A:

Clustering is an unsupervised method. It does not assume predefined categories and
it identifies data categories according to similar patterns. When clustering the group
patterns get turned into clusters of genes with correlated profiles.

Q:

16. What is k-means clustering?

A:

The k-means algorithm is an iterative procedure and depends on the (randomly)
chosen starting values.



Algorithm:
• Initially (at step 0), choose k observations by random; these points represent
the initial cluster centroids
• Then calculate the distances of each object to all centroids and assign it to a
cluster which has the nearest centroid
• When all objects have been assigned, recalculate the centroids of the k
clusters
• Repeat the last two steps until a maximum number of iterations is reached or
the centroids no longer change



k-means clustering:
• Classification of data through a single step partition
• Divisive approach (all data into single cluster and then dividing into smaller
groups according to similarity)

Q:

25. What are the 4 amino acids that are found in the core of
globular proteins?

A:

• Leucine à participating in hydrophobic interactions
• Isoleucine à participating in hydrophobic interactions
• Methionine à non reactive side chains
• Valine à non reactive side chains

• Phenalynine à aromatic interactions (p-stacking)
• Tyrosine à aromatic interactions (p-stacking)

Introduction to bioinformatics

Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

Jetzt loslegen

Das sind die beliebtesten StudySmarter Kurse für deinen Studiengang Introduction to bioinformatics an der Universität für Bodenkultur Wien

Für deinen Studiengang Introduction to bioinformatics an der Universität für Bodenkultur Wien gibt es bereits viele Kurse, die von deinen Kommilitonen auf StudySmarter erstellt wurden. Karteikarten, Zusammenfassungen, Altklausuren, Übungsaufgaben und mehr warten auf dich!

Mehr Karteikarten anzeigen

Das sind die beliebtesten Introduction to bioinformatics Kurse im gesamten StudySmarter Universum

Bioinformatics

Hochschule Bonn-Rhein-Sieg

Zum Kurs
Introduction to Statistics

Nigerian Turkish Nile University

Zum Kurs
Introduction to Linguistics

Universität Frankfurt am Main

Zum Kurs
Introduction to Linguistics

TU Braunschweig

Zum Kurs
introduction to information technology

University of Ghana

Zum Kurs

Die all-in-one Lernapp für Studierende

Greife auf Millionen geteilter Lernmaterialien der StudySmarter Community zu
Kostenlos anmelden Introduction to bioinformatics
Erstelle Karteikarten und Zusammenfassungen mit den StudySmarter Tools
Kostenlos loslegen Introduction to bioinformatics