Introduction To Bioinformatics at Universität Für Bodenkultur Wien | Flashcards & Summaries

Select your language

Suggested languages for you:
Log In Start studying!

It looks like you are in the US?
We have a website for your region.

Take me there

Lernmaterialien für Introduction to bioinformatics an der Universität für Bodenkultur Wien

Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Introduction to bioinformatics Kurs an der Universität für Bodenkultur Wien zu.

TESTE DEIN WISSEN

18. What is the molecular clock hypothesis and how is it used?

Lösung anzeigen
TESTE DEIN WISSEN

This hypothesis is used to estimate the time of occurrence of speciation or mutation
events by using fossil evidence or DNA/protein sequences. If molecular evolve at
constant rates, the amount of accumulated mutations is proportional to evolutionary
time. But the problem is that uniformity of evolutionary rates is rarely found because
of:
• Changing generation times
• Population size
• Species-specific differences
• Evolving functions of the encoded protein
• Changes in the intensity of natural selection

The “strict” clock assumes perfectly constant rates of evolution whereas the
“relaxed” clock uses different evolutionary rates on different branches.
Calibration: Individual molecular clocks can be tested for accuracy, they need to be
calibrated against material evidence, such as fossils. Over long time spans, estimates
can be off by 50% or more.

Lösung ausblenden
TESTE DEIN WISSEN

2. Explain the two experimental methods to determine protein
structure! What parameters are measured? How do we get the
structure?

Lösung anzeigen
TESTE DEIN WISSEN

a. X-Ray Crystallography:
Proteins need to be grown into large crystal, in which their position is fixed.
Sending x-rays, the x-rays are deflected by the electron clouds surrounding
the atoms in the crystal, producing a regular pattern of diffraction. The
diffraction patterns can be converted into an electron map using Fourier
transformation. Parameters: Phase in diffraction data



b. NMR Spectroscopy:
It is based on the detection of spinning patterns in atomic nuclei in a magnetic
field. Protein samples are labelled with radioactive C13 and N15 isotopes. The
radio frequency radiation induces transition between nuclear spin states and
the radio signals can be interpreted. The proximity and distance between
labelled atoms can be determined

Lösung ausblenden
TESTE DEIN WISSEN

25. What are the 4 amino acids that are found in the core of
globular proteins?

Lösung anzeigen
TESTE DEIN WISSEN

• Leucine à participating in hydrophobic interactions
• Isoleucine à participating in hydrophobic interactions
• Methionine à non reactive side chains
• Valine à non reactive side chains

• Phenalynine à aromatic interactions (p-stacking)
• Tyrosine à aromatic interactions (p-stacking)

Lösung ausblenden
TESTE DEIN WISSEN

38. What are substitution and scoring matrices and what are they
used for?

Lösung anzeigen
TESTE DEIN WISSEN

 Scoring matrix:
Scoring matrices are used to determine the relative score made by matching two
characters in a sequence alignment. These are usually log-odds of the likelihood of
the two characters being derived from a common ancestral character. (à PAM)



Substitution matrix:
A substitution matrix is a collection of scores for aligning nucleotides or amino acids
with one another. These scores generally represent the relative ease at with one
nucleotide or amino acid may mutate into or substitute another. They are used to
measure similarity in sequence alignments. (à BLOSUM)

Lösung ausblenden
TESTE DEIN WISSEN

13. Is clustering a supervised or an unsupervised method?

Lösung anzeigen
TESTE DEIN WISSEN

Clustering is an unsupervised method. It does not assume predefined categories and
it identifies data categories according to similar patterns. When clustering the group
patterns get turned into clusters of genes with correlated profiles.

Lösung ausblenden
TESTE DEIN WISSEN

19. How is the tertiary protein structure defined and classified?

Lösung anzeigen
TESTE DEIN WISSEN

Packaging and arrangement of the secondary structures form the tertiary structure.
They are generally classified as globular or membrane proteins.



Globular proteins: exist in solvent through hydrophilic residues on their surface
which is energetically favourable



Membrane proteins: exist in membrane lipids and are stabilized through hydrophobic
interactions. Their exterior is hydrophobic and typically the transmembrane
segments are alpha-helices



In the tertiary structure there are only non covalent interactions and bonds of side
chains within a protein (e.g. coiled coil).

Lösung ausblenden
TESTE DEIN WISSEN

6. Describe the classification of proteins based on secondary structure!

Lösung anzeigen
TESTE DEIN WISSEN

To know the relationship among the structures (hierarchical classification system):
c. Remove redundancy from databases
d. Separate structurally distinct domains within the structure (manually or with
algorithms)
e. Grouping proteins/domains of similar structures and clustering them



According to Levitt and Chothia, domain structures can be classified into 3 main
classes:
• a-domains: core built up exclusively from a-helices
• b-domains: usually 2 antiparallel b-sheets packed against each other
• a/b-domains: combinations of b-a-b motifs; parallel b-sheets surrounded by
a-helices
Two main databases:
a) SCOP:
• Based on manual examination of structures
• Grouped in: classes, folds, superfamilies and families
• Classes consist of fold with similar core structure



b) CATH:
• Proteins are classified based on automatic structural alignment
program, and manual comparison
• Grouped in: class, architecture, Topology, homologous superfamily,
homologous family


Lösung ausblenden
TESTE DEIN WISSEN

11. Explain the Ramachandran plot and how it can be used to
evaluate models!

Lösung anzeigen
TESTE DEIN WISSEN

The Ramachandran plot is a computer model that allows us to visualize the
energetically stable conformations of the bond angles of j against f for each of the
amino acids in a protein structure.
Rotation of the polypeptide backbone is limited to two angles (because of the planar
structure). The plot writes j and f against each other and maps the entire
conformational space of a peptide and shows allowed and disallowed regions. j is
not allowed to be 0 degrees because two oxygen molecules would bump into each
other. If j and f are 0 degrees, a hydrogen and an oxygen molecule would bump
into each other. The most stable conformation is at 180 degrees.

Lösung ausblenden
TESTE DEIN WISSEN

16. What is k-means clustering?

Lösung anzeigen
TESTE DEIN WISSEN

The k-means algorithm is an iterative procedure and depends on the (randomly)
chosen starting values.



Algorithm:
• Initially (at step 0), choose k observations by random; these points represent
the initial cluster centroids
• Then calculate the distances of each object to all centroids and assign it to a
cluster which has the nearest centroid
• When all objects have been assigned, recalculate the centroids of the k
clusters
• Repeat the last two steps until a maximum number of iterations is reached or
the centroids no longer change



k-means clustering:
• Classification of data through a single step partition
• Divisive approach (all data into single cluster and then dividing into smaller
groups according to similarity)

Lösung ausblenden
TESTE DEIN WISSEN

5. What are the advantages of HMM and PSSM compared to
regular expression?

Lösung anzeigen
TESTE DEIN WISSEN

Regular expression:
Regular expression is a pattern notation that describes a motif in Prosite format. It
describes a motif in a way that is more informative than a consensus sequence. It’s
used to search Prosites for proteins with matching sequence and to discover distant
homologs in sequence databases.



Position-specific-scoring-matrix (PSSM):
PSSM statistically represents multiple sequence alignments by assessing the
frequency of each base at each position of the multiple alignment and calculating
how well a new sequence fits into the PSSM. Profiles are PSSMs with gap
information.
Advantage: PSSM can tell about the likelihood of a new sequence fitting in and it is
also more flexible than regular expression.



Hidden Markov Model (HMM):
The HMM is based on the Markov Model which describes a series of events occurring
on after another in a chain and each event determines the probability of the next
event. By combining these chains we include states that are not observes, i.e. hidden.
Observed and hidden states re observes states while states that should be inferred,
e.g. exon, intron, protein domain, are hidden states. There are 3 states possible at
each position of a multiple sequence alighnment: main state, insertion state and
deletion state.
Advantage: Hidden states can be included and a trained HMM can be used to assess
how well an unknown sequence matches the model. It is also more flexible than
regular expression and gives statistical information about the probability of
sequences.

Lösung ausblenden
TESTE DEIN WISSEN

12. Explain the b-barrel!

Lösung anzeigen
TESTE DEIN WISSEN

d. Closed barrel:
The closed barrel has a simple structure – each successive b-strand is added
next to the previous b-strand until the last one is joined by hydrogen
bonds to the first b-strand. The strands are antiparallel and connected by
hairpins. They’re often hydrophilic inside and hydrophobic outside.



e. Jelly roll barrel:
The jelly roll structure consists of 8 b-strands arranged in two four-stranded
antiparallel b-sheets that pack together across a hydrophobic interface



f. TIM barrel:
The TIM barrel is a conserved protein fold consisting of 8 a-helices and
1. b-sheets (parallel) along the peptide backbone.

Lösung ausblenden
TESTE DEIN WISSEN

10. Explain 3 non-experimental methods to get protein models!
Why do they work?

Lösung anzeigen
TESTE DEIN WISSEN

a. Homology modelling:
Homology modelling relies on previous knowledge and the structure is based
on sequence homology. If proteins share a high enough sequence similarity,
they are likely to have a similar 3D-structure. The production of an all-atom
model is based on alignment with template proteins:
• Search for homologous proteins in the database
• Align sequences
• Determine structurally conserved regions
• Determine coordinates

b. Protein threading:
Protein threading predicts the fold of an unknown protein sequence by fitting
the sequence into a structural database and selecting the best fitting model,

c. Ab initio structure prediction:
It is based on a single query sequence and measures the relative propensity of
each amino acid belonging to a certain secondary structure element (scores
derived from crystal structures). Statistical programs then predict the
secondary structure elements.

Lösung ausblenden
  • 56042 Karteikarten
  • 1448 Studierende
  • 24 Lernmaterialien

Beispielhafte Karteikarten für deinen Introduction to bioinformatics Kurs an der Universität für Bodenkultur Wien - von Kommilitonen auf StudySmarter erstellt!

Q:

18. What is the molecular clock hypothesis and how is it used?

A:

This hypothesis is used to estimate the time of occurrence of speciation or mutation
events by using fossil evidence or DNA/protein sequences. If molecular evolve at
constant rates, the amount of accumulated mutations is proportional to evolutionary
time. But the problem is that uniformity of evolutionary rates is rarely found because
of:
• Changing generation times
• Population size
• Species-specific differences
• Evolving functions of the encoded protein
• Changes in the intensity of natural selection

The “strict” clock assumes perfectly constant rates of evolution whereas the
“relaxed” clock uses different evolutionary rates on different branches.
Calibration: Individual molecular clocks can be tested for accuracy, they need to be
calibrated against material evidence, such as fossils. Over long time spans, estimates
can be off by 50% or more.

Q:

2. Explain the two experimental methods to determine protein
structure! What parameters are measured? How do we get the
structure?

A:

a. X-Ray Crystallography:
Proteins need to be grown into large crystal, in which their position is fixed.
Sending x-rays, the x-rays are deflected by the electron clouds surrounding
the atoms in the crystal, producing a regular pattern of diffraction. The
diffraction patterns can be converted into an electron map using Fourier
transformation. Parameters: Phase in diffraction data



b. NMR Spectroscopy:
It is based on the detection of spinning patterns in atomic nuclei in a magnetic
field. Protein samples are labelled with radioactive C13 and N15 isotopes. The
radio frequency radiation induces transition between nuclear spin states and
the radio signals can be interpreted. The proximity and distance between
labelled atoms can be determined

Q:

25. What are the 4 amino acids that are found in the core of
globular proteins?

A:

• Leucine à participating in hydrophobic interactions
• Isoleucine à participating in hydrophobic interactions
• Methionine à non reactive side chains
• Valine à non reactive side chains

• Phenalynine à aromatic interactions (p-stacking)
• Tyrosine à aromatic interactions (p-stacking)

Q:

38. What are substitution and scoring matrices and what are they
used for?

A:

 Scoring matrix:
Scoring matrices are used to determine the relative score made by matching two
characters in a sequence alignment. These are usually log-odds of the likelihood of
the two characters being derived from a common ancestral character. (à PAM)



Substitution matrix:
A substitution matrix is a collection of scores for aligning nucleotides or amino acids
with one another. These scores generally represent the relative ease at with one
nucleotide or amino acid may mutate into or substitute another. They are used to
measure similarity in sequence alignments. (à BLOSUM)

Q:

13. Is clustering a supervised or an unsupervised method?

A:

Clustering is an unsupervised method. It does not assume predefined categories and
it identifies data categories according to similar patterns. When clustering the group
patterns get turned into clusters of genes with correlated profiles.

Mehr Karteikarten anzeigen
Q:

19. How is the tertiary protein structure defined and classified?

A:

Packaging and arrangement of the secondary structures form the tertiary structure.
They are generally classified as globular or membrane proteins.



Globular proteins: exist in solvent through hydrophilic residues on their surface
which is energetically favourable



Membrane proteins: exist in membrane lipids and are stabilized through hydrophobic
interactions. Their exterior is hydrophobic and typically the transmembrane
segments are alpha-helices



In the tertiary structure there are only non covalent interactions and bonds of side
chains within a protein (e.g. coiled coil).

Q:

6. Describe the classification of proteins based on secondary structure!

A:

To know the relationship among the structures (hierarchical classification system):
c. Remove redundancy from databases
d. Separate structurally distinct domains within the structure (manually or with
algorithms)
e. Grouping proteins/domains of similar structures and clustering them



According to Levitt and Chothia, domain structures can be classified into 3 main
classes:
• a-domains: core built up exclusively from a-helices
• b-domains: usually 2 antiparallel b-sheets packed against each other
• a/b-domains: combinations of b-a-b motifs; parallel b-sheets surrounded by
a-helices
Two main databases:
a) SCOP:
• Based on manual examination of structures
• Grouped in: classes, folds, superfamilies and families
• Classes consist of fold with similar core structure



b) CATH:
• Proteins are classified based on automatic structural alignment
program, and manual comparison
• Grouped in: class, architecture, Topology, homologous superfamily,
homologous family


Q:

11. Explain the Ramachandran plot and how it can be used to
evaluate models!

A:

The Ramachandran plot is a computer model that allows us to visualize the
energetically stable conformations of the bond angles of j against f for each of the
amino acids in a protein structure.
Rotation of the polypeptide backbone is limited to two angles (because of the planar
structure). The plot writes j and f against each other and maps the entire
conformational space of a peptide and shows allowed and disallowed regions. j is
not allowed to be 0 degrees because two oxygen molecules would bump into each
other. If j and f are 0 degrees, a hydrogen and an oxygen molecule would bump
into each other. The most stable conformation is at 180 degrees.

Q:

16. What is k-means clustering?

A:

The k-means algorithm is an iterative procedure and depends on the (randomly)
chosen starting values.



Algorithm:
• Initially (at step 0), choose k observations by random; these points represent
the initial cluster centroids
• Then calculate the distances of each object to all centroids and assign it to a
cluster which has the nearest centroid
• When all objects have been assigned, recalculate the centroids of the k
clusters
• Repeat the last two steps until a maximum number of iterations is reached or
the centroids no longer change



k-means clustering:
• Classification of data through a single step partition
• Divisive approach (all data into single cluster and then dividing into smaller
groups according to similarity)

Q:

5. What are the advantages of HMM and PSSM compared to
regular expression?

A:

Regular expression:
Regular expression is a pattern notation that describes a motif in Prosite format. It
describes a motif in a way that is more informative than a consensus sequence. It’s
used to search Prosites for proteins with matching sequence and to discover distant
homologs in sequence databases.



Position-specific-scoring-matrix (PSSM):
PSSM statistically represents multiple sequence alignments by assessing the
frequency of each base at each position of the multiple alignment and calculating
how well a new sequence fits into the PSSM. Profiles are PSSMs with gap
information.
Advantage: PSSM can tell about the likelihood of a new sequence fitting in and it is
also more flexible than regular expression.



Hidden Markov Model (HMM):
The HMM is based on the Markov Model which describes a series of events occurring
on after another in a chain and each event determines the probability of the next
event. By combining these chains we include states that are not observes, i.e. hidden.
Observed and hidden states re observes states while states that should be inferred,
e.g. exon, intron, protein domain, are hidden states. There are 3 states possible at
each position of a multiple sequence alighnment: main state, insertion state and
deletion state.
Advantage: Hidden states can be included and a trained HMM can be used to assess
how well an unknown sequence matches the model. It is also more flexible than
regular expression and gives statistical information about the probability of
sequences.

Q:

12. Explain the b-barrel!

A:

d. Closed barrel:
The closed barrel has a simple structure – each successive b-strand is added
next to the previous b-strand until the last one is joined by hydrogen
bonds to the first b-strand. The strands are antiparallel and connected by
hairpins. They’re often hydrophilic inside and hydrophobic outside.



e. Jelly roll barrel:
The jelly roll structure consists of 8 b-strands arranged in two four-stranded
antiparallel b-sheets that pack together across a hydrophobic interface



f. TIM barrel:
The TIM barrel is a conserved protein fold consisting of 8 a-helices and
1. b-sheets (parallel) along the peptide backbone.

Q:

10. Explain 3 non-experimental methods to get protein models!
Why do they work?

A:

a. Homology modelling:
Homology modelling relies on previous knowledge and the structure is based
on sequence homology. If proteins share a high enough sequence similarity,
they are likely to have a similar 3D-structure. The production of an all-atom
model is based on alignment with template proteins:
• Search for homologous proteins in the database
• Align sequences
• Determine structurally conserved regions
• Determine coordinates

b. Protein threading:
Protein threading predicts the fold of an unknown protein sequence by fitting
the sequence into a structural database and selecting the best fitting model,

c. Ab initio structure prediction:
It is based on a single query sequence and measures the relative propensity of
each amino acid belonging to a certain secondary structure element (scores
derived from crystal structures). Statistical programs then predict the
secondary structure elements.

Introduction to bioinformatics

Erstelle und finde Lernmaterialien auf StudySmarter.

Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.

Jetzt loslegen

Das sind die beliebtesten StudySmarter Kurse für deinen Studiengang Introduction to bioinformatics an der Universität für Bodenkultur Wien

Für deinen Studiengang Introduction to bioinformatics an der Universität für Bodenkultur Wien gibt es bereits viele Kurse, die von deinen Kommilitonen auf StudySmarter erstellt wurden. Karteikarten, Zusammenfassungen, Altklausuren, Übungsaufgaben und mehr warten auf dich!

Das sind die beliebtesten Introduction to bioinformatics Kurse im gesamten StudySmarter Universum

Introduction to linguistics

Universität Mannheim

Zum Kurs
Introduction to Statistics

Nigerian Turkish Nile University

Zum Kurs
introduction Informatics

TU München

Zum Kurs
Introduction to Linguistics

Universität Jena

Zum Kurs
Introduction to biology

Taibah University

Zum Kurs

Die all-in-one Lernapp für Studierende

Greife auf Millionen geteilter Lernmaterialien der StudySmarter Community zu
Kostenlos anmelden Introduction to bioinformatics
Erstelle Karteikarten und Zusammenfassungen mit den StudySmarter Tools
Kostenlos loslegen Introduction to bioinformatics