Greife auf kostenlose Karteikarten, Zusammenfassungen, Übungsaufgaben und Altklausuren für deinen Introduction to bioinformatics Kurs an der Universität für Bodenkultur Wien zu.
18. What is the molecular clock hypothesis and how is it used?
This hypothesis is used to estimate the time of occurrence of speciation or mutation
events by using fossil evidence or DNA/protein sequences. If molecular evolve at
constant rates, the amount of accumulated mutations is proportional to evolutionary
time. But the problem is that uniformity of evolutionary rates is rarely found because
of:
• Changing generation times
• Population size
• Species-specific differences
• Evolving functions of the encoded protein
• Changes in the intensity of natural selection
The “strict” clock assumes perfectly constant rates of evolution whereas the
“relaxed” clock uses different evolutionary rates on different branches.
Calibration: Individual molecular clocks can be tested for accuracy, they need to be
calibrated against material evidence, such as fossils. Over long time spans, estimates
can be off by 50% or more.
2. Explain the two experimental methods to determine protein
structure! What parameters are measured? How do we get the
structure?
a. X-Ray Crystallography:
Proteins need to be grown into large crystal, in which their position is fixed.
Sending x-rays, the x-rays are deflected by the electron clouds surrounding
the atoms in the crystal, producing a regular pattern of diffraction. The
diffraction patterns can be converted into an electron map using Fourier
transformation. Parameters: Phase in diffraction data
b. NMR Spectroscopy:
It is based on the detection of spinning patterns in atomic nuclei in a magnetic
field. Protein samples are labelled with radioactive C13 and N15 isotopes. The
radio frequency radiation induces transition between nuclear spin states and
the radio signals can be interpreted. The proximity and distance between
labelled atoms can be determined
25. What are the 4 amino acids that are found in the core of
globular proteins?
• Leucine à participating in hydrophobic interactions
• Isoleucine à participating in hydrophobic interactions
• Methionine à non reactive side chains
• Valine à non reactive side chains
• Phenalynine à aromatic interactions (p-stacking)
• Tyrosine à aromatic interactions (p-stacking)
38. What are substitution and scoring matrices and what are they
used for?
Scoring matrix:
Scoring matrices are used to determine the relative score made by matching two
characters in a sequence alignment. These are usually log-odds of the likelihood of
the two characters being derived from a common ancestral character. (à PAM)
Substitution matrix:
A substitution matrix is a collection of scores for aligning nucleotides or amino acids
with one another. These scores generally represent the relative ease at with one
nucleotide or amino acid may mutate into or substitute another. They are used to
measure similarity in sequence alignments. (à BLOSUM)
13. Is clustering a supervised or an unsupervised method?
Clustering is an unsupervised method. It does not assume predefined categories and
it identifies data categories according to similar patterns. When clustering the group
patterns get turned into clusters of genes with correlated profiles.
19. How is the tertiary protein structure defined and classified?
Packaging and arrangement of the secondary structures form the tertiary structure.
They are generally classified as globular or membrane proteins.
Globular proteins: exist in solvent through hydrophilic residues on their surface
which is energetically favourable
Membrane proteins: exist in membrane lipids and are stabilized through hydrophobic
interactions. Their exterior is hydrophobic and typically the transmembrane
segments are alpha-helices
In the tertiary structure there are only non covalent interactions and bonds of side
chains within a protein (e.g. coiled coil).
6. Describe the classification of proteins based on secondary structure!
To know the relationship among the structures (hierarchical classification system):
c. Remove redundancy from databases
d. Separate structurally distinct domains within the structure (manually or with
algorithms)
e. Grouping proteins/domains of similar structures and clustering them
According to Levitt and Chothia, domain structures can be classified into 3 main
classes:
• a-domains: core built up exclusively from a-helices
• b-domains: usually 2 antiparallel b-sheets packed against each other
• a/b-domains: combinations of b-a-b motifs; parallel b-sheets surrounded by
a-helices
Two main databases:
a) SCOP:
• Based on manual examination of structures
• Grouped in: classes, folds, superfamilies and families
• Classes consist of fold with similar core structure
b) CATH:
• Proteins are classified based on automatic structural alignment
program, and manual comparison
• Grouped in: class, architecture, Topology, homologous superfamily,
homologous family
11. Explain the Ramachandran plot and how it can be used to
evaluate models!
The Ramachandran plot is a computer model that allows us to visualize the
energetically stable conformations of the bond angles of j against f for each of the
amino acids in a protein structure.
Rotation of the polypeptide backbone is limited to two angles (because of the planar
structure). The plot writes j and f against each other and maps the entire
conformational space of a peptide and shows allowed and disallowed regions. j is
not allowed to be 0 degrees because two oxygen molecules would bump into each
other. If j and f are 0 degrees, a hydrogen and an oxygen molecule would bump
into each other. The most stable conformation is at 180 degrees.
16. What is k-means clustering?
The k-means algorithm is an iterative procedure and depends on the (randomly)
chosen starting values.
Algorithm:
• Initially (at step 0), choose k observations by random; these points represent
the initial cluster centroids
• Then calculate the distances of each object to all centroids and assign it to a
cluster which has the nearest centroid
• When all objects have been assigned, recalculate the centroids of the k
clusters
• Repeat the last two steps until a maximum number of iterations is reached or
the centroids no longer change
k-means clustering:
• Classification of data through a single step partition
• Divisive approach (all data into single cluster and then dividing into smaller
groups according to similarity)
5. What are the advantages of HMM and PSSM compared to
regular expression?
Regular expression:
Regular expression is a pattern notation that describes a motif in Prosite format. It
describes a motif in a way that is more informative than a consensus sequence. It’s
used to search Prosites for proteins with matching sequence and to discover distant
homologs in sequence databases.
Position-specific-scoring-matrix (PSSM):
PSSM statistically represents multiple sequence alignments by assessing the
frequency of each base at each position of the multiple alignment and calculating
how well a new sequence fits into the PSSM. Profiles are PSSMs with gap
information.
Advantage: PSSM can tell about the likelihood of a new sequence fitting in and it is
also more flexible than regular expression.
Hidden Markov Model (HMM):
The HMM is based on the Markov Model which describes a series of events occurring
on after another in a chain and each event determines the probability of the next
event. By combining these chains we include states that are not observes, i.e. hidden.
Observed and hidden states re observes states while states that should be inferred,
e.g. exon, intron, protein domain, are hidden states. There are 3 states possible at
each position of a multiple sequence alighnment: main state, insertion state and
deletion state.
Advantage: Hidden states can be included and a trained HMM can be used to assess
how well an unknown sequence matches the model. It is also more flexible than
regular expression and gives statistical information about the probability of
sequences.
12. Explain the b-barrel!
d. Closed barrel:
The closed barrel has a simple structure – each successive b-strand is added
next to the previous b-strand until the last one is joined by hydrogen
bonds to the first b-strand. The strands are antiparallel and connected by
hairpins. They’re often hydrophilic inside and hydrophobic outside.
e. Jelly roll barrel:
The jelly roll structure consists of 8 b-strands arranged in two four-stranded
antiparallel b-sheets that pack together across a hydrophobic interface
f. TIM barrel:
The TIM barrel is a conserved protein fold consisting of 8 a-helices and
1. b-sheets (parallel) along the peptide backbone.
10. Explain 3 non-experimental methods to get protein models!
Why do they work?
a. Homology modelling:
Homology modelling relies on previous knowledge and the structure is based
on sequence homology. If proteins share a high enough sequence similarity,
they are likely to have a similar 3D-structure. The production of an all-atom
model is based on alignment with template proteins:
• Search for homologous proteins in the database
• Align sequences
• Determine structurally conserved regions
• Determine coordinates
b. Protein threading:
Protein threading predicts the fold of an unknown protein sequence by fitting
the sequence into a structural database and selecting the best fitting model,
c. Ab initio structure prediction:
It is based on a single query sequence and measures the relative propensity of
each amino acid belonging to a certain secondary structure element (scores
derived from crystal structures). Statistical programs then predict the
secondary structure elements.
Greife kostenlos auf tausende geteilte Karteikarten, Zusammenfassungen, Altklausuren und mehr zu.
Jetzt loslegenFür deinen Studiengang Introduction to bioinformatics an der Universität für Bodenkultur Wien gibt es bereits viele Kurse, die von deinen Kommilitonen auf StudySmarter erstellt wurden. Karteikarten, Zusammenfassungen, Altklausuren, Übungsaufgaben und mehr warten auf dich!