Identification of novel proteins not present in public databases

Question:

My protein was analyzed using Alphalyse’s Pick ‘n Post protein identification service. The sample gave very good MS data, but was not identified because it is a novel protein not yet present in the NCBI database. Can you provide me the raw MS peptide fingerprint spectra and MS/MS peptide sequence data for additional database searching?

Answer:

For unidentified proteins with good MS data, Alphalyse can provide Mascot Generic Format files (MGF files) containing the MS data that was used for the Mascot database search. The MGF files contain the peptide masses and the peptide fragment masses obtained for each sample in a format that can be uploaded to the Mascot database search software.

The MGF file will enable you to repeat the database search:

A) on your own in-house Mascot search engine with proprietary databases,

B) on the public Mascot server http://www.matrixscience.com/ when more protein sequences are added in the future, and

C) on other database search programs using different search algorithms.

Please contact Alphalyse at info@alphalyse.com for a price quotation on the MGF file conversion for your samples, and to obtain a small tutorial on how to perform the MGF Mascot database searches. Learn more about protein identification using mass spectroscopy here.

Bookmark and Share

Databases and organisms available for mass spec protein identification service

Question:

I would like to identify proteins from an organism where the genome has just been sequenced.  How many proteins are available in the database you use, and from what organisms?

Answer:

The database used for Pick ‘n Post protein identification is the public nr database (nrdb) from NCBI. The database is a non-redundant (nr) compilation of all known protein sequences from GenBank CDS translations + PDB + SwissProt + PIR + PRF. The database is constantly updated with new sequences and downloaded regularly to our in-house Mascot server. Currently, the database contains more than 9 million protein sequences.

To find out how many protein sequences are included for your organism, go to the NCBI Entrez website and select the specific organism.

The NCBI Entrez Taxonomy Homepage:

http://www.ncbi.nlm.nih.gov/sites/entrez?db=taxonomy

Or search the database by organism name in Protein search:

http://www.ncbi.nlm.nih.gov/sites/entrez?db=Protein&itool=toolbar

Bookmark and Share

Useful Protein Molecular Weight Calculator

Protein molecular weight calculator

The rule of thumb for conversion of microgram amounts into picomole amounts at different protein molecular weights and peptide molecular weights is:

 1000/Mw of protein in kDa = pmol/ug 

Protein size kDa Microgram Picomole Microgram Picomole
1 1 1000 1 1000
10 10 1000 1 100
20 20 1000 1 50
50 50 1000 1 20
100 100 1000 1 10
150 150 1000 1 6.7

Useful Links: 

Protein molar calculator and formulas:

http://www.promega.com/biomath/calc08.htm

Coding capacity of DNA, conversion of DNA base pairs to approximate protein Mw:

http://www.promega.com/biomath/calc09.htm

Compute pI/Mw tool from protein amino acid sequence or database accession number:

http://www.expasy.ch/tools/pi_tool.html

Protein Mw determination by mass spectrometry:

http://www.pick-n-post.com/default.asp?ID=50010300070

Bookmark and Share

Common protein contaminants observed in mass spectrometric protein identification

Questions:

  1. Is the keratin identified by the Pick ‘n Post protein ID service in my 1D gel band a contamination or a relevant human protein from the sample?
  2. How is it possible that only Keratin was identified in the sample although the protein band was clearly visible?
  3. Does this mean that keratin was the only protein in the sample or is the identification of other proteins impossible when even small amounts of keratin are present?
  • The size of my protein and band cut from gel was 50-55kDa and Keratin is 66kDa, so how did the Keratin get into sample?
  • Is there even a smallest possibility that something happened in your lab?

These are just few things that come to my mind as I’m thinking what might have gone wrong. It would be nice if you could help me with these concerns, before I’ll start the whole process once again.

Answers:

When keratin is identified in a gel band it might be a real protein purified from the cells, but more frequently it is a contamination that occurred somewhere during protein purification and sample processing. Keratin is present in all dust in the lab from dead skin cells from humans and small pieces from your woollen sweater!

If the keratin contamination occurred prior to the gel electrophoresis, the keratin is observed as protein bands in the gel around 55 kDa and 65 kDa. This keratin is identified with a high score and good sequence coverage, and often several keratin types are identified. Your gel band was observed at Mw 55 kDa, so either this is a contamination that happened before the electrophoresis, or it was purified from the cells.

Keratin contamination can also occur after the electrophoresis during gel staining, or during gel scanning and spot excision if dust comes into contact with the sample. In such cases, the keratin amount is usually lower and only a few keratin peptides are observed in the mass spectra. The main protein component in the gel band can still be identified.

At Alphalyse we take extreme care to work in a clean dust-free environment. All samples are handled in 96-well plates together with quality control standards, and contamination from our lab is almost never observed.

The best advice to avoid keratin contamination is to avoid dust in lab in general, and always work with gloves. Electrophoresis and staining equipment should be cleaned and free of dust, and the lids on pipette tips should always be closed.

Additional Refrences:

Common protein and keratin contaminants observed in proteomics experiments are given in the references below.

The Common Repository of Adventitious Proteins, cRAP, is a list of proteins commonly found in proteomics experiments that are present either by accident or through contamination of protein samples.

keratin 1 (SWISS-PROT: P04264); similar to keratin 1 (ENSP00000301445); keratin 2a (SWISS-PROT:P35508); similar to keratin 2a (ENSP00000252247); keratin 5 (ENSP00000252242); keratin, type II cytoskeletal 6F (SWISS-PROT:P48669); keratin 9 (ENSP00000246662); similar to keratin, type I cytoskeletal 10 (SWISS-PROT: P13645); keratin 10 (TREMBL: Q14664); keratin 14 (SWISS-PROT: P02533); keratin 16 (ENSP00000301653).

Were observed  in publication:

Large-Scale Proteomic Analysis of the Human Spliceosome. Juri Rappsilber, Ursula Ryder, Angus I. Lamond, and Matthias Mann. Genome Res. 2002. 12: 1231-1245 http://genome.cshlp.org/content/12/8/1231.full

Common Peptide Contaminants Observed by Nanoelectrospray MS in Low Level Sequencing of Gel-Separated Proteins. Jens S. Andersen, Bernhard Küster, Alexandre Podtelejnikov, Ejvind Mørtz, and Matthias Mann.

www.pil.sdu.dk/files/1143.doc

with a list of contaminating peptides (peptide mass, sequence, Y-ion fragment masses) from keratins, and trypsin autolysis peptides from Roche bovine trypsin, and Promega modified trypsin. www.pil.sdu.dk/files/ContaTableASMS1999.doc

Bookmark and Share

How do I assess the stability of a protein drug substance and formulated drug product?

After protein purification, an important issue is the stability of a protein drug substance and formulated drug product. Stability assays of protein vaccines formulated with alum adjuvants are not straightforward to perform because most standard analytical methods for protein characterization cannot be easily applied to proteins immobilized on alum.

Alphalyse performed a stability assay on a protein vaccine. Vaccine vials were incubated at the normal storage temperature at 4 °C and at 37 °C in an accelerated stability study. Samples were taken at different storage intervals for stability measurements. For analysis of protein degradation products, the protein was eluted from the alum hydroxide by treatment with SDS-PAGE sample buffer. The samples were analyzed by electrophoresis, and the gels were stained with a sensitive silver staining method compatible with MS analysis.

The figure below shows a silver-stained gel of the drug product stored for nine months at 37 °C in the accelerated stability study. Degradation protein fragment bands were cut out from the silver-stained gel and analyzed by in-gel trypsin digestion and MS peptide mapping. The protein sequence coverage map shows peptide maps obtained for the individual protein fragments. Those peptide maps confirm that the proteins are drug substance degradation products. The sequence coverage maps show that bands 1 and 2 are protein fragments missing the C-terminal region, band 3 and 4 are also missing the N-terminal region, and band 5 and 6 found around 6 kDa in the gel contain only the middle part of the sequence. Thus, the combination of SDS-PAGE and MS peptide mapping again provided very valuable information (about protein degradation patterns, in this case) that is not easily obtained by other analytical techniques.

 

Stability analysis of alum formulated protein stored at 37oC for 9 months.

Stability analysis of alum formulated protein stored at 37oC for 9 months.

Figure: Stability analysis of alum-formulated protein stored at 37 °C for nine months. The protein was eluted from the alum, and 10 μg were analyzed by 1D SDS-PAGE. The break-down products (Bands 1–6) were cut out from the silver-stained gel together with intact protein (Band 0) and analyzed by MALDI MS peptide mapping. Peptide masses obtained from the degradation products were correlated to the protein sequence using GPMAW software from Lighthouse Data (www.gpmaw.com), and the results are shown in the protein sequence coverage map.

More about Protein Stability Assays here

Bookmark and Share

How to optimize electroblotting conditions onto PVDF membrane for N-terminal Edman sequencing

How do I optimize electroblotting conditions onto PVDF membrane for N-terminal Edman sequencing? Our 12 kDa protein separates well on 1D SDS PAGE but does not appear on the PVDF blot for N-terminal sequencing. We are using the NuPAGE Transfer buffer with 20% MeOH from Invitrogen. What can we do to optimize the blotting conditions?

Answer: 

The blotting buffer generally works well for most proteins. However, some proteins may show poor electroblotting efficiency, and the choice of PVDF membrane, blotting buffer and blotting conditions should be optimized. During optimization it is an advantage to stain the gel after blotting and to use 2 layers of PVDF. Some large proteins (above 80 kDa) may be difficult to get out of the gel, and it can help to add 0.1% SDS to the buffer since SDS increases the mobility of the proteins. The same effect can be obtained by omitting MeOH from the buffer, because MeOH strips SDS from the protein.

Some small proteins (below 15 kDa) may move too quickly out of the gel and through to the first PVDF membrane. In that case, SDS should not be used and the MeOH concentration increased to 20%. Also the gel can be pre-soaked in blotting buffer for 5-10 mins before blotting. Choice of blotting buffers with a neutral pH (Tris-Glycine buffers), may be useful for very basic proteins with high isoelectric points. Basic proteins may be positively charged and the PVDF membrane should be placed on the other or on both sides of the gel. Glycine-containing buffers will give high glycine yield in the first Edman cycle, and the PVDF membrane should be washed extensively after staining. Use this PVDF transfer protocol for N-terminal Edman Sequencing

Bookmark and Share

Mass Spectrometry for Host Cell Protein monitoring and control in pharmaceutical process development

Mass spectrometry based Host Cell Protein identification and quantification

Here the Mass Spec Host Cell Protein (HCP) assay is applied for monitoring product purity and consistency of the pre-clinical tox batch and 2 clinical batches of a recombinant vaccine protein.

The proteins were separated by 1D SDS PAGE and visualized by silver staining. The protein bands were digested with trypsin and the proteins identified by MALDI MS/MS peptide mapping and database searching. In total, 8 specific HCPs were identified.

The protein identities and information about physical properties, such as pI, Mw, and hydrophobicity can guide further protein purification and process development for phase III. An advantage compared to ELISA-based HCP assays is that the MS HCP assay provides the identities of the contaminating HCPs and does not require time-consuming development of antibodies.

Advantages

  •  Immediate solution
  •  Alternative to ELISA when not available
  •  Identity for each HCP
  •  Relative quantity of HCPs
  •  Complementary with Western blot
  •  Fast assay setup in 2 weeks

Applications

  •  PAT compliant
  •  Monitoring process development,
  •  Comparison of pre-clinic TOX batches and clinical batches

 Learn more about Mass Spec Host Cell Protein monitoring

MS Host Cell Protein Assay Principle

MS Host Cell Protein Assay Principle

Figure 1: The drug substance [DS] from different cGMP batches (samples 1-3) were loaded on a 1D SDS-PAGE gel and stained with sensitive silver stain. The gel was overloaded with DS for detection of very small amounts of contaminating proteins. Protein bands were cut out and analyzed by MALDI mass spectroscopy to identify contaminating host cell proteins. Eight specific HCP’s were identified and marked HCP1-HCP8, as well as product-related variants, marked DS.

Reference:
Mortz, E. et al. “Proteomics technology applied to upstream and downstream process development of a protein vaccine”, Bioprocess International 6, p36-43, 2008.

Bookmark and Share

Facts to Remember about Amino Acid Analysis of Proteins and Peptides

Amino acid analysis is a method to determine the absolute amounts of individual amino acids in a sample. The method can be applied to samples containing free amino acids, as well as to peptides and protein samples after hydrolysis into amino acids. Amino acid analysis can be used for determination of the relative composition of amino acids in a protein, for determination of the absolute amount of a protein or peptide if the amino acid sequence is known, and for purity estimation of a purified protein.

In amino acid analysis it should be noted that:

  • Serine and threonine are degraded slightly during acid hydrolysis, and recoveries can be 10% lower than expected.
  • Methionine can be oxidized during hydrolysis, usually less than 10% is oxidized.
  • Valine and isoleucine bonds (Val-Val, Ile-Val, Val-Ile, Ile-Ile) are difficult to hydrolyse and recoveries can be 5-15% lower than expected.
  • Glycine content is often higher than expected because it is a frequent contaminant due to its use in many buffers. Analysis of known amounts of amino acids standards is used to determine a compensation factor to correct for differences in ninhydrin reactivity.

Read more about Amino Acid Analysis here

Bookmark and Share

1D SDS PAGE versus 2D PAGE. I’m considering 2D electrophoresis, any recommendations?

Question:

I’m interested in isolating and identifying cell surface receptor proteins, typically with high Mw, glycosylated and with transmembrane domains. I’m considering 2D electrophoresis, any recommendations?

Answer:

2D PAGE is the classical approach to separate and visualize many proteins from complex proteomics samples. However, 2D PAGE has some known disadvantages, e.g.; hydrophobic membrane proteins precipitate during IEF and are almost never observed, only proteins within the pI range of the gel, typically pH 4-7, or 3-10 are observed, and the Mw range of proteins in the gel is approximately 10-130 kDa.
For large and hydrophobic proteins it is therefore better to use 1D SDS PAGE, because the proteins can be dissolved in the 1D SDS PAGE buffer containing 0.1% SDS, the gels have no pI limits, and the Mw range can go up to 1.000 kDa.
Another possibility is to use in-solution digestion of the protein mixture and protein identification by LC-MS/MS.

Some relevant references.
Albert Sickmann et al. PNAS 2003 vol. 100, 23, 13207-13212. The proteome of Saccharomyces cerevisiae mitochondria. http://www.pnas.org/content/100/23/13207.full.pdf+html

Jens S. Andersen et al. Current Biology, vol.12, 1, 2002, 1-11 Directed Proteomic Analysis of the Human Nucleolus.

Wiśniewski JR et al. Nat Methods. 2009 May;6(5):359-62. Epub 2009 Apr 19. Universal sample preparation method for proteome analysis. http://www.ncbi.nlm.nih.gov/pubmed/19377485

Bookmark and Share

How should I prepare my samples for 2D electrophoresis, and how much protein should I load to get protein identification by mass spec?

How should I prepare my samples for 2D electrophoresis, and how much protein should I load to get protein ID by mass spec?

Answer:

Protein samples for 2D PAGE should not contain high salt concentrations, ionic detergents like SDS, or cellular debris and DNA from cell lysis, as these will disturb isoelectric focussing (IEF) and cause spot streaking in the gel.

A general 2D gel sample preparation protocol:

1. Use the lysis procedure and buffer that you have used before

2. Clean up and concentrate the sample, for example using the GE 2D clean-up kit:

http://www6.gelifesciences.com/aptrix/upp01077.nsf/Content/Products?OpenDocument&parentid=80648451&moduleid=164990

 3. Dissolve the proteins in 2D lysis buffer (8.9M urea, 2% Triton X-100, 0.5% IPG buffer, 0.13M DTT and 8mM PMSF).  This buffer is directly compatible with the 2D electrophoresis.

 4. Determine the protein concentration, for example using the 2D Quant kit:

http://www6.gelifesciences.com/aptrix/upp01077.nsf/Content/Products?OpenDocument&parentid=80648356&moduleid=164989

 The protein concentration should be approximately 1-20 ug/ul.

 The protein load on each gel should be approximately 50-300 microgram for complex samples with many proteins, and 5-50 microgram for pure protein samples. For protein identification by mass spectrometry, the gels can be stained by standard Coomassie staining methods, or by more sensitive silver staining methods that have been optimized for MS protein ID.

Find more information at:

2-D Electrophoresis, Principles and Methods, including troubleshooting guide from GE Healthcare:

http://www6.gelifesciences.com/aptrix/upp01077.nsf/Content/2d_electrophoresis~2delectrophoresis_handbook

Alphalyse 2D gel services:

http://www.alphalyse.com/default.asp?ID=50010300051

Silver staining optimized for protein identification by mass spectrometry:

http://www.pick-n-post.com/default.asp?ID=50010300037

Bookmark and Share