Amino Acids and Protein Sequences

Each protein or peptide consists of a linear sequence of amino acids. The protein primary structure conventionally begins at the amino-terminal (N) end and continues until the carboxyl-terminal (C) end. The structure of a protein may be directly sequenced or inferred from the sequence of DNA.

The amino acid sequence of a protein or peptide is useful information to understand the protein or peptide, identify it in a sample and categorize its post-translational modifications. The process of determining the amino acid sequence is known as protein sequencing.


The sequence of a protein is usually notated as a string of letters, according to the order of the amino acids from the amino-terminal to the carboxyl-terminal of the protein. Either a single or three-letter code may be used to represent each amino acid in the sequence.

There are 20 amino acids that occur naturally in nature, which can be represented by a three or single letter code as follows:

  • Alanine (Ala, A)
  • Arginine (Arg, R)
  • Asparagine (Asn, N)
  • Aspartic acid (Asp, D)
  • Cysteine (Cys, C)
  • Glutamic acid (Glu, E)
  • Glutamine (Gln, Q)
  • Glycine (Gly, G)
  • Histidine (His, H)
  • Isoleucine (Ile, I)
  • Leucine (Leu, L)
  • Lysine (Lys, K)
  • Methionine (Met, M)
  • Phenylalanine (Phe, F)
  • Proline (Pro, P)
  • Serine (Ser, S)
  • Threonine (Thr, T)
  • Tryptophan (Trp, W)
  • Tyrosine (Tyr, Y)
  • Valine (Val, V)

Methods of Protein Sequencing

There are two main methods used to find the amino acid sequences of proteins. Mass spectrometry is the most common method in use today because of its ease of use. Edman degradation using a protein sequenator is the second method, which is most useful if the N-terminus of a protein needs to be characterized.

It is helpful to know which amino acid is at the N-terminus of the protein both for ordering of the peptide fragments into the whole chain and to reduce the impact of impurities that commonly occur in the first round of Edman degradation. The N-terminus can be identified by:

  1. Using a reagent to label the amino acid at the end of the protein.
  2. Hydrolyzing the protein
  3. Using chromatography and other methods of comparison to identify the marked protein.

There are fewer methods that can practically be used to identify the C-terminus of the protein. However, one method that may be used involves adding carboxypeptidases to a solution of the protein and taking regular samples. Plotting the concentration of amino acids against time can help to identify the amino acid at the C-terminus.

Edman degradation allows the sequence of amino acids in the protein to be discovered with Edman sequencers, which are currently able to sequence peptides up to about 50 amino acids in length. This involves several steps to:

  1. Use a reducing agent to break any disulfide bridges in the protein.
  2. Separate the chain(s) of the protein complex and purify them.
  3. Determine the composition and terminal amino acids of each chain.
  4. Break each chain into small fragments (less than 50 amino acids in each)
  5. Separate the fragments and purify them.
  6. Use the fragments to determine amino acid sequence.
  7. The preceding steps should be repeated with a different fragment pattern so that the overall protein sequence can be reconstructed with minimal errors.

Amino Acid Composition and Analysis

The unordered composition of an amino acid is often useful information when attempting to determine the ordered sequence of the protein. This is because it can help identify errors and interpret ambiguous results.  Additionally, the frequency of amino acids can also help to decide upon the protease that is more appropriate for the protein digestion.

There are two main steps to determine the frequency of amino acids in a process known as amino acid analysis. Firstly, hydrolysis of a known quantity of the protein should break it up into the amino acid monomers. These can then be separated and quantified using various methods.

The hydrolysis is typically carried out by heating a sample of the protein to over 100°C in hydrochloric acid for an extended period of time (at least 24 hours), allowing more time for proteins with bulky hydrophobic groups. As there is a risk of protein degradation in these conditions, particularly for cysteine, glutamine, serine, threonine, tryptophan, and tyrosine, it is recommended to use several samples and to heat them for different times. Once hydrolyzed, the amino acids can be separated and identified with techniques such as ion-exchange chromatography or reverse phase HPLC.



Further Reading

Last Updated: Jul 19, 2023

Yolanda Smith

Written by

Yolanda Smith

Yolanda graduated with a Bachelor of Pharmacy at the University of South Australia and has experience working in both Australia and Italy. She is passionate about how medicine, diet and lifestyle affect our health and enjoys helping people understand this. In her spare time she loves to explore the world and learn about new cultures and languages.


Please use one of the following formats to cite this article in your essay, paper or report:

  • APA

    Smith, Yolanda. (2023, July 19). Amino Acids and Protein Sequences. News-Medical. Retrieved on June 20, 2024 from

  • MLA

    Smith, Yolanda. "Amino Acids and Protein Sequences". News-Medical. 20 June 2024. <>.

  • Chicago

    Smith, Yolanda. "Amino Acids and Protein Sequences". News-Medical. (accessed June 20, 2024).

  • Harvard

    Smith, Yolanda. 2023. Amino Acids and Protein Sequences. News-Medical, viewed 20 June 2024,


The opinions expressed here are the views of the writer and do not necessarily reflect the views and opinions of News Medical.
Post a new comment

While we only use edited and approved content for Azthena answers, it may on occasions provide incorrect responses. Please confirm any data provided with the related suppliers or authors. We do not provide medical advice, if you search for medical information you must always consult a medical professional before acting on any information provided.

Your questions, but not your email details will be shared with OpenAI and retained for 30 days in accordance with their privacy principles.

Please do not ask questions that use sensitive or confidential information.

Read the full Terms & Conditions.

You might also like...
Harnessing post-translational modifications for a tuberculosis booster vaccine