Searching across hundreds of databases

Our searching services are busy right now. Your search will reload in five seconds.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

X
Forgot Password

If you have forgotten your password you can enter your email here and get a temporary password sent to your email.

Protein clustering and RNA phylogenetic reconstruction of the influenza A [corrected] virus NS1 protein allow an update in classification and identification of motif conservation.

PloS one | 2013

The non-structural protein 1 (NS1) of influenza A virus (IAV), coded by its third most diverse gene, interacts with multiple molecules within infected cells. NS1 is involved in host immune response regulation and is a potential contributor to the virus host range. Early phylogenetic analyses using 50 sequences led to the classification of NS1 gene variants into groups (alleles) A and B. We reanalyzed NS1 diversity using 14,716 complete NS IAV sequences, downloaded from public databases, without host bias. Removal of sequence redundancy and further structured clustering at 96.8% amino acid similarity produced 415 clusters that enhanced our capability to detect distinct subgroups and lineages, which were assigned a numerical nomenclature. Maximum likelihood phylogenetic reconstruction using RNA sequences indicated the previously identified deep branching separating group A from group B, with five distinct subgroups within A as well as two and five lineages within the A4 and A5 subgroups, respectively. Our classification model proposes that sequence patterns in thirteen amino acid positions are sufficient to fit >99.9% of all currently available NS1 sequences into the A subgroups/lineages or the B group. This classification reduces host and virus bias through the prioritization of NS1 RNA phylogenetics over host or virus phenetics. We found significant sequence conservation within the subgroups and lineages with characteristic patterns of functional motifs, such as the differential binding of CPSF30 and crk/crkL or the availability of a C-terminal PDZ-binding motif. To understand selection pressures and evolution acting on NS1, it is necessary to organize the available data. This updated classification may help to clarify and organize the study of NS1 interactions and pathogenic differences and allow the drawing of further functional inferences on sequences in each group, subgroup and lineage rather than on a strain-by-strain basis.

Pubmed ID: 23667580 RIS Download

Research resources used in this publication

None found

Antibodies used in this publication

None found

Associated grants

None

Publication data is provided by the National Library of Medicine ® and PubMed ®. Data is retrieved from PubMed ® on a weekly schedule. For terms and conditions see the National Library of Medicine Terms and Conditions.

This is a list of tools and resources that we have found mentioned in this publication.


FigTree (tool)

RRID:SCR_008515

A graphical viewer of phylogenetic trees and a program for producing publication-ready figures. It is designed to display summarized and annotated trees produced by BEAST.

View all literature mentions

WEBLOGO (tool)

RRID:SCR_010236

Web application to generate sequence logos, graphical representations of patterns within multiple sequence alignment. Designed to make generation of sequence logos easy. Sequence logo generator.

View all literature mentions

CD-HIT (tool)

RRID:SCR_007105

THIS RESOURCE IS NO LONGER IN SERVICE. Documented on February 28,2023. Software program for clustering biological sequences with many applications in various fields such as making non-redundant databases, finding duplicates, identifying protein families, filtering sequence errors and improving sequence assembly etc. It is very fast and can handle extremely large databases. CD-HIT helps to significantly reduce the computational and manual efforts in many sequence analysis tasks and aids in understanding the data structure and correct the bias within a dataset. The CD-HIT package has CD-HIT, CD-HIT-2D, CD-HIT-EST, CD-HIT-EST-2D, CD-HIT-454, CD-HIT-PARA, PSI-CD-HIT, CD-HIT-OTU and over a dozen scripts. * CD-HIT (CD-HIT-EST) clusters similar proteins (DNAs) into clusters that meet a user-defined similarity threshold. * CD-HIT-2D (CD-HIT-EST-2D) compares 2 datasets and identifies the sequences in db2 that are similar to db1 above a threshold. * CD-HIT-454 identifies natural and artificial duplicates from pyrosequencing reads. * CD-HIT-OTU cluster rRNA tags into OTUs The usage of other programs and scripts can be found in CD-HIT user''s guide. CD-HIT was originally developed by Dr. Weizhong Li at Dr. Adam Godzik''s Lab at the Burnham Institute (now Sanford-Burnham Medical Research Institute).

View all literature mentions

MAFFT (tool)

RRID:SCR_011811

Software package as multiple alignment program for amino acid or nucleotide sequences. Can align up to 500 sequences or maximum file size of 1 MB. First version of MAFFT used algorithm based on progressive alignment, in which sequences were clustered with help of Fast Fourier Transform. Subsequent versions have added other algorithms and modes of operation, including options for faster alignment of large numbers of sequences, higher accuracy alignments, alignment of non-coding RNA sequences, and addition of new sequences to existing alignments.

View all literature mentions

Recombination Detection Program (tool)

RRID:SCR_018537

Software package to analyse nucleotide sequence data and identify evidence of genetic recombination. RDP3 is version of RDP program for characterizing recombination events in DNA-sequence alignments. RDP4 is version of RDP program for detection and analysis of recombination patterns in virus genomes.

View all literature mentions