Dataset: 11.1K articles from the COVID-19 Open Research Dataset (PMC Open Access subset)
All articles are made available under a Creative Commons or similar license. Specific licensing information for individual articles can be found in the PMC source and CORD-19 metadata
.
More datasets: Wikipedia | CORD-19

Logo Beuth University of Applied Sciences Berlin

Made by DATEXIS (Data Science and Text-based Information Systems) at Beuth University of Applied Sciences Berlin

Deep Learning Technology: Sebastian Arnold, Betty van Aken, Paul Grundmann, Felix A. Gers and Alexander Löser. Learning Contextualized Document Representations for Healthcare Answer Retrieval. The Web Conference 2020 (WWW'20)

Funded by The Federal Ministry for Economic Affairs and Energy; Grant: 01MD19013D, Smart-MD Project, Digital Technologies

Imprint / Contact

Highlight for Query ‹Bovine respiratory disease risk

IILLS: predicting virus-receptor interactions based on similarity and semi-supervised learning

Background

Viruses are the most abundant biological entities on the planet and widely distributed in organs of living organisms and environments [1, 2]. In particular, they are an important part of the human microbiome which is closely related with human health and diseases. Actually, hundreds of human diseases were resulted from viruses, such as Ebola virus (EBOV), Zika virus, American Machupo virus (MACV), Guanarito virus (GTOV), Sabia virus (SABV), Junin virus (JUNV), and so on. In marine environments, viruses can kill up to 40% of the standing stock of prokaryotes daily. In addition, the cellular and physiological changes in the host cells can be caused by virus infections, such as altering genomic sequences and dysfunctioning their hosts [9, 10].

When viruses contact the surface of host cells, the virus process starts. In general, the receptor-binding is considered as the first step for the viral infection of host cells. The specificity and affinity are the main factors that viruses can use diverse types of molecules to attach to and enter into cells. With the development of high-throughput technologies, many studies indicate that some molecules including proteins are the receptor of viruses, such as carbohydrates and lipids. Furthermore, the virus-receptor interaction is also an dynamic process, as it can evolve over the course of an infection while virus variants with distinct receptor-binding specificity and tropism can appear. In order to help understand the interaction mechanism between viruses and receptors, a database (called viralReceptor) with mammalian virus-receptor interactions has been constructed by Zhang et.al. ViralReceptor consists of 128 viral species or sub-species, 119 receptors of mammalian and 268 interaction pairs between them. In addition, the structural and functional analysis of receptors also further provide the theoretic basis to discover new virus-receptor interactions, which include protein domains, higher level of N-glycosylation, higher ratio of self-interaction, and so on.

In this study, we propose a computational method (IILLS) based on Initial Interaction scores method via the neighbors and Laplacian regularized Least Square algorithm (a semi-supervised learning method), to predict virus-receptor interactions. IILLS integrates the known virus-receptor interactions and amino acid sequences of receptors to compute similarities of viruses and receptors. Then IILLS uses the Laplacian regularized Least Square algorithm and initial interaction scores based on the neighbors to construct the computational model. We conduct the 10-fold cross validation (10CV) and leave one out cross validation (LOOCV) to assess the prediction performance of IILLS and compare it with other three methods. The prediction performance of IILLS is best in terms of AUC (the area under of ROC curve) as its AUC values are 0.8675 and 0.9061 with 10CV and LOOCV, respectively. The evaluation results of case study also show that IILLS is an effective virus-receptor prediction method.

We also provide IILLS, via a web server, to predict virus-receptor interactions. The input of this web server is a receptor amino acid sequence or a txt file with multiple sequences in the FASTA format. The prediction result will be displayed after submission when uploading a sequence. However, the prediction results of the txt file of sequences is sent by the email with link page. Therefore, when uploading a sequence file, an email address should be provided. In addition, a job ID is assigned after one submission. According to job ID, the user can also obtain the prediction result from web server.

Materials

We download the known mammalian virus-receptor interactions from viralReceptor database. Then we further extract human virus-receptor interactions as the benchmark dataset. It includes 104 virus species or sub-species, 74 receptors and 211 interaction pairs between viruses and receptors. The detail node degree distributions of viruses and receptors in this standard virus-receptor interaction network are also described in Figs. 1 and 2. The degree of a node is the number of edges which have this node as an endvertex in the virus-receptor interaction network. Each color represents the proportion of viruses (receptors) which have the same node degree. In Fig. 1, the node degrees of 104 virus range from 1 to 8, respectively. Their distribution proportion are 56.7%, 19.2%, 8.7%, 6.7%, 1.9%, 3.8%, 1.0% and 1.9%, respectively. In Fig. 2, each color represents the proportion of receptors with the same node degree. For example, the red color represents that 8.1% of all receptors have the node degree of 4.

Similarity of viruses

Based on the assumption that similar viruses exhibit similar interaction profiles with receptors [17–20], we used the Gaussian Interaction Profile (GIP) similarity to measure the virus similarity. Let \documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document}$\phantom {\dot {i}\!}V=\{v_{1},v_{2},...,v_{N_{v}}\}$\end{document}V={v1,v2,...,vNv} be the set of Nv viruses, \documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document}$\phantom {\dot {i}\!}P=\{p_{1},p_{2},...,p_{N_{p}}\}$\end{document}P={p1,p2,...,pNp} be the set of Np receptors, and \documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document}$\phantom {\dot {i}\!}Y \in R^{N_{v} \times N_{p}}$\end{document}Y∈RNv×Np be the adjacency matrix of the bipartite graph to describe known virus and receptor associations. When the virus vi and receptor pj have a known interaction, the value of yij is 1 and otherwise 0. The GIP similarity of viruses v1 and v2 can be computed as follows:

1\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$\begin{array}{@{}rcl@{}} {S_{v}(v_{1},v_{2})} = {G_{v}(v_{1},v_{2})} = exp\left(-\gamma_{v} {||{yv}_{1}-{yv}_{2}||}^{2}\right), \end{array} $$ \end{document}Sv(v1,v2)=Gv(v1,v2)=exp−γv||yv1−yv2||2,

2\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$\begin{array}{@{}rcl@{}} \gamma_{v} = \gamma{^,_{v}}/\left(\frac{1}{N_{v}}\sum\limits_{i=1}^{N_{v}}{||{yv}_{i}||}^{2}\right), \end{array} $$ \end{document}γv=γv,/1Nv∑i=1Nv||yvi||2,

in which \documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document}$\phantom {\dot {i}\!}{yv}_{1}=\{y_{11},y_{12},...,y_{{1}{N_{p}}}\}$\end{document}yv1={y11,y12,...,y1Np} and \documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document}$\phantom {\dot {i}\!}{yv}_{2}=\{y_{21},y_{22},...,y_{{2}{N_{p}}}\}$\end{document}yv2={y21,y22,...,y2Np} are the interaction profiles of virus v1 and virus v2, respectively. The parameter γv is used to regulate the kernel bandwidth. We can set the value of bandwidth parameter γv, by the cross validation. In this study, the parameter γv, is set to be 1 according to previous successful studies [17, 21, 22] and the influence analysis of prediction performance of parameter γv, by the 10-fold cross validation.

Similarity of receptors

In this study, we take two methods to measure the receptor similarity, which include the GIP similarity and the amino acid sequence similarity. The GIP similarity of receptors is also computed by the known interactions of receptors. Specifically, for receptors p1 and p2, their GIP similarity can be calculated as follows:

3\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$\begin{array}{@{}rcl@{}} {G_{p}(p_{1},p_{2})} = exp\left(-\gamma_{p} {||{yp}_{1}-{yp}_{2}||}^{2}\right), \end{array} $$ \end{document}Gp(p1,p2)=exp−γp||yp1−yp2||2,

4\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$\begin{array}{@{}rcl@{}} \gamma_{p} = \gamma{^,_{p}}/\left(\frac{1}{N_{p}}\sum\limits_{i=1}^{N_{p}}{||{yp}_{i}||}^{2}\right), \end{array} $$ \end{document}γp=γp,/1Np∑i=1Np||ypi||2,

in which \documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document}${yp}_{1}=\{y_{11},y_{21},...,y_{{N_{v}}{1}}\}^{T}$\end{document}yp1={y11,y21,...,yNv1}T is the interaction profile of receptor p1 while \documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document}${yp}_{2}=\{y_{12},y_{22},...,y_{{N_{v}}{2}}\}^{T}$\end{document}yp2={y12,y22,...,yNv2}T is the interaction profile of receptor p2. Furthermore, the parameter γp is also used to control the kernel bandwidth and the parameter γp, is also set to be 1.

In addition, we compute the sequence similarity between receptors. First, we download the amino acid sequences of receptors from the KEGG GENE database. The receptor sequence similarity is computed by their normalized Smith-Waterman score [24, 25]. For receptors p1 and p2, the sequence similarity can be calculated as follows:

5\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$\begin{array}{@{}rcl@{}} {G_{s}(p_{1},p_{2})} = SW(p_{1},p_{2})/{\sqrt{SW(p_{1},p_{1})}\sqrt{SW(p_{2},p_{2})}}, \end{array} $$ \end{document}Gs(p1,p2)=SW(p1,p2)/SW(p1,p1)SW(p2,p2),

in which SW(p1,p2) is the original Smith-Waterman score between receptor p1 and receptor p2.

Based on the GIP similarity and the sequence similarity of receptors, we construct the final similarity of receptors Sp as follows:

6\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$\begin{array}{@{}rcl@{}} S_{p} = \alpha*G_{p}+(1-\alpha)*G_{s}, 0 \leq \alpha \leq 1.0 \end{array} $$ \end{document}Sp=α∗Gp+(1−α)∗Gs,0≤α≤1.0

where α is the weight parameter.

Initialized interaction profiles for new viruses and receptors

The quality of known virus-receptors has important impact on the performance of prediction method. In this study, we want to set the initialized interaction scores for viruses (receptors) which have no known interaction with receptors (viruses). Inspired by the KNN method, we take the interaction profiles of all neighbors into consideration, which have known interactions. For example, the initial interaction profile between a new virus vi and receptor pj can be calculated as follows:

7\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$ y(v_{i},p_{j}) = \frac{\sum\limits_{l=1}^{N_{v}} S{^{(il)}_{v}}y_{lj}}{\sum\limits_{l=1}^{N_{v}} S{^{(il)}_{v}}} $$ \end{document}y(vi,pj)=∑l=1NvSv(il)ylj∑l=1NvSv(il)

in which \documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document}$S{^{(il)}_{v}}$\end{document}Sv(il) is the GIP similarity between viruses vi and vl.

Similarly, we also apply the same model to calculate the interaction profiles of new receptor. Specifically, the initial interaction profile between virus vi and a new receptor pj can be calculated as follows:

8\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$ y(v_{i},p_{j}) = \frac{\sum\limits_{l=1}^{N_{p}} S{^{(jl)}_{p}}y_{il}}{\sum\limits_{l=1}^{N_{p}} S{^{(jl)}_{p}}} $$ \end{document}y(vi,pj)=∑l=1NpSp(jl)yil∑l=1NpSp(jl)

in which \documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document}$S{^{(jl)}_{p}}$\end{document}Sp(jl) is the final similarity between receptors pj and pl.

Laplacian regularized least square for virus-receptor interaction prediction

Inspired by successful applications of Laplacian regularized Least Square (LapRLS) model in predicting drug-target interactions [26–28], we adopt the LapRLS model to predict virus-receptor interactions. After obtaining the similarity matrices, we construct the normalized Laplacian matrices for viruses and receptors as follows:

9\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$ L^{v} = (D^{v})^{-1/2}(D^{v}-S_{v})(D^{v})^{-1/2}, $$ \end{document}Lv=(Dv)−1/2(Dv−Sv)(Dv)−1/2,

10\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$ L^{p} = (D^{p})^{-1/2}(D^{p}-S_{p})(D^{p})^{-1/2}, $$ \end{document}Lp=(Dp)−1/2(Dp−Sp)(Dp)−1/2,

where the matrix Dv is the diagonal matrix whose element Dv(i,i) is calculated by the sum of row i of the virus similarity matrix Sv. Similarly, the matrix Dp is calculated based on the receptor similarity matrix Sp.

For viruses and receptors, prediction matrixes Fv and Fp are respectively calculated from the LapRLS model by minimizing the cost functions as follows:

11\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$ F{_{v}^{*}} = \underset{F_{v}}{arg \ min} {\left[ ||Y-F_{v}||{_{F}^{2}} + \beta_{v} tr\left({F{_{v}^{T}}}{L^{v}}{F_{v}}\right)\right]}, $$ \end{document}Fv∗=argminFv||Y−Fv||F2+βvtrFvTLvFv,

12\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$ F{_{p}^{*}} = \underset{F_{p}}{arg \ min} {\left[ ||Y-F_{p}||{_{F}^{2}} + \beta_{p} tr\left({F{_{p}^{T}}}{L^{p}}{F_{p}}\right)\right]}, $$ \end{document}Fp∗=argminFp||Y−Fp||F2+βptrFpTLpFp,

in which tr(.) is the trace of a matrix, Y is the adjacency matrix of the known virus-receptor interactions, Lv and Lp are the normalized Laplacian matrices of virus similarity and receptor similarity, and ||.||F is the Frobenius norm. βv and βp are the trade-off parameters and are set to be 1. According to previous studies, the computation model can be solved by:

13\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$ F{_{v}^{*}} = S^{v}(S^{v}+\beta_{v}L^{v}S^{v})^{-1}Y, $$ \end{document}Fv∗=Sv(Sv+βvLvSv)−1Y,

14\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$ F{_{p}^{*}} = S^{p}(S^{p}+\beta_{p}L^{p}S^{p})^{-1}Y^{T}, $$ \end{document}Fp∗=Sp(Sp+βpLpSp)−1YT,

Finally, we obtain the virus-receptor interaction prediction matrix F∗ by the mean of results of viruses and receptors:

15\documentclass[12pt]{minimal}

\usepackage{amsmath}

\usepackage{wasysym}

\usepackage{amsfonts}

\usepackage{amssymb}

\usepackage{amsbsy}

\usepackage{mathrsfs}

\usepackage{upgreek}

\setlength{\oddsidemargin}{-69pt}

\begin{document} $$ F^{*} = \left({F{_{v}^{*}}+(F{_{p}^{*}})^{T}}\right)/{2}. $$ \end{document}F∗=Fv∗+(Fp∗)T/2.

Performance evaluation

In order to assess the prediction performance of IILLS, we conduct the 10CV and LOOCV. The AUC is the metric to evaluate the prediction performance. We compare our method with other three methods: BRWH, LapRLS and CMF.

Comparison with other methods

Figure 3 shows the prediction performance of four methods in 10CV. Compared with other methods (BRWH: 0.7959, LapRLS: 0.7577, CMF: 0.7128), IILLS achieves the best prediction performance with the AUC value of 0.8675.

Figure 4 also shows that IILLS is superior to other methods in terms of AUC values (IILLS: 0.9061, BRWH: 0.8105, LapRLS: 0.7713, CMF: 0.7421). These experiment results illustrate that IILLS can obtain the better prediction performance.

Analyzing receptor similarity

In this study, we also analyze the receptor similarity based on the GIP similarity and sequence similarity in terms of the influences of prediction performance of parameter α in our method. We conduct 10CV and LOOCV to compute the prediction performance.

Table 1 shows the 10CV prediction performances of various parameter values of α ranging from 0 to 1.0 with the increment of 0.1. We can see from Table 1 that our method obtains the best prediction performance in 10CV when only using sequence similarity (α=0). The AUC value of our method has a slightly descending trend when α ranges from 0 to 1.0.

Table 2 shows the LOOCV prediction performances of various parameter values of α ranging from 0 to 1.0 with the increment of 0.1. We can see from Table 2 that our method also obtains the best prediction performance in LOOCV when only using sequence similarity (α=0). The AUC value of our method has also a slightly descending trend when α ranges from 0 to 1.0. Therefore, we set the α to be 0 in this study.

In addition, we also provide the ROC of our method on different values of parameter α in three cases. The first only uses the sequence similarity of receptors (α=0). The second only uses the GIP similarity of receptors (α=1.0). The third is with the mean of GIP similarity and sequence similarity of receptors (α=0.5).

Figures 5 and 6 show the prediction performances of IILLS under three different receptor similarities in 10CV and LOOCV, respectively. We can also see from Figs. 5 and 6 that IILLS achieves the best prediction performance when only using the sequence similarity.

Parameter analysis for γv,

In this section, we analyze parameters γv,. In addition, by considering the effect of parameter γv, is similar to the effect of parameter γp,, we set γp,=γv,. When only using the sequence similarity, Table 3 shows the 10CV prediction performances of value set (0.25, 0.5, 1, 2, 4) of parameter γv,. We can see from Table 3 that our method obtains best prediction performance in 10CV when γv, is set to be 2. The AUC value under setting γv,=2 is slightly better than the AUC value when γv,=1. Therefore, we also simply set the γv,=1 as the default value based on the previous successful studies and experiment results of 10CV.

Case studies

In order to further evaluate the prediction performance of IILLS in applications, we analyze the prediction ability of our method in discovering new virus-receptor interactions. The extracted human virus-receptor interactions are used as the benchmark datasets. Table 4 shows the validation results of top 10 virus-receptor interactions which are predicted by IILLS. We can see from Table 4 that 5 of 10 predicted associations are validated by previous studies. C-type lectin domain family 4 member M (CLEC4M, also called L-SIGN or CD209L) is equipped with a carbohydrate recognition domain (CRD) that mediates the recognition of fucose and high-mannose glycans in a Ca2+-dependent manner, these carbohydrate structures can be found in multiple pathogens, such as Lassa virus, Ebola virus, among others [32, 33]. The CD209 is also the receptor of known SARS-CoV, human coronaviruses and 229E, although the disease caused by SARS-CoV differs from the diseases caused by the known human coronaviruses and 229E. L-SIGN (also called DC-SIGN) is related to CLEC4M and is a C-type lectin involved in both innate and adaptive immunity, they are known to bind to multiple pathogens and function as cellular receptors for various viruses, such as Dengue virus. Rift Valley fever virus (RVFV) goes through L-SIGN to infect cells expressing the lectin ectopically [32, 36]. The phleboviruses, such as Uukuniemi virus (UUKV), can exploit L-SIGN for infection [32, 36].

Discussion

With the development of high-through sequencing technology and microbiology, many studies have evidenced that microbes have key impacts on health body and human diseases. Furthermore, the viruses are an important part of the human microbiomes, and are also the direct origin of infectious diseases, such as Sabia virus and so on. The receptor-binding is the first step for viral infection of host cells. Therefore, in order to systematically understand the mechanisms between virus and receptor and improve the diagnosis and treatment of infectious diseases, it need develop effective methods to identify new virus-receptor interactions.

Conclusion

In this study, we develop a computational method (IILLS) to predict virus-receptor interactions of human with known virus-receptor interactions and the amino acid sequence of receptors. Firstly, IILLS computes the virus similarity by GIP kernel. Then we also calculate the receptor GIP kernel similarity and the receptor sequence similarity. The final receptor similarity is constructed by the sequence similarity based on the experiment results. IILLS uses the Laplacian regularized Least Square (LapRLS) model to predict the potential virus-disease interactions. It further improves the prediction performance by adding an initial interaction scores process for new viruses and receptors. In terms of AUC with 10CV and LOOCV, IILLS can achieves better prediction performance than other three competing methods. The case studies also show that IILLS can effectively predict virus-receptor interactions, and also help control the virus infectious diseases in the future.

However, there still exist some limitations in IILLS. On the one hand, the virus similarity is calculated by the GIP kernel with known virus-receptor interactions. We should consider more relevant biological network information, such as sequence information. In addition, other integration methods of receptor similarity also should be considered in the future. Finally, other latest matrix factorization methods also should be considered, such as DNRLMF-MDA, DRRS, SIMCLDA[39] and BNNR. Therefore, we would like to develop a more effective method for predicting virus-receptor interactions by addressing the above limitations in the future.