 Research
 Open Access
 Published:
Combining graph embedding and sparse regression with structure lowrank representation for semisupervised learning
Complex Adaptive Systems Modeling volume 4, Article number: 22 (2016)
Abstract
In this paper, we propose a novel method for semisupervised learning by combining graph embedding and sparse regression, termed as graph embedding and sparse regression with structure low rank representation (GESRLR), in which the embedding learning and the sparse regression are performed in a combined approach. Most of the graph based semisupervised learning methods take into account the local neighborhood information while ignoring the global structure of the data. The proposed GESRLR method learns a lowrank weight matrix by projecting the data onto a lowdimensional subspace. The GESRLR makes full use of the supervised learning information in the construction of the affinity matrix, and the affinity construction is combined with graph embedding in a single step to guarantee the global optimal solution. In the dimensionality reduction procedure, the proposed GESRLR can preserve the global structure of the data, and the learned lowrank weight matrix can effectively reduce the influence of the noise. An effective novel algorithm to solve the corresponding optimization problem was designed and is presented in this paper. Extensive experimental results demonstrate that the GESRLR method can obtain a higher classification accuracy than other stateoftheart methods.
Introduction
Complex adaptive systems (CAS) research area is trying to establish a comprehensive and general understanding of the complex world around us (Niazi and Hussain 2013). Complex systems typically involve the generation of high dimensional data and rely on effective analysis and management of such highdimensional data. High dimensional data exists in a wide variety of real applications, such as text mining, image retrieval, and visual object recognition. While the high performance of computers can address some of the problems of high dimensional data, for example, the time consuming problem, however, the processing of highdimensional data often suffers from a series of other problems, such as the curse of dimensionality and the impact of noise and redundancy. Fortunately, it has been shown that the high dimensionality of the data is usually small in the intrinsic reduced space.
In the more or less recent past time, researchers have put forward a lot of efficient data dimensionality reduction algorithms (Wang et al. 2014; Zhou and Tao 2013; Nie et al. 2011; Xu et al. 2009; Li et al. 2008). Principal component analysis (PCA) (Belhumeur et al. 1997) is a traditional method that projects the high dimensional data onto a low dimensional space. Linear discriminant analysis (LDA) (Zuo et al. 2006) is a supervised dimensionality reduction method by maximizing the amount of betweenclass variance relative to the amount of withinclass variance (Nie et al. 2009; Yang et al. 2010). Neighborhood component analysis (NCA) (Goldberger et al. 2004) learns a linear transformation by directly maximizing the stochastic variant of the expected leaveoneout classification accuracy on the training set. In order to find the intrinsic manifold structure of data samples, researchers also proposed some nonlinear dimension reduction methods, such as the locally linear embedding (LLE) (Roweis and Saul 2000) and the Laplacian eigenmap (LE) (Belkin and Niyogi 2003). If there are new data samples in the training set, the Laplacian methods need to learn the whole training set again, this is one of the disadvantages of these types of algorithms. In order to solve this problem, He et al. (2005a) put forward the algorithm of locality preserving projection (LPP), in which the linear projection is used to deal with new data samples. Wu et al. (2007) proposed the local learning projection (LLP) method to solve this problem. In addition, the neighborhood preserving embedding (NPE) (He et al. 2005b) algorithm was put forward to keep the local neighborhood structure on the manifold of the data samples. Some previous studies (Zhang et al. 2009; Tenenbaum et al. 2000; Yan et al. 2007) proved that many dimensionality reduction algorithms can be expressed as a unified framework.
However, in real applications, most of the methods mentioned above can only preserve the information of the local neighbors, while ignoring the global structure of the data. The local structure of the dataset may be easily affected by some factors such as noise, illumination or corruption. As a result, the performance of clustering or classification tasks will be reduced because of these. Fortunately, some researches have shown that the recently proposed lowrank representation (LRR) (Liu et al. 2010, 2013) algorithm has a good robustness for datasets that contain noise or corruption. In the past few years, a series of robust classification algorithms based on lowrank representation have been put forward. The Robust PCA (RPCA) (Wright et al. 2009; Candès et al. 2011) use the lowrank representation to recover the structure of subspaces from the dataset corrupted by noise. For subspace segmentation problem, Liu et al. (2010, 2013) use the nuclear norm to find the lowest rank representation of a dataset; in this way, the global structure of the dataset can be well preserved. Unlike the lowrank representation seeking the lowest rank of the dataset, sparse representation finds the sparest representation of a dataset. Zhuang et al. (2012) combine the sparsity and lowrankness together to put forward a nonnegative lowrank and sparse representation (NNLRS) for dealing with the highdimensional dataset. And then they use the representation coefficient matrix to construct the affinity graph for subspace segmentation. Through the combination of sparse representation and lowrank representation, the NNLRS method can both capture the global structure and the local structure of the dataset.
Through the analysis of the above problems, a novel method is proposed in this paper by combining the graph embedding and sparse regression method in a joint optimization framework. And the supervised learning information is also used in the framework to guide the construction of the affinity graph. In this paper, the construction of the affinity and graph embedding are combined to ensure the overall optimal solution. In the whole learning process, the label information can be accurately propagated through the graph construction. Thus, the linear regression can learn the discriminative projection to better adapt to the labels of the samples and improve the classification rate of the new samples. In order to solve the corresponding optimization problem, this paper proposes an iterative optimization procedure.
In general, the main contributions of this paper are summarized as follows:

1.
Different from conventional methods, by both using the lowrank representation and sparse representation which can preserve the global structure and the local structure of the data, the proposed GESRLR method can learn a novel weight graph.

2.
By unifying the graph learning, projection learning and label propagation into a joint optimization framework, the proposed GESRLR method can guarantee an overall optimum solution.
The remaining of this paper is organized as follows: “Background and related work” section briefly reviews the background and some related works. The proposed GESRLR method and the corresponding solution are described in “Combined graph embedding and sparse regression with structure lowrank representation” section. Extensive experiments are conducted in “Experiments” section. Finally, we conclude the paper in “Conclusion” section.
Background and related work
Since the proposed method in this paper is based on lowrank representation and manifold embedding (Nie et al. 2014), we briefly review the relevant methods. Given the dataset \(X = [x_{1} ,x_{2} , \ldots ,x_{u} ,x_{u + 1} ,x_{n} ] \in R^{m \times n} ,\) where the labeled samples are denoted as \(x_{i} _{i = 1}^{u}\) and the unlabeled samples are denoted as \(x_{j} _{j = u + 1}^{n} .\) The label information of the labeled samples is denoted as \(y_{i} \in \{ 1,2, \ldots ,c\} ,\) where the number of the total classes is c. The label binary indicator matrix Y are defined as follows: given the training sample \(x_{i} (i = 1, \ldots ,n)\) and its label vector \(y_{i} \in R^{c} ,\) if x _{ i } is the sample from the kth class \((k = 1, \ldots ,c),\) then the kth entry of the label vector y _{ i } is 1 and for the other entries, the value is 0. In this paper, the l _{ r,p }norm is defined as follows:
Lowrank representation (LRR)
Given the dataset \(X \in R^{m \times n}\) which is drawn from a union of subspaces \(\{ \Pi_{i} \}_{i = 1}^{c} ,\) where c is the dimension of the lowdimensional subspaces, and the dataset is corrupted by noise matrix E, the objective function of the LRR method is defined as follows:
where A is the dictionary for the lowrank representation, E is the error matrix of the noise or corruption and γ is the parameter to control the influence of the error matrix. Due to the optimization of the rank norm is NPhard (Nie et al. 2014), in practice, we often use the nuclear norm for relaxation. Thus the objective function of the lowrank representation is defined as follows:
where \(\left\ \cdot \right\_{*}\) represents the nuclear norm which is a relaxation of the rank norm. \(\left\ \cdot \right\_{1}\) represents the l _{1}norm which is a relaxation of the l _{0}norm for error matrix. If let A = I, we can see that the objective function of LRR is equivalent to RPCA while the goal of RPCA is to recover an approximate matrix from a corrupted subspace. In real applications, we often use the original matrix X as the dictionary. Therefore, the objective of the optimization problem (2) can be rewritten as:
There are many optimization methods for solving the problem (3). After we get the final result of representation coefficient matrix Z, we can use is as a kind of similarity to construct an affinity graph (Z + Z ^{T}). Then we use the spectral clustering method on the affinity graph to obtain the final clustering result.
Flexible Manifold Embedding (FME)
Given the dataset X, we assume the predicted label matrix is F, then we can have F = X ^{T} W + 1b ^{T} if the label is strict to lie in the space of the give matrix X, in which \(1 \in R^{n \times 1}\) is an all 1 vector. \(W \in R^{m \times c}\) is the projection matrix. However, as the objective function F = X ^{T} W + 1b ^{T} is a linear format, if the samples is from a nonlinear manifold, this may be too strict to fit the samples. Therefore, it is reasonable to add a residual item in the regression model of FME (Nie et al. 2010). Then the objective function of FME is relaxed to F = X ^{T} W + 1b ^{T} + F _{0}, where F _{0} is the residual item between the predicted label matrix F and X ^{T} W + 1b ^{T}. The advantage of this kind of relaxation can make the processing of the sample data points on the nonlinear manifolds more flexible. The goal of FME is to predict the sample label matrix F and reduce the residual of regression F _{0} at the same time. The objective function of FME is defined as follows:
where the two parameters μ and γ are used to balance the influence of the two terms. \(L \in R^{n \times n}\) is the Laplacian matrix and \(U \in R^{n \times n}\) is the diagonal matrix. \(tr( \cdot )\) represents the trace of a matrix. The first two terms in (4) is used to propagate the labels from the labeled samples to unlabeled samples. The last two terms are the regression model. If we use the X ^{T} W + 1b ^{T} − F to replace the regression residual F _{0}, then the objective function of FME can be expressed as follows:
Combined graph embedding and sparse regression with structure lowrank representation
In this section, we introduce the details of the proposed method in this paper. The objective of GESRLR is to unify the graph embedding and regression into a unified framework. The objective of regression model is to find a projection matrix \(W \in R^{m \times c}\) to match the sample labels \(F \in R^{n \times c} ,\) and use it to classify the new samples. Thus, the objective function of the regression model can be defined as follows:
where F _{0} is the regression residual (Nie et al. 2010).
In the following section, we first introduce the motivation of the proposed method of GESRLR, summarize the objective function of the GESRLR method and propose the optimization solution.
Motivations
For the label propagation problem, we usually have the following hypothesis: a data sample and its nearest neighbors usually belong to the same class, and the nearest neighbors would have a big influence in the determination of the labels of new data samples. In short, the labels of similar samples should be close and we can propagate the labels to similar samples. Therefore, in the construction of an ideal graph we should consider that similar data points and their nearest neighbors should be assigned larger weight values. However, the evaluation of similarity of most traditional graph construction methods mainly depends on the pairwise Euclidean distance, while the Euclidean distance is very sensitive to noise and any other corruption of the data samples (Zhuang et al. 2012). However, these methods can only capture the local structure of the dataset, but ignore to preserve the global structure of the dataset. Fortunately, some recent studies show that the LRR method can preserve the global structure of the dataset, and it is robust to noise and the corruption of the dataset (Liu et al. 2010, 2013). As a result, these lowrank properties can be combined with the graph embedding problem, and thus it can address the sensitivity with respect to the local and neighbor properties. So, the main idea of constructing an informative graph is to use the lowrankness property to preserve the local and the global structure of the dataset with noise. Following the above analysis, we put forward a novel method of joint graph embedding and sparse regression with structure lowrank representation, named GESRLR, presented in the next sections in this paper.
GESRLR model
The aim of the proposed GESRLR method is to design an optimization framework to combine graph embedding and sparse regression in order to get a global overall optimum solution. Based on the above analysis of lowrank representation (LRR) and flexible manifold embedding (FME), the objective function of the proposed GESRLR is defined as follows:
where \(M_{ij} = \left\ {X_{i}  X_{j} } \right\_{2}^{2} ,\) U is a diagonal matrix defined as
\(\varsigma\) is a large constant such that \(F_{*l}\) and \(Y_{*l}\) \((l = 1,2, \ldots ,c)\) can be approximately satisfied, and \(F \in R^{n \times c}\) is the predicted labels of both labeled and unlabeled samples. In the objective function (7), the aim of the first term is to assess the fitness of labels which means that the predicted labels F should be close to the labels of the labeled data samples. The second term is the graph embedding and it aims at integrating the regression, graph embedding and label propagation for the unlabeled data samples from the labeled data samples. For the data point x _{ i }, if we get a larger weight Z _{ ij }, this means that the label \(F_{*j}\) has a bigger influence on the prediction of the label \(F_{*i}\) for the data point x _{ i }. The third item is used to minimize the regression residual. The third and fourth items represent the regression model, the goal being to learn the projection for fitting the labels of the data samples and classifying new data points. In this method, we adopt the l _{2,1}norm to regularize the projection matrix W, so that it is guaranteed that W is sparse in row for feature selection. The last three items adopt the lowrank representation to learn a weight graph. The five parameters α, β, λ _{1}, λ _{2} and γ are used to balance the influence of the corresponding five terms. Therefore, the objective function of the proposed GESRLR method can be formulated as follows:
where L = D − S is the Laplacian matrix, and D is a diagonal matrix with \(D_{ii} = \frac{{\sum {Z_{i*} } + \sum {Z_{*i} } }}{2}.\)
The solution of GESRLR
The optimization problem of (9) can be solved by calculating W independently and updating F and Z iteratively. In order to solve the optimization problem of (9), we introduce an auxiliary variable S to separate the objective function. We firstly convert the problem of (9) to the following equivalent optimization problem:
In order to solve the optimization problem, we first transfer the optimization problem to the Lagrange function, and the Lagrange function of problem (10) is as follows:
where \(\varphi \left( {Z,S,E,Y_{1} ,Y_{2} ,\mu } \right) = \frac{\mu }{2}\left( {\left\ {X  AZ  E + \frac{{Y_{1} }}{\mu }} \right\_{F}^{2} + \left\ {Z  S + \frac{{Y_{2} }}{\mu }} \right\_{F}^{2} } \right)\) and \(\left\langle {A,B} \right\rangle = tr(A^{T} B).\) Y _{1} and Y _{2} are the Lagrange multipliers and μ ≥ 0 is a penalty parameter. For solving the optimization problem, we use the LADMAP method. By fixing the other variables, the LADMAP updates the variables W, F, Z, S and E alternately, and then it updates Y _{1} and Y _{2}.
1. By fixing F, Z and S, W is solved by the following optimization problem:
By setting the derivative \(\frac{\partial L\left( W \right)}{\partial W} = 0,\) we have the following equation:
where \(D_{ii} = \frac{1}{{2\left\ {w_{i} } \right\_{2} }}\)
Or equivalently
where \(A = (XX^{T} + \beta D)^{  1} X.\)
2. By fixing W, S and Z, F is solved by the following optimization problem:
This is an unconstrained optimization problem. Let W = AF and integrate in the objective function and find the derivation of the problem (11) with respect to F, by making the value of the derivative to zero, and we have
where B = ((X ^{T} A − I)^{T}(X ^{T} A − I)).
3. By fixing W, S and F, Z is solved by the following optimization problem:
where \(\nabla_{Z} \varphi\) is the partial differential of \(\varphi\) with respect to Z, \(\theta = \left\ A \right\_{F}^{2} ,\) \(\left\ \cdot \right\_{F}\) represents the Frobenius norm.
where J is the thresholding operator with respect to the singular value \(\frac{{\lambda_{1} }}{\theta \mu }\). A proximal optimization method can be used to find the solution of Z.
4. By fixing W, Z and F, S is solved by the following optimization:
Let R = λ _{2} M + V, \(V_{ij} = \left\ {F_{i}  F_{j} } \right\_{2}^{2} .\) The optimization problem in (20) can be decomposed into n independent subproblems, and each of these subproblems can be formulated as a weight nonnegative sparse coding problem as follows:
where \((S^{k} )_{g}^{i}\) and \((R)_{g}^{i}\) are the gth elements of ith columns of matrices S ^{k} and R. Therefore, the problem of (21) has a closed form solution (Yang et al. 2013; Zhang et al. 2012).
5. By fixing W, Z, S and F, E is solved by the following optimization problem as follows:
from the above analysis, we can find that, on one hand, the deduction of the variables F, Z, S and E are closely dependent. On the other hand, the solution of variable W is only related to the variable F. Therefore, we can update the variables F, Z, S and E, iteratively, by fixing the other variables fixed. We can calculate the variable W by W = AF after getting the optimal solution of F.
The overall optimization framework for the proposed GESRLR method is described in Algorithm 1.
Experiments
Human social systems and human facial structure recognition is the emergent outcome of adaptation over a period of time (Holland and John 2012). Here, in the experiments described in this paper, we have used several datasets to evaluate the performance of the proposed GESRLR method (http://www.cad.zju.edu.cn/home/dengcai/Data/data.html), including two human face images datasets (i,e., the ORL and the extended Yale B datasets) in addition to an object dataset (COIL20), a spoken letter recognition dataset (Isolet 5) and a handwritten digit dataset (USPS dataset). The datasets contain the common images information in daily life, and they are widely used in the areas of image processing, machine learning, etc. The computing platform is matlab R2015B in a PC with CPU i7 2600, RAM 16G,
Datasets descriptions

1.
ORL dataset The ORL dataset consists of 400 face images of 40 people. These face images are taken under different situations, such as different time, varying lighting, facial details (glasses/no glasses) and facial expressions (open/closed eyes, smiling/not smiling).

2.
The extended Yale B dataset The extended Yale B dataset contains the face images of 38 people, each individual has around 64 frontal face images which are taken under different illuminations. For computing efficiency, we adjust the size of each image to 32 × 32 pixels in this experiments.

3.
COIL20 dataset The COIL20 dataset contains the images of 20 objects, each object has 72 images and the images are collected from varying every five degrees. For computation efficiency purposes, we adjust the size of each image to 32 × 32 pixels in this experiments.

4.
ISOLET 5 dataset The ISOLET spoken letter recognition dataset consists of 150 subjects, where each person speaks each letter from the alphabet twice. The speakers are divided into 5 groups, each group has 30 speakers, and this is marked as ISOLET 5 dataset. In this work, the ISOLET 5 dataset contains 1559 images, with images from 26 people, each speaker providing 60 images.

5.
USPS dataset The USPS dataset is a handwritten digit dataset, which contains two parts: the training set with 7291 samples, and the test set with 2007 samples. In this experiment, we randomly selected 7000 images of the 10 letters. Thus, there are 700 images in each category. The size of each images is 16 × 16 pixels.
Classification results
In this section, we evaluate the performance of the proposed GESRLR method. For the semisupervised problem, we compare the proposed GESRLR method with the following algorithms: FME (Nie et al. 2010), GFHF (Zhu et al. 2003), NNSG, SDA (Cai et al. 2007), LapRLS/L (Belkin et al. 2006), Transductive component analysis (TCA) (Liu et al. 2008), and MFA (Yan et al. 2007). We also use the learned projection matrix to classify the new samples. The classification method used in our experiments is the nearest neighbor (NN) classification. For the NNSG and GFHF methods, the classification method is as indicated in the corresponding research paper in (Zhou et al. 2004). For some embedding algorithms, we first learn the graph Laplacian matrix L while the graph weight matrix is defined as \(S_{ij} = e^{{\frac{{  \left\ {x_{i}  x_{j} } \right\^{2} }}{\sigma }}}\). The number of the nearest neighbors are chosen from the set of \(\{ 3,4,5,6,7,8,9,10\} ,\) and the kernel parameters are from the set of \(\{ 10^{  4} ,10^{  3} ,10^{  2} ,10^{  1} ,10^{0} ,10^{1} ,10^{2} ,10^{3} ,10^{4} \} .\) The final dimensions of some algorithms, such as FME, LapRLS/L, SDA, TCA and MFA are set to the number of the classes and the parameters in these methods are set to the best value according to the related research papers. While the parameters of the proposed method GESRLR (α, β, λ _{1}, λ _{2} and γ) are chosen from the range of (10^{−4}, 10^{0}). For the sake of computational efficiency, all data in these data sets were eventually reduced to 60D vectors.
We performed the experiments on the above datasets: ORL, the extended Yale B, COIL20, Isolet5 and USPS. For every dataset, we randomly selected 50 % samples of each subject as the training sample set, while the remaining samples are selected as the testing set. For the semisupervised classification, we select p samples per subject as the labeled data samples, while the remaining formed the unlabeled data samples. The unlabeled data samples are used to test the performance of semisupervised classification, while the testing sample set is used to test the performance of classifying the new data samples with the learned projection matrix.
For the dataset of ORL, COIL20, Isolet5 and USPS, the number of the labeled data sample is set to p = 1, 2 and 3, respectively. For the dataset of the extended Yale B, the number of the labeled data samples is set to p = 5, 10 and 15, respectively. In addition to the MFA algorithm, where the labeled samples were used for subspace learning, for the other algorithms, the training samples are used to learn the projection matrix. We run the experiments 30 times on the unlabeled data samples and the test data samples, and we obtain the mean classification accuracy and standard deviation (%). In the Tables 1, 2, 3, 4, 5, the corresponding experiments are referred as Semi and Test respectively. From these experimental results, we can get the following conclusions:

1.
In terms of classification accuracy, the semisupervised classification algorithms TCA, LapRLS/L, SDA get a higher classification accuracy than the supervised classification algorithm MFA. This shows that the unlabeled data samples help to improve the performance of the semisupervised classification.

2.
In some datasets, the GFHF algorithm achieves higher semisupervised classification accuracy than that of TCA, LapRLS/L and SDA algorithms, especially on the datasets which have some strong variations. For example, the extended Yale B dataset has strong illumination changes and expression. In this case, the label propagation may not perform well. This phenomenon is more obvious on the extended Yale B dataset.

3.
On the unlabeled dataset, the performance of the proposed GESRLR algorithm is obviously better than the compared methods. This indicates that the structure of the graph obtained by the GESRLR method has more discriminant information, which is more effective for the label propagation. This also suggests that simultaneously performing label propagation and graph learning is necessary and effective.
The GESRLR method requires five parameters (α, β, λ _{1}, λ _{2} and γ) to be set in advance. Figure 1 shows the classification accuracy versus the variations of the five parameters, respectively, on the extended Yale B dataset. 50 % of samples per subject were randomly selected as training samples and remaining samples were used as test samples. We report the mean recognition accuracy over 20 random splits. Obviously, it can be found that when the parameters vary in a relatively large ranges, the performance of the proposed GESRLR method is more stable.
Next, we consider the effectiveness of the algorithm when different dimension sizes are used. The experiment is conduct on the extended Yale B dataset. We also report the mean recognition accuracy over 20 random splits. We can see from the Fig. 2, when use larger dimensions of the feature, the accuracy increase, while when the number of features is over 60, the accuracy increases slowly and it is much stable.
Conclusion
Complex adaptive systems (CAS) involve the processing of large amounts of high dimensional data. It is thus paramount to develop and employ effective machine learning techniques to deal with such high dimensional and large datasets generated from the CAS area. In this paper, we proposed a novel semisupervised learning method termed as graph embedding and sparse regression with structure low rank representation (GESRLR), by combing graph embedding and sparse regression, which are performed simultaneously in order to get an optimal solution. Different from some traditional methods, the proposed GESRLR method takes into account both the local and global structure of the dataset to construct an informative graph. Extensive experiments on five datasets demonstrate that the GESRLR method outperform the stateoftheart methods. In our future work, we will extend the ideas presented in this paper and will apply the proposed GESRLR method to other challenging problems.
Abbreviations
 GESRLR:

graph embedding and sparse regression with structure low rank representation
 CAS:

complex adaptive systems
 PCA:

Principal component analysis
 LDA:

linear discriminant analysis
 NCA:

neighborhood component analysis
 LLE:

locally linear embedding
 LE:

Laplacian eigenmap
 LPP:

locality preserving projection
 LLP:

local learning projection
 NPE:

neighborhood preserving embedding
 LRR:

lowrank representation
 RPCA:

robust principal component analysis
 NNLRS:

nonnegative lowrank and sparse representation
 FME:

flexible manifold embedding
 LADMAP:

linearized alternating direction method with adaptive penalty
 TCA:

transductive component analysis
 NN:

the nearest neighbor
References
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396
Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 1(7):2399–2434
Cai D, He X, Han J (2007) Semisupervised discriminant analysis. In: IEEE 11th international conference on computer vision, 2007, ICCV 2007. IEEE, New York, pp 1–7
Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):11
Goldberger J, Hinton GE, Roweis ST, Salakhutdinov R (2004) Neighborhood components analysis. In: Advances in neural information processing systems, pp 513–520
He X, Yan S, Hu Y, Niyogi P, Zhang HJ (2005a) Face recognition using Laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340
He X, Cai D, Yan S, Zhang HJ (2005b) Neighborhood preserving embedding. In: Tenth IEEE international conference on computer vision, 2005. ICCV 2005, vol 2. IEEE, New York, pp 1208–1213
Holland, John H (2012) Signals and boundaries: building blocks for complex adaptive systems. Mit Press, Cambridge
Li X, Lin S, Yan S, Xu D (2008) Discriminant locally linear embedding with highorder tensor data. IEEE Trans Syst Man Cybern Part B 38(2):342–352
Liu W, Tao D, Liu J (2008) Transductive component analysis. In: Eighth IEEE international conference on data mining, 2008, ICDM’08. IEEE, New York, pp 433–442
Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by lowrank representation. In: ICML, pp 663–670
Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by lowrank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184
Niazi Muaz A, Hussain Amir (2013) Complex adaptive systems. Cognitive agentbased computingI. Springer, Amsterdam, pp 21–32
Nie F, Xiang S, Jia Y, Zhang C (2009) Semisupervised orthogonal discriminant analysis via label propagation. Pattern Recogn 42(11):2615–2627
Nie F, Xu D, Tsang IW, Zhang C (2010) Flexible manifold embedding: a framework for semisupervised and unsupervised dimension reduction. IEEE Trans Image Process 19(7):1921–1932
Nie F, Xu D, Li X, Xiang S (2011) Semisupervised dimensionality reduction and classification through virtual label regression. IEEE Trans Syst Man Cybern Part B 41(3):675–685
Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis. In: Proceedings of the 31st international conference on machine learning (ICML14), pp 1062–1070
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Wang SJ, Yan S, Yang J, Zhou CG, Fu X (2014) A general exponential framework for dimensionality reduction. IEEE Trans Image Process 23(2):920–930
Wright J, Ganesh A, Rao S, Peng Y, Ma Y (2009) Robust principal component analysis: exact recovery of corrupted lowrank matrices via convex optimization. In: Advances in neural information processing systems, pp 2080–2088
Wu M, Yu K, Yu S, Schölkopf B (2007) Local learning projections. In: Proceedings of the 24th international conference on machine learning. ACM, New York, pp 1039–1046
Xu D, Yan S, Lin S, Huang TS, Chang SF (2009) Enhancing bilinear subspace learning by element rearrangement. IEEE Trans Pattern Anal Mach Intell 31(10):1913–1920
Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51
Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering using local discriminant models and global integration. IEEE Trans Image Process 19(10):2761–2773
Yang J, Chu D, Zhang L, Xu Y, Yang J (2013) Sparse representation classifier steered discriminative projection with applications to face recognition. IEEE Trans Neural Networks Learn Syst 24(7):1023–1035
Zhang T, Tao D, Li X, Yang J (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21(9):1299–1313
Zhang L, Zhou WD, Chang PC, Liu J, Yan Z, Wang T, Li FZ (2012) Kernel sparse representationbased classifier. IEEE Trans Signal Process 60(4):1684–1695
Zhou T, Tao D (2013) Double shrinking sparse dimension reduction. IEEE Trans Image Process 22(1):244–257
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. Adv Neural Inform Process Syst 16(16):321–328
Zhu X, Ghahramani Z, Lafferty J (2003) Semisupervised learning using Gaussian fields and harmonic functions. In: ICML, vol 3, pp. 912–919
Zhuang L, Gao H, Lin Z, Ma Y, Zhang X, Yu N (2012) Nonnegative low rank and sparse graph for semisupervised learning. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012. IEEE, New York, pp 2328–2335
Zuo W, Zhang D, Yang J, Wang K (2006) BDPCA plus LDA: a novel fast feature extraction technique for face recognition. IEEE Trans Syst Man Cybern Part B 36(4):946–953
Authors’ contributions
CZY conceived the study, performed the experiments and wrote the paper. VP and XJW reviewed and edited the manuscript. All authors read and approved the final manuscript.
Authors’ information
CongZhe You is a Ph.D. Candidate in Jiangnan University, now he is a visiting student in Coventry University sponsored by the China Scholarship Council. His research interests are in the area of Machine Learning and Pattern Recognition.
Vasile Palade received the Ph.D. degree from the University of Galati, Galaţi, Romania, in 1999. He is currently a Reader with the School of Computing, Electronics and Maths, Coventry University, Coventry, U.K. His research interests include computational intelligence with application to bioinformatics, fault diagnosis, web usage mining, among others. He published more than 100 papers in Journals and conference proceedings as well as several books.
XiaoJun Wu received the B.Sc. degree in mathematics from Nanjing Normal University, Nanjing, China, in 1991. He received the M.S. degree in 1996, and the Ph.D. degree in pattern recognition and intelligent systems in 2002, both from Nanjing University of Science and Technology, Nanjing, China. He joined Jiangnan University in 2006, where he is currently a Professor. He has published more than 150 papers in his fields of research. He was a visiting researcher in the Centre for Vision, Speech, and Signal Processing (CVSSP), University of Surrey, U.K., from 2003 to 2004. His current research interests include pattern recognition, computer vision, fuzzy systems, neural networks, and intelligent systems.
Acknowledgements
The authors would like to thank the anonymous reviewers and editors for their valuable suggestions.
Competing interests
The authors declare that they have no competing interests.
Funding
This study was funded by the National Natural Science Foundation of China (Grant No. 61373055) and the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20130093110009).
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Lowrank representation
 Sparse representation
 Graph embedding
 Sparse regression
 Semisupervised classification