Combining graph embedding and sparse regression with structure lowrank representation for semisupervised learning
 CongZhe You^{1, 2}Email author,
 Vasile Palade^{2} and
 XiaoJun Wu^{1}
DOI: 10.1186/s4029401600347
© The Author(s) 2016
Received: 8 August 2016
Accepted: 1 October 2016
Published: 7 October 2016
Abstract
In this paper, we propose a novel method for semisupervised learning by combining graph embedding and sparse regression, termed as graph embedding and sparse regression with structure low rank representation (GESRLR), in which the embedding learning and the sparse regression are performed in a combined approach. Most of the graph based semisupervised learning methods take into account the local neighborhood information while ignoring the global structure of the data. The proposed GESRLR method learns a lowrank weight matrix by projecting the data onto a lowdimensional subspace. The GESRLR makes full use of the supervised learning information in the construction of the affinity matrix, and the affinity construction is combined with graph embedding in a single step to guarantee the global optimal solution. In the dimensionality reduction procedure, the proposed GESRLR can preserve the global structure of the data, and the learned lowrank weight matrix can effectively reduce the influence of the noise. An effective novel algorithm to solve the corresponding optimization problem was designed and is presented in this paper. Extensive experimental results demonstrate that the GESRLR method can obtain a higher classification accuracy than other stateoftheart methods.
Keywords
Lowrank representation Sparse representation Graph embedding Sparse regression Semisupervised classificationIntroduction
Complex adaptive systems (CAS) research area is trying to establish a comprehensive and general understanding of the complex world around us (Niazi and Hussain 2013). Complex systems typically involve the generation of high dimensional data and rely on effective analysis and management of such highdimensional data. High dimensional data exists in a wide variety of real applications, such as text mining, image retrieval, and visual object recognition. While the high performance of computers can address some of the problems of high dimensional data, for example, the time consuming problem, however, the processing of highdimensional data often suffers from a series of other problems, such as the curse of dimensionality and the impact of noise and redundancy. Fortunately, it has been shown that the high dimensionality of the data is usually small in the intrinsic reduced space.
In the more or less recent past time, researchers have put forward a lot of efficient data dimensionality reduction algorithms (Wang et al. 2014; Zhou and Tao 2013; Nie et al. 2011; Xu et al. 2009; Li et al. 2008). Principal component analysis (PCA) (Belhumeur et al. 1997) is a traditional method that projects the high dimensional data onto a low dimensional space. Linear discriminant analysis (LDA) (Zuo et al. 2006) is a supervised dimensionality reduction method by maximizing the amount of betweenclass variance relative to the amount of withinclass variance (Nie et al. 2009; Yang et al. 2010). Neighborhood component analysis (NCA) (Goldberger et al. 2004) learns a linear transformation by directly maximizing the stochastic variant of the expected leaveoneout classification accuracy on the training set. In order to find the intrinsic manifold structure of data samples, researchers also proposed some nonlinear dimension reduction methods, such as the locally linear embedding (LLE) (Roweis and Saul 2000) and the Laplacian eigenmap (LE) (Belkin and Niyogi 2003). If there are new data samples in the training set, the Laplacian methods need to learn the whole training set again, this is one of the disadvantages of these types of algorithms. In order to solve this problem, He et al. (2005a) put forward the algorithm of locality preserving projection (LPP), in which the linear projection is used to deal with new data samples. Wu et al. (2007) proposed the local learning projection (LLP) method to solve this problem. In addition, the neighborhood preserving embedding (NPE) (He et al. 2005b) algorithm was put forward to keep the local neighborhood structure on the manifold of the data samples. Some previous studies (Zhang et al. 2009; Tenenbaum et al. 2000; Yan et al. 2007) proved that many dimensionality reduction algorithms can be expressed as a unified framework.
However, in real applications, most of the methods mentioned above can only preserve the information of the local neighbors, while ignoring the global structure of the data. The local structure of the dataset may be easily affected by some factors such as noise, illumination or corruption. As a result, the performance of clustering or classification tasks will be reduced because of these. Fortunately, some researches have shown that the recently proposed lowrank representation (LRR) (Liu et al. 2010, 2013) algorithm has a good robustness for datasets that contain noise or corruption. In the past few years, a series of robust classification algorithms based on lowrank representation have been put forward. The Robust PCA (RPCA) (Wright et al. 2009; Candès et al. 2011) use the lowrank representation to recover the structure of subspaces from the dataset corrupted by noise. For subspace segmentation problem, Liu et al. (2010, 2013) use the nuclear norm to find the lowest rank representation of a dataset; in this way, the global structure of the dataset can be well preserved. Unlike the lowrank representation seeking the lowest rank of the dataset, sparse representation finds the sparest representation of a dataset. Zhuang et al. (2012) combine the sparsity and lowrankness together to put forward a nonnegative lowrank and sparse representation (NNLRS) for dealing with the highdimensional dataset. And then they use the representation coefficient matrix to construct the affinity graph for subspace segmentation. Through the combination of sparse representation and lowrank representation, the NNLRS method can both capture the global structure and the local structure of the dataset.
Through the analysis of the above problems, a novel method is proposed in this paper by combining the graph embedding and sparse regression method in a joint optimization framework. And the supervised learning information is also used in the framework to guide the construction of the affinity graph. In this paper, the construction of the affinity and graph embedding are combined to ensure the overall optimal solution. In the whole learning process, the label information can be accurately propagated through the graph construction. Thus, the linear regression can learn the discriminative projection to better adapt to the labels of the samples and improve the classification rate of the new samples. In order to solve the corresponding optimization problem, this paper proposes an iterative optimization procedure.
 1.
Different from conventional methods, by both using the lowrank representation and sparse representation which can preserve the global structure and the local structure of the data, the proposed GESRLR method can learn a novel weight graph.
 2.
By unifying the graph learning, projection learning and label propagation into a joint optimization framework, the proposed GESRLR method can guarantee an overall optimum solution.
The remaining of this paper is organized as follows: “Background and related work” section briefly reviews the background and some related works. The proposed GESRLR method and the corresponding solution are described in “Combined graph embedding and sparse regression with structure lowrank representation” section. Extensive experiments are conducted in “Experiments” section. Finally, we conclude the paper in “Conclusion” section.
Background and related work
Lowrank representation (LRR)
There are many optimization methods for solving the problem (3). After we get the final result of representation coefficient matrix Z, we can use is as a kind of similarity to construct an affinity graph (Z + Z ^{ T }). Then we use the spectral clustering method on the affinity graph to obtain the final clustering result.
Flexible Manifold Embedding (FME)
Combined graph embedding and sparse regression with structure lowrank representation
In the following section, we first introduce the motivation of the proposed method of GESRLR, summarize the objective function of the GESRLR method and propose the optimization solution.
Motivations
For the label propagation problem, we usually have the following hypothesis: a data sample and its nearest neighbors usually belong to the same class, and the nearest neighbors would have a big influence in the determination of the labels of new data samples. In short, the labels of similar samples should be close and we can propagate the labels to similar samples. Therefore, in the construction of an ideal graph we should consider that similar data points and their nearest neighbors should be assigned larger weight values. However, the evaluation of similarity of most traditional graph construction methods mainly depends on the pairwise Euclidean distance, while the Euclidean distance is very sensitive to noise and any other corruption of the data samples (Zhuang et al. 2012). However, these methods can only capture the local structure of the dataset, but ignore to preserve the global structure of the dataset. Fortunately, some recent studies show that the LRR method can preserve the global structure of the dataset, and it is robust to noise and the corruption of the dataset (Liu et al. 2010, 2013). As a result, these lowrank properties can be combined with the graph embedding problem, and thus it can address the sensitivity with respect to the local and neighbor properties. So, the main idea of constructing an informative graph is to use the lowrankness property to preserve the local and the global structure of the dataset with noise. Following the above analysis, we put forward a novel method of joint graph embedding and sparse regression with structure lowrank representation, named GESRLR, presented in the next sections in this paper.
GESRLR model
The solution of GESRLR
The overall optimization framework for the proposed GESRLR method is described in Algorithm 1.
Experiments
Human social systems and human facial structure recognition is the emergent outcome of adaptation over a period of time (Holland and John 2012). Here, in the experiments described in this paper, we have used several datasets to evaluate the performance of the proposed GESRLR method (http://www.cad.zju.edu.cn/home/dengcai/Data/data.html), including two human face images datasets (i,e., the ORL and the extended Yale B datasets) in addition to an object dataset (COIL20), a spoken letter recognition dataset (Isolet 5) and a handwritten digit dataset (USPS dataset). The datasets contain the common images information in daily life, and they are widely used in the areas of image processing, machine learning, etc. The computing platform is matlab R2015B in a PC with CPU i7 2600, RAM 16G,
Datasets descriptions
 1.
ORL dataset The ORL dataset consists of 400 face images of 40 people. These face images are taken under different situations, such as different time, varying lighting, facial details (glasses/no glasses) and facial expressions (open/closed eyes, smiling/not smiling).
 2.
The extended Yale B dataset The extended Yale B dataset contains the face images of 38 people, each individual has around 64 frontal face images which are taken under different illuminations. For computing efficiency, we adjust the size of each image to 32 × 32 pixels in this experiments.
 3.
COIL20 dataset The COIL20 dataset contains the images of 20 objects, each object has 72 images and the images are collected from varying every five degrees. For computation efficiency purposes, we adjust the size of each image to 32 × 32 pixels in this experiments.
 4.
ISOLET 5 dataset The ISOLET spoken letter recognition dataset consists of 150 subjects, where each person speaks each letter from the alphabet twice. The speakers are divided into 5 groups, each group has 30 speakers, and this is marked as ISOLET 5 dataset. In this work, the ISOLET 5 dataset contains 1559 images, with images from 26 people, each speaker providing 60 images.
 5.
USPS dataset The USPS dataset is a handwritten digit dataset, which contains two parts: the training set with 7291 samples, and the test set with 2007 samples. In this experiment, we randomly selected 7000 images of the 10 letters. Thus, there are 700 images in each category. The size of each images is 16 × 16 pixels.
Classification results
Semisupervised classification results of different algorithms on the COIL20 dataset
Method  P = 1  P = 2  P = 3  

Semi (%)  Test (%)  Semi (%)  Test (%)  Semi (%)  Test (%)  
GFHF  78.65 ± 2.07  –  81.32 ± 1.77  –  84.56 ± 2.02  – 
MFA  –  –  69.87 ± 2.24  70.10 ± 2.52  76.54 ± 2.28  76.27 ± 2.37 
SDA  64.92 ± 2.07  65.80 ± 2.54  72.24 ± 2.19  73.19 ± 2.15  78.89 ± 2.05  78.19 ± 2.66 
TCA  71.08 ± 2.23  70.83 ± 2.51  78.17 ± 3.15  77.29 ± 2.18  81.15 ± 2.32  80.96 ± 2.27 
LapRLS  69.46 ± 2.58  69.73 ± 2.76  75.21 ± 2.66  75.16 ± 2.31  79.61 ± 2.54  79.85 ± 2.59 
FME  76.31 ± 2.09  74.46 ± 2.13  82.35 ± 2.18  79.14 ± 2.39  85.86 ± 1.92  84.70 ± 2.03 
NNSG  79.15 ± 2.86  75.31 ± 2.01  83.79 ± 2.69  80.88 ± 2.43  86.62 ± 2.29  82.13 ± 2.24 
GESRLR  81.09 ± 2.33  76.79 ± 2.18  85.29 ± 2.62  81.07 ± 2.59  87.12 ± 2.15  83.32 ± 2.16 
We performed the experiments on the above datasets: ORL, the extended Yale B, COIL20, Isolet5 and USPS. For every dataset, we randomly selected 50 % samples of each subject as the training sample set, while the remaining samples are selected as the testing set. For the semisupervised classification, we select p samples per subject as the labeled data samples, while the remaining formed the unlabeled data samples. The unlabeled data samples are used to test the performance of semisupervised classification, while the testing sample set is used to test the performance of classifying the new data samples with the learned projection matrix.
Semisupervised classification results of different algorithms on the USPS dataset
Method  P = 1  P = 2  P = 3  

Semi (%)  Test (%)  Semi (%)  Test (%)  Semi (%)  Test (%)  
GFHF  72.39 ± 3.60  –  79.66 ± 3.67  –  83.39 ± 3.07  – 
MFA  –  –  68.74 ± 3.82  66.52 ± 4.23  72.76 ± 4.21  70.57 ± 3.19 
SDA  56.86 ± 3.11  54.91 ± 3.92  67.37 ± 3.26  67.43 ± 2.91  72.66 ± 2.64  69.32 ± 3.20 
TCA  70.39 ± 3.38  65.36 ± 3.17  76.52 ± 3.21  71.27 ± 3.28  79.58 ± 3.37  72.76 ± 2.94 
LapRLS  57.89 ± 4.08  58.42 ± 4.36  69.03 ± 3.86  69.39 ± 2.49  76.02 ± 3.28  74.08 ± 2.79 
FME  74.75 ± 6.52  67.91 ± 5.04  79.64 ± 3.41  73.26 ± 3.19  82.15 ± 2.26  74.97 ± 2.72 
NNSG  76.98 ± 3.80  68.92 ± 3.37  81.17 ± 2.59  76.85 ± 2.57  84.50 ± 2.13  76.38 ± 2.54 
GESRLR  78.49 ± 3.65  69.56 ± 3.18  83.61 ± 2.36  77.28 ± 2.29  86.07 ± 2.73  78.20 ± 2.17 
Semisupervised classification results of different algorithms on the ISOLET5 dataset
Method  P = 1  P = 2  P = 3  

Semi (%)  Test (%)  Semi (%)  Test (%)  Semi (%)  Test (%)  
GFHF  49.29 ± 2.15  –  56.26 ± 2.44  –  61.13 ± 2.14  – 
MFA  –  –  61.19 ± 2.14  61.46 ± 2.89  65.52 ± 2.27  65.19 ± 2.36 
SDA  52.01 ± 2.38  51.19 ± 2.54  61.31 ± 2.28  61.57 ± 2.35  67.55 ± 2.28  67.91 ± 2.06 
TCA  49.19 ± 2.94  49.30 ± 2.13  59.77 ± 2.36  59.16 ± 2.42  64.72 ± 2.37  65.01 ± 2.38 
LapRLS  51.71 ± 3.03  50.98 ± 2.84  61.63 ± 2.37  61.85 ± 2.21  65.19 ± 1.89  65.25 ± 2.05 
FME  49.92 ± 2.40  50.17 ± 2.49  59.92 ± 2.45  59.88 ± 2.56  65.98 ± 1.64  66.13 ± 2.29 
NNSG  53.39 ± 2.26  51.75 ± 2.37  62.84 ± 2.57  62.63 ± 2.26  67.33 ± 2.21  67.94 ± 2.15 
GESRLR  55.01 ± 2.25  52.26 ± 2.82  63.09 ± 2.12  63.13 ± 2.43  69.26 ± 2.24  70.03 ± 1.78 
Semisupervised classification results of different algorithms on the ORL dataset
Method  P = 1  P = 2  P = 3  

Semi (%)  Test (%)  Semi (%)  Test (%)  Semi (%)  Test (%)  
GFHF  52.81 ± 4.31  –  63.26 ± 3.78  –  68.97 ± 3.54  – 
MFA  –  –  78.22 ± 4.25  79.11 ± 3.76  85.40 ± 3.89  84.78 ± 2.54 
SDA  65.29 ± 2.72  65.32 ± 2.83  75.84 ± 3.61  76.92 ± 3.25  82.44 ± 2.54  82.95 ± 2.26 
TCA  64.75 ± 2.05  64.61 ± 2.29  77.02 ± 3.15  78.80 ± 2.57  84.49 ± 3.12  84.27 ± 2.67 
LapRLS  61.49 ± 3.31  59.88 ± 3.10  78.29 ± 2.54  77.86 ± 2.71  85.83 ± 2.75  85.94 ± 2.39 
FME  68.25 ± 2.58  66.69 ± 3.24  80.80 ± 3.25  80.73 ± 2.76  85.92 ± 3.67  84.35 ± 2.64 
NNSG  71.86 ± 3.29  67.77 ± 3.73  82.57 ± 2.65  82.91 ± 2.15  86.38 ± 3.83  85.52 ± 2.97 
GESRLR  73.08 ± 3.17  69.29 ± 3.68  85.52 ± 2.14  85.64 ± 2.89  87.45 ± 3.54  86.12 ± 2.99 
 1.
In terms of classification accuracy, the semisupervised classification algorithms TCA, LapRLS/L, SDA get a higher classification accuracy than the supervised classification algorithm MFA. This shows that the unlabeled data samples help to improve the performance of the semisupervised classification.
 2.
In some datasets, the GFHF algorithm achieves higher semisupervised classification accuracy than that of TCA, LapRLS/L and SDA algorithms, especially on the datasets which have some strong variations. For example, the extended Yale B dataset has strong illumination changes and expression. In this case, the label propagation may not perform well. This phenomenon is more obvious on the extended Yale B dataset.
 3.
On the unlabeled dataset, the performance of the proposed GESRLR algorithm is obviously better than the compared methods. This indicates that the structure of the graph obtained by the GESRLR method has more discriminant information, which is more effective for the label propagation. This also suggests that simultaneously performing label propagation and graph learning is necessary and effective.
Semisupervised classification results of different algorithms on the extended Yale B dataset
Method  P = 5  P = 10  P = 15  

Semi (%)  Test (%)  Semi (%)  Test (%)  Semi (%)  Test (%)  
GFHF  27.49 ± 1.27  –  34.76 ± 2.11  –  40.13 ± 2.02  – 
MFA  –  –  69.52 ± 3.19  70.08 ± 3.26  73.90 ± 2.72  74.15 ± 3.42 
SDA  51.92 ± 2.36  52.06 ± 1.58  66.76 ± 1.65  67.49 ± 1.41  73.40 ± 1.19  73.08 ± 1.78 
TCA  51.47 ± 2.19  52.56 ± 2.34  65.94 ± 1.95  66.76 ± 2.25  74.38 ± 1.76  74.28 ± 2.37 
LapRLS  60.16 ± 2.24  59.47 ± 1.83  74.85 ± 1.67  74.19 ± 1.47  78.64 ± 2.54  78.08 ± 2.67 
FME  63.46 ± 2.14  63.75 ± 1.89  76.92 ± 2.38  74.37 ± 1.22  80.38 ± 1.77  78.19 ± 2.03 
NNSG  72.37 ± 2.25  66.92 ± 1.64  82.25 ± 1.64  75.42 ± 1.27  83.38 ± 1.93  79.06 ± 1.25 
GESRLR  75.26 ± 2.59  68.13 ± 1.54  84.11 ± 1.57  76.61 ± 1.95  85.87 ± 1.69  80.52 ± 1.28 
Conclusion
Complex adaptive systems (CAS) involve the processing of large amounts of high dimensional data. It is thus paramount to develop and employ effective machine learning techniques to deal with such high dimensional and large datasets generated from the CAS area. In this paper, we proposed a novel semisupervised learning method termed as graph embedding and sparse regression with structure low rank representation (GESRLR), by combing graph embedding and sparse regression, which are performed simultaneously in order to get an optimal solution. Different from some traditional methods, the proposed GESRLR method takes into account both the local and global structure of the dataset to construct an informative graph. Extensive experiments on five datasets demonstrate that the GESRLR method outperform the stateoftheart methods. In our future work, we will extend the ideas presented in this paper and will apply the proposed GESRLR method to other challenging problems.
Abbreviations
 GESRLR:

graph embedding and sparse regression with structure low rank representation
 CAS:

complex adaptive systems
 PCA:

Principal component analysis
 LDA:

linear discriminant analysis
 NCA:

neighborhood component analysis
 LLE:

locally linear embedding
 LE:

Laplacian eigenmap
 LPP:

locality preserving projection
 LLP:

local learning projection
 NPE:

neighborhood preserving embedding
 LRR:

lowrank representation
 RPCA:

robust principal component analysis
 NNLRS:

nonnegative lowrank and sparse representation
 FME:

flexible manifold embedding
 LADMAP:

linearized alternating direction method with adaptive penalty
 TCA:

transductive component analysis
 NN:

the nearest neighbor
Declarations
Authors’ contributions
CZY conceived the study, performed the experiments and wrote the paper. VP and XJW reviewed and edited the manuscript. All authors read and approved the final manuscript.
Authors’ information
CongZhe You is a Ph.D. Candidate in Jiangnan University, now he is a visiting student in Coventry University sponsored by the China Scholarship Council. His research interests are in the area of Machine Learning and Pattern Recognition.
Vasile Palade received the Ph.D. degree from the University of Galati, Galaţi, Romania, in 1999. He is currently a Reader with the School of Computing, Electronics and Maths, Coventry University, Coventry, U.K. His research interests include computational intelligence with application to bioinformatics, fault diagnosis, web usage mining, among others. He published more than 100 papers in Journals and conference proceedings as well as several books.
XiaoJun Wu received the B.Sc. degree in mathematics from Nanjing Normal University, Nanjing, China, in 1991. He received the M.S. degree in 1996, and the Ph.D. degree in pattern recognition and intelligent systems in 2002, both from Nanjing University of Science and Technology, Nanjing, China. He joined Jiangnan University in 2006, where he is currently a Professor. He has published more than 150 papers in his fields of research. He was a visiting researcher in the Centre for Vision, Speech, and Signal Processing (CVSSP), University of Surrey, U.K., from 2003 to 2004. His current research interests include pattern recognition, computer vision, fuzzy systems, neural networks, and intelligent systems.
Acknowledgements
The authors would like to thank the anonymous reviewers and editors for their valuable suggestions.
Competing interests
The authors declare that they have no competing interests.
Funding
This study was funded by the National Natural Science Foundation of China (Grant No. 61373055) and the Specialized Research Fund for the Doctoral Program of Higher Education of China (Grant No. 20130093110009).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720View ArticleGoogle Scholar
 Belkin M, Niyogi P (2003) Laplacian eigenmaps for dimensionality reduction and data representation. Neural Comput 15(6):1373–1396View ArticleMATHGoogle Scholar
 Belkin M, Niyogi P, Sindhwani V (2006) Manifold regularization: a geometric framework for learning from labeled and unlabeled examples. J Mach Learn Res 1(7):2399–2434MathSciNetMATHGoogle Scholar
 Cai D, He X, Han J (2007) Semisupervised discriminant analysis. In: IEEE 11th international conference on computer vision, 2007, ICCV 2007. IEEE, New York, pp 1–7
 Candès EJ, Li X, Ma Y, Wright J (2011) Robust principal component analysis? J ACM 58(3):11MathSciNetView ArticleMATHGoogle Scholar
 Goldberger J, Hinton GE, Roweis ST, Salakhutdinov R (2004) Neighborhood components analysis. In: Advances in neural information processing systems, pp 513–520
 He X, Yan S, Hu Y, Niyogi P, Zhang HJ (2005a) Face recognition using Laplacianfaces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340View ArticleGoogle Scholar
 He X, Cai D, Yan S, Zhang HJ (2005b) Neighborhood preserving embedding. In: Tenth IEEE international conference on computer vision, 2005. ICCV 2005, vol 2. IEEE, New York, pp 1208–1213
 Holland, John H (2012) Signals and boundaries: building blocks for complex adaptive systems. Mit Press, Cambridge
 Li X, Lin S, Yan S, Xu D (2008) Discriminant locally linear embedding with highorder tensor data. IEEE Trans Syst Man Cybern Part B 38(2):342–352View ArticleGoogle Scholar
 Liu W, Tao D, Liu J (2008) Transductive component analysis. In: Eighth IEEE international conference on data mining, 2008, ICDM’08. IEEE, New York, pp 433–442
 Liu G, Lin Z, Yu Y (2010) Robust subspace segmentation by lowrank representation. In: ICML, pp 663–670
 Liu G, Lin Z, Yan S, Sun J, Yu Y, Ma Y (2013) Robust recovery of subspace structures by lowrank representation. IEEE Trans Pattern Anal Mach Intell 35(1):171–184View ArticleGoogle Scholar
 Niazi Muaz A, Hussain Amir (2013) Complex adaptive systems. Cognitive agentbased computingI. Springer, Amsterdam, pp 21–32Google Scholar
 Nie F, Xiang S, Jia Y, Zhang C (2009) Semisupervised orthogonal discriminant analysis via label propagation. Pattern Recogn 42(11):2615–2627View ArticleMATHGoogle Scholar
 Nie F, Xu D, Tsang IW, Zhang C (2010) Flexible manifold embedding: a framework for semisupervised and unsupervised dimension reduction. IEEE Trans Image Process 19(7):1921–1932MathSciNetView ArticleGoogle Scholar
 Nie F, Xu D, Li X, Xiang S (2011) Semisupervised dimensionality reduction and classification through virtual label regression. IEEE Trans Syst Man Cybern Part B 41(3):675–685View ArticleGoogle Scholar
 Nie F, Yuan J, Huang H (2014) Optimal mean robust principal component analysis. In: Proceedings of the 31st international conference on machine learning (ICML14), pp 1062–1070
 Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326View ArticleGoogle Scholar
 Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323View ArticleGoogle Scholar
 Wang SJ, Yan S, Yang J, Zhou CG, Fu X (2014) A general exponential framework for dimensionality reduction. IEEE Trans Image Process 23(2):920–930MathSciNetView ArticleGoogle Scholar
 Wright J, Ganesh A, Rao S, Peng Y, Ma Y (2009) Robust principal component analysis: exact recovery of corrupted lowrank matrices via convex optimization. In: Advances in neural information processing systems, pp 2080–2088
 Wu M, Yu K, Yu S, Schölkopf B (2007) Local learning projections. In: Proceedings of the 24th international conference on machine learning. ACM, New York, pp 1039–1046
 Xu D, Yan S, Lin S, Huang TS, Chang SF (2009) Enhancing bilinear subspace learning by element rearrangement. IEEE Trans Pattern Anal Mach Intell 31(10):1913–1920View ArticleGoogle Scholar
 Yan S, Xu D, Zhang B, Zhang HJ, Yang Q, Lin S (2007) Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell 29(1):40–51View ArticleGoogle Scholar
 Yang Y, Xu D, Nie F, Yan S, Zhuang Y (2010) Image clustering using local discriminant models and global integration. IEEE Trans Image Process 19(10):2761–2773MathSciNetView ArticleGoogle Scholar
 Yang J, Chu D, Zhang L, Xu Y, Yang J (2013) Sparse representation classifier steered discriminative projection with applications to face recognition. IEEE Trans Neural Networks Learn Syst 24(7):1023–1035View ArticleGoogle Scholar
 Zhang T, Tao D, Li X, Yang J (2009) Patch alignment for dimensionality reduction. IEEE Trans Knowl Data Eng 21(9):1299–1313View ArticleGoogle Scholar
 Zhang L, Zhou WD, Chang PC, Liu J, Yan Z, Wang T, Li FZ (2012) Kernel sparse representationbased classifier. IEEE Trans Signal Process 60(4):1684–1695MathSciNetView ArticleGoogle Scholar
 Zhou T, Tao D (2013) Double shrinking sparse dimension reduction. IEEE Trans Image Process 22(1):244–257MathSciNetView ArticleGoogle Scholar
 Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B (2004) Learning with local and global consistency. Adv Neural Inform Process Syst 16(16):321–328Google Scholar
 Zhu X, Ghahramani Z, Lafferty J (2003) Semisupervised learning using Gaussian fields and harmonic functions. In: ICML, vol 3, pp. 912–919
 Zhuang L, Gao H, Lin Z, Ma Y, Zhang X, Yu N (2012) Nonnegative low rank and sparse graph for semisupervised learning. In: IEEE conference on computer vision and pattern recognition (CVPR), 2012. IEEE, New York, pp 2328–2335
 Zuo W, Zhang D, Yang J, Wang K (2006) BDPCA plus LDA: a novel fast feature extraction technique for face recognition. IEEE Trans Syst Man Cybern Part B 36(4):946–953View ArticleGoogle Scholar