News Schemes for Activity Recognition Systems Using  PCA-WSVM, ICA-WSVM, and LDA-WSVM

Abidine, M’hamed Bilal; Fergani, Belkacem

doi:10.3390/info6030505

Open AccessArticle

News Schemes for Activity Recognition Systems Using PCA-WSVM, ICA-WSVM, and LDA-WSVM

by

M’hamed Bilal Abidine

^* and

Belkacem Fergani

Laboratoire d'Ingénierie des Systèmes Intelligents et Communicants, Faculty of Electronics and Computer Sciences, University of Science and Technology Houari Boumediene (USTHB), 32, El Alia, Bab Ezzouar, 16111 Algiers, Algeria

^*

Author to whom correspondence should be addressed.

Information 2015, 6(3), 505-521; https://0-doi-org.brum.beds.ac.uk/10.3390/info6030505

Submission received: 9 June 2015 / Revised: 13 August 2015 / Accepted: 18 August 2015 / Published: 20 August 2015

(This article belongs to the Special Issue Selected Papers from MedICT 2015)

Download

Browse Figures

Versions Notes

Abstract

:

Feature extraction and classification are two key steps for activity recognition in a smart home environment. In this work, we used three methods for feature extraction: Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Linear Discriminant Analysis (LDA). The new features selected by each method are then used as the inputs for a Weighted Support Vector Machines (WSVM) classifier. This classifier is used to handle the problem of imbalanced activity data from the sensor readings. The experiments were implemented on multiple real-world datasets with Conditional Random Fields (CRF), standard Support Vector Machines (SVM), Weighted SVM, and combined methods PCA+WSVM, ICA+WSVM, and LDA+WSVM showed that LDA+WSVM had a higher recognition rate than other methods for activity recognition.

Keywords:

activity recognition; principal component analysis; independent component analysis; linear discriminant analysis; weighted support vector machines

1. Introduction

Activity recognition is one of the most important tasks in pervasive computing applications [1,2,3,4]. Research in human activity recognition is aimed to determine a human user’s activity, such as cooking, brushing, dressing, sleeping, and so on. Therefore, different types of sensors have been used to sense user’s activities in smart environments.

Sensor data collected needs to be analyzed using machine learning and pattern recognition techniques [5,6] to determine which activities is taking place by the dweller. As for any pattern recognition task, the keys to successful activity recognition are: (i) appropriately designed feature extraction of the sensor data; and (ii) the design of suitable classifiers to infer the activity. The learning of such models is usually done in a supervised manner and requires a large annotated dataset recorded in different settings [1,2,3].

The existing activity recognition algorithms suffer from two problems: the non informative of the feature space and the imbalanced data result in a degradation of the performance of activity recognition. Thus, feature extraction [7] preprocessing steps exist to extract a subset of new features from the original set by providing a better selection of relevant features of high-dimensional data, as well as high discrimination between classes. In this paper an attempt has been made to study three feature extraction methods, which are Principal Component Analysis (PCA) [8], Independent Component Analysis (ICA) [9], and Linear Discriminant Analysis (LDA) [10], and their relevance to improve the classification accuracy of the existing activity recognition systems.

Another problem affecting the performance of an algorithm’s activity classification is the imbalanced data [11,12]. Activity recognition datasets are generally imbalanced, meaning certain activities occur more frequently than others (e.g., sleeping is generally done once a day, while toileting is done several times a day). This can negatively influence the learning process due to the known effect of minority class, which, in turn, imbalances the outcome, and may yield disastrous consequences for human activity recognition systems. This motivates extensive research that aims to improve the effectiveness of SVM on imbalanced classification in the activity recognition field [13,14]. Especially, approaches for addressing the imbalanced training-data problem can be categorized into two main streams: data processing approach and algorithmic approach [15,16,17].

The first approach is to preprocess the data either randomly or intelligently, by using undersampling the majority instances [16] or oversampling the minority instances [15]. In this paper, we consider the algorithmic approach in the following because it keeps all the information and does not change the distribution of training data. The solutions of this approach include the cost-sensitive learning [18,19] that treats different misclassifications using the weights assigned to data in order to pursue a high accuracy of classification.

Our paper addresses these issues and contributes on the following topics. Firstly, we have presented new schemes using PCA+WSVM, ICA+WSVM, and LDA+WSVM to recognize activities of daily living from binary sensor data. The Weighted Support Vector Machine (WSVM) [9] was employed to handle the imbalanced classification data problem using three methods, independently, for feature extraction: PCA, ICA, and LDA. Secondly, the proposed approaches are assessed and compared with the Conditional Random Fields (CRF) [20], the standard SVM, and Weighted SVM methods. Especially, CRF has recently gained popularity in the activity recognition field [1,3]. The experiments were implemented on multiple annotated real-world datasets from sensor readings in different houses [21,22].

2. Proposed Strategy Based Activity Recognition System

Despite its popularity in machine learning, the SVM technique has not been extensively used in activity recognition studies as pointed out in [23,24,25,26]. However, by the high accuracy rates obtained in other contexts, this would suggest possible success in activity recognition. Nevertheless, it is overwhelmed by the majority class instances in the case of imbalanced datasets. The weighted Support Vector Machine (WSVM) technique has been suggested as a candidate solution for such a purpose because it uses an efficient training approach that will improve its ability to learn from a large or imbalanced data set and, therefore, improve the performances of multi-class classifier SVM.

In this paper, a new activity recognition scheme is proposed; the WSVM method was applied for imbalanced classification using three methods, independently, for feature extraction: PCA, ICA, and LDA, as shown in Figure 1. PCA aims to eliminate the redundancy information. ICA estimates components as statistically independent as possible. LDA improves the separability of samples in the subspace and extracts LDA features. Then, these transformed (lower-dimensional space) datasets by each feature extraction method will be used for learning and testing a WSVM classifier. The outcome of the trained WSVM will then be used to process a new observations during the testing phase, where the associated activities of daily living class will be predicted.

Figure 1. Scheme of the proposed strategy based activity recognition system.

2.1. Feature Extraction Methods

Suppose

X = {x_{i}, i = 1, 2 ..., m}

are sets of training data with

x_{i} \in R^{n}

, m is the total of samples, n is sample’s feature dimension, N is the total of classes. Projected sample is:

x_{i} \in R^{p}

(p < n).

2.1.1. Principal Component Analysis (PCA)

Principal component analysis [8] is a projection-based technique that approximates the original data with lower dimensional feature vectors through the construction of uncorrelated principal components that are a linear combination of the original variables. However, PCA is ignorant of the class labels attached to the data, so a good class separation in the direction of the high variance principal components is not guaranteed [8]. The main process of PCA is as follows.

In PCA, data matrix

X \in R^{m * n}

are first centered

x \Leftarrow x - \bar{x}

with

\bar{x}

is the mean of the samples. Then PCA diagonalizes the covariance matrix as

C o v_{(X)} = \frac{1}{m - 1} \sum_{i = 1}^{m} (x_{i} - \bar{x}) {(x_{i} - \bar{x})}^{T} with \bar{x} = \frac{1}{m} \sum_{i = 1}^{m} x_{i}

(1)

This problem leads to solve the eigenvalue equation

λV = Cov_(X) V, ||V|| = 1

(2)

where V = [v₁, v₂, …, v_i], (i = 1, …, n) is the n × n matrix containing n eigenvectors and λ is an n × n diagonal matrix of eigenvalues of the covariance matrix. In Equation (2), each n dimensional eignvector v_i corresponds to the ith eigenvalue λ_i. The variance in any direction v_i can be measured by dividing the associated eigenvalue λ_i by the sum of the n eigenvalues. The first p principal components are selected as principal components which will be used for classification when their accumulative contributive rate is:

w = \frac{\sum_{j = 1}^{p} λ_{j}}{\sum_{j = 1}^{n} λ_{j}} \geq T h r e s h o l d = 0.85

(3)

2.1.2. Independent Component Analysis (ICA)

The most commonly used method for generating spatially-localized features is independent component analysis (ICA) to produce basis vectors that are statistically independent (not just linearly decorrelated, as with PCA) [9]. The algorithm works on the principle of minimizing mutual information between the variables; minimizing mutual information is the correct criteria for judging independence. Additionally, minimizing mutual information is same as maximizing entropy.

The ICA model can also be written as:

U = WX

(4)

Based on the knowledge of informatics, negentropy of U can be used as the criteria to estimate the independency of vectors, which is approximated by using the contrast function [27]:

J_{G} (w_{i}) = {[E {G (w_{i}^{T} X)} - E {G (V)}]}^{2}

(5)

where V is the standardized Gaussian random variable (zero mean and unit variance). G is a non-quadratic function, the commonly used G can be:

G_{1} (u) = \frac{1}{α_{1}} \log \cosh α_{1} u

(6)

G_{2} (u) = - \exp (- u^{2} / 2)

(7)

where

1 \leq α_{1} \leq 2

is some suitable constant.

Maximizing formula in Equation (5) leads to estimating

w_{i}

by

w_{i}^{*} = E {X g (w_{i}^{T} X)} - E {g^{'} (w_{i}^{T} X)} w_{i}

(8)

w_{i} = \frac{w_{i}^{*}}{‖ w_{i}^{*} ‖}

(9)

where

w_{i}^{*}

is a new estimated value of

w_{i}

. g and g' are respectively the first and second derivatives of G. Based on the maximal negentropy principal, the whole matrix W can be computed by maximizing the sum of one-unit contrast function and taking into account the constraint of decorrelation [27]. In practice, ICA can often uncover disjoint underlying trends in multi-dimensional data.

2.1.3. Linear Discriminant Analysis (LDA)

The aim of LDA is to find the optimal projection matrix

W_{o p t} \in R^{n \times p}

using the Fisher criterion below, to find the maximum of ratio of between-class scatter S_B to the within-class scatter S_W of the projected samples:

J (W_{o p t}) = argmax \frac{‖ W^{T} S_{B} W ‖}{‖ W^{T} S_{W} W ‖}

(10)

where the between and within class covariance S_B and S_W are defined as:

S_{B} = \sum_{i = 1}^{N} p_{i} ({\bar{x}}_{i} - \bar{x}) {({\bar{x}}_{i} - \bar{x})}^{T}

(11)

S_{W} = \frac{1}{m} \sum_{i = 1}^{N} \sum_{x \in C_{i}} (x - {\bar{x}}_{i}) {(x - {\bar{x}}_{i})}^{T}

(12)

where

p_{i} = m_{i} / m

is priori probability of each class, m_i is the number of training samples of the ith class,

{\bar{x}}_{i}

is the mean of the ith class, and

\bar{x}

is the overall mean vector.

To maximize (10), the optimal W_opt are the eigenvectors associated with the largest eigenvalues of the following generalized eigenvalue problem:

S_B w_i = λ_i S_W w_i

(13)

The solution can be computed by solving the leading eigenvectors of

S_{W}^{- 1} S_{B}

that correspond to the eigenvalue λ_i. Then column vectors w_i are row vectors in the transformation matrix W. It should be noted that only those eigenvectors should be selected that correspond to eigenvalues carrying most of the energy, i.e., the total dispersion. Another interesting property is that this transform decorrelates both S_B and S_W matrices. The rank of S_B is at most the N-1, and hence no more than this number of new features can be obtained.

2.2. Weighted Support Vector Machines (WSVM)

A SVM classifier is more insensitive to the problem of learning from imbalanced data. It considers a balanced training set using the same cost parameter C of different classes; this may generate suboptimal classification models. The SVM optimization primal problem is given as follows:

\begin{array}{l} \min_{w, b, ξ} 1 / 2 K (w, w) + C \sum_{i = 1}^{m} ξ_{i} \\ s u b j e c t t o y_{i} (w^{T} φ (x_{i}) + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0, i = 1, ..., m \end{array}

(14)

The Weighted Support Vector Machine (WSVM) was presented to deal with this problem by introducing two different cost parameters

C^{+}

and

C^{-}

in the SVM optimization primal problem [5] for the majority classes (y_i = +1) and minority (y_i = −1), as given in Equation (15) below:

\begin{array}{l} \min_{w, b, ξ} 1 / 2. K (w, w) + C^{+} \sum_{y_{i} = 1}^{m +} ξ_{i} + C^{-} \sum_{y_{i} = - 1}^{m -} ξ_{i} \\ s u b j e c t t o y_{i} (w^{T} φ (x_{i}) + b) \geq 1 - ξ_{i}, ξ_{i} \geq 0, i = 1, ..., m \end{array}

(15)

The dual optimization problem of WSVM with different constraints on

α_{i}

can be solved in the same way as solving the standard SVM optimization problem [5], which has the following dual form:

\begin{matrix} \max_{α_{i}} \sum_{i = 1}^{m} α_{i} - \frac{1}{2} \sum_{i = 1}^{m} \sum_{j = 1}^{m} α_{i} α_{j} y_{i} y_{j} K (x_{i}, x_{j}) \\ Subject to \sum_{i = 1}^{m} α_{i} y_{i} = 0, \\ 0 \leq α_{i} \leq C_{+}, if y_{i} = + 1, and \\ 0 \leq α_{i} \leq C_{-}, if y_{i} = - 1 \end{matrix}

(16)

where m⁺ and m⁻ are number of samples from +1 and −1 classes.

C^{+}

and

C^{-}

are the cost parameters for positive and negative classes, respectively, to construct a classifier for multiple classes. They are used to control the trade-off between margin and training error. Some authors [19,28,29] have proposed adjusting different cost parameters for different classes of data, which effectively improves the low classification accuracy caused by imbalanced samples. Veropoulos et al. in [19] proposed to increase the trade-off associated with the minority class (i.e.,

C^{-} > C^{+}

) to eliminate the effect of class imbalance. However they did not suggest any guidelines to decide what the regularization factors should be. The coefficients are typically chosen as [30]:

C^{+} = C \times w^{+}

(17)

C^{-} = C \times w^{-}

(18)

When the two classes which request different sample size have the similar properties boundary (that is, the ratio of vectors supported by each class and their total sample size is equal, or these two classes have similar error rate), Hong Gunn Chew and others [30] took a detailed analysis for the reasons of classification accuracy caused by the size of the class in the SVM algorithm, and put forward the corresponding solutions. They obtained the following conclusions like this:

\frac{w^{+}}{w^{-}} = \frac{m^{-}}{m^{+}}

(19)

where C is the common cost coefficient for both classes in Equations (17) and (18), w⁺ and w⁻ are the weights for +1 and −1 class respectively. In this paper, the weights are typically chosen as w⁺ = 1 and w⁻ = m⁺/m⁻ for two-class WSVM. This criterion respects this reasoning that is to say that the tradeoff

C^{-}

associated with the smallest class is large in order to improve the low classification accuracy caused by imbalanced samples. The modified SVM algorithm would not tend to skew the separating hyperplane towards the minority class examples to reduce the total misclassifications as the minority class examples are now assigned with a higher misclassification cost.

For multiclass imbalanced data classification, we used different misclassification penalties per class. Typically the smallest class gets weighed higher. It allows the user to set individual weights for individual training examples, which are then used in WSVM training. We give the main ratio cost value C_i for each class i (1, …, N) in the function of the class prior probabilities P(C₊) and P(C_i) for the C₊ et C_i classes, respectively; it is given by:

C_{i} = C \times w_{i} where w_{i} = [\frac{P (C_{+})}{P (C_{i})}]

(20)

We estimate each class prior probability P(C_i) as the proportion of the number of samples in class i to the total number of training samples as follow:

p (C_{i}) = \frac{m_{i}}{\sum_{i} m_{i}}

(21)

Based on the above equation, the corresponding cost criterion in feature space can be given as follows:

C_{i} = C \times [m_{+} / m_{i}], i = 1, ..., N

(22)

where

m_{+}

is the number of samples of majority classes and m_i is the number of samples of the other classes. C is the common ratio misclassification cost factor of the WSVM. The search of the optimal value of the regularization parameter C is determined with the cross validation method. Where [ . ] is integer part of the quantity under square bracket. Notice that it always holds that

C_{i} \geq C

.

In this study, a software package LIBSVM [31] was used to implement the multiclass classifier algorithm. It uses the one-versus-one method (OVO) [5]. OVO method consists in constructing

N (N - 1) / 2

classifiers and each one is trained on data from the two activity classes. When all

N (N - 1) / 2

classifiers are constructed, a voting strategy is used for the test. The point is predicted in the class with the largest number of votes (‘‘Max Wins’’ strategy). Chen et al. [32] discussed issues of using the same or different parameters for the

N (N - 1) / 2

two-class problems. Their preliminary results show that both approaches give similar accuracy.

3. Experimental Setup and Results

3.1. Datasets

To evaluate the performance of our experimentations, we used different annotated datasets using different sensor networks in a pervasive environment [21,22]. The details of all the datasets are shown in Table 1. Each network was installed in a different home setting and was composed by a different number of sensors nodes. These sensors were installed in everyday objects such as doors, cupboards, refrigerator, and toilet flush to record activation/deactivation events (opening/closing events) as the subject carried out everyday activities. The sensor data were labeled using different annotation methods.

Table 1. House settings description.

**Table 1.** House settings description.
Houses	TK26M	TK57M	TAP30F	TAP80F
Age	26	57	30	80
Gender	Male	Male	Female	Female
Annotation	Bluetooth headset	Handwritten diary	PDA	PDA
Duration	28 days	18 days	16 days	14 days
Sensors	14	21	77	84

A list of activities that were annotated for all datasets with the number of observations of each activity can be found in Table 2. Any period of time at which no activity took place was labelled “Idle”. This table clearly shows how some activities occur very frequently (e.g., “toileting”), while others that occur less frequently have a longer duration (e.g., “leaving” and “sleeping”). Therefore, the datasets suffer from a severe class imbalance problem due to the nature of the data.

Table 2. Annotated list of activities for each house and the number of observations of each activity. The bold letters represent each activity.

**Table 2.** Annotated list of activities for each house and the number of observations of each activity. The bold letters represent each activity.
TK26M	TK57M	TAP30F	TAP80F
Idle₍₄₆₂₇₎	Idle₍₂₇₃₂₎	Idle_(19025)	Idle_(17673)
Leaving_(22617)	Leaving_(11993)	Leaving₍₈₇₎	Toileting₍₆₃₀₎
Toileting₍₃₈₀₎	Eating₍₃₇₆₎	Toileting₍₇₇₆₎	Take medication₍₁₈₅₎
Showering₍₂₆₅₎	Toileting₍₂₄₃₎	Bathing₍₄₅₉₎	Prep.breakfast₍₄₆₆₎
Sleeping_(11601)	Showering₍₁₉₁₎	Grooming₍₄₈₄₎	Prep.lunch₍₈₄₃₎
Breakfast₍₁₀₉₎	Brush teeth₍₁₀₂₎	Dressing₍₁₄₉₎	Prep.dinner₍₅₀₆₎
Dinner₍₃₄₈₎	Shaving₍₆₇₎	Prep.breakfast₍₂₃₃₎	Prep.snack₍₃₂₀₎
Drink₍₅₉₎	Sleeping₍₇₇₃₈₎	Prep.lunch₍₆₇₆₎	Washing dishes₍₃₂₈₎
	Dressing₍₁₁₂₎	Prep.dinner₍₁₇₈₎	Watching TV₍₇₁₇₎
	Medication₍₁₆₎	Prep.snack₍₁₃₇₎	Listen music₍₁₁₀₀₎
	Breakfast₍₇₃₎	Preparing a beverage₍₁₆₅₎
	Lunch₍₆₂₎	Washing dishes₍₆₈₎
	Dinner₍₂₉₁₎	Cleaning₍₁₈₆₎
	Snack₍₂₄₎	Doing laundry₍₂₄₆₎
	Drink₍₃₄₎
	Relax₍₂₄₃₅₎

3.2. Setup

The models were validated by splitting the original data into a test and a training set using a “Leave One Day Out cross validation” approach, retaining one full day of sensor readings for testing and using the remaining sub-samples as training data. The process is then repeated for each day and the average performance measure reported.

Sensor outputs are binary and represented in a feature space which is used by the model to recognize the activities performed. The vector contained one entry for each sensor, two-state sensors 0 or 1 are used and the features are the states of all sensors. The raw sensor representation uses the sensor data in the same way it was received from the sensor network. The value is 1 when the sensor is active and 0 otherwise. We do not use the raw sensor data representation as observations; instead we use the combining “Change point” and “Last” representation which have been shown to give much better results in activity recognition [3].

In learning imbalanced data, the overall classification accuracy is not considered an appropriate measure of performance. Due to the fact that, in our case, we evaluate the models using F-Measure, a measure that considers the correct classification of each class is equally important. It is calculated from the precision and recall scores. We are dealing with a multi-class classification problem and therefore define the notions of true positive (TP), false negatives (FN), and false positives (FP) for each class separately. With highly-skewed data distribution, the overall accuracy metric at (23) is not sufficient anymore. It does not take into account differences in the frequency of activities. These measures are calculated as follows:

A c c u r a c y = \frac{\sum_{i = 1}^{N} T P_{i}}{T o t a l} \times 100 %

(23)

P r e c i s i o n = \frac{1}{N} \sum_{i = 1}^{N} [\frac{{TP}_{i}}{{TP}_{i} + {FP}_{i}}] \times 100 %

(24)

R e c a l l = \frac{1}{N} \sum_{i = 1}^{N} \frac{{TP}_{i}}{{TP}_{i} + {FN}_{i}} \times 100 %

(25)

F - M e a s u r e = \frac{2 . P r e c i s i o n . R e c a l l}{P r e c i s i o n + R e c a l l} \times 100 %

(26)

3.3. Results

In our experiments, the SVM algorithm is tested with a LibSVM implementation [31]. It was used to implement the one-versus-one multiclass classifier [5]. We used the radial basis kernel function as follows:

K (x, y) = \exp (\frac{- 1}{2 σ^{2}} {‖ x - y ‖}^{2})

. Firstly, we optimized the SVM hyper-parameters (σ, C) for all training sets in the range (0.1–2) and [0.1, 1, 10, 100], respectively, to maximize the class accuracy of the leave-one-day-out cross validation technique. The best pair parameters (σ_opt, C_opt) = (1.7, 1), (2, 1), (1.4, 1), and (1.2, 1) are used for the datasets TK26M, TK57M, TAP30F, and TAP80F respectively. Then, locally, we optimized the cost parameter C_i, adapted for each activity class by using WSVM classifier with the common cost parameter is fixed C = 1, see Table 3, Table 4, Table 5 and Table 6.

Table 3. Selection of the weights w_i using TK26M dataset.

**Table 3.** Selection of the weights w_i using TK26M dataset.
Activity	Id	Le	To	Sh	Sl	Br	Di	Dr
w_i	5	1	61	88	2	216	73	419

Table 4. Selection of the weights w_i using TK57M dataset.

**Table 4.** Selection of the weights w_i using TK57M dataset.
Activity	Id	Le	Ea	To	Sho	B.t	Sha	Sl	Dre
w_i	4	1	32	50	63	118	179	2	107
	Me	Br	Lu	Di	Sn	Dri	Re	-	-
	749	164	193	41	500	375	5	-	-

Table 5. Selection of the weights w_i using TAP30F dataset.

**Table 5.** Selection of the weights w_i using TAP30F dataset.
Activity	Id	Le	To	Ba	Gr	Dr	P.b	P.l	P.d
w_i	1	220	24	40	38	126	82	28	101
	P.s	P.b	W.d	Cl	D.l	-	-	-	-
	131	128	307	96	73	-	-	-	-

Table 6. Selection of the weights w_i using TAP80F dataset.

**Table 6.** Selection of the weights w_i using TAP80F dataset.
Activity	Id	To	T.m	P.b	P.l	P.d	P.s	W.d	W.TV	L.m
w_i	1	30	92	38	21	36	72	53	32	17

We reported in Figure 2 and Figure 3 the selected features using PCA and LDA for all datasets. The summary of the performance measures obtained for all classifiers are presented in Table 7. For CRF results on these datasets, refer to [3,33,34]. ICA differs from PCA in the fact that the low-dimensional signals do not necessarily correspond to the directions of maximum variance. We start with the first independent component and keep increasing the number until the cross-validation error reduces.

After the selection of the best parameters, we evaluated the performance of different algorithms using appropriate metrics for imbalanced classification. The classification results for CRF, SVM, WSVM, PCA+WSVM, ICA+WSVM, and LDA+WSVM are summarized in Table 7 below.

Figure 2. Feature selection by Principal Component Analysis (PCA).

Figure 3. Feature selection by Linear Discriminant Analysis (LDA).

Table 7. Recall (Rec.), Precision (Prec.), F-measure (F), and Accuracy (Acc.) results for all methods. The values are percentages.

**Table 7.** Recall (Rec.), Precision (Prec.), F-measure (F), and Accuracy (Acc.) results for all methods. The values are percentages.
Dataset	Classifier	Rec.	Prec.	F	Acc.
TK26M	CRF [3]	70.8	74.4	72.5	95.6
	SVM	61.8	73.3	67.0	95.5
	WSVM	72.8	74.6	73.7	92.5
	PCA+WSVM	71.5	71.5	71.5	91.2
	ICA+WSVM	71.2	73.3	72.2	92.7
	LDA+WSVM	77.0	78.4	77.7	93.5
TK57M	CRF [33]	30.0	36.0	33.0	78.0
	SVM	35.6	34.9	35.2	80.8
	WSVM	40.8	37.8	39.2	77.1
	PCA+WSVM	36.5	34.2	35.3	76.9
	ICA+WSVM	36.2	38.1	37.1	76.6
	LDA+WSVM	42.3	39.8	41.0	77.2
TAP30F	CRF [34]	26.3	31.9	28.8	83.7
	SVM	22.3	34.0	26.9	83.3
	WSVM	30.8	30.6	30.7	23.8
	PCA+WSVM	32.1	31.6	31.8	20.8
	ICA+WSVM	30.4	28.7	29.5	21.7
	LDA+WSVM	38.2	52.9	44.3	33.8
TAP80F	CRF [34]	27.1	29.5	28.2	77.2
	SVM	15.2	30.0	20.1	75.6
	WSVM	29.2	29.4	29.3	28.7
	PCA+WSVM	29.6	29.4	29.5	22.4
	ICA+WSVM	26.5	27.9	27.2	22.1
	LDA+WSVM	38.7	45.7	41.9	28.7

This table shows that LDA+WSVM method gives a clearly better F-measure performance, while CRF and SVM methods perform better in terms of accuracy for all datasets. As can be noted in this table, LDA outperforms PCA and ICA for recognizing activities with a WSVM classifier for all datasets. The PCA+WSVM method improves the classification results compared to CRF, SVM, WSVM, and ICA+WSVM for the TAP30F and TAP80F datasets, compared to other datasets.

The Figure 4 and Figure 5 give the classification results in terms of the accuracy measure for each activity with WSVM, PCA+WSVM, ICA+WSVM, and LDA+WSVM methods.

In Figure 4, for WSVM, PCA+WSVM, LDA+WSVM models, the minority activities “Toileting”, “Showering”, and the kitchen activities “Breakfast” and “Drink” are significantly better detected, compared to other methods. LDA+WSVM is an effective method for recognizing activities. The majority activities are better for all methods, while the “Idle” activity is more accurate for the LDA+WSVM method.

Figure 4. Accuracy for each activity on TK26M dataset.

Figure 5. Accuracy for each activity on TAP80F dataset.

We can see in Figure 5 that the minority activities (“Toileting”, “Washing dishes”, “Watching TV”, “Listen music”, and the kitchen activities “Prep.Lunch”, “Prep.Snack”) are better recognized with LDA-WSVM. Additionally, the kitchen activities perform worst for all datasets. They are, in general, hard to recognize but they are better recognized with LDA-WSVM compared to others methods.

3.4. Discussion

Based on the experiments carried out in this work, a number of conclusions can be drawn. Using experiments on large real-world datasets, we showed the F-measure obtained with TK26M dataset is better compared to other datasets for all recognition methods because the TK57M, TAP30F, and TAP80F datasets include more activity classes. We supposed that the use of a hand-written diary in the TK57M house and PDA in TAP30F and TAP80F houses for annotating data is less accurate than using the Bluetooth headset as in TK26M house. For the TK26M dataset, a Bluetooth headset was used which communicated with the same server the sensor data was logged on. This means the timestamps of the annotation were synchronized with the timestamps of the sensors. In TK57M activity diaries were used, this is more error-prone because times might not always be written down correctly and the diaries have to be typed over afterwards.

In this section, we explain the difference in terms of performance between different recognition methods for imbalanced dataset. Our experimental results show that WSVM and LDA+WSVM methods work better for classifying activities; they consistently outperform the other methods in terms of the accuracy of the minority classes. In particular, LDA-WSVM is the best classification method for all datasets because the LDA method is more adapted for the features reduction in the datasets with consideration the discrimination between classes.

PCA-WSVM outperforms CRF, SVM, WSVM, and ICA-WSVM for TAP30F and TAP80F datasets. In other datasets ICA-WSVM surpasses PCA-WSVM. We conclude that the PCA method is more adapted for the features extraction in the datasets with large features vectors.

A multiclass SVM classifier does not take into consideration the differences (costs) between the class distributions during the learning process and optimizes with the cross-validation research the same cost parameter C for all classes. Not considering the weights in SVM formulation affects the classifiers’ performances and favors the classification of majority activities (“Idle”, “Leaving” and “Sleeping”). Although WSVM, including the individual setting of parameter C for each class, is significantly more effective than CRF and SVM methods, WSVM is not efficient compared to LDA+WSVM. The LDA method significantly improves the performance of the WSVM classifier. Thus, it follows that LDA-WSVM can be made more robust for classifying human activities.

The recognition of the minority activities in TK26M as “Toileting”, “Showering”, “Breakfast” “Dinner”, and “Drink” is lower compared to “Leaving” and “Sleeping” activities. This is mainly due to the fact that the minority activities are less represented in the training dataset. However, the activities “Idle” and the three kitchen activities gave the worst results compared to the others activities. Most confusion occurs between the “Idle” activity and the kitchen activities. In particular, the “Idle” is one of the most frequent activities but is usually not a very important activity to recognize. It might, therefore, be preferable to lose accuracy on this activity if it allows a better recognition of minority classes.

The kitchen activities are food-related tasks, they are worst recognized for all methods because most of the instances of these activities were performed in the same location (kitchen) using the same set of sensors. In other words, it is observed that groups of similar activities are more separable if performed in different locations. For example, “Toileting” and “Showering” are more separable because they are in two different locations in the TK26M dataset. Therefore, the location of the sensors is of great importance for the performance of the recognition system.

4. Conclusions

In this paper, we have proposed a combination of PCA, ICA, and LDA methods and a Weighted SVM prediction model to recognize activities of daily living from home environments using a network of binary sensors. The proposed scheme shows two merits:

(1): After the feature extraction step with PCA, ICA, and LDA, the most significant components to the set of features extracted are obtained, the training set is reduced, and the prediction accuracy is improved.
(2): The multi-class Weighted SVM classifier, as the latter processor, has good generalization performance in imbalanced human activity datasets.

Experimental results show that the LDA-WSVM learning method produces interesting results for activity recognition success. This model is effective to classify multiclass sensory data over techniques such as CRF, SVM, WSVM, PCA-WSVM, and ICA+WSVM. In all datasets LDA-WSVM has the highest F-measure metric, while CRF and SVM models produced high accuracy. This is due to the fact that CRF and SVM are more sensitive to overfitting on a dominant class than other methods. Finally, we observed that differences in the layout of houses and the way a dataset was annotated used for training the models can greatly affect the performance in activity recognition. In this work, we have used the offline inference. The activities could only be inferred when a full day has passed. It would also be interesting to perform the LDA+WSVM method in online inference, which is significantly harder; however, it is necessary for specific applications.

Author Contributions

M’hamed Bilal Abidine: Main writing and also analyzing and improving the proposed approach; Belkacem Fergani: Total supervision of the work, review, and comments. Both authors have read and approved the final manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

Tapia, E.M.; Intille, S.S.; Larson, K. Activity recognition in the home using simple and ubiquitous sensors. In Pervasive Computing, Proceedings of the Second International Conference, PERVASIVE 2004, Linz/Vienna, Austria, 21–23 April 2004; Springer: Berlin/Heidelberg, Germany Lecture Notes in Computer Science. ; Volume 3001, pp. 158–175.
Logan, B.; Healey, J.; Philipose, M.; Tapia, E.M.; Intille, S.S. A long-term evaluation of sensing modalities for activity recognition. In Proceedings of the 9th International Conference on Ubiquitous Computing (Ubicomp ’07), Innsbruck, Austria, 16–19 September 2007; pp. 483–500.
Van Kasteren, T.L.M.; Noulas, A.; Englebienne, G.; Kröse, B.J. Accurate activity recognition in a home setting. In Proceedings of the 10th International Conference on Ubiquitous Computing (UbiComp ’08), Seoul, Korea, 21–24 September 2008; pp. 1–9.
Fleury, A.; Vacher, M.; Noury, N. SVM-based multimodal classification of activities of daily living in health smart homes: Sensors, algorithms, and first experimental results. IEEE Trans. Inf. Technol. Biomed. 2010, 14, 274–283. [Google Scholar] [CrossRef] [PubMed]
Vapnik, V.N. The Nature of Statistical Learning Theory; Springer: New York, NY, USA, 2000. [Google Scholar]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: New York, NY, USA, 2006. [Google Scholar]
Guyon, I.; Gunn, S.; Nikravesh, M.; Zadeh, L. Feature Extraction: Foundations and Applications; Springer: Berlin/Heidelberg, Germany; New York, NY, USA, 2006. [Google Scholar]
Jolliffe, I.T. Principal Component Analysis; Springer: New York, NY, USA, 2002. [Google Scholar]
Comon, P. Independent component analysis—A new concept? Signal Process. 1994, 36, 287–314. [Google Scholar] [CrossRef] [Green Version]
Hastie, T.; Tibshirani, R.; Friedman, J.H. The Elements of Statistical Learning; Springer: Heidelberg, Germany; New York, NY, USA, 2001. [Google Scholar]
Chawla, N.V. Data mining for imbalanced datasets: An overview. In Data Mining and Knowledge Discovery Handbook; Springer: New York, NY, USA, 2010; pp. 875–886. [Google Scholar]
Van Hulse, J.; Khoshgoftaar, T.M.; Napolitano, A. Experimental perspectives on learning from imbalanced data. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 935–942.
Cao, H.; Nguyen, M.; Phua, C.; Krishnaswamy, S.; Li, X. An Integrated Framework for Human Activity Classification. In Proceedings of the 14th ACM International Conference on Ubiquitous Computing (UbiComp ’12), Pittsburgh, PA, USA, 5–8 September 2012; pp. 331–340.
Abidine, M.B.; Fergani, B.; Clavier, L. Importance-Weighted the Imbalanced data for C-SVM Classifier to Human Activity Recognition. In Proceedings of the 8th International Workshop on Systems, Signal Processing and their Applications (WOSSPA’13), Algiers, Algeria, 12–15 May 2013; pp. 330–335.
Chawla, N.; Bowyer, K.; Hall, L.; Kegelmeyer, P. SMOTE: Synthetic Minority Over-sampling Technique. J. Rtificial Intell. Res. 2002, 16, 321–357. [Google Scholar]
Vilarino, F.; Spyridonos, P.; Vitrià, J.; Radeva, P. Experiments with SVM and stratified sampling with an imbalanced problem: Detection of intestinal contractions. In Proceedings of the 3rd ICAPR, Bath, UK, 22–25 August 2005; Springer: Berlin/Heidelberg, Germany Lecture Notes in Computer Science. ; Volume 3687, pp. 783–791.
Akbani, R.; Kwek, S.; Japkowicz, N. Applying support vector machines to imbalanced datasets. In Machine Learning: ECML’04, Proceedings of the 15th European Conference on Machine Learning, Pisa, Italy, 20–24 September 2004; pp. 39–50.
Huang, Y.M.; Du, S.X. Weighted support vector machine for classification with uneven training class sizes. In Proceedings of the 4th IEEE International Conference on Machine Learning and Cybernetics, Guangzhou, China, 18–21 August 2005; pp. 4365–4369.
Osuna, E.; Freund, R.; Girosi, F. Support Vector Machines: Training and Applications; Technical Report A.I. Memo No. 1602; Massachusetts Institute of Technology: Cambridge, MA, USA, 1997. [Google Scholar]
Vail, D.L.; Veloso, M.M.; Lafferty, J.D. Conditional Random Fields for Activity Recognition. In Proceedings of the 6th International Joint Conference on Autonomous Agents and Multiagent Systems (AAMAS 2007), Honolulu, HI, USA, 14–18 May 2007. [CrossRef]
Van Kasteren, T.L.M. Datasets for Activity Recognition. Available online: http://sites.google.com/site/tim0306/ (accessed on 9 February 2012).
Tapia, E.M. Activity Recognition in the Home Setting Using Simple and Ubiquitous Sensors. Available online: http://courses.media.mit.edu/2004fall/mas622j/04.projects/home/ (accessed on 3 April 2013).
Abidine, M.B.; Fergani, B. Evaluating C-SVM, CRF and LDA classification for daily activity recognition. In Proceedings of the 2012 International Conference on Multimedia Computing and Systems (ICMCS), Tangier, Morocco, 10–12 May 2012; pp. 272–277.
Banos, O.; Damas, M.; Pomares, H.; Prieto, A.; Rojas, I. Daily living activity recognition based on statistical feature quality group selection. Expert Syst. Appl. 2012, 39, 8013–8021. [Google Scholar] [CrossRef]
Manosha Chathuramali, K.G.; Rodrigo, R. Faster Human Activity Recognition with SVM. In Proceedings of the 2012 IEEE International Conference on Advances in ICT for Emerging Regions (ICTer), Colombo, Sri Lanka, 12–15 December 2012; pp. 197–203.
Palaniappan, A.; Bhargavi, R.; Vaidehi, V. Abnormal Human Activity Recognition Using SVM Based Approach. In Proceedings of the 2012 International Conference on Recent Trends in Information Technology (ICRTIT), Chennai, India, 19–21 April 2012; pp. 97–102.
Hyvärinen, A.; Oja, E. Independent component analysis: Algorithms and applications. Neural Netw. 2000, 13, 411–430. [Google Scholar] [CrossRef]
Veropoulos, K.; Campbell, C.; Cristianini, N. Controlling the Sensitivity of Support Vector Machines. In Proceedings of the International Joint Conference on AI, Stockholm, Sweden, 31 July–6 August 1999; pp. 55–60.
Cao, P.; Zhao, D.; Zaiane, O.R. An Optimized Cost-Sensitive SVM for Imbalanced Data Learning. In Advances in Knowledge Discovery and Data Mining; Pei, J., Tseng, V.S., Cao, L., Motoda, H., Xu, G., Eds.; Springer: New York, NY, USA, 2013; pp. 280–292. [Google Scholar]
Chew, H.G.; Crisp, D.J.; Bogner, R.; Lim, C.C. Target detection in radar imagery using support vector machines with training size biasing. In Proceedings of the 6th International Conference on Control, Automation, Robotics, and Vision (ICARCV), Singapore, Singapore, 5–8 December 2000; pp. 80–85.
Chang, C.C.; Lin, C.J. LIBSVM: A Library for Support Vector Machines. Available online: http://www.csie.ntu.edu.tw/~cjlin-/libsvm/ (accessed on 22 April 2010).
Chen, P.H.; Lin, C.J.; Scholköpf, B. A tutorial on nu-Support Vector Machines. Appl. Stoch. Models Bus. Ind. 2005, 21, 111–136. [Google Scholar] [CrossRef]
Van Kasteren, T.L.M.; Alemdar, H.; Ersoy, C. Effective Performance Metrics for Evaluating Activity Recognition Methods. In Proceedings of the ARCS 2011—24th International Conference on Architecture of Computing Systems, Comot, Italy, 24–25 February 2011; p. 10.
Van Kasteren, T.L.M.; Englebienne, G.; Kröse, B.J. An activity monitoring system for elderly care using generative and discriminative models. Pers. Ubiquitous Comput. 2010, 14, 489–498. [Google Scholar] [CrossRef]

© 2015 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abidine, M.B.; Fergani, B. News Schemes for Activity Recognition Systems Using PCA-WSVM, ICA-WSVM, and LDA-WSVM. Information 2015, 6, 505-521. https://0-doi-org.brum.beds.ac.uk/10.3390/info6030505

AMA Style

Abidine MB, Fergani B. News Schemes for Activity Recognition Systems Using PCA-WSVM, ICA-WSVM, and LDA-WSVM. Information. 2015; 6(3):505-521. https://0-doi-org.brum.beds.ac.uk/10.3390/info6030505

Chicago/Turabian Style

Abidine, M’hamed Bilal, and Belkacem Fergani. 2015. "News Schemes for Activity Recognition Systems Using PCA-WSVM, ICA-WSVM, and LDA-WSVM" Information 6, no. 3: 505-521. https://0-doi-org.brum.beds.ac.uk/10.3390/info6030505

Article Menu

News Schemes for Activity Recognition Systems Using PCA-WSVM, ICA-WSVM, and LDA-WSVM

Abstract

1. Introduction

2. Proposed Strategy Based Activity Recognition System

2.1. Feature Extraction Methods

2.1.1. Principal Component Analysis (PCA)

2.1.2. Independent Component Analysis (ICA)

2.1.3. Linear Discriminant Analysis (LDA)

2.2. Weighted Support Vector Machines (WSVM)

3. Experimental Setup and Results

3.1. Datasets

3.2. Setup

3.3. Results

3.4. Discussion

4. Conclusions

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI