In the first phase, we focused on the detection and prediction of actions of the elderly and in the second phase, through labeling normal and abnormal actions, we detect and predict anomalies by the use of a supervised learning algorithm.
3.3.1. Action Detection and Prediction Phase
3.3.1.1. Definitions
Action vs. activity: An action is a more detailed concept than activity. Each activity can include several actions. For example, “make some tea”, “make some coffee”, “make a sandwich”, and “wash the dishes” are separate actions, however, all of them can be related to the activity of meal preparation.
In this paper, most of our focus is on actions rather than activities.
Features: It should be noted that when an action happens, there are a set of environmental conditions and characteristics associated with it, which we refer to as features. Time and location are some well-known features. Some features are related to a specific action and some of them are common between different actions. For example, using an electrical device like a TV remote control is just related to the “watching TV” action but location as a feature can be seen in several actions like “making coffee”, “making tea”, “washing the dishes” and “making a sandwich”. We will show that selecting more detailed and dedicated features increases the chance of correct recognition of actions.
3.3.1.2. Supervised Learning
In this paper, we explore a supervised learning approach to detect and predict smart home residents’ actions. For this purpose, we use Random Forest Classifier (RFC), Gaussian Naive Bayes (GNB), Decision Tree (DT), and KNN Classifier models, since they are fast and easy to train.
A series of features are explicitly extractable from the dataset, while some other features can be extracted from the dataset implicitly.
Initially, we used the time duration feature from the dataset for action detection and prediction. We also considered two features of start time and end time for each action.
Since most actions usually occur in specific locations, we added the location feature for each action. We divided the home environment into the hypothetical regions including kitchen, bathroom, bedroom, restroom, office, living room, and hallway, and we added the corresponding location to each action.
We also considered the previous action as another feature. The previous action represents the action that precedes the current action. Finally, two other features were considered that were related to the use of water and electrical equipment. Regarding water usage, three codes were devised, one relating to the actions that necessarily require water usage, another relating to actions that do not require water, and the third code refering to the actions that we were not sure if they needed water or not. Subsequently, each code was assigned to the corresponding action. In terms of power consumption, two codes were considered, one for actions that use an electrical device and the other for actions that do not; then, the related code was assigned to the corresponding action.
3.3.2. Anomaly Detection Phase
At this stage of our work, for detecting and predicting anomalous actions, we used part of a eHealth dataset, which was related to elderly’s information by changing the level of dependency during one year.
Features taken from the previous phase that are used in this stage include start time, end time, and the duration of the action. In addition to the mentioned features, the start day of the action, the interval between the end of the previous action to the starting time of the current action, and the current performed action are also considered. Based on these features user profiles are made. An action with respect to the profile of the person is considered as an anomaly if:
- Its duration lasts more or less than normal.
- An unauthorized delay occurs between consecutive actions.
- The action starts or ends at times that are not expected.
- An invalid action occurs before the current action.
Initially, we used several statistics to separate normal actions from abnormal and label them accordingly. In the following, we will discuss each of the used methods.
3.3.2.1. Min-Max Range Based on First Season
In this method, because the dependency of the elderly is still constant during the first season (according to the information provided by the dataset owners), we chose the first season as the basis to determine the normal range for the features of each action. More precisely, we assumed all actions that are performed in the first season i.e., data gathered in the first three months, are normal actions. Using first season records for each action, we calculate the normal range for all the features. Given the feature f of the jth sample of the ith action be represented as , then
Moreover, we prepare the list of possible actions that are performed before each action. For determining the list, if an action is performed only one or two times before the specific action during the season, it will not be added to the list. After determining the valid range for all features of each of the actions, we reviewed all samples of other seasons. If the action had at least one feature out of the valid range of the underlying feature, we labeled that as an anomaly otherwise, we considered the action as a normal one.
3.3.2.2. Mean +/- 3stdev Range Based on First Season
In this method, similar to the previous one, we considered the first season as the basis for extracting normal intervals for the features of the actions. By considering all samples of the first season, we extracted the time valid intervals for all the features of each action as,
Then, by examining and comparing the samples of other seasons, if all the values of the features for each action are within the valid range, we give the normal label to the action, and otherwise, labeling the action as abnormal.
3.3.2.3. Mean +/- 3 stdev Range Based on Total Year
In this method, instead of placing just one season as a basis, the whole year is considered for determining the normal range of features. In this method, for labeling each sample, possible ranges of features for each action is calculated based on all other records within one year. If all the values of the sample features were within the valid range, then we give the action a normal label, and otherwise, we label it as abnormal.
3.3.2.4. Inter-Quartile Range Based on First Season
In this method, the first season is considered for calculating the normal range of features. After calculating the possible range of (Q1 – 1.5*IQR, Q3 + 1.5*IQR) for all features of each action, records of the remaining three seasons are labeled according to the specified intervals. Valid ranges are calculated with respect to the inter-quartile range of the values for each feature, i.e.
3.3.2.5. Inter-Quartile Range Based on Total Year
In this method, for labeling each record, we calculate the range of (Q1 - 1.5*IQR, Q3 + 1.5*IQR) for all the features considering the records and label the current record accordingly.