The development of Road Information Acquisition Systems (RIASs) based on the Mobile Crowdsensing (MCS) paradigm has been widely studied for the last years. In that sense, most of the existing MCS-based RIASs focus on urban road networks and assume a car-based scenario. However, there exist a scarcity of approaches that pay attention to rural and country road networks. In that sense, forest paths are used for a wide range of recreational and sport activities by many different people and they can be also affected by different problems or obstacles blocking them. As a result, this work introduces SAMARITAN, a framework for rural-road network monitoring based on MCS. SAMARITAN analyzes the spatio-temporal trajectories from cyclists extracted from the fitness application Strava so as to uncover potential obstacles in a target road network. The framework has been evaluated in a real-world network of forest paths in the city of Cieza (Spain) showing quite promising results.
For the last decade, smartphones have been the center of the digital life in modern societies due to their growing popularity. As a result, they are now equipped with several sensors like GPS, accelerometer, microphone, and so forth.
This palette of sensors allows to capture a large amount of contextual information related to the phone’s holders and their surrounding environment . In that sense, this contextual information has been used in several domains like tourism , assistance living  or personal training . In addition to that, this has eased the development of the mobile crowdsensing (MCS) or human/people sensing paradigm. MCS allows to perceive large-scale phenomena that can not be detected at an individual level like the air pollution levels or the parking state of a city .
One of the most useful scenarios where MCS has been used is the deployment of innovative Road Information Acquisition Systems (RIAS). This type of systems report the condition of a road network like longitudinal and lateral roughness, friction, cracking or surface substance. This is instrumental information for road-maintenance operators. Traditionally, this information has been captured by means of infrastructure-based sensors like cameras or high-precision lasers . In that sense, MCS allows to enlarge the coverage of this type of systems beyond the location of the infrastructure-based sensors by using the data captured by the position and motion sensors of the drivers’ handheld devices [16,19,31]. The application domain of this MCS-based solutions restricts itself to urban road networks where different types of motor vehicles travels on .
However, existing literature has payed little attention to other types of road networks that are more common in a country environment like forest paths. These types of roads are widely used by people of all kinds for carrying out a large number of outdoor activities, such as cycling, running or going hiking. For that reason, they may suffer from condition problems that make them difficult to be properly used by visitors, like large obstacles (e.g. fallen trees) or landslides.
Due to their growing popularity, it becomes necessary the development of effective RIASs targeting rural or forest-related road networks. Nevertheless, existing solutions based on infrastructure sensors can not be applied in a cost-effective manner whereas MCS-based solutions rely on sensor-data analyses that focus on detecting fine-grained problems on asphalted roads. Nonetheless, this type of road is not the most common one in many rural environments.
Apart from that, we can also state the growing prominence of fitness apps, like Strava1 or Endomondo2 which operate on a large number of smartphones and wearable devices . These applications allows to endlessly collect different features from exercise activities on large spatial areas. This makes these apps suitable enablers for crowd-based methods in a wide range of domains beyond the smart heath scope .
For that reason, the present work introduces SAMARITAN, a system for foreSt pAth Monitoring based on collAboRatIve Trajectory dAta aNalysis. The goal of SAMARITAN is to provide a RIAS for forest paths by following a MCS approach. In particular, it focuses on detecting obstacles that seriously affect the transit of people across such paths like landslides or fallen trunks. To do so, different well-known algorithms from the spatio-temporal trajectory-data mining field have been used.
As Fig. 1 depicts, SAMARITAN firstly collects the spatio-temporal trajectories of cyclists within a region of interest via the fitness-app Strava. Then, a two-level segmentation of the trajectories is applied. This step profits from the segments of interest defined by the own users in the Strava platform. This way, SAMARITAN leverages the knowledge shared by cyclists who move around the monitored spatial region. From the segments extracted of each individual trajectory, a set of candidates where a problem related to the paths state might occur is extracted based on the detection of abnormal stop points. Finally, the candidates extracted at individual level are aggregated to conform the final group of affected segments.
All in all, bearing in mind the open challenges of rural RIAS, the salient contributions of SAMARITAN are the following,
– First of all, it is the first crowd-based RIAS that fully focuses on the rural environment and its particularities.
– Secondly, it makes use of data extracted from a fitness application so as to uncover the state of a road network with great detail. In that sense, such extracted data does not limit to the raw spatio-temporal trajectories from the target users, but also the community-based segments that allow to guide the detection of incidents within the network.
The remainder of the paper is structured as follows. Next, an overview the relevant related work is put forward in Section 2. Section 3 is devoted to describing in detail the logic structure and the processing stages of SAMARITAN. Then, Section 4 discusses the main results of the performed experiments. Finally, the main conclusions and the future work are summed up in Section 5.
In this section we review the most recent advances in MCS-based RAISs along with the usage of data extracted from fitness applications as an enabler of innovative services in different environments.
2.1.MCS-based road information acquisition systems
During the last years, the ubiquity of smart devices carried by drivers has fostered the development of many different approaches for road state monitoring. In that sense, one the first proposals for MCS-based RAIS was put forward in . In this work, authors designed a mechanism able to detect potholes and speedbumps via the analysis of accelerometer and GPS data. In  a privacy-preserving mechanism based on fog computing is defined so as to protect the road-condition reports sent by vehicles to the backend servers.
Nevertheless, a common feature of the aforementioned works is that they use onboard devices previously installed on vehicles instead of the drivers’ handheld devices. This seriously limits their feasibility.
As a result, the work in  proposed a mechanism that actually profits from a smartphone equipment to detect road-surface problems. However, each device operates independently in the detection task so there is not a real cooperation among users. This makes rather difficult to achieve a complete coverage of a large road network.
This way, in  authors proposed an holistic architecture for MCS-based road status monitoring coined CRATER. This mechanism relies on different features extracted from the accelerometer of the users’ smartphones so as to detect patholes and speedbumps via binary classifiers. Another interesting work is  where authors made use the crowdsensing platform SmartRoadSense, previously defined in , to perform a large-scale deployment of a RAIS as pilot study. Again, an accelerometer stream data is used to detect road-surface roughness. Besides, a time-series approach by means of the well-known algorithm Dynamic Time Warping for the detection of bumps or potholes is described in .
Another important research trend goes beyond the detection of single surface problems and intend to provide more general information about the road state. This is the case of  where authors follow a similar GPS and accelerometer fusion approach for road monitoring based on MCS. Unlike previous works they provide a more detailed solution to detect the general state of a road, not just surface problems like bumps or potholes. Similarly, a smartphone-based RAIS able to classify the quality of the road surface into five different levels ranging from good to terrible is put forward in .
Unlike these works, SAMARITAN focuses on rural roads that do not have the same problems as traditional urban roads. In our case, we detect obstacles that block certain rural paths. For that detection, we do not rely on measurements from the accelerometer sensor of smartphones like in the previous works. Instead of that, we analyze crowd-based GPS traces to detect abnormal stay points that might reflect some problems on certain path segments.
2.2.Analysis of fitness data
During the last years, many different applications have been developed by using the crowd-based data collected by sport and fitness applications.
One of the most interesting fields where this type of data has been applied is the analysis of human mobility flows in cities. For example, fitness-apps data is used in  to infer the usage patterns of certain recreational areas whereas in  a comprehensive study of mobility patterns based on Strava cycling data in Johannesburg (South Africa) is stated.
Moreover, an analysis based on Strava data of the correlation between certain characteristics of an urban road network and the volume of commute cycling is put forward in . In , authors extend this type of analysis by including meteorological data in order to study how weather factors actually affect the behaviour of cyclists. Moreover, the work in  correlates the air-pollution level of a city with the mobility patterns of commuting and non-commuting cycling activities by using data from Strava. Similarly, authors in  make use of this type of data to detect cycling frequency behaviours in a city and in , a data-fusion approach, including data from Strava, is proposed to detect the points of interest where cyclists move within a spatial zone.
In our work, we use cycling data from Strava in a completely different scope. We make use of the spatio-temporal GPS trajectories extracted from that platform like the aforementioned works. These trajectories represent the routes taken by Strava users. However, we analyze the collected raw trajectories to detect points in a rural road network with potential obstacles. This constitutes a novel usage of this type of fitness data beyond the mobility-pattern extraction mentioned above.
3.The SAMARITAN framework
In this section, we describe in detail the SAMARITAN framework. In that sense, Fig. 2 shows its key steps. As we can see, SAMARITAN follows a crowd-sensing approach where steps 2, 3 and 4 are executed in the contributors’ devices whereas step 5 is executed in a backend server. The following subsections describe in detail each of these steps.
3.1.Collection of the trajectories
The first stage of the framework pipeline focuses on collecting the individual spatio-temporal trajectories of people moving around the target area of interest as Fig. 2 shows. For that goal, we make use of the Application Programming Interface (API) of the Strava platform.
We should remark that SAMARITAN focuses on the trajectories generated by cyclists instead of other sports. The rationale of this filtering is that the detection of problems in the road network is done by the analysis of abnormal speed fluctuations of the incoming trajectories. The cycling trajectories usually have a speed range large enough to perform such an analysis with an acceptable confidence level.
In our scope, a trajectory from a cyclist (where is the set of target cyclists) represents the movement made by c during one route or trip with his bike. Therefore, a trajectory can be defined as follows,
A cyclist trajectory is a sequence of consecutive timestamped points of a cyclist c, , , where is the i-th location and timestamp of the trajectory so that .
All the trajectories collected from a particular cyclist c conform the set . In that sense, as Fig. 2 shows, the SAMARITAN framework performs a set of analytical steps over each set of trajectories for all the target cyclists . These steps are labelled as 2a, 2b, 3 and 4 in the figure.
3.2.Segmentation of the trajectories
When it comes to analyze spatiotemporal trajectories, one common pre-procesing step is the trajectory segmentation . This consists of dividing a trajectory into fragments by several criteria like time interval, spatial shape, semantic meaning. This allows to compress the incoming trajectory in a more simple format.
For this step, SAMARITAN makes use of the segments defined by the own Strava community. In that sense, a Strava segment is just a portion of a road.3 They are created by the own Strava users so that they can compare times marks at that particular road portion during their training.
Since each segment is defined as a line with a starting and ending point defined by coordinates that can be crawled via Strava API, it is possible to spatially join each trajectory with the set of community-based segments . Basically, this join takes each point and checks if it spatially fits into any of the line segments .
As a result of this join, a new segment-based trajectory is generated from each point-based trajectory . It just comprises the Strava segments that contain any point . This can be defined as follows,
A cyclist segment-trajectory is a sequence of segments, , , where is the i-th segment of the trajectory.
For example, in Fig. 1, the trajectory of cyclist 1, , gives rise to the trajectory . In case of the trajectory of cyclist 2, , it generates the trajectory . Therefore, we can see that the number of segments of each trajectory might be different in each case.
3.3.Detection of low-speed points
Apart from the segmentation described above, we also analyze each incoming raw trajectory so as to detect the parts where the cyclist c moved quite slowly (step 2b in Fig. 2).
The rationale of detecting these low-speed parts of a trajectory is that they might indicate the presence of certain obstacles during a cyclist’s trip. In these points, a cyclist usually needs to slow down and, in some cases, even get off his bike. This is reflected as a sudden drop in the speed profile of the trajectory. These low-speed points can be regarded as outliers in the speed time-series of a trajectory.
This speed time-series can be calculated from the sequence of timestamped points of a raw trajectory stated in Def. 2. In particular, for each pair of consecutive points , we compute its instantaneous speed as where refers to the Haversine distance between the two points. This gives raise to the associate speed time-series of the trajectory that is defined as follows,
A cyclist trajectory speed profile is a sequence of speed measurements from a trajectory, , , , where is the instantaneous speed at the i-th location of .
In order to detect the outliers from this profile, SAMARITAN makes applies the Generalized Extreme Student Deviation (GESD) test over the time-series . GESD is a simple but robust procedure to detect outliers in time series in many different situations based on the GESD test . In brief, the GESD infers whether a set of extreme observations of a time series are actually outliers based on their associated t-distribution.
In order to only retain the low-speed outliers, we discard those abnormal values that are above 0.5 m/s. As a result of this process, a set of abnormally low speed values are extracted.
Finally, we map each abnormal speed to its corresponding point . This gives raise to the set of points where such abnormal low speeds occurred.
It is worth-mentioning that the aforementioned procedure relies on GPS trajectories that are defined at a very fine granularity where the current location of the cyclist is captured every few seconds. The functionality of SAMARITAN would be rather limited if the GPS feed (e.g. the cyclist’s smartphone or smartwatch) was configured with a large sampling rate because, in that case, some low-speed points might not be detected.
However, high-intensity sports are better monitored when high sampling rates above 10 Hz are used . Furthermore, some manufacturers of GPS trackers in the sport field actually recommend a similar configuration for their devices . Consequently, it is sensible to expect that the potential contributors of SAMARITAN would generate fine-grained trajectories during its cycling activities in most of the cases.
3.4.Mapping of low-speed points and segments
Once we have uncovered the abnormal low-speed points of a trajectory, we need to detect whether these points occurred or not in any of the community-based segments of the trajectory. This is because SAMARITAN handles the points in a different manner depending on they fit or not into a Strava segment as we will see later.
Consequently, this step of SAMARITAN takes as input the Strava segments and the set of low-speed points of each incoming trajectory (see Fig. 2). Algorithm 1 summarizes the mapping processing perform at this stage of the framework.
As we can see from the pseudo-code, we just take each low-speed point in (line 2) and check whether it spatially fits into in any of the segments of (lines 4–6).
If the point fits into a segment then we perform a time-based analysis (lines 7–11 of Alg. 1). The idea of this analysis is that if a cyclist had to abruptly slow down one or more times within a segment then the time required to cover that segment would be meaningfully larger than the cyclist’s average for that particular segment. Otherwise, the detected low-speed points may be just noisy measurements.
To do so, we profit from a feature of the Strava API4 that allows to extract the historic time records of a cyclist for a given segment. Hence, we can compute the mean and the standard deviation of these time records for a cyclist c in any segment , .
Apart from that, we can calculate the actual time required by a cyclist to cover a segment s during a trajectory (). Given that value, we can estimate its z-score in the Gaussian Distribution (line 9). This score indicates how many standard deviations the mark is far away from the mean .
If this z-score is above a certain threshold () then we can conclude that the target cyclist has not only abruptly decelerated within the segment (because of the existence of low-speed points), but has also moved much lower than usual. As this can be regarded as a truly abnormal behaviour, the target low-speed point is included as part of the set of in-segment low-speed points (ISLSPs) for further analysis (line 11).
Otherwise, if the low-speed point does not fit into any segment we can not perform the aforementioned time analysis. This type of out-segment points are handled in a different manner. Thus, they are included in the complementary set of out-segment low-speed points (OSLSPs) . Consequently, we can see that the following two conditions hold,
Finally, these two sets are processed in a different way by SAMARITAN. This is because both sets represent completely different situations. Whilst the low-speed points in occurred in regions (segments) with a high volume of cyclists, include the low-speed points that occurred in spatial regions that might not be heavily traveled by cyclists.
3.5.Clustering of the out-segment low-speed points (OSLSPs)
As we have mentioned before, contains the OSLSPs of a trajectory , that is, the locations where the cyclist moved abnormally and suddenly slow in spatial areas not covered by any community-based segment .
In this case, we need to determine if these abrupt decelerations correspond to recent changes in the mobility behaviour of cyclist c or they are just part of his usual behaviour. This because a sudden change in the mobility profile of a cyclist in terms of new stop points might indicate the presence of recent obstacles in these points.
In order to determine if the OSLSPs are part of the usual mobility behavior of a cyclist or not, we follow a density-based clustering approach based on the well-known DBSCAN algorithm . This way, we collect all the OSLSPs from each cyclist c during the last hours. This gives raise to the set .
Based on the OSLSPs included in that dataset, we applied the procedure described in Algorithm 2. First of all, we apply the density-based clustering algorithm DBSCAN  to (line 2 of Alg. 2). The key steps of DBSCAN can be summarized as follows,
– For each OSLSP , the algorithm detects the other low-speed points located within the distance ϵ from it. This is its ϵ-neighborhood .
– If the number of points within that neighborhood is above the then the is considered a centroid and all the points in its ϵ-neighborhood belong to the same cluster.
The set of centroids obtained from this algorithm () indicates the locations where a cyclist c has abruptly slow down several times during their routes for the last hours. However, we need to check, for each location, if this is a usual behavior or a recent one.
To do so, we extract the minimum timestamp of all the points included in the ϵ-neighborhood () of a centroid (lines 3–4). If the difference between that timestamp and the current one () is below a certain threshold () (line 5), this indicates that all the points in the ϵ-neighborhood of target centroid have been generated quite recently. Therefore, this point is included as a representative out-segment low-speed point (r-OSLSP) in the set (line 6).
An illustrative example of this process is shown in Fig. 3. In this scenario, six different trajectories from the same cyclist c are collected, each one comprising a single OSLSP. Next, SAMARITAN executes Algorithm 2. In this case, comprises 2 different OSLSPs (from trajectories and ). As this number is equal to , the time interval calculation of the ϵ-neighborhood is performed. In this case, its is 700 time units. Since () is below then is included in the set.
A similar situation arises with trajectory and its low-speed point . However, in this case the of its ϵ-neighborhood takes 40 time units as value. This makes that the covered time interval of the neighborhood () is much longer than the . This excludes from as its coverage area includes stop points that occurred a long time ago.
At this point, we should indicate that it is true that GPS trajectories usually suffer from some inaccuracies due to signal-reception problems. They might cause that the path represented by a trajectory does not completely fit the actual path followed by the moving object. However, this type of error usually arises in scenarios where signal occlusion occurs like indoor environments or certain urban regions .
In that sense, the present framework focuses on an outdoor scenario that may reduce the incidence of this type of error. Moreover, the low-speed point extraction does not rely on the actual trace followed by a cyclist but on his speed evolution. It is true that inaccurate GPS trajectories might cause the system to wrongly infer false low-speed points or not to detect certain true ones. However, the inference of ISLSPs and OSLSPs depends on the latent speed of a trajectory at Strava segment level (for ISLSPs) or considering a ϵ-neighborhood (for OSLSPs). This segment or cluster matching procedure is similar to the map-matching step performed by many solutions for GPS trajectory processing in the urban and vehicular environment [4,6,21]. This allows to reduce the impact of these noisy points in the speed pattern of a trajectory. Hence, this makes it possible to detect that a cyclist has moved abnormally slow or not.
Finally, the and points individually generated by each cyclist are send to a back-end server to further processing them in an aggregated manner.
3.6.Detection of obstacles with in-segment low-speed points (ISLSPs)
Based on the procedure described in Section 3.4, the client-side of SAMARITAN was able to extract the set with the ISLSPs of each trajectory . Given this set, the back-end server of the framework can now detect if there is any anomaly in the cyclists’ mobility behavior in any of the Strava segments .
To do so, SAMARITAN follows a batch-based analysis (depicted as step 5a in Fig. 2). To begin with, it defines a time-based sliding window that collects all the sets generated by all cyclists during the last hours. The content of this sliding-windows is defined as the set . This set contains all the ISLSPs occurred in the most recent trajectories received by SAMARITAN.
Given the aforementioned dataset, SAMARITAN detects changes in the mobility behavior within each Strava segment by following the procedure described in Algorithm 3.
Basically, we select for each segment the ISLSPs that fit in it (lines 2–3 of the algorithm). Next, we launch an instance of the DBSCAN algorithm with the selected subset (line 4). The generated centroid are considered potential obstacles and aggregated to the set (line 5).Finally, Algorithm 3 is launched each time a new is generated by any cyclist .
An alternative approach for this step would have been the execution of a global instance of DBSCAN independent of any segment. Nonetheless, this had merged together ISLSPs of different segments. Since each Strava segment conceptually represents a particular road slice with different features it is necessary to analyze each segment independently.
All in all, we can see that the Strava segments provides a sparse spatial tessellation of the area under study. They allow to group together stop-points in spatial areas frequently crossed by cyclists.
3.7.Detection of obstacles with out-segment low-speed points (OSLSPs)
In Section 3.5 we put forward how uncover the r-OSLSPs from each cyclist under control. Therefore, we can use that information so as to detect a new set of obstacles apart from the ones uncovered by the procedure described in the previous section based on ISLSPs.
As in Section 3.6, we analyze the collected r-OSLSPs by making use of a batch-based approach. In particular, we gather the r-OSLSPs from all the cyclist in generated during the last hours by means of a time-based sliding window .
Next, each time is updated with a new r-OSLSP set, , DBSCAN is applied on the new sliding-window content. The two parameters of the algorithm ( and ϵ) are set with the same values that the instance of DBSCAN used to uncover the personal r-OSLSPs in Section 3.5.
The resulting set of centroids indicate spatial regions (not covered by a Strava segment) where several cyclist had to abruptly decelerate during, at least, the last hours. Therefore, this set of centroids is defined as the second type of potential obstacles detected by SAMARITAN.
We can see that, in order to process, the OSLSPs SAMARITAN follows a two-level DBSCAN clustering.
– In the first level, OSLSPs are clustered together for each individual cyclist as put forward in Section 3.5.
– In the second level, all the r-OSLSPs extracted in the first clustering level are aggregated and clustered again so as to come up with the final set of candidate locations of obstacles ().
This approach makes SAMARITAN able to be adapted to a client-server infrastructure where the first clustering level is executed in the cyclists’s personal devices whereas the second one is executed in a back-end server with the meaningful stops from the contributors.
Finally, the set of obstacles and are merged together as the final set of obstacles identified by SAMARITAN. In that sense, we should remark that the two sets are disjoined because reports obstacles within Strava segments whereas are obstacles in parts of the rural-road network not identified by any community-based platform.
3.8.Summary of the SAMARITAN pipeline
For the sake of clarity, here we sum up the key steps that compose the processing pipeline of the SAMARITAN framework as shown in Fig. 2.
First of all, the client-side SAMARITAN collects the spatio-temporal trajectories of its target cyclist moving around the spatial area under monitoring (step 1 in Fig. 2). Then, these trajectories are split based on the community-defined Strava segments and their low-speed points are uncovered (steps 2a and 2b). Next, these low-speed points are mapped based on the Strava segments (step 3). This gives raise to OSLSPs and ISLSPs. After that, the OSLSP are clustered so as to uncover the r-OSLSPs (step 4).
Finally, the central server of the framework merges together the ISLSP in each Strava segment to detect the potential obstacles within these segments (step 5a). At the same time, the r-OSLSPs are clustered with DBSCAN to identify regions with potential obstacles located outside any Strava segment (step 5b).
3.9.Data privacy aspects of SAMARITAN
Regarding the issues about the cyclists’ privacy when using SAMARITAN, it is true that the proposed solution relies on the analysis of the spatio-temporal trajectories from cyclists so as to detect forest-path incidents. However, the initial processing of the raw GPS traces is performed locally in the client side of the framework running in each cyclist’s mobile device. This client side only sends to the central server the low-speed points and that have been uncovered, not the whole spatio-temporal trace of the trajectory (see Sections 3.4 and 3.5).
Consequently, the location data sent to the central server is actually quite limited. In addition to that, the server only needs to store the low-speed points sent by the cyclist during a certain amount of time due to its batch-based computation of the aggregated points (see Sections 3.6 and 3.7). After and hours, which are the length of the time-based sliding windows used for such an aggregation, the cyclists’ reports can be removed from the server. Since users only report sparse data and it is not stored in the server for a long period of time, this would make it rather difficult to uncover private or personal information from the contributors, like their home location, by using solely the information contained in the SAMARITAN central server.
For the sake of completeness, the privacy-preserving solution stated in  focuses on a scenario where vehicles are endlessly reporting real-time data to certain intermediate entities called Roadside Units (RSUs). These units analyze the fine-grained data from vehicles and report possible alerts about road conditions to upper cloud servers. Authors focus on developing a mechanism to secure the communication channel between vehicles and RSUs as a large amount of private data flows through it. On the contrary, SAMARITAN does not require contributors to send real-time data to a central or intermediary server but certain candidate points where a road problem might occur. Therefore, the direct integration of the aforementioned privacy solution in SAMARITAN would not be possible.
3.10.Configuration of SAMARITAN
The proposed framework is able to control its sensibility to detect forest-path problems. This is because this detection mainly depends on two different sets of parameters, the lengths of the time-windows (, ) used to store the most recent low-speed points from all the cyclists and the DBSCAN parameters (, ϵ), which define the density of these points required to infer that there is a particular problem in a forest path.
These sets of parameters allow to adjust the behaviour of SAMARITAN with respect to the number of available cyclists using the service. Depending on the number of contributors, the SAMARITAN administrator can reduce or increase the length of the time windows by considering the storage capability of the central server.
In case of the number of contributors is high, the length of the time-windows can be reduced. This is because provided that there is a problem in a forest-path many cyclists would report abnormal low-speed points in a short period of time.
On the contrary, if the number of contributors is low, the administrator can reduce the density-parameters of the DBSCAN algorithm so as to increase the sensibility of the solution. It is true that this might lead to reporting some false positives but, in this case, the important goal is to detect as many obstacles as possible.
4.Evaluation of the framework
We have evaluated SAMARITAN in the city of Cieza at the southwest of Spain. This small city is surrounded by many forest paths that are very popular among local cyclists. Furthermore, the region was heavily affected by a cold front in September of 2019.5 This caused a lot of damage to the roads in the area.
In order to apply SAMARITAN in that region we focused on the city outskirts included the spatial bounding box defined by the latitude-longitude coordinates and .
The details of three datasets used for the present evalution are described next.
4.1.1.Strava segments under consideration
The target geographical area includes 36 Strava segments whose spatial distribution within the target bounding box is shown in Fig. 4. As we can see, these segments are not homogeneously distributed in the target area. This is because they are defined by the own Strava users. Therefore, they usually represent parts of the road that are interesting or event challenging for the cyclist perspective.
Besides, Table 1 indicates the minimum, average and maximum length of these segments. We can see that these segments cover a wide range of lengths from a few meters to almost 2000 meters.
|Min. segment length||92 m|
|Max. segment length||1922 m|
|Avg. segment length||673 m|
|Total segment length||24228 m|
Furthermore, Fig. 5 shows the length distribution among the collected Strava segments. From this distribution, we can see that most of the segments cover road parts of roughtly 317 meters.
This variety of lengths justifies the step take by SAMARITAN to identify obstacles within Strava segments described in Section 3.6. As we saw in that section, the ISLSPs of a particular segment are clustered by means of DBSCAN. This allows to cope with quite large segments as the framework is able to provide different tentative locations within a segment.
4.1.2.Spatio-temporal trajectories under study
Regarding the trajectories used as input by the framework, we have collected the data from 5 different cyclists during a six-month period from 01/06/2019 to 31/12/2019. This has given raise to 98 different spatio-temporal trajectories. The sampling rate of these trajectories varied from 1 s to 3 s.
In order to test the accuracy of SAMARITAN, eight particular locations within the target geographical area where a clear alteration of a forest path occurred were used as ground truth. They were identified as . Figure 6 shows the position of these obstacles. As we can see, seven of them fit into a Strava segment whereas one was outside any of them.
All these obstacles were caused by a three-day cold front that occurred during the 11st and 14th of September of 2019 in the southeast of Spain. This storm caused torrential downpours that seriously affected the spatial region considered by the use case. As a result, five of the target problems were caused by landslides, three by fallen trees and one by a broken pipe. All these obstacles were manually discovered by the authors during a three-day campaign from 27th to 29th of September. In that sense, this ground-truth information was not revealed to the 5 cyclists acting as contributors of SAMARITAN.
For the sake of clarity, Table 2 shows the relationship between the aforementioned obstacles and some detail of the segments comprising them. Furthermore, Fig. 7 shows some of these obstacles. As we can see, they were caused by landslides (obstacle ), fallen trees ( and ) and broken irrigation channels ().
|Segment name||Segment length||Obstacle id.|
|Bajada Senda del Moro ()||1500 m|
|Menu-Puente Abaram ()||1900 m|
|Subida Camino-Viejo ()||1100 m|
For the sake of clarity, Table 3 sums up the parameter settings in this use case.
As we can see, the parameters that define the time periods to analyze the low-speed points (, , ) are set to 360 hours (2 weeks). This is because it allows to collect enough data so as to detect obstacles in a more reliable manner.
|Low-speed points mapping (Section 3.4)||1.5|
|ϵ||OSLSP clustering (Section 3.5)||250 m|
|Obstacle detection with ISLSPs (Section 3.6)||360 h|
|Obstacle detection with r-OSLSPs (Section 3.7)||360 h|
This is because SAMARITAN relies on routes made by cyclist during their free time, so they can not be regarded as daily trips. This makes the required time periods to process data quite large.
4.3.Analysis of the results
Given the trajectories and segments described above, SAMARITAN reported 10 different potential obstacles as shown in Fig. 8.
From this set, seven of them (, , , , , and ) were located less than 150 meters of a ground truth alarm. In that sense, Table 4 shows the closest true obstacle for each of the generated obstacles.
|Inferred obstacle||Closest true obstacle||Distance|
In order to properly analyze these results, we study the speed behaviour of the trajectories in the three segments comprising seven of the true obstacles (, and ).
Furthermore, some of these obstacles were repaired by authorities during the time period of the present use case. This allowed us to split such speed profile in three different time intervals, 1) one including the trajectories before the occurrence of the obstacle event, 2) another including the trajectories in the time interval during which the obstacle was present and 3) a final time interval covering the trajectories after the forest-path problem was solved.
To set the date thresholds defining each interval, we used the average time marks of the five target cyclist in each segment depicted in Fig. 9. As we can see, there is a clear increment of the time marks in the three segments after the 8th of September of 2019. This consistent with the dates of the cold front affecting the region in that month.
After that, the marks behavior varies depending on the segment. In segment (Fig. 9a) the marks go back to the values before the cold front after roughly two months. In segment (Fig. 9c) we can see slight decrease of the marks around the 20th of October. Finally, segment does not exhibit any decrements after the cold front (Fig. 9e).
Besides, the speed distributions shown in Figs 9b,d&f confirm that the presence of obstacles clearly affects the speed behaviour of cyclists around the affected region.
Figure 10 shows the average speed evolution of the target trajectories in segments , and during these three time intervals. The figure also includes the location of the inferred and true obstacles with respect the length of the segment. For example, obstacles is at 630 m from the beginning of segment (see Fig. 10a).
The first thing to note is that there is a clear difference between the trajectories’ speed profile depending on the time period. This shows that the presence of obstacles meaningfully affects the mobility flows of cyclist moving around the area under control.
More in detail, we can see that the speed-profile after the obstacle events and before their repair are quite similar in segment (Fig. 10a), indicating that the forest path in that segment was fully recovered. However, the speed profiles before and after the obstacle event in segment (Fig. 10b) are quite different. In particular, cyclist have now to move much more slowly than before the obstacle event. Finally, in segment there was not any repair intervention. Hence, the speed profiles of the trajectories during and after the obstacle event are quite similar.
If we focus on the speed behavior of the trajectories during the obstacle presence (yellow lines in Fig. 10), we can clearly see a set of sudden speed drops around the obstacles location. This generates different ISLSPs from the trajectories moving along these segments. This allows to detect obstacles and in segment (Fig. 10a) and obstacles and in (Fig. 10b) with very high accuracy. A similar behavior is observed in segment with obstacles and (Fig. 10c).
However, obstacle is located in an part of segment where the forest path was at a very bad condition. This caused all the trajectories in the last part of to move, on average, quite slowly for around 400 meters. As a result, many different ISLSPs were generated in that last part of the segment. This caused a slight displacement of the SMARITAN alert () with respect the true obstacle.
Finally, SAMARITAN was not able to detect the true obstacle in an accurate manner (see Table 4). Since this obstacle is outside any segment it can only be detected by means of r-OSLSPs. According to Section 3.7, obstacles outside Strava segments are only generated when some close r-OSLSPs from the different cyclist are detected. From Fig. 11 we can see that only 2 out of 5 cyclist actually generated r-OSLSPs near . This is because this obstacle is located in an area with a quite low transit of cyclists. Therefore, the time window , in charge of gathering the most recent r-OSLSPs, does not contain enough points so that the DBSCAN instance is able to uncover meaningful clusters.
From this use case we can draw up some interesting findings.
First of all, SAMARITAN uses two types of crowdsensing data to detect forest-path problems, an internal and an external one. The former is the GPS trajectories generated by the cyclists acting as contributors that are explicitly processed by the client side of the framework. The latter is the road-segment statistics gathered from a third-party community service like Strava. In that sense, the use case has shown that the enrichment of a MCS architecture with crowd-based data from an external platform is a promising approach to extend the usage of these architectures to new domains such as the detection of road problems in rural regions.
Secondly, the evaluation of the framework has proved that, as many MCS architectures, the reliability and accuracy of the proposed solution strongly depends on the density of contributors. In that sense, road problems occurring within Strava segments are more likely to be discovered. This is because these segments are defined by the Strava users in parts of the road network covered by a large number of cyclists. This limitation is a side effect of relying on Strava as an input source.
However, this dependence allows SAMARITAN to leverage the data shared by users in Strava. As a result, it avoids a cold-start problem when it comes to retrieve historic speed-behaviour of cyclists of the segments in each new deployment. In that sense, if the Strava platform disappeared or restricted its access via API then it would be necessary to find an alternative public feed providing historic data about cyclist mobility in the target areas. In that sense, some spatial repositories, like OpenStreetMap, allows users to upload their own GPS trajectories.6 Another possible alternative would be the composition of the segment statistics directly by SAMARITAN. This could be done in an incremental manner as long as the system processes data from the contributors. However, this option would suffer from the same cold-start problem described above.
Finally, the crowd-based approach followed by SAMARITAN to collect fitness-related data limits its application to scenarios accomplishing certain requirements. Since the system relies on sudden and abrupt deceleration of the cyclists as initial step to fire the path-problem detection mechanism, SAMARITAN would not be a feasible solution in urban environments where cyclists usually face many different obstacles (e.g traffic lights, pedestrians and so forth) that may make them to abruptly stop. This would generate plenty of noisy data and the system would report a large number of false positives.
The endless enrichment of personal mobile contrivances with new sensing capabilities has enabled the development of many collaborative applications. In this context, MCS-based RIASs allow to monitor the state of large portions of a road network in a cost-effective manner. However, existing solutions focus on a rather limited scope as they assume that devices providing sensor data are used by car drivers.
In this context, the present work introduces SAMARITAN, a MCS-based RIAS that targets forest paths. This type of roads are very popular to do many different sports like cycling or hiking. However, they can also suffer from problems that affect their usual transit flow. These problems are caused by the presence of obstacles like fallen trees or landslides.
SAMARITAN profits from the growing popularity of fitness Internet applications to perform its path-monitoring task. In particular, it makes use of cyclist trajectories and community-based segments collected from Strava, a foremost sport application. The evaluation of the framework in a real-world scenario has shown that our solution has been able to accurately detect most of the target obstacles. However, as most solutions based on the MCS paradigm, its reliability depends on the density of contributors in the target area.
The key benefit of SAMARITAN with respect to other possible alternatives for forest-path monitoring is that it follows an opportunistic MCS approach. As a result, contributors do not need to explicitly notify the locations of the forest-path incidents because the system is able to do that based on their automatically-generated GPS trajectories. In that sense, potential alternatives where users must explicitly report the problems that they see in the paths would suffer from large usability problems. The most important one is that this type of approach would require users to stop halfway through their sport activity every time they spot an incident, report the problem and lastly resume their training. This would discourage the usage of these type of alternatives by many different potential contributors within the fitness field.
All in all, the present work would help rural administrations to better maintain their forest paths networks. It would allow these authorities to control large geographical areas at affordable cost provided that they are visited by enough cyclists.
Finally, future work will extend the framework to consider other contextual factors for obstacles detection. This way, the current weather conditions or the orography of the target region under study might be relevant features in order to asses whether an abnormal stop is caused by a road-condition problem or not.
Authors would like to thank the cyclists who kindly collaborated in the evaluation of the tool by giving up their GPS trajectories. Moreover, this work has been supported by the Fundación Séneca del Centro de Coordinación de la Investigación de la Región de Murcia under Project 20813/PI/18, and by the Spanish Ministry of Science, Innovation and Universities under grant RTC-2017-6389-5.
G. Alessandroni, L. Klopfenstein, S. Delpriori, M. Dromedari, G. Luchetti, B. Paolini, A. Seraghiti, E. Lattanzi, V. Freschi, A. Carini et al., Smartroadsense: Collaborative road surface condition monitoring, in: Proceedings of the UBICOMM, 2014, pp. 210–215.
I. Ayala, L. Mandow, M. Amor and L. Fuentes, A mobile and interactive multiobjective urban tourist route planning system, Journal of Ambient Intelligence and Smart Environments 9(1) (2017), 129–144. doi:10.3233/AIS-160413.
S. Basudan, X. Lin and K. Sankaranarayanan, A privacy-preserving vehicular crowdsensing-based road surface condition monitoring system using fog computing, IEEE Internet of Things Journal 4(3) (2017), 772–782. doi:10.1109/JIOT.2017.2666783.
L. Bayındır, A survey of people-centric sensing studies utilizing mobile phone sensors, Journal of Ambient Intelligence and Smart Environments 9(4) (2017), 421–448. doi:10.3233/AIS-170446.
C. Borcea, M. Talasila and R. Curtmola, Mobile Crowdsensing, CRC Press, 2016.
C. Chen, Y. Ding, X. Xie, S. Zhang, Z. Wang and L. Feng, TrajCompressor: An online map-matching-based trajectory compression framework leveraging vehicle heading direction and change, IEEE Transactions on Intelligent Transportation Systems 21(5) (2020), 2012–2028. doi:10.1109/TITS.2019.2910591.
J. Eriksson, L. Girod, B. Hull, R. Newton, S. Madden and H. Balakrishnan, The pothole patrol: Using a mobile sensor network for road surface monitoring, in: Proceedings of the 6th International Conference on Mobile Systems, Applications, and Services, MobiSys’08, Association for Computing Machinery, New York, NY, USA, 2008, pp. 29–39. ISBN 9781605581392. doi:10.1145/1378600.1378605.
M. Ester, A density-based algorithm for discovering clusters in large spatial databases with noise, in: Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, AAAI Press, Portland, Oregon, 1996.
G. Griffin, K. Nordback, T. Götschi, E. Stolz and S. Kothuri, Transportation research circular E-C183: Monitoring bicyclist and pedestrian travel and behavior: Current research and practice, Transportation Research Board of the National Academies, Washington, DC, 2014.
G.P. Griffin and J. Jiao, Where does bicycling for health happen? Analysing volunteered geographic information through place and plexus, Journal of Transport & Health 2(2) (2015), 238–247, http://www.sciencedirect.com/science/article/pii/S2214140514001042. doi:10.1016/j.jth.2014.12.001.
F.E. Grubbs, Procedures for detecting outlying observations in samples, Technometrics 11(1) (1969), 1–21. doi:10.1080/00401706.1969.10490657.
H.H. Hochmair, E. Bardin and A. Ahmouda, Estimating bicycle trip volume for Miami-Dade county from Strava tracking data, Journal of Transport Geography 75 (2019), 58–69, http://www.sciencedirect.com/science/article/pii/S0966692318308639. doi:10.1016/j.jtrangeo.2019.01.013.
J. Hong, D.P. McArthur and J.L. Stewart, Can providing safe cycling infrastructure encourage people to cycle more when it rains? The use of crowdsourced cycling data (Strava), Transportation Research Part A: Policy and Practice 133 (2020), 109–121.
V. Janko, B. Cvetković, A. Gradišek, M. Luštrek, B. Štrumbelj and T. Kajtna, e-Gibalec: Mobile application to monitor and encourage physical activity in schoolchildren, Journal of Ambient Intelligence and Smart Environments 9(5) (2017), 595–609. doi:10.3233/AIS-170453.
P. Jonsson, J. Casselgren and B. Thörnberg, Road surface status classification using spectral analysis of NIR camera images, IEEE Sensors Journal 15(3) (2015), 1641–1656. doi:10.1109/JSEN.2014.2364854.
F. Kalim, J.P. Jeong and M.U. Ilyas, CRATER: A crowd sensing application to estimate road conditions, IEEE Access 4 (2016), 8317–8326. doi:10.1109/ACCESS.2016.2607719.
L.C. Klopfenstein, S. Delpriori, P. Polidori, A. Sergiacomi, M. Marcozzi, D. Boardman, P. Parfitt and A. Bogliolo, Mobile crowdsensing for road sustainability: Exploitability of publicly-sourced data, International Review of Applied Economics (2019), 1–22. doi:10.1080/02692171.2019.1646223.
A. Kulkarni, N. Mhalgi, S. Gurnani and N. Giri, Pothole detection system using machine learning on Android, International Journal of Emerging Technology and Advanced Engineering 4(7) (2014), 360–364.
X. Li and D.W. Goldberg, Toward a mobile crowdsensing system for road surface assessment, Computers, Environment and Urban Systems 69 (2018), 51–62, http://www.sciencedirect.com/science/article/pii/S0198971517301333. doi:10.1016/j.compenvurbsys.2017.12.005.
L.C. Lima, V.J.P. Amorim, I.M. Pereira, F.N. Ribeiro and R.A.R. Oliveira, Using crowdsourcing techniques and mobile devices for asphaltic pavement quality recognition, in: 2016 VI Brazilian Symposium on Computing Systems Engineering (SBESC), 2016, pp. 144–149. ISSN 2324-7894. doi:10.1109/SBESC.2016.029.
R. Mohamed, H. Aly and M. Youssef, Accurate real-time map matching for challenging environments, IEEE Transactions on Intelligent Transportation Systems 18(4) (2017), 847–857. doi:10.1109/TITS.2016.2591958.
W. Musakwa and K.M. Selala, Mapping cycling patterns and trends using Strava Metro data in the city of Johannesburg, South Africa, Data in Brief 9 (2016), 898–905, http://www.sciencedirect.com/science/article/pii/S235234091630662X. doi:10.1016/j.dib.2016.11.002.
F. Palumbo, C. Gallicchio, R. Pucci and A. Micheli, Human activity recognition using multisensor data fusion based on reservoir computing, Journal of Ambient Intelligence and Smart Environments 8(2) (2016), 87–107. doi:10.3233/AIS-160372.
E. Rampinini, G. Alberti, M. Fiorenza, M. Riggio, R. Sassi, T. Borges and A. Coutts, Accuracy of GPS devices for measuring high-intensity running in field-based team sports, International Journal of Sports Medicine 36(91) (2015), 49–53.
G. Romanillos, M.Z. Austwick, D. Ettema and J.D. Kruijf, Big data and cycling, Transport Reviews 36(1) (2016), 114–133. doi:10.1080/01441647.2015.1084067.
B. Rosner, Percentage points for a generalized ESD many-outlier procedure, Technometrics 25(2) (1983), 165–172. doi:10.1080/00401706.1983.10487848.
R. Sileryte, P. Nourian and S. van der Spek, Modelling spatial patterns of outdoor physical activities using mobile sports tracking application data, in: Geospatial Data in a Changing World, T. Sarjakoski, M.Y. Santos and L.T. Sarjakoski, eds, Springer International Publishing, Cham, 2016, pp. 179–197. ISBN 978-3-319-33783-8. doi:10.1007/978-3-319-33783-8_11.
G. Singh, D. Bansal, S. Sofat and N. Aggarwal, Smart patrolling: An efficient road surface monitoring using smartphone sensors and crowdsourcing, Pervasive and Mobile Computing 40 (2017), 71–88, http://www.sciencedirect.com/science/article/pii/S1574119216301262. doi:10.1016/j.pmcj.2017.06.002.
Y. Sun and A. Mobasheri, Utilizing crowdsourced data for studies of cycling and air pollution exposure: A case study using Strava data, International Journal of Environmental Research and Public Health 14(3) (2017), https://www.mdpi.com/1660-4601/14/3/274. doi:10.3390/ijerph14030274.
Team AMS, Understanding GPS tracking, Technical report, Australia, 2013.
X. Wang, J. Zhang, X. Tian, X. Gan, Y. Guan and X. Wang, Crowdsensing-based consensus incident report for road traffic acquisition, IEEE Transactions on Intelligent Transportation Systems 19(8) (2018), 2536–2547. doi:10.1109/TITS.2017.2750169.
Z. Yan and D. Chakraborty, Semantics in mobile sensing, Synthesis Lectures on the Semantic Web: Theory and Technology 4(1) (2014), 1–143. doi:10.2200/S00577ED1V01Y201404WBE008.
J. Zhang and M.F. Goodchild, Uncertainty in Geographical Information, CRC Press, 2002.
Y. Zheng, Trajectory data mining: An overview, ACM Trans. Intell. Syst. Technol. 6(3) (2015). doi:10.1145/2743025.