Fog-Computing-Based Approximate Spatial Keyword Queries With Numeric Attributes in IoV

Due to the popularity of onboard geographic devices, a large number of spatial–textual objects are generated in the Internet of Vehicles (IoV). This development calls for approximate spatial keyword queries with numeric attributes in IoV (A<sup>2</sup>SKIV), which takes into account the locations, textual descriptions, and numeric attributes of spatial–textual objects. Considering large amounts of objects involved in the query processing, this article comes up with the idea of utilizing vehicles as fog-computing resource and proposes the network structure called FCV, and based on which the fog-based top-<inline-formula> <tex-math notation="LaTeX">$k~\text{A}^{2}$ </tex-math></inline-formula>SKIV query is explored and formulated. In order to effectively support network distance pruning, textual semantic pruning, and numerical attribute pruning, simultaneously, a two-level spatial–textual hybrid index STAG-tree is designed. Based on STAG-tree, an efficient top-<inline-formula> <tex-math notation="LaTeX">$k~\text{A}^{2}$ </tex-math></inline-formula>SKIV query processing algorithm is presented. The simulation results show that our STAG-based approach is about <inline-formula> <tex-math notation="LaTeX">$1.87\times $ </tex-math></inline-formula> (<inline-formula> <tex-math notation="LaTeX">$17.1\times $ </tex-math></inline-formula>, resp.) faster in search time than the compared ILM (DBM, resp.) method, and our approach is scalable.

and caching capabilities to the edges of the network, and it facilitates localization decisions and rapid response.
As a kind of fog computing, vehicle fog computing (VFC) is considered as a promising method for supporting applications in IoV, which uses vehicles as an infrastructure to make full use of vehicle communication and computing resources. In particular, VFC utilizes a large number of cooperative enduser clients or near-user edge devices to perform huge amounts of communication and computation [3], which differs from other existing technologies in its proximity to end users, dense geographic distribution, and support for mobility [4], [5]. In order to enhance the computing and storage capabilities of the network edge, recently, a new network structure, named fog computing-based IoV (FC-IoV) [6], is proposed, which deploys fog servers at downtown intersections and accidentprone roads to enhance the computing and storage capabilities of the network edge.
Recently, lots of efforts are made to explore different kinds of issues on fog-based IoV, such as the optimal deployment and dimensionality (ODD) for autonomous driving [7] and reasonable and feasible resource allocation in real time [8]. However, there is few work on processing spatial-textual information generated in IoV to obtain user interested information. In real life, due to the popularity of on-board geographic devices, large numbers of spatial-textual objects are generated in IoV. To effectively process the massive data collected and obtain the information that users are interested in, spatial keyword query (SKQ) has been proposed and discussed [9]- [13], which uses a set of keywords and a spatial constraint to express user's interest in exploring useful information.
The existing work on SKQ query processing can be divided into two categories: 1) SKQ in Euclidean space [11] and 2) SKQ in traffic networks [14]. For SKQ in traffic networks, the schemes use real traffic network distance rather than the Euclidean distance in the Euclidean space, and thus can better meet the requirements of real-time applications in IoV. Moreover, considering that some previous work focuses on SKQ requiring exact keyword matching, and may result in too few results returned due to the diversity of textual expressions, recently, approximate SKQ (ASKQ) was explored. ASKQ can handle spelling errors and conversional spelling difference (for example, color versus colour), which appear in real applications frequently.
However, in many applications of IoV, such as mobile e-commence, various items are generated with textual descriptions, different attributes, and spatial locations.
Correspondingly, the requirement of a user could include a set of keywords, attribute-value pairs, distance limitation, or the number k of results, for example, "oxford," "dictionary," publish year = 2018 & price = 1000, and k = 5 (means the top-5 results). To capture the requirements of users, a spatial keyword search with numeric attributes is needed. Meanwhile, the more queries and objects involved, the more complex the query processing, which makes efficient query processing and fast feedback on query results a challenge. This calls for ASKQ (A 2 SK) with locations, textual descriptions, and numeric attribute requirement simultaneously. To this end, we also need to make full use of the potential communication and computing power around query users in IoV, in addition to the efficient query processing methods.
To address the issues mentioned above, this article explores the fog computing-based A 2 SK queries in traffic networks of IoV (A 2 SKIV), which poses three major challenges. First, query users and textual-spatial objects may distribute within a large traffic networks with millions vertices and edges in IoV. How to efficiently calculate the network distances between queries and objects is the first issue need to be handled. Second, with millions of textual-spatial objects in IoV, we need to consider a large number of keywords and attribute-value pairs. Moreover, approximate keyword match rather than exact keyword match is considered which makes A 2 SKIV search more complex. Third, many users may initiate queries simultaneously; the proposed matching method should be effective enough to significantly reduce the cost of query processing.
To support network distance pruning, keyword pruning, and numeric attribute-value pruning simultaneously, a novel spatial-textual hybrid index structure should be designed, which should consider the relative invariance of traffic network structure and the dynamic variation of textual-spatial objects and queries. First, we need a spatial index to keep the traffic network structure in IoV, thus given the positions of an object and a query, the network distance between them can be calculated quickly, while maintaining a reasonable and acceptable amount of storage space. Meanwhile, a textual and numeric index on the textual-spatial objects of each traffic network region (subgraph) is required too. In order to save space consumption, the textual information and numeric information need to be organized efficiently and smartly. Moreover, in order to improve the processing efficiency of a huge amount of unqualified textual-spatial objects, some efficient pruning rules are also needed.
In order to meet the requirements mentioned above, this article explores A 2 SKIV comprehensively, and the main contributions of this article are as follows.
1) The A 2 SKIV problem is formulated, which distinguishes itself from existing SKQ query efforts in that it takes into account textual similarity, numeric similarity, and spatial proximity in traffic network space, simultaneously. 2) A two-level spatial-textual hybrid index STAG-tree is presented. In addition, several lemmas are presented to prune a huge amount of unrelated objects. A top-k A 2 SKIV query processing algorithm based on the STAG-tree index is designed. In addition, we discuss how to extend the proposed method for supporting numeric attributes with interval values. 3) Simulation using two traffic networks together with their spatial-textual object sets is performed to evaluate the effectiveness of the proposed STAG-tree index and query processing algorithm. The remainder of this article is organized as follows. In Section II, we review the related work. Section III presents the system model and problem definitions. In Section IV, we introduce a hybrid index in detail. Top-k A 2 SKIV query processing algorithm is proposed in Section V. Section VI discusses extending the method for supporting attributes with interval values. Section VII gives the experimental evaluation and, finally, Section VIII concludes this article.

A. Fog Computing in IoV
In 2012, Cisco came up with the concept of fog computing. Since then, many efficient schemes were proposed [15]- [20]. An object cloud communication architecture [3] based on fog computing and intelligent gateway was proposed. Later, Aazam and Huh [21] proposed a system called fog micro data center, where the fog plays an important role in resource management, data filtering, preprocessing, data processing, and security measures. Meanwhile, Hou et al. [22] proposed a new concept of VFC, using vehicles as infrastructure to take full advantage of their communication and computation resources. An intelligent VFC system combining parking assistance and intelligent parking was discussed [7]. In particular, a vehicle reservation auction method based on VFC perception was designed to guide the vehicle to the available parking space with less effort during driving. Meanwhile, the vehicle's fog ability was utilized to compensate the vehicle's service cost through monetary reward, thus helping to delay the sensitive computing service. Yu et al. [6] discussed the ODD of FC-IoV infrastructure for autonomous driving. Two different architectural patterns, namely, coupling pattern and decoupling pattern, were proposed, and the ODD problem was transformed into two integer linear programming formulas to reduce the deployment cost. Such efforts improve the computing and storage capabilities of IoV and enable lots of applications. In edge-enabled networks, the geographic diversity of resources and various hardware configurations need to be carefully managed to ensure efficient utilization of resources. Lamb and Agrawal [8] analyzed the moving edge calculation of vehicle networks and introduced an architecture of evaluating available resources and allocating the most reasonable and feasible resources in real time.

B. SKQ Querying in Traffic Networks of IoV
In order to meet user's interests in IoV, lots of efforts are made to deal with moving top-k SKQ processing, directionaware SKQ processing, interactive top-k SKQ querying, keyword search based on distributed graphs [23], why-not rangebased skyline queries [24], and location-aware error-tolerant keyword search [25]. In order to accelerate the calculation of long road network distance, a multihop distance labeling scheme (DBM) was proposed [26], which is based on the Dijkstra method. Guo et al. [14] discussed the distributed SKQ search on the traffic network and proposed a new distributed index. By using this index, each machine independently evaluates search operations in a distributed manner. Gao et al. [27] discussed reverse top-k Boolean SKQ search in traffic networks, which shows how to use arbitrary k to answer the query without anticipatory computing. Zhao et al. [28] explored the time-aware SkQ queries on the traffic network. They proposed a novel TG index and several algorithms to efficiently process this type of queries. To support mobile search and targeted location-aware advertising, an inverted index-based solution (ILM) is proposed to improve query performance [29]. Li et al. [30] studied the intelligent augmented keyword search in real-life IoVs. A hybrid index called ASKTI was proposed. In ASKTI, the information of traffic network structure, keywords, Boolean expressions, and spatial information of objects are smartly organized, so as to prune unqualified traffic network space as early as possible. Abeywickrama et al. [31] discussed how to efficiently process SKQ queries on traffic networks and proposed K-SPIN, a versatile framework that avoids keyword separated indexes to reduce latency and avoid expensive operations.
To improve query processing performance in IoV, there are many similarity functions, such as edit distance, Jaccard, and n-gram [32]. To handle the inconsistencies and errors in queries and data, Alsubaiee et al. [33] proposed a natural index structure, which enhances the approximate keyword search ability of the spatial index based on tree. An approximate ngram matching method was proposed [34], which uses the long but approximate n-gram matching as the basis for pruning k nearest neighbor candidates. Zheng et al. [35] explored approximate keyword search in semantic track database and proposed a hybrid index called Giki. Giki consists of two components, which are SQ-Tree part using n-grams and K-Ref part using edit distance.
Although there are many effective query processing methods in IoV, most schemes face the following limitations.
1) Focus on exact keyword match, while ignoring approximate keyword match, which can handle spelling errors and traditional spelling differences that often occur in practical applications. 2) Only keyword matching and attribute-value matching are considered, ignoring spatial constraints. 3) Limited to Euclidean space, and the query search cannot be processed in traffic networks in IoV, a realistic application scenario. This article fills this gap by developing a two-level spatialtextual hybrid index, which can overcome the limitations mentioned above in IoV.

III. SYSTEM MODEL AND PROBLEM DEFINITIONS
This section first gives the system model, and then formulates top-k A 2 SKIV queries. Table I lists the notations that we use in this article.

A. System Model
To meet the requirements of efficient query processing and fast feedback on query results, a fog computing-based network structure FCV is adopted to utilize the computing and storage capabilities of edge devices, which is a hierarchical structure that consists of three layers. Fig. 1 illustrates the system overview and scenarios of FCV with moving and parked vehicles' service and applications.
The proposed FCV considers four scenarios of vehicle behavior states. Fog computing has the natural advantage of being closer to vehicle endpoints and mobile devices, thus avoiding the high latency associated with complex system responses and service failures associated with remote routing to remote cloud servers. To address communication and computing power issues, FCV employs vehicles and mobile devices as the infrastructures, making full use of their communication and computing resources. Moreover, RSUs and fog devices are adopted and deployed. In general, the deployment of RSUs and fog devices focuses on intersections in the city center and some road-sides on busy roads.
As shown in Fig. 1, the first layer of FCV is a cloud computing layer which includes cloud servers and gateways. In particular, the gateway communicates with other heterogeneous networks and can also send the filtered underlying data to cloud servers.
The second one is the fog computing layer including lightweight fog devices at network edges. Fog devices temporarily cache and process the raw-date a collected, and upload the filtered data to the cloud servers for further processing. Fog devices can also store some frequently accessed data for rapid-response processing.
The third layer is the accessing layer, which includes RSUs, vehicles, and mobile devices. RSUs provide open service access points for fog computing vehicles and mobile devices. Note that although RSUs and fog devices are deployed in similar locations, we will deploy them separately, taking into account the flexibility of deployment. RSUs, nearby vehicles, and mobile devices communicate wirelessly, exchanging information, and collaborating on computing tasks. However, RSUs communicate with fog devices via wired connection. There are four types of scenarios in the third layer, as described below. Fig. 1(a) and (b) illustrates the parked vehicles and mobile devices as infrastructures. A huge number of parked vehicles are scattered across the traffic network in IoV. These vehicles and mobile devices become a rich computing infrastructure, providing powerful computing resources and storage space. When joining the FCV, they can be used as a small data center to deal with a variety of complex tasks. Fig. 1(c) and (d) illustrates moving vehicles working as infrastructures. In urban areas, traffic is usually slow. In addition, most vehicles travel very slowly when entering the urban area, especially during rush hours, and there is a good communication between nearby mobile vehicles and devices. Moving vehicles can constantly transmit information by establishing new connections. When nearby moving vehicles join the FCV, they can collaborate and connect with each other and complete tasks using local computing and communication resources.

B. Problem Definition 1) Traffic Network of IoV:
A traffic network of IoV is modeled as a undirected weighted graph G = (V, E), where V is a set of vertices, and E is a set of edges. A vertice v ∈ V represents a road intersection or endpoint in the traffic network. An edge e(v i , v j , l) ∈ E, represents the road segment between two vertices v i and v j (i = j), and l represents the length of the road segment. Our model can be extended to support the directed weighted graph, which represents unidirectional traffic by simply allowing the length of e(v i , v j ) be set different from that of e(v j , v i ).

2) Spatial-Textual Objects With Numeric Attributes and Approximate Spatial Keyword Queries: Definition 1 [Spatial-Textual Objects With Numeric Attributes in Traffic Networks of IoV (Object for Short)]:
is a set of attribute-value pairs, and o · L is a spatial point on the edge of the traffic network. The size of o · V is the number of attribute-value pairs represented by n, and so o can be represented as Definition 2 [Approximate Spatial Keyword Queries With Numeric Attributes in IoV (A 2 SKIV)]: An A 2 SKIV query q is defined as q = (q · W, q · V, q · L), where q · W is the relevant keywords, q · V is a set of user-given attribute-value pairs, and q · L is a spatial point on the edge of the traffic network. The size of q · V is the number of attribute-value pairs represented by m, and so q can be represented as 3) Match Semantics: For A 2 SKIV query q and object o, to measure the relevance between q and o, there are three aspects should be considered, i.e., textual distance, numeric attribute distance, and traffic network distance between q and o.
Definition 3 (Keyword Mapping): For A 2 SKIV query q and object o, a keyword mapping from q to o, i.e., q.KM(o), is a set of keywords, in which each keyword is textual closest to q among all keywords contained by o in terms of edit distance, 1 i.e., w i = arg min w j ∈o.tags {d ed (q i , w j )}.
Definition 4 (Textual Distance): Given A 2 SKIV query q and object o, we first calculate the sum of edit distance between each keyword w i ∈ q.KM(o) and corresponding keyword q i ∈ q · W. To normalize the sum of edit distance calculated to range [0, 1], the max{|q · W|, |o.tags|}, which is the greater one between |q · W| and |o.tags|, is also considered as follows: Next, let us discuss how to calculate the numeric distance between query q and object o. Numeric attribute distance refers to the degree of difference between the values of q and o under the same numeric attribute, which is expressed as the size of difference.
For q and o, the numeric distance between q and o under each numeric attribute A j (1 ≤ j ≤ m) can be expressed as follows: Then, we normalize each numeric attribute distance to range [0, 1], and comprehensively consider the influence of each numeric attribute distance to calculate the total numeric distance between q and o.
Definition 5 (Numeric Distance): For each query attribute Max(A j ) and Min(A j ) are the maximum and minimum values of attribute A j for all objects in object set O, and 1.0 ≤ β j ≤ 10.0. Let e j = c j + 1 ≥ 1, the numeric distance D nd (q, o) between q and o can be defined as follows: (3) Travel distance is another aspect for query effort measurement, which is the length of the shortest path from query q to object o, i.e., D N (q.l, o.l).
Definition 6 (Travel Distance): Since the value of the Sigmoid function changes rapidly in the case of small variables, this is consistent with the intuition that user satisfaction is generally more sensitive to travel distance in the case of short distance. Therefore, we use the Sigmoid function to normalize travel distance to range [0, 1] where 0 < ρ ≤ 1 is the distance adjustment parameter. Finally, we adopt the concept of textual-numeric-spatial distance and combine the measurement of spatial, textual, and numeric relevance between q and o by using a simple linear interpolation. In particular, the textual-numeric-spatial distance between q and o is a linear combination of the spatial, textual, and numeric relevance between q and o, each weighted with parameters α, β, and γ , respectively.

C. Problem Statement
By using the textual-numeric-spatial distance D tns (q, o) to measure the combined proximity between query q and object o, we can formally define top-k A 2 SKIV query below.
Example 1: Fig. 2 illustrates an example of Top-k A 2 SKIV query on the traffic network in IoV, with ten spatial-textual objects and one query located on the edges. Each object has a set of keywords and a set of attribute-value pairs to provide its description information, and a spatial point on the edge of the traffic network to describe its location. The query q contains four items: 1) a set of query keywords {Theater, coffee}; 2) attribute-value pairs for "A 1 = 4.4 & A 2 = 45"; 3) a spatial point q l for its current location; and 4) a value k = 1 for top-1 related objects wanted. Note that A 1 ="rating," and A 2 = "pcc (per capita consumption)." We first consider o 5 , o 6 , o 7 , and o 9 , whose network distances from query q are the four most shortest ones among all the objects in O.  Assume Similarly, we can get D tns (q, o 5 ) = 0.1649, D tns (q, o 7 ) = 0.1682, and D tns (q, o 9 ) = +∞. Note that D nd (q, o 9 ) = +∞ since o 9 does not have query attribute A 2 , thus D tns (q, o 9 ) equals +∞. Then, object o 6 is the top-1 result object of q at this moment, and other objects can be evaluated similarly.
In the following three sections, the detailed method for top-k A 2 SKIV query processing is proposed, which includes hybrid index construction, Top-k A 2 SKIV processing scheme design, and extending our index constructed to support attributes with interval values. Top-k A 2 SKIV processing scheme consists of pruning techniques and query processing algorithms. The query processing flowchart is then shown in Fig. 3.

IV. HYBRID INDEX FOR A 2 SKIV QUERY PROCESSING
To improve query performance and efficiently prune irrelevant objects for A 2 SKIV queries as many as possible, a novel two-level spatial-textual hybrid index structure STAG-tree is proposed as shown in Fig. 4, which supports network distance pruning, textual pruning, and numeric attribute pruning simultaneously. STAG-tree also considers the relative invariance of traffic network structure and the dynamic variation of objects and queries. Then, the flowchart of building the STAG-tree is illustrated in Fig. 5.

A. Build G-Tree Component
G-tree [37] is an assembly based index and can efficiently support location-based queries on traffic network in IoV. A traffic network is modeled by an undirected weighted graph G = {V, E} as mentioned before, and G-tree can be constructed by using graph partitioning. First, the graph G is marked as the root of G-tree, and then G is partitioned into f equalsized subgraphs G 1 , G 2 , . . . , G f , i.e., |V G 1 |, |V G 2 |, . . . , |V G f | are almost the same, and works as the parent node of these subgraphs. Note for G i may exist u ∈ V i such that ∃(u, v) ∈ E and v / ∈ V i , such node u is called a border, and B G i is used to represent the border set in graph G i . Thus, G i can be denoted by and B G i denote the vertices, edges, and borders in G i which meet the following conditions: Then, subgraph G i is partitioned recursively, and the steps are repeated until each subgraph has no more than τ vertices. Note that f and τ are adjustable parameters. For example, as shown in  Fig. 2, the traffic network G 0 is first divided into two subgraphs G 1 and G 2 . Then, G 1 (G 2 , resp.) G 11 and G 12 (G 21 and G 22 , resp.). Assume f = 2 and τ = 6, the G-tree structure of the traffic network in Fig. 2 can be obtained as shown in Fig. 4(a). Note that the numbers under the ID of each subgraph are the IDs of its borders.
To accelerate the shortest path calculation, G-tree keeps the distance metrics (DM) which include the shortest-path distance between each border-border pair (border-vertex pair, resp.) for nonleaf nodes (leaf nodes, resp.). Particularly, an efficient bottom-up method is adopted to accelerate the distance computation. In this way, the DMs of the G-tree in Fig. 4(a) can be obtained, and the DM of each subgraph (or graph) is given next to it. The total space complexity of G-tree is O(log 2 f * √ τ * |V| + log f (|V|/τ ) * log 2 2 f * |V|), where |V| is total number of vertices in graph G, f is the fan-out of nonleaf G-tree nodes for graph G, and τ is the maximum number of vertices contained in each leaf node of G-tree. Note that log 2 2 , √ τ , and log f (|V|/τ ) are small numbers, thus the size of G-tree is scalable. Please refer to [37] for details.

B. Build Textual and Numeric Component
Second, as shown in Fig. 4(b), the dynamic part of the index, i.e., a textual and numeric index on objects, is constructed. T-Ref Part: As far as we know, it is unfeasible to calculate the edit distance during query processing by directly using the Wagner-Fischer algorithm [32]. Thus, for each leaf subgraph G i , we construct the T-ref part to index the edit distance of the objects within G i . For G i , we select a set of reference keywords R(G i )= {w G i r } to index the edit distances between the keywords contained in the objects within G i and R(G i ).
To construct the T-ref part for G i , we need to divide the keywords contained in all the objects within G i into N clusters, and select a reference keyword w G i r n for each cluster, thus to minimize the mathematical expectation of editing distance in each cluster. To this end, k-means clustering algorithm is adopted to obtain each cluster and its corresponding reference keyword. Thus, each object o i within G i is indexed in a B + -tree by the key y(o n i ). The key y(o n i ) is calculated according to the edit distance between the keyword w j i and the reference keyword where C equals the maximum edit distance between the reference keyword of the cluster and the keywords belonging to the cluster. To facilitate the edit distance calculation in query processing, we also calculate and keep the distance lower limit DL(w G i r n ) and the distance upper limit DU(w G i r n ) for each cluster. Example 2: Fig. 4(c) gives the T-ref for subgraph G 12 , where the keywords of objects within G 12 are partitioned into three clusters, whose reference keyword is "Theater," "coffee," and "bread," respectively.

A-Ref Part:
A-ref part is to facilitate the numeric distance calculation of the objects in subgraphs. For each numeric attribute A k (1 ≤ k ≤ n) of the system, we use [k − 1, k) to represent the value range of the objects with attribute A k . To map the attribute values of objects to the value ranges of attributes, each object o i within G i is indexed in a B + -tree by the key y(o k i ). The key y(o k i ) is calculated according to its attribute value, i.e., y(o Example 3: Fig. 4(c) also gives the A-ref for subgraph G 12 , where the numeric attributes of objects within G 12 are partitioned into three clusters, whose value range is [0, 1), [1,2), and [2, 3), respectively. For example, the attribute value for A 2 of o 6 is 45, and M 2 = 100, and then we have y(o 2 6 ) = (45/100) + 2 − 1 = 1.45. Remember, we partition the traffic network into equally sized subgraphs, while minimizing the number of border vertices at the same time. And then, the index part of each subgraph is constructed accordingly. To allocate the workload among different fog-devices in the second layer of our FCV structure, the information of STAG-index is partitioned, each corresponding to a subgraph. For a fog server, the index part of the subgraph on which it resides and the subgraphs surrounding it will be stored in the server.

V. PROCESS A 2 SKIV QUERIES IN IOV
This section introduces the top-k A 2 SKIV query processing method based on the STAG-tree index.

A. Pruning Techniques
First, several lemmas are introduced to efficiently prune the unrelated traffic network space and unqualified spatial-textual objects in IoV.
Lemma 1: Given a top-k A 2 SKIV query q = (q·W, q·V, q· L, k) and a subgraph G i , G i can be ignored if where o k is the kth nearest neighbor of q. Proof: For any object o in G i , we have Thus, we have As a result, any object o in G i cannot be a top-k result object. Thus, G i can be ignored.
Lemma 2: Given a top-k A 2 SKIV query q = (q · W, q · V, q · L, k) and a subgraph G i , if ∀q j ∈ q · W, q j .signature ∩ G i .signature = ∅, then G i can be ignored.
Proof: For G i , if ∀q j ∈ q · W, q j .signature ∩ G i .signature = ∅, which means that for any query keyword q j , there is no object in G i textual similar with q j , hence, G i can be ignored.
Lemma 3: Given a top-k A 2 SKIV query q = (q · W, q · V, q · L, k) and a subgraph G i , if ∃A j ∈ q · V, such that Max(A j )(orMin(A j )) for G i equals +∞, then G i can be ignored.
Proof: For subgraph G i , if ∃A j ∈ q · V, whose Max(A j ) (or Min(A j )) for G i equals +∞, it means G i does not include any object containing attribute A j , which also means that the numeric distance d j for any object o in G i equals +∞. Thus, D nd (q, o) equals +∞, which in turn makes D tns (q, o) equal +∞. Any object o in subgraph G i cannot be a top-k result object. Thus, G i can be ignored.
Lower Bound Distance Computation: The pruning strength of the above three lemmas is relatively limited. In order to further reduce unrelated subgraphs, for any subgraph (3). c) To calculate D LB tr (q, G i ), we use the shortest network distance between q and the border vertices of G i , if q does not belong to G i ; otherwise, a) The calculation of D LB nd (q, G i ) and D LB tr (q, G i ) is the same as that of the nonleaf subgraph. b) The TA-ref index of leaf subgraph G i will be used to calculate D LB tns (q, G i ), whose focus is the calculation of d LB ed (q i , w i ) for each query keyword q i ∈ q · W and its most mapping keyword w i for objects in G i . We will detail how to determine w i and its corresponding object o i as follows. Calculating d LB ed (q i , w i ) for q i : Since the edit distance follows the triangle inequality, we make use of the edit distances between q and reference keywords of the T-ref part in TA-ref index for G i .
First, the edit distance between q i and each reference key- , let d LB ed (q i , w i ) = 0, and the processing for k i completes.
Otherwise, we choose w r = arg min 0≤j≤n {d ed (q i , w G i r j )} and its two bounding values DL(w G i r ) and DU(w G i r ), and let Then, by using all the d LB ed (q i , w i ), D LB td (q, G i ) can be obtained through (1).
where o k is with the same meaning of Lemma 1, and D LB tns (q, G i ) is the lower bound of the textual-numeric-spatial distance between query q and any object o in G i , G i can be ignored.
Proof: Since D LB tns (q, G i ) > D tns (q, o k ), for any object o ∈ G i , there exist at least k objects whose textual-numeric-spatial distance between query q is smaller than that of o, thus o cannot be a top-k result object. Hence, G i can be ignored.

B. A 2 SKIV Query Processing Algorithm
Now we are ready to discuss the A 2 SKIV query processing algorithm using STAG-tree index, which is called A 2 S 2 KG. It takes as inputs an STAG-tree ST and an A 2 SKIV query q = (q · W, q · V, q · L, k), and outputs the result object set S result . A 2 S 2 KG progressively accesses the nearest subgraphs and retrieves the most relevant objects. Finally, the k objects with the smallest textual-numeric-spatial distance value, D tns (q, o), form the query result set.
The detailed steps of the A 2 S 2 KG algorithm are shown in Algorithm 1. First, a min-heap HG is initialized to empty for organizing the nodes (subgraphs) or objects to be visited. Moreover, a set S result is adopted to keep the result objects for query q, and a float D tsk is initialized to be +∞ for keeping the textual-numeric-spatial distance of the current kth nearest neighbor from query q. In particular, HG is an ordered structure and D LB tns (q, P node ) is the key of a node (subgraph) P node in HG.
A 2 S 2 KG first locates the leaf node (subgraph), leaf(q), where q lies in. For each object o in leaf(q), it inserts o together with its D tns (q, o) into heap HG, and updates D tsk accordingly, if D tns (q, o) is no larger than D tsk (lines 4-6). Then, it uses pointer P node to keep the upper most node (subgraph) visited of ST and uses variable P LB to keep the lower bound of the textual-numeric-spatial distance between query q and P Node , i.e., D LB tns (q, P Node ). Let P node point to leaf(q) and P LB be D LB tns (q, P Node ) (line 7), and then visit ST in a bottomup manner (lines 8-23). If HG is empty, the adjust function is called to move P node to its parent node and update P LB accordingly (line 10). The adjust function will also process each unvisited child nodes of new P node . The detail steps of the adjust function are shown in Algorithm 2.
Next, a tuple (c, dis) is popped-out from HG. Note that (c, dis) is the head element of HG, and HG is ordered by the (lower bound of) textual-numeric-spatial distances of its elements from query q. If dis, which is the (lower bound of) distance of head element c from query q, is larger than P LB , then the query answer may be existed in the parent node of P node , thus the adjust function is called to move P node to its The detailed steps of the Adjust function are shown in Algorithm 2. It first moves P node to its parent node (line 2). Then, for each unvisited child node s of P node , the Gjudge function (Algorithm 3) is called to check if s possibly contains result objects, and if true, we calculate the lower bound of the textual-numeric-spatial distance between s and query q, i.e., D LB tns (q, s). Finally, P LB , which keeps the minimum value of D LB tns (q, s) for all the child nodes of P node , is returned. The pseudocode of the Gjudge function is shown in Algorithm 3. For node s, Gjudge uses Lemmas 1-3, respectively, to check if s is a qualified subgraph, otherwise, s is safely pruned and 1, which is the uppermost limit of D tns (q, s), is returned. If s is not pruned, we: 1) calculate D LB tns (q, s) and 2) push s together with D LB tns (q, s) into HG (update D tsk accordingly), and return D LB tns (q, s) if D LB tns (q, s) ≤ D tns (q, o k ); otherwise, s is pruned by Lemma 4 and 1 is returned, since if D LB tns (q, s) > D tns (q, o k ), s cannot contain any result object. The processing step of the Ojudge function is similar to that of the Gjudge function, and so we omit the discussion for space limitation. Time Complexity Analysis: Finally, we discuss the time complexity of the A 2 S 2 KG algorithm. Given an object o and a top-k A 2 SKIV query q = (q · W, q · V, q · L, k), o is a candidate result object for q if: 1) o·V contains all the numeric attributes of q · V, i.e., ∀q · V · A j , 1 ≤ i ≤ m, ∃o · V · A i = q · V · A j , whose probability can be represented as Pr AM (o) and 2) ∃q j ∈ q · W, q j .signature ∩ G i .signature = ∅, whose probability can be . Here, C i j is the number of combinations of taking i elements from a set of j.
For object o and query q, it is difficult to calculate the exact value of Pr KM (o), thus we use the probability of o.tags containing at least one keyword in q · W to approximate Pr KM (o). Assume m and n are the numbers of keywords for o and q, respectively, and m ≥ n. Thus, we have Since we use the G-tree to compute the shortest path distances, the time cost for computing D tr (q, o) is O(τ * log τ + log f (|V|/τ ) * log 2 2 f * |V|) [37]. To sum up, the time complexity of the A 2 S 2 KG algorithm is O([k/(Pr cand (o))] * (W 2 dis +V 2 dis )+ τ * log τ + log f (|V|/τ ) * log 2 2 f * |V|).

VI. SUPPORT ATTRIBUTES WITH INTERVAL VALUES
In real applications, the attribute value of an object is not necessarily a specific value, but usually an interval of values. In this section, we discuss extending our STAG-tree index to handle this situation, where the attribute value of the object is an interval of values.

A. Modification of Numeric Distance Calculation
First, we modify the calculation of the numeric distance between query q = (q · W, q · V, q · L, k) and object o = (o.tags, o·V, o·L), for objects with attributes of interval values (IV for short). Remember that numeric attribute distance refers to the degree of difference between the values of query q and object o under the same numeric attribute.
For query q and object o, the numeric distance d k between q and o for attribute A k of interval values can be expressed as follows: Then, by using (3), we normalize each noninfinite numeric attribute distance to range [0, 1], and comprehensively consider each noninfinite numeric attribute distance to calculate the total numeric distance between q and o. Note if there is any query attribute not existing in o · V, the numeric distance between q and o, i.e., D nd (q, o), equals +∞.

B. Index Modification
Next, we discuss extending STAG-tree index to support queries and objects with attributes of interval values. In particular, the A-ref part needs to be modified to accommodate the interval values of numeric attributes for objects and queries.
In Section IV-A, for each numeric attribute A k (1 ≤ k ≤ n), we use [k − 1, k) to represent attribute-value range of A k . Moreover, we map o i .V k , which is the attribute value for A k of object o i within G i , to the value range of A k by the key Example 4: As shown in Fig. 6, we add the fourth numeric attribute, i.e., A 4 (business hours), for the objects in the system, and the value range of A 4 is [3,4). Assume that the business hours for o 6

VII. PERFORMANCE EVALUATION
A. Experimental Settings 1) Data Sets: We use two data sets, Florida (FL for short) and California (CAL for short), to test the performance of the proposed methods. FL and CAL consist of the traffic network, the users, and points of interest (POIs) of Florida and California, respectively. The object information for CAL comes from the Geographic Names Information System in the United States (geonames.usgs.gov/domestic). Each object includes an object ID, a textual description, and a location within the road traffic network. For the FL data set, we use the objects extracted from Twitter (www.twitter.com), and each object includes an object ID, a Twitter message, a time of publication, and a location in the Florida transportation network. The detail information of FL and CAL is shown in Table II.
2) Queries: To evaluate the performance of A 2 SKIV, we generate a set queries, including locations, keywords, and attribute-value pairs. The keywords and attributes of A 2 SKIV queries are also obtained from Twitter. In addition, attribute values are randomly selected and range from 1 to 1000. The number of query keywords and query attributes ranges from 1 to 5 and 1 to 4, respectively, with a default value of 2.
3) Algorithms: Our STAG-tree-based method (STAG for short) will be compared with two baseline methods, DBM [26]  and ILM [29], in terms of memory consumption and processing time. Specifically, DBM is based on the Dijkstra method. Starting from query q, DBM performs network expansion for candidate objects and calculates the textual-numericspatial distance of the object o encountered from query q, i.e., D tns (q, o). In order to accelerate the calculation of long road network distance, a multihop distance labeling scheme is adopted. ILM is an inverted-list-based scheme. For each keyword w, let the set of n-grams [32] contained in w be S w . Thus, for object o, we have S o = ∪ w∈o.tags S w . For each n-gram ζ , a list l ζ containing the ID of objects, whose n-grams contain ζ , can be obtained. For each query keyword q i ∈ q · W, S q i is computed, and then by using the heap algorithm [38], the object lists l ζ j s (for each ζ ∈ S q i ) are merged, to get a new list l q i of objects for q i , whose objects are sorted in descending order of |S q i ∩ S o |. Thus, the objects sharing no common n-gram with q can be safely pruned. Similarly, for each query attribute A i , we also have a list l A i containing the ID of objects which contain attribute A i . Thus, the objects do not contain all the attributes A i ∈ q · V are ignored.

B. Efficiency Measurement
This section evaluates the performance of three methods by varying the object cardinality, number of query results (k), number of query keywords, number of query attributes, and the values of preference parameters (α, β, and γ ). The memory space for query processing is also studied. The main parameters and their values are shown in Table III. 1) Memory Consumption: The memory consumption of three methods is shown in Fig. 7, which increases as the number of objects increases. The more objects, the more storage space they take up. Generally speaking, STAG and ILM consume more memory resource than that of DBM for both FL and CAL data sets. For ILM, the reason is that it builds an inverted list for each keyword and attribute, and the object IDs store multiple copies in inverted lists. As for STAG, we build T-ref and A-ref part to keep the textual and numeric information of each object.
2) Effect of |D|: The running time of methods with respect to the number of objects in the system is shown in Fig. 8. It is observed that STAG outperforms its competitors. On average, the STAG-based approach is about 1.87× (17.1×, resp.) faster in processing time than the compared ILM (DBM, resp.) method. It is due to the fact that STAG can prune huge amounts of unpromising objects based on network distance, textual similarity, and attribute similarity, simultaneously. Fig. 8 also shows that the running time of three methods increases as the object cardinality increases. It is natural since more related objects need to be considered when there are more objects in IoV. Moreover, STAG and ILM are much more scalable on the FL and CAL data sets than DBM, because DBM checks objects in the order being encountered. On the contrary, STAG and ILM arrange the objects according to keywords and attributes. Therefore, objects that do not contain all query attributes (or do not have any keyword similar to any query keyword) can be pruned securely, which makes both approaches more scalable than DBM.

3) Effect of k:
The effect of value k (number of results wanted) on the running time of STAG, ILM, and DBM is evaluated. Fig. 9 shows that STAG significantly outperforms ILM and DBM, since it uses the STAG-tree index to prune large parts of unqualified objects. On the contrary, DBM performs the worst because it examines all the objects in the order being encountered and then computes their textual-numericspatial distance values. As for the stability of methods, when the value of k varies, all methods incur higher cost with larger k, because the larger k is, the more related objects need to be examined. For STAG, the increase of k value has no obvious effect on the performance due to the effective pruning scheme.

4) Effect of |q·W|:
We also evaluate the query performance when the number of query keywords, |q · W|, varies. Fig. 10 shows that the running time of all methods increases with the increment of |q · W|. For STAG and ILM, the reason is that an object with any keyword similar to any query keyword has a chance to be one of the query results, thus more qualified objects need to be considered with larger |q · W|. The processing time of DBM increases slightly with larger |q · W|, because it requires more computation time to calculate the textual-numeric-spatial distance values of objects. Not surprisingly, STAG gets the best performance of three methods. For example, STAG requires only 36.4% (6.3%, resp.) processing time of ILM (DBM, resp.) when |q · W| equals 3 for FL data set. 5) Effect of |q·V|: Now we continue to evaluate the impact of number of query attributes, |q · V|, on performance of three schemes. Fig. 11 shows that all methods incur less processing time with larger query attributes. The reason is twofold. On the one hand, a candidate object is needed to contain all query attributes, and the more query attributes there are, the fewer eligible objects there are. On the other hand, the calculation of numeric distance values for candidate objects is little more difficult with more query attributes. Overall, the former outweighs the latter, so the total processing cost of the methods decreases as the number of query attributes increases. It is worth noting that the decreasing tendency of STAG is more obvious than its competitors due to its significant pruning ability, i.e., most of the cells (subgraphs) in STAG tree can be ignored as more attributes are queried.
6) Effect of α, β, and γ : Parameters α and β control the importance of textual and numeric similarity between queries and objects, respectively. When the value of α or β changes separately, there is no fixed impact pattern on the performance of query processing. As a result, we do not give the results of α and β for effective measurement, and only show the impact of γ on query efficiency. Note that varying the value of γ means varying the sum of α and β. Fig. 12 gives the query performance of these three methods for different γ values. Again, our STAG significantly outperforms its competitors. On average, it incurs only 38.0% (6.5%, resp.) query time of ILM (DBM, resp.) for the CAL data set. As for the stability of methods when the value of γ varies, three methods incur lower cost with larger γ , since larger γ means the spatial proximity between the query and objects becomes more important, thus the candidate objects may locate within a more concentrate range and fewer relevant objects need to be considered.

VIII. CONCLUSION
This article formulated and solved fog-computing-based A 2 SKIV in IoV. A fog-based network structure FCV is adopted to improve query processing efficiency and reduce query feedback time. To deal with A 2 SKIV queries, a twolevel hybrid index STAG-tree is proposed, whose first level is a G-tree which accelerates the calculation of the network distance between objects and the query, and whose second level is the textual and numeric component which efficiently organizes the information of objects within the subgraphs of traffic network in IoV. In addition, several lemmas are presented to prune a huge number of unqualified textual-spatial objects, and an efficient top-k A 2 SKIV query processing algorithm was presented. The effectiveness of the proposed index and query processing algorithm is verified by extensive experimental evaluation using real and composite data sets. The results also showed that the proposed scheme is effective in applications, such as mobile search and targeted location-aware advertising in IoV.