You are viewing a javascript disabled version of the site. Please enable Javascript for this site to function properly.

# Probabilistic data structures in smart city: Survey, applications, challenges, and research directions

#### Abstract

With the commencement of new technologies like IoT and the Cloud, the sources of data generation have increased exponentially. The use and processing of this generated data have motivated and given birth to many other domains. The concept of a smart city has also evolved from making use of this data in decision-making in the various aspects of daily life and also improvement in the traditional systems. In smart cities, various technologies work collaboratively; they include devices used for data collection, processing, storing, retrieval, analysis, and decision making. Big data storage, retrieval, and analysis play a vital role in smart city applications. Traditional data processing approaches face many challenges when dealing with such voluminous and high-speed generated data, such as semi-structured or unstructured data, data privacy, security, real-time responses, and so on. Probabilistic Data Structures (PDS) has been evolved as a potential solution for many applications in smart cities to complete this tedious task of handling big data with real-time response. PDS has been used in many smart city domains, including healthcare, transportation, the environment, energy, and industry. The goal of this paper is to provide a comprehensive review of PDS and its applications in the domains of smart cities. The prominent domain of the smart city has been explored in detail; origin, current research status, challenges, and existing application of PDS along with research gaps and future directions. The foremost aim of this paper is to provide a detailed survey of PDS in smart cities; for readers and researchers who want to explore this field; along with the research opportunities in the domains.

## 1.Introduction

In the early 1970s, it was a period of acceleration in the field of computing and data was the new term. There was an evolution of relational databases between the 1980s and 1990s. The internet and IoT are clusters of unstructured, semi-structured, and structured data. In the 1990s, there was slow or no internet in most cities. These were ordinary cities. Generally, ordinary cities are defined as human settlements without the use of the latest technology at all. The need for processing, storage, and analysis is required beyond the human and technical infrastructure. This huge volume and variety of unstructured data and untapped information is spread across the networks. The core part of Apache Hadoop is the Hadoop Distributed File System (HDFS), consisting of the storage part, and the MapReduce programming model for processing [4]. As the data is increasing day by day, to access this data in an efficient and well-structured manner, the government and public sector require technologies and expertise to convert urban cities to smart cities. The research interest in smart cities has continuously increased in the current and coming years and it is economically justified by the progress of state-of-the-art technologies [21]. New inventions like the internet, the cloud, and IoT have increased data generation speed as well as introduced a variation in data. The smart city impacts towns and cities around the world with the evolution of the internet and the use of new technologies. But in the current context, rapid population growth is creating challenges for the government and its services. So, smart cities are the most acceptable solution to such situations of better metropolitan living conditions [146]. The emergence of a smart city has the potential to manage all these issues (Fig. 1). But, expeditious urbanization also quickly presents a major challenge worldwide. Though movement from rural to metropolitan areas is unavoidable, that’s why cities persist in facing many challenges like transport, healthiness, air quality, agriculture, and many more [87]. Smart cities face many issues like pollution, health assistance, overburden on both public and private sectors, traffic, etc. due to the rapid growth in population in metropolitan cities [126]. In this connection, the problems associated with these issues in cities require ingenious solutions. They include human strength, ingenuity, and collaboration with different stakeholders [117].

##### Fig. 1.

Smart city.

In a recent survey, the United Nations estimated that the world’s population will grow by 32% from 2015 to 2050 and that the population will grow by 63% in the metropolitan area. Several researchers also stated that by 2030, more than 60% of the population will be living in cities, with the southeast United States experiencing the greatest growth [74,275]. The extensive and rapid growth of information and Communication Technology (ICT) has provided possible solutions and mechanisms to various problems in metropolitan areas [13] this approach can be used as a tool [267] to increase the effectiveness of the city system. In this regard, the thinking or mentality of humans is required to be changed, like a smart mind, to raise the standard of living for sustainability in the smart city. Furthermore, remote monitoring and management systems are being used to improve energy efficiency in smart cities [316].

Since the advent of the coronavirus and its declaration as a worldwide epidemic, all public and private sectors have been affected. This virus has affected daily routines and activities like education, communication, daily movement, labor, and many more. Some public and private sectors have worked with 50% manpower on alternative days. Many sectors, like IT companies, schools, etc. have run Work From Home (WFH). Even doctors have their OPD or patients diagnosed online. The key solution to this epidemic’s key problems for daily life can be solved through the smart city. Jasim et al. [129]made a wise decision to help society and expand the resources available to individuals in areas such as healthcare, communication, transportation, education, and many others.

The Government of India (GoI) has a long-term vision for smart cities. GoI defines smart cities as a mission to conduct monetary growth and improve the quality of life by allowing the development of regional zones and utilizing technology that directs to smart results. They also designed a workbook “Making a City Smart: Learnings from the Smart Cities Mission” with activities in each section to help cities plan their smart city journeys. Creating a smart city incorporates lessons from the smart city mission to clarify what, why, and how a smart city works. Consolidated at the national level, 100 smart cities have proposed to undertake 5,151 projects worth Rs. 2,05,018 crores in 5 years from their selected dates. New financial innovations are built on investment plans. The distribution is estimated to be from a variety of sources as follows: 6480 projects worth Rs. 185,905 Cr are tendered, 5845 projects worth Rs. 157,369 are in work order stages and 3145 projects worth Rs. 53,256 Cr are completed. The cities like Gwalior, Thiruvananthapuram, Satna, Udaipur, New Town Kolkata, Bhubaneshwar, and Ahmedabad are covered under various smart city projects [183].

### 1.1.Role of big data in smart city

Big Data basically stands for velocity, volume, and variety of information, which includes complexity in terms of speed of data generation, the structure of data (variety), and the amount of data to be generated. Big data has a number of definitions and’V’ principles to make big data processing clear and accurate [118]. Anything beyond the human and traditional infrastructure and techniques required to store, process, and analyze big data provides the solutions for it. With the advent of new technologies such as IoT, ICT, sensors, and so on, big data systems are becoming more efficient in sophisticated data infrastructures [170].

##### Fig. 2.

Influence of big data in smart cities.

The role of big data in smart cities is heavily influenced, particularly in recognizing patterns, analyzing, and processing data collected from various IoT devices (Fig. 2). Every city is evolving towards being a smart city. These cities combine basic needs with high technology for a carefree and basic lifestyle. To extract important information from such huge data sets, big data analysis is the key [7]. The data collected from different sensors and devices in a smart city’s various gates installed in the city is used for better decisions. This growth in data requires efficient storage and handling, which is a big challenge for both academia and industry [96]. Big data has different effects in different parts of the city.

The protection of the general public: The security and privacy of citizens is the main concern in smart cities. To protect their citizens from anything mishappening within the city, various analyses of geographical data can be done. This can all be recorded through Close Circuit Televisions (CCTVs) and sensors installed on street lights or traffic lights. An enormous amount of data is developed and significant expansion is required when the desired data transforms the city into a considerably safer location.

Urbanism: Cities are investing heavily in transforming to become smart cities. The current needs of the city can be identified through the effective use of data, which can aid in identifying areas that require development and improvement. As a result, cities can invest by volunteering in areas where they are needed.

Transportation: After Covid-19 both public and private transportation are affected after COVID-19. Most people rely on their own reasoning to get to work and so on. The congestion on the roads is increasing, so the risk of accidents or anything else going wrong has also increased. To handle or manage traffic on roads, the system requires big data and well-structured data. Large data sets will also help reduce risks.

Sustainable Growth: This is also the main part of a smart city. For sustainable growth, a huge amount of past and present data is required. Likewise, storing this data is one of the challenges for the smart city. This data is updated on a daily basis. Data is a special factor in determining the impact of improvement in the city.

Infrastructure: To maintain sustainability, smart city infrastructure needs to improve consistently. Obviously, for this data, a smart city is required and used in a good manner to improve or maintain the infrastructure.

So, the role of big data in smart cities is significant for efficient and quick results. Big data is the brain of the smart city. The main challenge is the lack of awareness of using this data to create smart solutions to fulfill the requirements of citizens. Citizens are the main stakeholders of smart cities. So, to improve the standard of living, the developer needs new data processing tools and techniques. There is computational, time, and space complexity by using the deterministic data structure. Probabilistic Data Structures are one of the key solutions for smart storage and searching of data in real-time scenarios.

The traditional methods may also produce accurate results in real-time, but there is a trade-off in space and time, which is not acceptable due to the massive amount of data. Due to the huge and limitless evolution, the traditional data structure is shifted to PDS for retrieval and storage. PDS has given an approximation solution, which may be or may not be the exact answer, but it moves in the right direction. Since PDSs have some probabilistic components, they are efficient in reducing the time or space trade-offs. PDS also plays a key role in big data processing, storing nonstructural data, helping with fast retrieval, and making approximate predictions. Hash functions are used to represent these data structures [246]. The PDS is used for membership checking, frequency testing, similarity testing, and cardinal counting. Low memory requirements and good processing speed are two distinct features of PDS [136].

The extensive review of this paper has found the key challenges faced during the handling of big data in smart cities. In this regard, the PDS and its variants have provided the key solutions to these challenges and also found new possibilities. In this paper, the importance of big data in smart cities, generation (Section 2), architecture (Section 4), and the detailed application (Section 7) of smart cities, worldwide running smart city projects (Section 5) and commonly used PDS (Section 6) have been discussed. During the survey, it was found that big data has a high influence on smart cities. PDS has evolved as a potential solution for many applications in smart cities to complete this tedious task of handling big data with real-time response. PDS has been used in many smart city domains, including healthcare, transportation, the environment, energy, and industry. This paper has thoroughly investigated the prominent domain of smart cities, including the origin, current research status, and existing applications of PDS, as well as research gaps and challenges (Section 10).

## 2.Generations of smart cities

To improve the standard of living there is a need to change from rural to urban. Governments and citizens are increasing their attention in terms of technologies and new startups in smart cities. The concept of a smart city for urban transformation has radically changed over the years. The generation of smart cities has been concise (Table 1). Based on analysis and study, several researchers have divided the technical advancement in smart cities into different generations (generations 1.0–5.0) (Fig. 3) [140,265]:

##### Fig. 3.

Generation of smart city [140].

##### Table 1

Generation of smart city

 Parameters Generation 1.0 Generation 2.0 Generation 3.0 Generation 4.0 Generation 5.0 Year 1974–2000 2000–2010 2011–2018 2018–2020 2021-future Main Objective Improving efficiency of city administration To address certain issues like as pollution growth, healthcare, and transportation Public health and safety, practical intelligence, and data analysis Aggregate procedure and the challenge of incorporating resolutions Human interaction with the AI system Focus on Technological pressures and the influential role of big corporations, like CISCO and IBM New technologies, exploring various options for enriching the grade of life in cities Urban development, participate in the modern city building program Understand the opportunities and boundaries of latest technologies and value the influence that smart city technologies Evaluate all factors of life and the inconsistent claims of further metropolis stakeholders Data/ Information Sharing Within in an area between two machines and limited Machine to Machines with high range Data Sharing on Cloud Human Interaction with AI Key Technologies Technology Driven, Urban Big Data Technology Enabled, Sensors, Networks Citizen Co-creation, Digital Technology, IoT, 5G Cloud Computing (CC), Sidewalk Labs (Toronto and Google) AI, Robotics, IoT, 6G Limitations Data Exchange lack of technologies usages Privacy and Security Data management New and untested technologies

Generations 1.0: – When the technology vendors started implementing their own solutions in cities, it was defined as the first technology-driven smart city. Due to the influential role of big corporations like IBM and Cisco, they are criticized for their technology concerns. To improve the efficiency of public and private services, the creators of technology development are encouraged to use their own solutions in cities.

Generations 2.0: – Indirectly concerned with citizens, issues like healthcare, transport, air quality, and water quality are arising. The technology and tools were developed to address such issues in generation 2.0. The participation of citizens was negligible in decision-making in cities [276]. The quality of life and local administration was measured by modern technology. For this, cities presented agendas and schemes that support the execution of trendy technology.

Generations 3.0: – Thus, the modern city represents the whole connected ecosystem that integrates ICT into a smart city. The modern city building programs were organized and the local public participated. It was the time when the government acted as a facility providing services to citizens. Here also, citizens are competent to represent their thoughts and innovative ideas [251]. Here also, the urban space is designed for users and their engagement.

Generations 4.0: – By adopting 4.0 industrial transformation, the benefits of smart cities are valued to outweigh city costs through city establishment [298]. This generation of smart cities illustrates the most useful parts of the past; for example, technology generation 1.0, individual performance 2.0, and 3.0 engagement. Smart City Generations 4.0 is inspired by Industry Revolution 4.0, and they develop new technologies. In this, control is over the use of existing resources and infrastructure.

Generations 5.0: – In this generation 5.0, the concept of cognitive computing is introduced for developing cities. It is purely based on Artificial Intelligence (AI). Systems are self-learned from past and present knowledge that reflects changes in real-time like interest, barriers, etc [228]. Each public and private service can be handled by an independent agent, which gives fast and efficient results. Generations 5.0 is the main focus on decision-making for urban development by using behavioral analysis and AI [264].

## 3.Ranking of smart city

Worldwide, there are different ranking systems based on distinct parameters. Out of these, the most popular are, “Liveable City Index (LCI)” [204],“intelligent cities” [292], “sustainable cities” [28], “global cities” and “competitiveness cities” [266].

The center smart city index is used for ranking by researchers’ organizations such as Smart City Observatory, in collaboration with IMD competitiveness [44]. To manage the infrastructure of smart cities like transportation, traffic, street light etc. the technologies like AI and IoT may be used. Muhammad A.R. Tariq et al. [266] tries to identify the preferences of people living with different city ranking systems, and the top five cities in the last three years are given in Table 2. Another ranking criteria is that Juniper Research, an analyst firm based in the UK, has ranked the cities as smart cities on various factors like transportation, energy, healthcare, connectivity between urban areas, etc. Shanghai, Seoul, Barcelona, Beijing, and New York are among the top five global cities considered. These cities mainly work on real-time data, which helps in managing the assets and future-proofing them. They also cover the downtown to provide 5G and 99% of fiber coverage across the city. To fully fill, the needs of residents, these smart cities are using the “Citizen Cloud App”, which uses ambient technologies like AI, Cloud Computing, and Big Data, which come under Smart City Generation 5.0 [20].

##### Table 2

Top five smart cities in last three years

 Years Smart Cities 2021 Singapore, Zurich, Oslo, Taipei City, Lausanne. 2020 Singapore, Helsinki, Zurich, Auckland, Oslo. 2019 Singapore, Zurich, Oslo, Geneva, Copenhagen.

## 4.Architecture of smart city

Smart city development includes the integration and implementation of digital and IoT. IoT provides essential elements of smart cities like data production, data management, and application management. The number of smart city architectures proposed over time [226]. In Fig. 4 the most generic architecture has been represented with four layers. All these four layers are integrated with security modules because of sensitive data.

Sensing layer: – The bottom-most layer of the architecture represents the sensing layer or data collection layer. IoT devices are used to build this layer. This layer uses various IoT devices like actuator, Zigbee, Radio Frequency Identification (RFID) sensors etc. to sens these various parameters like humidity, temperature, pressure, etc. Data collection on mobile devices is a huge burden on the sensor layer, which resides under the structure. This layer captures real-time data from sensors [26,192].

Transmission layer: – With a variety of communication technologies, the transfer layer carries data to higher levels. This layer acts as the spine of any smart city architecture. In this layer, the various communication networks like 3G, 4G, LTE, the internet, and satellite also help in mobile networks. Fifth-generation (5G) telecommunication is embedded in the base station for transferring huge wireless traffic [100].

Data management layer: – This layer resides between the transmission and application layers and is the brain of a smart city. The functions of this layer are deception, editing, analysis, storage, and decision-making [195]. The stored information in this layer is used to provide services to various applications in the top layer. The primary role of this layer is to preserve data integrity, data purification, expansion, and optimization [285]. As the final function of the data management layer, the conclusions obtained are transferred to the application layer for proper use.

##### Fig. 4.

Layered architecture of a generic smart city [243].

Application layer: – An application layer is required to connect the data management layer with urban residents. This is the topmost layer of this architecture. This layer provides assistance to the users. It operates applications that use IoT, for example, smart homes, grid distribution, smart transport, weather forecasting, etc., and intelligent health [317]. As this layer is directly connected to the end-users, the satisfaction of users may increase with the improvement of services provided.

## 5.Projects contributions related to smart city

The research on smart city buildings has been accomplished worldwide. Hence, the concept of smart cities was introduced. The research was conducted by various international organizations, universities, and businesses. Various countries, like the US and China, have also accomplished research in intelligent urban design [156]. The smart city encourages the planning of metropolitan, oversight through ICT, IoT, CC etc [157]. The evaluation method proposed by IBM is very focused on building standards and relevant standards. So far, the various initiatives, contributions, achievements, and projects in smart cities are listed in Table 3 [106].

##### Table 3

Projects contributions related to smart city

 Year Project/Achievement Technology/Contribution Location 1974 A Cluster Analysis of Los Angeles [139] Urban Big data Los Angeles 1994 A Virtual digital City – De Digitale Stad (DDS) [227] Internet Use Amsterdam 2005 Research on smart cities [2] Spent $25 m Cisco 2008 Smarter Planet Project [206] Sensors, networks and analysis of urban issues IBM 2009 Smarter Cities Campaign [189] Spent$50 m IBM 2009 Smart Grid [110] Provide funds American Recovery and Reinvestment Act (ARRA) 2009 Smart Meters [32] 80% of consumers by 2020 EU Electricity Directive required 2010 Yokohama Smart City Project (YSCP) [70] Infrastructure, Next Generation Energy, and Social Systems Trade, Ministry of Economy, and Industry, a Japanese government organization 2011 Competition of 200 applicants for smart city [47] 24 cities are winners IBM 2011 Expo World Congress [45] 50 countries attended in Barcelona 2012 Public transit, parking and street lighting [27] Data-drive urban systems Barcelona 2013 Smart London Board [268] Digital Technology Mayor of London 2014 103 pilot smart cities [304] Second batch China 2014 Wien Framework Strategy [209] Launch smart city until 2025 Vienna City Council 2015 100 Smart Cities Mission [9] Indian Cities GoI 2016 Smart Cities Challenge [116] Columbus Won $50 m US Dept of Transportation 2017 5G testbeds [133] Trials programme UK government 2017 Launched smart city blueprint [53] Blueprint Hong Kong 2018 Smart Waterfront ares [289] Sidewalk Labs Toronto and Google 2018 Smarter London [166] Upgrade 2013 plans London 2018 Motion Index ranked [63] Top 3 cities (New York, London and Paris) IESE Business School Cities 2018 Award as Smart city [290] Smart City Expo World Congress Singapore 2019 Cellular Vehicle to Everything [250] C-V2X standard Ford Commitment 2019 Data Privacy implications [167] Sidewalk Labs Toronto 2019 Global Smart Cities Alliance [92] World Economic Forum as secretariat G20 2019 5G testbeds [315] New York and Salt Lake City US Federal Communications Commission 2020$4.2bn smart city in northern Hanoi [174] Expected to be complete in 2028 Vietnamese 2030 By 2030 number of cities are increases [194] 43 cities with population more then 10 million World survey 2050 By 2050 Live in cities [194] 70% population expected World survey

## 6.Probabilistic data structure

The exponential increase in data production services is most evident in the last decade, due to the emergence of ICT, IoT, etc. The traditional methods may also produce accurate results in real-time, but there is a trade-off in space and time, which is not acceptable for a massive amount of data. Due to the huge and limitless evolution, the traditional data structure is shifted to PDS for retrieval and storage. PDS has given an approximation solution, which may be or may not be the exact answer, but it moves in the right direction. Since PDSs have some probabilistic components, they are efficient in reducing the time or space trade-offs. PDS also plays a key role in big data processing, storing nonstructural data, helping in fast retrieval, and making approximate predictions. In general, there are thirteen types of PDS (Fig. 5). But in this paper, we are discussing four types, because these are widely used by various researchers in their design and development. These are very useful for handling and storing large amounts of data in an appropriate and efficient manner. Hash functions are used to represent these data structures [246]. The PDS is used for membership checking, frequency testing, similarity testing, and cardinal counting. Low memory requirements and good processing speed are two distinct features of PDS [136].

##### Fig. 5.

Overview of PDS [246].

### 6.1.Bloom filter

In 1970, the concept of Bloom Filter (BF) was introduced by Burton H. Bloom [36]. The BF is a probabilistic model, a highly efficient random data structure. It is used to reduce space problems and is an effective way to query the membership of any item in an extensive set. The BF consists of an array with m-bits which is initialized to 0 and k hash functions. The hash function k is used to query an element. To find the position of elements in the array, put them into the k hash functions.

For insertion into BF, the first element has been added using a hash function (Fig. 6). For elements to be inserted, hash functions are computed and the corresponding bit is set with that index. In the query process, an element is checked to see whether it is a member of the set or not. For querying, check all bits correspond to indexes. If all the bits are set high, then the answer is “maybe” and if at least 1 bit is not set, then the answer is “definitely not”.

Properties of BF:

• It is impossible to false negative, but if the queried locations are set to 1, then a false positive is possible (in standard BF).

• To process an element, the query time is O (k), where k is the hash function.

• The size of the union and intersection of BF is the same and to implement hash functions, the operations bitwise OR and AND are used.

##### Fig. 6.

Insertion in bloom filter [246].

### 6.2.Count min sketch

Count-min sketch (CMS) was introduced by G.Cormode and S.Muthukrishna in 2003 [64]. It is a streaming algorithm for probabilistic sub-linear space [244]. Its functionality is also hashing-dependent like BF. CMS is different from BF in only that it uses a 2-dimensional array to handle a given data set, while BF uses a 1-dimensional array for representing hashed data.

##### Fig. 7.

Count min sketch [246].

The basic data structure of CMS consists of a 2-D array (d * w) where w is used for storing the counts and it depends on the maximum outputs given by hash functions, and d is the hash function h(1d) and it is pairwise independent. To update the counts, calculate the hash positions with the d-hash functions. The values determined by the hash function are used to get the actual occurrence of an item in an array. The minimum of the determined values by the hash function is the actual occurrence of an item (Fig. 7). The space used by CMS is equal to the count of (d * w). With the use of CMS and appropriate values of d and w, we get efficient results with very few errors. Use more hash functions for more accurate results.

Properties of Count-Min sketch:

• It supports union operations on cells.

• The query time to process is O (k), where k is a hash function.

• The accuracy improves when an item appears multiple times, with a higher frequency, or with heavy hitters.

• CMS also has various applications like compressed sensing, stream processing, frequency tracking, etc.

### 6.3.Locality sensitive hashing

In 1998, Indyk and Motwani introduced the Locality Sensitive Hashing (LSH) [123]. LSH works on the principle of using low-dimensional space for high-dimensional data. The hash functions are selected very carefully, which has more chances of collusion in the hash bucket. The LSH has three phases. These are: the first phase is the preprocessing phase where different measures are used for mapping data, in the second phase, hash tables are created, and in the final phase, these hash tables are used for recognizing identical items (Fig. 8). The similar items are located in the same bucket, so the whole data is located in buckets [104].

##### Fig. 8.

Locality-sensitive hashing framework [246].

A LSH family lshF() is defined with probabilities P1 and P2, an approximation factor c1, a threshold R0, and for a metric space M=(M,d). This function lshF() maps the metric space to buckets sϵS with a set of functions h:MS [287]. The following conditions must be satisfied for p,qϵM and hash function h chosen uniformly at random from F:

• if d(p,q)R, then h(p)=h(q) (i.e., p and q collide) with probability at least P1,

• if d(p,q)cR, then h(p)=h(q) with probability at most P2.

A family is interesting when P1P2. Such a family lshF() is called (R,cR,P1,P2)-sensitive.

Properties of Locality Sensitive Hashing:

• The same items are hashed to the same buckets as different items, so LSH hashes the items a number of times.

• Cosine, similarity, and hammering distances are examples of item LSH functions.

• The functions of LSH are not limited to the same standard measures commonly used for data retrieval, overlap, and dice coefficient [54].

### 6.4.Quotient filter

Michael Bender et al. proposed a quotient filter (QF) in 2011 [30]. It uses less memory to sample an element that is a member or not of the set. Basically, it has performed four operations add, delete, is a member, is not a member on the set. QF uses the single hash function for fingerprint generation with a size of p (in bits). For insertion of element in QF, remainder fr(fp(x)mod2r) and quotient fp(fp(x)2r) are calculated, where fp is the index of bucket and fr is value inserted in fp bucket. QF gives the specific result to the query that is either probably yes or definitely not the element of the set. There is some probability after querying that the element is in the set, but actually, it is not present (Fig. 9). The storage size is decreased with the increase of filter size, so there is a trade-off between storage and false-positive [82].

##### Fig. 9.

Quotient filter [89].

Properties of Quotient Filter:

• An Approximate Member Query (AMQ) filter is used to speed up the answers in the storage system.

• Insertion, deletion, and updating are allowed in QF, resulting in the large usage of proxy databases.

• There is no need to re-hash the original key for merging and re-sizing of QF.

PDSs are used in many real-time applications like preserving patient data in healthcare, traffic control, energy-saving, and also for SEM and SAM, and many more. In the next section, the PDS and their applications in smart cities and big data will be explored in a more detailed manner.

## 7.Smart city application

The smart city needs to majorly focus on activities like healthcare, traffic, street lights. Intelligent technologies like ICT, IoT, sensors etc. and analysis of data are the key requirements to improving the citizen’s standard of life. The important application of smart city (Fig. 10) are smart healthcare, smart traffic management system, smart emergency system, smart street light, a virtual power plant in smart grid etc.

Peoples have a number of reasons like job opportunities, education, and many more to move from one city to another even in an epidemic like Covid-19. In smart cities data is fetched and analyzed through IoT devices like sensors, actuators etc. [102]. Further, this data is used to improve social services, infrastructure, and decision-making. This information plays an essential role in real-time application and services in urban areas [202].

##### Fig. 10.

Applications of smart city.

### 7.1.Smart healthcare

Healthcare is an important service to the growth and development of any smart city. In recent years, smart healthcare has emerged with the growth of ICT. It is a process of managing one’s health by a couple of doctors themselves or virtually [112]. The traditional medical system needs to be transformed into smart healthcare in order to become more efficient and convenient for patients. With the emergence in technologies like ICT, IoT, AI etc. the healthcare industry have transformed from old-fashioned like interaction between doctor-patient to new i.e. remote health monitoring [113]. IBM (Armonky, NY, USA) came up with the idea of smart healthcare in 2009 [269]. There are some challenges while transformation into smart healthcare, these are: healthcare systems, equipment are under enormous pressure, healthcare data is growing exponentially, inform decision-making process with detailed information, additional wisdom for nurses, challenges in consolidating and resolving legal issues, making the patients run from one department to another for collecting reports, accurate precision diagnostics, and right measurements. With the increase of population, these challenges are also heightened with demanding health services. The preponderance of industrialized nations is facing vital difficulties regarding the quality and cost of numerous healthcare and wellbeing services [3]. Due to limited resources, some cities are deprived of proper healthcare services. In view of this, there is a need for new systems, smart healthcare. In the field of smart healthcare, devices, the internet, and IoT can connect people with each other, as well as manage healthcare activities and resources. Smart healthcare is one of the highest levels of knowledge building in the medical field [108]. To collect information dynamically, liking between institutions and healthcare, smart healthcare is used to take advantage of advanced technologies, devices like wearable, mobile internet, and the IoT. Smart healthcare can reduce workload at the information desk, and also Help patients and their wards.

#### 7.1.1.Evolution of healthcare

Various authors and researchers in their studies are categories the healthcare system in different generations (Fig. 11) (Healthcare 1.0–Healthcare 5.0). The different functionality evolution of healthcare system are discussed as:

Healthcare 1.0: – In the year between 1970–1990 the first evolution ‘Healthcare 1.0’ was introduced. Due to limited digital resources, it was restricted to paper documentation. It was mainly focused on improving the efficiency of health services and a reduction in paperwork. The revolution transformed the home remedy system and the untrained physicians who provide paternal care into a more sophisticated, intelligent, and data-driven system that can be called the “medical complex”. In the 1830s the British government start piping the water to homes, when the plague was caused due to drinking of polluted water [114,171]. Shortly thereafter, the scientific vaccine theory was established [48]. In the 19th century, a better environment for healthy living integrated measures for sanitation, infection control, vaccination, and epidemiology surveys have been created.

##### Fig. 11.

Health generation.

Healthcare 2.0: – The era of Healthcare 2.0 was between 1991–2005, the main focus was to combine with digital technology. Industrial machinery kept working and changing. In the 20th century to increase the productivity of cheap products, the automobile industry introduces the concept of mass production [120]. The healthcare system also follows the same. At the end of the 19th century, few large pharmaceutical companies were formed [214]. A few years later various antibiotics were introduced, with the advent of mass-industrial manufacturing technology [299]. Also at that time in medical education importance was given to both clinical training and basic science education [93]. Hospitals are expanding, being provided by more specialists, and doctors are being trained to deal with more patients with complex conditions. The main focus is on building a part of healthcare 2.0 [58]. The second version of healthcare is aimed at improving productivity and data sharing. The focus on information sharing is not limited to within the organization but among a group of other healthcare providers. The new version is entrenched in response to the symptoms, illness, and needs of the individual. Information was shared with other organizations with privacy and security. It was electrical energy-oriented.

Healthcare 3.0: – The era of healthcare 3.0 was between 2006–2015. In this evolution, Electronic Health Records (EHRs) were introduced. It helps the doctors in accessing patient records through cloud gateway. It was a step towards creating a value-based model based on Telecommunication and information communication technology (TICT), which provides database enhancement and additional data efficiency to prevent medical-related problems [1]. The advent of microcontrollers in the 1980s allowed for the production of small computers and fast-tracking environments, as well as large data storage [127,255]. With advanced computer technology, tomography jumped from single images to redesigned images, and doctors could diagnose ulcers with more information and diagnose diseases earlier. Doctors are provide evidence-based medicine after diagnosis disease [85]. They have additional information gathered from e-libraries using fast computer technologies. Healthcare 3.0 focused on providing emergency care and was able to ensure preventive care before the onset of illness or symptoms of the disease. Internet changes learning because much medical literature is available at e-libraries. In Healthcare 3.0 major concern is to use technologies like BDA and IoT-based wearable devices along with advanced E-Medical record databases.

Healthcare 4.0: – From 2016 to 2020, the era of Healthcare 4.0 was introduced with patient-centric healthcare services with the advent of new technologies and IoT devices [261]. This healthcare evolution is inspired by ‘Industry 4.0’ by establishing personalized healthcare platforms and augmented virtualization [271]. It was focused on smart devices, involving capabilities of empowering data analytics with ML, DL, AI, and IoT for the detection of diseases, [151]. It is the successive approach of 1.0, 2.0, and 3.0. Patients get medicine from suppliers by using their websites and also get medical assistance through blogs [240]. In this case, patient records are shared with healthcare professionals via an e-Health record over the cloud or on the LAN, where many patients and healthcare workers can be connected. It helps the physician to access patient records anywhere and also communicate with fellow doctors for better treatment. However, data sharing has introduced new challenges such as authentication, security, and authority, and so on, [280]. New hands & a new brain, which includes robots, mini-laboratories, wearable devices, and 3D printers. Every device works faster and more efficiently; illnesses can be quickly diagnosed using a drop of blood; custom-designed surgery of body joints can be performed, and bone framework can be prepared using 3D printing [280]. Healthcare 4.0 also involves technologies like robotic surgery, CC, CPS, information security, and many more.

Healthcare 5.0: – The era of healthcare 5.0 began in 2020 and is going on. This machine includes AI features such as a robot nurse, a smart IoT device, and a 6G network speed. [186]. With latency (10–100 ms) and reliability (99.9999%) [150], and based on ultra-high accuracy for remote connections [314], 6G communication addresses reliability and latency issues in smart cities. Why not 5G? According to [141] 5G will fall short of meeting future demands for big data connections such as holographic communications (e.g., 3D video conferencing), games, and telesurgery. This evolution comes up with device-to-device, machine-to-machine, and human-to-machine communication. The human-machine cooperation and participation improve the diagnosis system and fast results. This system is more secure than its predecessors. A blockchain-based architecture for healthcare applications that automatically collects data and removes unreliable systems from external companies. The system is not dependent on a single-point failure due to a decentralized network [150].

The evolution of the healthcare system has also been succinct in Table 4.

##### Table 4

Evolution of healthcare

 Parameters Healthcare 1.0 Healthcare 2.0 Healthcare 3.0 Healthcare 4.0 Healthcare 5.0 Era Duration 1970–1990 1991–2005 2006–2015 2016–2020 2020-till date Objective Reduce paper work Productive and Sharing data Come Up with patient centric solutions Provide real time tracking and response solutions High accuracy in diagnose the diseases and analyze the huge amount of data Feature Modular computing systems have emerged from the health sector Develop EHR to provide better view for physicians Combine data with Networked EHR, Use microcontrollers (small computers) which facilities speedy computation Combine with real-time information collection, Improve and Increase AI use Highly integrated efficient sensors which help to monitor, collect and diagnose diseases Focus Simple automation Connectivity with other organizations Interactivity with patients Integrated real time monitoring, diagnostics with AI support No data loss machine to machine and device to device communication Tool Machine Tool Digital Technology Computer, digitization and the internet IoT, CC, Big Data, Robotics, AI Human-Robot collaboration, sustainability, globalization Limitation Limited functionality Shared limited information Different levels are used within the community for limited interaction Does not achieve full customer satisfaction, New and untested technology Run in only smart city, by integrating modern software and technologies decision making is complex process

Facilities in Smart Healthcare: – The main participants in smart healthcare are research institutes and hospitals, patients, and doctors. Smart healthcare has many benefits, like disease prevention and monitoring, diagnosis and treatment, hospital management, health decision-making, and medical research. ICT, IoT, AI, 5G internet, big data, and modern biotechnology are the foundations of the intelligent healthcare room, according to [108]. Doctors use Laboratory Information Management system, Electronic medical record etc. for managing health data [207].

#### 7.1.2.Challenges in smart healthcare

Trusted Communication: – Many medical devices experience network failure, which is unacceptable when real-time data access is required. It is challenging to maintain connections on mobile devices such as wearable devices while moving anywhere a patient walks, crossing boundaries, and covering areas. In mobile devices, switching the network to the most powerful available signal also affects data generation. The different amounts of data are generated while switching to the network, which is dependent on the roaming relationship between the SIM provider and the location of the feed. As there are many subcategories for mobile connectivity, we should also adjust the network type and location to the value, speed, audio, and video required by our device.

Cyber security: – Cyber security is one of the challenges when we talk about smart healthcare. As it requires internet access and IoT devices, the data may be stolen or attackers may attack data or modify the data. So the first priority of the healthcare system is to ensure the privacy of patients and their data. There are some private IoT-based networks (e.g. APNs, VPNs, and IPsec protocol) that are available, which create private areas only accessible by authorized users or devices.

Scalable Platforms: – The smooth functioning of a smart healthcare system, a scalable platform is required. For the effectiveness of a smart healthcare system to be enhanced, it must be integrated and supported seamlessly with patients and their big data. So, the authorized professionals, physicians, and patients use the devices and system easily for monitoring remotely.

Cost: – Meet, the requirements of the healthcare system are also cost-effective. The healthcare system may take less time for decision-making and information gathering. So, it must require new tools and techniques for efficient storage and retrieval.

Data Availability: – The availability of data is facing some issues like resource management and device identification. Systems don’t need any redundancy; they only need consistent data. So it requires identifying the resources and storage management with new technologies in healthcare systems.

Data Security: – While accessing the data, it is mandatory that only the authorized user can access the concerned data. It still needs to be improved.

Unique Identification: – In the healthcare system, it is required to uniquely identify patients by their doctor and vice versa. This is required for providing or getting the best and correct treatment.

Privacy Issue: – This is one of the major issues while we are talking about big data and IoT devices in a smart city. Many kinds of research provide many tools and techniques. But still, it needs to be improved.

Device Communication: – Machine to machine, a device to machine, and human to machine communication is challenging. Patients need a quick response to their records, medical tests, and medical reports. A 5G/6G network is required for the smooth transformation of data communication. It is cost-effective, and also required some tools and algorithms for privacy and data exchanges.

Data Integrity: – Ensuring the integrity of healthcare data is also challenging because it is important for providers to use it in making decisions about patient care. It is also required for information exchange between doctors and patients.

#### 7.1.3.Application of PDS in smart healthcare

Due to the rapid growth in the population and an epidemic, hospitals are overburdened and overcrowded. Patients are struggling to find the doctor’s office and pre-body check-up center. Patients and their guardians also face the issue of understanding medical terms, and they need someone’s help to get the proper information. For that, a proper healthcare system is required with the use of new technologies like sensors, IoT, ICT, etc. On one hand, the IoT has benefited patients, physicians, hospitals, and health insurance companies. On the other hand, challenges have increased. Because sensors and IoT have produced massive amounts of data, To handle this data, the following is the application of PDS which may be used in smart healthcare systems:

Data Processing: – The data generated by devices is homogeneous or heterogeneous, semi-structured or unstructured, which will result in the traffic load on switches. The storage capacity of switches is limited, so the performance of routing may be compromised. A scheme ‘BloomStore: Dynamic Bloom Filter-based Secure rule-space management’ is proposed for Software Defined Networking (SDN) [245]. The scheme uses two self-dependent hash functions for security checks and also manages the network resources to handle the traffic in data. Smart healthcare systems provide system-assistance analysis for dedicated medical care to patients. The security of this data is in danger. The data security features are improved to prevent unauthorized access to information in healthcare systems. BF is also useful in data transmission elimination [300].

Data Sharing: – The Garbled Bloom filter is used to support authentic search results and secure data sharing for multiple users [260] in the Verifiable Multi-Key Searchable Encryption (VMKSE). This scheme supports single-keyword search. The author also compares his programme with a modern solution to evaluate its effectiveness. This scheme helps in doctor-patient data sharing. The multi-keyword search verification mechanism is introduced based on pseudo-random and it is IoT-cloud-enabled for healthcare data systems [283]. The mechanism also takes care of authentication and advanced encryption with advanced privacy of data. BF is used in a secure two-dimensional calculation protocol, to compare a unit of characters and record [277].

Data Security and Privacy-Preserving System: – The use of IoT in smart cities plays a big role in smart healthcare. A huge amount of data is exchanged between machines, humans, and devices. This big data can be encrypted and then stored on a cloud so that only authenticated users can access it. There are also privacy issues. A scheme for efficiently sharing data is proposed to address this issue. The author uses attribute-based encryption with attribute BF to control access. While transforming data or sharing sensitive information, a lot of privacy concerns may arise. Xu et al. propose a scheme using BF, which is privacy-preserving for patient health information for sharing information [293]. The author uses an encrypted search method that allows numerical search for encrypted data. The BF and message verification code are used to filter patient data and check the accuracy of search results. Liu et al. design cooperative privacy preservation for wearable devices that ensure authenticity and consideration for controlling data access in the context of space and time information [162]. In space-aware, they use MinHash-based authentication, and in time-aware, attribute-based encryption is applied. They adopted BF to determine the existence of sensitive data in storage without exposing secret information, and for secure data interaction, positive and negative filters are used. Seham A., et al. [11] introduce a system for infection control with the use of Blockchain for privacy-preserving. In this system, one leader elected by authority updates the two BFs, one for infected users and the other for close contact users. Two BF are used for infected and suspected users, which reduces the storage space. The block size of the COVID-19 status ledger with and without using BF is also compared (Fig. 12).

##### Fig. 12.

Status ledger’s block size with and without using BF. [11].

Fast Response and Congestion control: – The number of connected devices in the network (like traffic lights, vehicles, laptops, smartphones, etc.) is increasing exponentially. For resource sharing and effective communication, all devices with different configurations have to be integrated into the same network. Due to the rapid growth of these devices, the traffic in the network also increases, causing difficulty in predicting traffic patterns. G. S. Aujla et al. propose an approach to handle these issues. “Blockchain-as-a-Service for Software Defined Network (SDN) in Smart City Applications” is mentioned [23]. Multiple etiquette firmware can be integrated into a single network using the SDN controller [22].

In “Software-defined Content Dissemination Scheme for Internet of Healthcare Vehicles in COVID-19 like Scenarios”, [111] introduces a new way to find the right online distribution channel described in the healthcare ecosystem software. Internet of Healthcare Vehicle (IoHV) is an emerging concept that depends on smart transport, including ambulances and additional healthcare vehicles (testing Covid-19 immediately). To connect to the internet, especially when possible, for testing COVID-19 immediately and contact tracing, this IoHV is helpful and is deployed across a smart city setup. All healthcare vehicles are connected to each other through different types of links: vehicle-to-pedestrian (V2P), vehicle-to-vehicle (V2V), infrastructure-to-infrastructure (I2I), vehicle-to-infrastructure (V2I), and cellular links [147]. To overcome congestion and improve response, the global information of devices is stored in the deletable BF in the proposed framework. While implementing in the real world, there are many ambient intelligence challenges. Some of them related to healthcare are listed below:

Adopting Advance healthcare technology: Almost all medical devices are connected to IoT devices. The management systems like appointments, patient administration, laboratory information, etc. are now handled by ML and AI. So it is necessary to develop a connected network of healthcare leaders, clinics, manufacturers, and software development companies. which enhances the new business models and also helps adopt the new technologies.

Integration between healthcare services: The massive amount of data is generated while using medical devices and AI-integrated applications. Many top healthcare companies lack data management systems and new architecture.

Rising Healthcare cost: The rising costs of healthcare are always a serious issue, which includes the manufacturing cost of healthcare vehicles and disease detection and testing processes. Due to this, many patients skip lab tests and do regular follow-ups.

Payment Verification: – In this digital world and epidemic, the maximum number of transactions is done digitally and payment verification is required. In this lieu, Pratim et al. a lightweight payment verification based on the blockchain using BF is proposed as IoT-Assisted e-Healthcare [223]. If BLWN simply asks for the full location of a small set of blocks/topics, there may be a chance for privacy leaks as the full area can peer into BLWNs’ assets. It may allow the full site to call for Denial of Services (DoS) and the insignificant liaison service for funds available from BLWN. Therefore, it may be completely overwhelmed by the great difficulty in its ability to make a computer while crashing its system.

Real Time Analysis/Support: – Wearable devices are mainly used in real-time Electronic Health Records (EHR) collection. Obviously, encryption is required for searching targeted EHRs by medical institutions. In this reference, Yuan et al. [259] propose a scheme in which medical institutions can search and access EHRs in the cloud. They improve the search accuracy and privacy of users in EHRs. To improve search efficiency, the Cuckoo filter is used and gives a facility to data owners for modification (insert and delete) in their EHRs in the cloud.

### 7.2.Smart transportation

For moving from one place to another, horses and camels were used for a long time in society. The world has entered the next phase of the movement, namely smart transport, with the advent of new technologies and smart transportation systems [180]. “Moving smarter is not our future – it is already our present”, says Lisa Jerram, senior analyst at Pike Research, simply to make sure the easy journey is acceptable as cities become more populous and face potential budget crises by building new infrastructure, as is the case in Europe, North America and Japan, [38]. According to [274], the population of the world’s urban areas increased from 29% to 50% between 1950 and 2008, and it is expected to increase to 70% by 2050.

#### 7.2.1.Application of smart transportation

To meet the increased demand of citizens, the easy option is to provide then better transport services. With this the growth in supply of automobile is increases, due to that the traffic congestion is also increases. The attraction to the development of smart transport system took a lot of attention to addressing traffic congestion and rapid urban growth [71]. It is identified that the implementation of smart technologies is the key factor in gaining intelligence and stability [107,233]. Also several researchers shown that how this sustainability achieving the environmental and economic efficiency [71,115]. Therefore, sustainable transportation is very important in today’s society. There are various application under smart transportation are discussed as follows:

Smart Street Light: – Smart city should also needs to upgrade Road street light to smart street light. It helps in reducing the energy consumption by dimming the lights. This saved energy are then used for other services like pollution monitoring, update about whether, GPS etc. and also help in signing available parking in nearby area. But smart street lights are depending on its feature and requirements, and involve a combination of cameras and sensors. Sensors and Cameras are collecting the data can either process locally if street light have computing device or propagate through network. These devices can detect the movement which enable dynamic lighting and dimming [102]. Chen et al. present a system which used for controlling the street lights using TX2 device and also help in updating the parking status to users (Fig. 13).

##### Fig. 13.

Smart street light scenario [59].

Intelligent Transportation System (ITS): – Today, the state-of-the-art transport system is heavily influenced by Machine Learning (ML) and Dynamic Range Learning (DRL) based strategies to detect autonomous vehicles, dispatch passengers in a safe manner and ensure the safety of vehicles. ITS uses the various advance technologies like senros, IoT devices, ICT and many more [302]. The huge amount of data is being generated by these IoT devices and sensors which contributes to the concept of intelligent cities and the future of ITS [312]. The techniques like AI, ML, and especially DRL play an important role as an integral part of sustainable in precisely monitoring and measuring real-time data traffic flow in an urban area [301,305].

The various applications of ITS are discussed by [302] like: The intelligent highway in Britain, which reduces the traffic and accident rate, The CRITER system in Lyon, France, offers transportation workers a schematic plan like a map and also predicts the bottleneck points. In Japan, electronic toll collections (ETC) measure the physical characteristics of vehicles and check and deduct charges automatically if they are in ETC. It is useful to avoid illegal entry into the city. Include IoT in ITS to build a system where communication between road facilities, vehicles, and management equipment is done without barriers. The Global Positioning System (GPS) is replaced by Radio Frequency Identification (RFID) in the IoT to become the Smart Transportation System (STS). The author also discusses the services for passengers on public transport, like service range, charging, security control, and administration.

Availability of Parking: – With the rapid growth of vehicles in the city, it is a trick for drivers to find available parking. This dilemma is seen as an opportunity to increase the efficiency of parking facilities, thereby decreasing road accidents and taking less time to find free space in a smart city. The troubles related to parking and traffic congestion could be solved if drivers were aware in advance of the availability of parking in the area and surrounding areas [39]. A smart and automated system that can detect empty parking spaces can reduce search time, by finding out where parking is available and bypassing lawful information to drivers. Maria et al. proposed an image processing system that takes video as input from a drone and feeds it into a frame extraction block [176]. These frames are then preprocessed to reduce complexity. These systems may be improved if the availability of data and the techniques to manage this data were improved or changed with new technologies.

Street lights can also play an important role in detecting empty parking spaces in open environments. Traditional parking (sensor-based) occupancy systems are more expensive, as demonstrated by [59]. They use Jetson TX2, an NVIDIA’s Computer Unified Device Architecture embedded artificial intelligence supercomputer, which has high power efficiency. This system works both day and night with an on-off street parking smart control system. Parking space is detected by marker-based image processing using the onboard camera of an Unmanned Aerial Vehicle (UAV) [67].

Traffic Control: – The authorities are facing chaos trying to manage the traffic with an increase of vehicles on the roads. Because of a lack of human resources, authorities are moving toward smart traffic control to manage the city’s traffic. To reduce congestion in the context of VANETs, robots can play a key role. The aim of smart robots is to give information to avoid ideal roads and manage traffic congestion in urban areas. To detect illegal traffic behaviour or traffic violations, the system uses street cameras [66]. But it is not possible to install street cameras everywhere in the city. Modern cars with video storage cameras have been introduced to control traffic violations in the city. These cars capture the videos in the city and report any violations that happen to the authorities. Rathore et al. propose a system to detect the front car and road line using the Single Shot MultiBox Detector (SSD) and Hough transform for self-driving. A violation detection algorithm is designed for the fog device smart to identify driving violations, U-turns, and driving central dividers [222]. Steve Mazur has also presented a traffic control system (Fig. 14) in [180].

##### Fig. 14.

Smart traffic control [180].

Automated Toll Collection: – To decrease the fuel consumption used in automobiles, the use of the cream road is required. The government and road contractors are working on making the new highways and flyovers in the cities, and contractors are installing their toll on those roads to complete their expenses. Motorists and commuters are spending their valuable time at the toll plazas paying the amount of tax. Due to this, the parking problem, traffic congestion, and pollution are increasing near the toll plaza. Commuters are also facing delays, which increases the travel time for their journey [34]. Automatic toll collection is on the rise nowadays, both by governments and researchers. The main concern with this automated toll collection system, the RIFD tag, is that it is installed on the windshield of vehicles. To collect the required amount, the vehicles pass through the sensor system before the tollgate [134]. Regular user may also have facility of prepaid smart cards. So that the traffic at tollgates can be avoided. M.A. Berlin et al. propose an alert message based toll collection system using smart Road Side Unit (RSU) [34]. This system also helps in stopping the payment violation by send the alert message if the any vehicle violate the toll payment. This system is totally man free and barricade free, which also help in rush hours to handle the traffic [34]. As the use RIFD tags not only the time is saved but also eliminate the corruption in toll plazas [8].

Smart Mobility: – Smart mobility is also the main concern in smart transportation as it is consistent with the development of a sustainable world [101]. In particular, either ICT is the initiative to smart mobility or a complete failure [31]. Smart Mobility is providing solutions to users by using new technologies like IoT, ICT etc. Some tourists use apps to plan their journey, but they get limited information and priorities for travel recommendations [205]. Smart mobility is also helpful for citizens to roam and move freely in the vicinity of a smart city. Smart mobility also helps in improving the traffic control system by giving access to other routes in emergencies or traffic jams. Intelligent navigators facilitate providing routes and navigating to essential services like ambulances and government. can be facilitated by intelligent navigation. In the coming year, smart mobility transform into mobility as a service paradigm, like car-sharing [205].

#### 7.2.2.Benefits of smart transportation

Figure 15 explores a picture of smart transportation in the city. Smart transportation has many benefits, some of them are discussed as.

##### Fig. 15.

Smart transportation [180].

Smart Transportation is safer: – In smart transport the integration of ML with IoT, 5G help in reducing traffic and road accidents. In these IoT devices, cameras and other safety devices help in monitoring the traffic situation and intimate the same to users for improving road safety.

Smart Transportation is better managed: – Smart transportation facilitates the public administration by allowing to monitor the performance of road safety and traffic. It also gives information on critical sources of problems and tracks where maintenance is required.

Smart Transport is very effective: – With better management of resources in smart transport gives more efficient results. If we are having quality data, then easily identify the areas where improvement is required. It also provides better-quality filling rates.

Smart Transportation is more cost-effective: – Smart transportation also helps in reducing the cost by providing the best shortest routes, giving information regarding facilities available with approx distance and price, and many more. Commuters also take benefit if they get affordable public transport as compared to hiring a private CAB.

Smart Parking Management: – In smart parking the driver or car owner can’t face the problem of finding the available space for parking. The system provides the information by collecting real-time data from connected devices and sensors.

Smart Traffic Management: – With the smart traffic management system users can get information about congestion on road. So that they can plan their journey accordingly.

Smart Transportation provides instant information: – Smart transport system can also provide information instantly regarding the issue in the city, traffic congestion, and problem areas using traffic management centers. They also ensure public safety and provide information on affordable insurance plans.

Integrated Ticket Systems: – It will also provide diverse services to citizens by providing the intelligent ticket system in some local services.

#### 7.2.3.Challenges in smart transportation

A large number of vehicles in major cities around the world has posed major transportation and stability challenges, like air pollution, traffic congestion and energy problems [284]. Following are some challenges to smart transportation.

Security: – Vulnerable to cyberattacks is one of the biggest fears among smart city dwellers. Cyber attacks are more common to criminals as the world’s connectivity to the internet increases. The data flow during the smart transport management system are may be hacked or used to thief the vehicle if there is no secure communication. The security of data used in toll collection systems is also in danger. During finding the available parking space user share their location, this also may create problems in user security.

Data Privacy: – User data cannot be retrieved without their knowledge. The user’s personal identity must not be identified or traced. Data privacy in smart transport is the main concern. Under the new law, data processing must have legitimate.

Supply Chain: – Due to epidemics like Covid-19 the global supply chain is affected. While transportation may face various problems during this epidemic. Due to that many businesses are affected. When the drivers may ill and move from one region to another it may cause public health.

Environmental Problem: – With the rapid growth in the automobile industry the traffic on roads also increases. This may affect the air quality and water pollution in nearby residential areas. The environmental problems caused by the IoT devices are currently serious and need to be addressed urgently [180].

Health Concern: – Health concern is also one of the challenges to smart transportation. If the transport system is not connected to hospitals, then it may cause major problems or may loss their lives in road accidents. The system needs to improve its service on road, and proper intimation of the concerns (hospitals and police stations).

#### 7.2.4.Applications of PDS in smart transport

The traffic on roads in both rural and urban areas is increasing day by day. It needs to smartly manage the traffic and distribute the traffic load by making the transport system to be smart. It includes smart parking, smart street lights, violation check on the roads, traffic control system etc.

Data Dissemination: – It means the statistical data is transmitted or distributed to end-users. In Pursuing a Pub/Sub Internet(PURSUIT) project use BF to store the path information in source routing. The main scenario of this project is general data dissemination [16].

Privacy Preserving: – The IoT devices and sensors are used to collect and exchange a huge amount of data, to improve the transport system. As Cloud Computing progressed, more sensitive information (like vehicle registration number, chassis number, insurance detail,etc.) was released to the cloud. The most accurate way to protect data privacy is to encrypt data before extracting it [253]. Enabling keyword searches directly over encrypted data is a desirable way to make the best use of encrypted data. Wang et al. has proposed a brand new idea for acquiring multiple keywords (compound keywords) in random search [282]. Unlike most existing keyword search programs, the program eliminates the requirement of a predefined keyword dictionary.

### 7.3.Smart environment

Many developed cities suffer from poor air quality as population and industry growth rapidly. Increasing acceptance of smart transport data in smart cities around the world has provided unprecedented opportunities to improve air quality management in transportation [284]. Government agencies and residents are increasingly concerned with air grade, which contributes to a wide range of human environments and human development. The most common methods of predicting air pollution especially utilize low-level simulations. These standards produce disappointing effects that have led to aspects influencing the measurement of air corrosion based on the overall structure of the building. Estimating air quality using atmospheric scattering standards is time-consuming. Modeling incorporated testing is a new expansion to measure air pollution and conservatory gases in an intelligent environment. Normally, maximum houses in a smart city are used solar and wind turbines for green energy [140]. Liu et al. suggested a Long Short-Term Memory (LSTM) model in planning smart environment in smart cities, which predict air quality that assists Staked Auto-Encoder (SAE) [165]. LSTM is used to evaluate air quality forecasts in smart cities. The internal components that occur due to air pollution are removed by optimizing SAE. The total error rate is 0.46 and the class accuracy of 91.22% is shown by this model, it still needs to be improved.

Jovanovska et al. proposed and air quality system based on IoT and Cloud computing [130]. They visualize and control air pollution using mobile applications. Sulfur oxide (SO2), Ozone(O3), Nitrogen dioxide (NO2), and most important PM10 and PM2.5 are common indicators of contamination that cause health risks like heart and lung diseases. So, improving the quality of air is a good effort by everyone for the weather and health of every citizen.

#### 7.3.1.Challenges in smart environment

The major challenges for the environment are water pollution, air quality, and radiation. To achieve sustainable growth by maintaining a healthy society proper vigilance is needed in the world. With the development of IoT and smart sensors, Smart Environmental Monitoring (SEM) is the system for environmental monitoring, in the latest years [273]. Figure 16 shows various issues of environment like temperature, radiation, dust, humidity, ultraviolet signal etc. For establishing the system Silvia et al. used the WSN which provides an interface between smart sensors and IoT Devices [273].

##### Fig. 16.

Challenges in smart environment.

Air quality Monitoring: – Due to the rapid increase in traffic and industries, air pollution is one of the primary concerns of our epoch. The earth is becoming increasingly polluted due to the emissions of harmful gases like CO, NO2, SO2, and CO2. These toxic gases can’t be predicted because there are dissolve in the air. So, the air quality needs to be checked, and for that, an IoT-based tool is required. An IoT device can collect and analyze the data to predict the air quality either good or bad. Sensors using Raspberry Pi/ Arduino and IoT devices can monitor the local air quality [175]. Dhingra et al. develop an application i.e. “IoT-Mobair”, which is mobile-based use to monitor and detect the air pollution of the concerned area [76]. This mobile-based application has various features like air quality, daily forecasts, health-related tips, and risks, air quality map generations etc. But, when dealing with big data generated from sensors, then this application has faced some computational complexity problems. For that Dhingra et al. have suggested using fog computing instead of cloud computing. The IoT is a global system of “smart devices” which can detect and communicate with the environment and interact with users and other applications. Qian et al. found that due to low sensitivity and low accuracy the exiting monitoring system does not work well and it also requires laboratory analysis [218]. The data is highly correlated in the case of air pollution monitoring, where these systems are leads to a lot of obsolete information. To the data delivery cost and to alleviate data neglect Qian et al. the system i.e. ‘Content-centric IoT-based Air pollution Monitoring (CIAM)’. In CIAM, the content method is used to compile and integrate air pollution data.

Water quality Monitoring: – Monitoring of water quality is important in determining water safety and related public health [256]. Water quality parameters are determined by the same factors as physical, chemical, and biological. Bacterial contamination, turbidity, dissolved oxygen, dispersion, free chlorine, and pH are the typical parameters of water quality [215,219]. The various research papers have been studied in terms of intelligent water pollution control systems using ML, IoT, and smart sensors. The pollution of water in the lake can be predicted using an ML-based neural network for machine-reading which analyzes the sensed image [160]. The water is classified as pure or polluted, and we studied the separation of water pollution with the use of ML methods and IoT devices [61]. The prediction of water quality parameters using AI and neural networks and the amounts of sulfate or chloride present in water were studied [220]. In order to separate the pollutants in water using SVM, the analysis of big data and problems faced during the separation of water pollution were discussed [46]. AI-SVM is a classification system used for real-time monitoring and technology used for testing and its separation from non-drinking water [41,128]. Video-based monitoring of water quality and pollution was investigated, which used IoT video surveillance and ML tools to separate dirty and clean water [208]. To predict the future and quality of water before use, another function which is a feature-based model, also helped in analyzing the water suggested [311]. Different ML models were used to test the concentration of chlorophyll-A in pond water and were also recommended for real-time water management system [163].

Agriculture Monitoring System: – The growth of industrial and robust agricultural production methods has accelerated to ensure the quality and quantity of the growing demand for food [249]. In “smart or green agriculture”, Smart Environment Monitoring (SEM) plays an important role as agriculture is the relevant growth factor for any nation [210]. It also helps in product development and sustainable growth to handle major challenges in the agriculture sector [196,239]. Ullo et al. refer to the smart agriculture scenarios (Fig. 17), where the SEM system is a smart agricultural monitoring system in real. In the agriculture sector, various factors are very important for achieving sustainable production, like water level, water pollution level, moisture analysis, soil health etc. These features are included in the smart agricultural monitoring system, which is monitored and controlled using IoT devices, smart sensors for agriculture data capturing, and WSN to transmit data into the cloud [273].

##### Fig. 17.

Smart agriculture monitoring system using IoT devices and sensors.

In the new agricultural era, there is a growing market for IoT that offers a few creative solutions. The various studies and research on Smart Agricultural Monitoring systems (SAM) are discussed, which include fertilizer control, crop monitoring measures, pest control, etc. Kumar et al. propose a system for plant growth monitoring i.e. ‘gCrop’ using IoT, ML and WSN [148]. They use a 3rd-degree regression model and provide a prediction with a high computational complexity of 98% accuracy. Shinde and Pathak et al. performed a crop quality test to monitor the quality of paddy rice using Synthetic Aperture Radar(SAR) data [242]. In the rice quality test, SVMs were used with limited sample size and back distribution features. The land and its size play an important role in checking the growth level of different crop species that are either satisfactory or not. To measure the leaf index Hosseini et al. propose a system with a Gaussian process model [91] and using SVM as ML method and reported 89% with a limited sample size [119]. To determine the level of fertilizer, pesticides, and water quantity used for plant irrigation, an expert system using AI was developed [78] using the Naive Bayes [17] method and studying ML using sensory data taken from agriculture. UAV is used [77] to investigated the crop quality tests [230] and soil health for phenological data of soybean crop [49]. Smart farming [42], pest monitoring [164], and crop monitoring [311] are important in the various uses of SEM systems. Weather and the environment also affect the health and growth of plants. Ullu et al. propose a technique that checks the condition of the soil, moisture, air, and water quality, temperature etc. in the context of SEM using IoT devices, AI, and smart sensors [273]. The data analysis is performed while smart agriculture provides estimation, assisting protection, decision making, and storage management [249]. The data is moving while performing the techniques to achieve smart agriculture, and various challenges are faced by both farmers and researchers. Some of them are addressed below.

#### 7.3.2.Challenges in smart agriculture

To increase food production, farmers will face many challenges. The production will increase 70% by the year 2050 [122]. Various challenges in agriculture have been discussed (Fig. 18) as.

##### Fig. 18.

Challenges in smart agriculture.

Irrigation management: – One of the objectives of an irrigation system is to calculate the water requirement for crops based on collected data and water flow without interference from humans. Irrigation systems use dispersed sensors to monitor the different soils, water bodies, vegetation, and microscopic elements. Climate is one of the most important variables in estimating agricultural water requirements. A farmer can adjust his irrigation system in a variety of ways according to soil and weather conditions [216]. The entire farm can track, manage, and forecast weather from almost anywhere. IoT will help in developing the new infrastructure for irrigation in a very exciting way. Smart IoT-operated irrigation systems use embedded sensors in the field to monitor soil structures, climate, and agricultural irrigation conditions.

Soil management: – Soil monitoring is one of the most challenging agricultural activities for both businesses and farmers. Various soil parameters like pH, humidity, etc. are involved in soil management and IoT sensors can be used to calculate these parameters. Soil management helps in finding the right kind of plants and helps to identify fertilizer requirements in the soil. Crop production can be affected by soil testing due to a number of environmental concerns. The process and patterns of farming can easily be understood, if these types of problems are well defined. Crop production may improve and fertilization practices can be promoted based on findings of a soil survey study for farmers [37]. The moisture and humidity sensors can monitor the moisture in the soil, and IoT technology identifies the contaminated soil and shields the field from over-fertilization and damage to crops. Agricultural productivity and quality may increase, pollution can be avoided, and input costs may be reduced due to soil management.

Climate management: – Climate has a profound effect on crop production. With the use of an IoT-enabled weather forecast system, farmers can determine the best time to plant, irrigate, and harvest. With the help of distant sensors attached to the field, farmers can learn about natural conditions like humidity, soil moisture, and air temperature. On the basis of historical results, to maximize the yield, farmers should properly prepare and market the harvest and irrigation season. By editing and updating the collected data, farmers should take immediate steps to ensure a safe crop yield. Many of the right things are put together to maintain and establish a good plant environment while living under stringent limits like airflow, temperature, CO2, and O2 levels. With the use of IoT-enabled systems, where for advanced decisions data can be exchanged between intelligent sensors and devices, this can be achieved [288].

Accurate farming: – The traditional method of farming to increase yields and preserve crops was based on physical examination. If any issue was found, then it was resolved by trial and error after being involved in a serious incident on the farm. Farmers will face various different types of challenges while farming, like less water, floods, lack of suitable planting space, and cost control. Productivity can be improved with the use of IoT in agriculture. With the use of gathered information, farmers can organize their farming activities, including what seeds they should sow, what crop yields should be expected, the time to harvest, and how much fertilizer to use. Example: The natural soil diversity of a field is an accurate agricultural practice. The plants can be planted thicker and the irrigation can be used sparingly if the soil in a certain area holds more water. Alternatively, if the site is used for grazing, we can take more cattle than an equivalent area with a lower level of the soil.

Nutritional management: – As the human body needs proper nutrition to grow, the same plants may require accurate nutrition. Nutrients help to produce the best yield when given at the right prices and at the right times. Too much and too limited nutrients for plants will affect the environment. For example, too much phosphorus, ammonia, or nitrogen may reduce water levels. To grow in one place, the selection of the best crop cycle allows for balancing soil fertility. While reducing environmental degradation and economic costs, achieving sustainable agriculture, nutrition, and technology are essential [6].

Garbage Management: – The wastage of water, soil, and seeds is common during farming. This needs to be controlled, but in an intelligent manner. For garbage collection, create smart trash cans using IoT sensors, which can smartly sense and collect the garbage. The collected data related to network disposal is used to read, store and transmit with the help of these smart trash cans. Garbage management can be done with the help of some smart and systematic algorithms [10].

Livestock monitoring: – Livestock plays an important role in agriculture, so they need intention, proper care, timely feeding, etc. It is a growing worldwide issue to provide enough food to the world’s people with the growing agricultural production. As a result, the importance of livestock management on farms is crucial to survival. To improve the quality and quantity of agricultural products, new technology like IoT advances is important. It also improves the quality of livestock by allowing farmers to make decisions based on data-driven. To monitor the livestock’s welfare remotely and identify their habitats, cloud-based technologies are used with power communication sensors [50]. The health condition of livestock like respiratory rate, digestion, blood pressure, heart rate, and other day and night vital signs can be checked by farmers using connected sensors. But the data flow between these sensors and smart devices is interrupted or tempered, so it needs to be secure and well managed to get efficient results.

Farm Management System (FMS): – Smart farming promotes productivity while minimizing environmental influences, but this smart agriculture technique is merely possible with the help of FMS [97]. With the help of WSN and GSM in FMS, farmers can track the entire farm and capture the data with a small controller [199]. With the use of sensors and smart devices in the field, the identifier is used to provide appropriate awareness of soil, fertility, and weather, to the farmers. The data collection and storage, monitoring, and analysis of the farm operations can be automated using an IoT-based farm management system. It can also help in managing agricultural budgets and business operations. The irrigation scheme helps in protecting the farms from animals and pests. But also automatic irrigation systems can increase the water consumption [105,137].

Tracing and tracking: – Satyanarayana et al. [236] develop the structures to remotely track soil structure and its status in accordance with the needs of plant culture. The different agricultural areas and locations are tracked using GPS devices and wireless network connections. The real-time data processing is tracked and approved by connecting WSN and ZigBee to other devices like the Central Monitoring Station (CMoS), GSM, and GPRS. The GPS also enables the farmer to take actions based on notifications sent to the farm manager through SMS or MMS. It is often used in agriculture to detect precise location and control capacity, despite it having high operating and maintenance costs.

Plant management: – The growth of a farmer’s crop is most important in farming. Farmers use good seeds, organic fertilizers, proper watering etc. for best results. For farmers to protect themselves from them, they use chemical medicines and fertilizers, which may later affect the human body. So, it required an intelligent system that protects the crop from insects and does not affect the human body. This can be done by plant management, which involves monitoring and recording the welfare of the crop. The plant and its diseases can be detected using RFID chips and IoT sensors. The farmers can process the data remotely and take necessary steps like keeping the insects from plants. The production of rice for a specific country with a Chinese monitoring station using SVM [152]. Farmers can also prevent the risk and plan their farming practices by demonstrating an effective calculation strategy for coffee fruit [68].

Water Management: – The major challenge in greenhouses is to determine how much water is required i.e. water management [281]. Intelligent sensors are installed to control the waste of water and operate them by using a variety of IoT techniques. Automated drip irrigation is used to control the soil moisture in irrigation and storage of water in greenhouses. The farmers are checking the water levels in a water tank with their Android phones. With the use of IoT devices and sensors, the whole water management is done like the motor is automatically started and stopped by checking the level water level. Due to over-irrigation in conventional irrigation systems, up to 50% of water is lost [241]. A smart Irrigation system(SIS) provides a system to overcome this issue. This system helps the farmers to avoid water wastage and improve the quality of their crops through timely irrigation. SIS also transmits the knowledge of the field to the farmer using temperature and soil sensors. Farmers may also plan and modify their irrigation according to the local weather information. For Water Distribution System (WDS) an architecture WDSchain’, which is blockchain-based in MATLAB, is proposed [172]. For security, various consensus mechanisms are used, and results show a trade-off is required between data validation and system complexity (Fig. 19).

##### Fig. 19.

Water distribution systems chain architecture [172].

Blockchain with Agricultural IoT: – To improve agricultural intelligence, data-driven technologies can be allowed a secure data storage system. The data collection is often very costly where the inventory, agricultural contracts, and information about farm conditions from a reliable source can be provided. Developing the trust between providers and consumers and establishing a reliable food supply chain, the blockchain technology helps in tracking the food for timely payments to stakeholders [191].

To monitor the farm and collect the data remotely, smart sensors and cameras were used. To adopt the current condition of agricultural land, farmers can use IoT devices as they use smartphones anywhere in the world. New technologies like IoT have the potential to increase global productivity and reduce the cost of crop production. To face the challenges on the farm, the agricultural sector can be restructured by providing different IoT-based tools and techniques. A massive amount of unstructured data has also been generated and the PDS has the potential to solve issues like handling big data with real-time response.

#### 7.3.3.Applications of PDS in smart environment

Energy efficiency and traffic awareness: – To improve energy efficiency and traffic awareness. Yousef et al. propose a scheme in underwater WSN for water pollution monitoring [297]. BF is used in the preprocessing step to reduce the number of transmissions and eliminate redundancy to save precious energy. This type of project is mainly used in deep water like the sea and ocean. Mahmoud et al. use BF for customer’s identity and privacy-preserving of transferred data in WDS [173]. They also suggest the optimal parameters of BF i.e. for 200 customers with 2000 numbers of bits and 7 hash functions. In WDSchain, BF is used to match the authentication of network nodes in proof-of-authentication(PoAuth).

High-performance communication networks: – IoT devices contain many sensors, routers, actuators, and base stations that need communications between them and send millions of data that need to deliver with high-performance communication networks. IoT devices that may have limited power resources or limited integration areas have an important research challenge. To increase the performance of IoT communication networks alassery et al. propose an efficient mechanism based on BF [12]. BF is used to store the routing information after aggregating packets to send receiving packets to upstream routers.

Query Optimization: – Under the big data domain has been developed an air monitoring system, which poses major challenges to data analysis. Peng et al. proposed a scheme for query optimization of air quality big data using BF index [211]. The efficiency of data collected in the air quality monitoring system is improved. They create a Hive data repository in the Optimized Row Columnar (ORC) file format and the Row Group Index (RGI). For basic data types 64-bits ThomasWang’s hash function is used and Murmur3 64-bit hash algorithm for string and binary type by BF in ORC.

Privacy-preserving Data Aggregation and Analysis: – A data aggregation algorithm is proposed using BF for privacy data analysis in mobile crowdsensing system [212].

### 7.4.Smart industry

In world development, industrial growth plays an important role. When the word comes to mind ‘Industry’, then most people probably think that it is a noisy and big place. The growth of the industry has increased by 18th century. Due to the lack of technologies owned by the industry, it faces various challenges like production, distribution, supply chain management etc [279]. By the year 2050, technology will have progressed to the level of autonomy [80]. The industries are categorized into three economic sectors: primary, secondary, and tertiary [24].

Primary industry: – This sector is concerned with the general people, means to sell and supply the products. This sector is place dependent because the raw material is extracted from the earth. The operation of this sector for economic growth revolves around minerals, earth water, vegetation etc. The example of these sectors are farming, mining, and fishing and they extract the raw material like coal, foods, corn etc. There are two types of industries under the primary sector: The genetic industry and the Extractive Industry.

Secondary industry: – After the collection of raw material by primary industries the secondary industries used this material for construction and manufacturing the products. These industries are used to make products like steel for automobiles, textiles for cloths, wood for furniture, etc. For manufacturing these products heavy machinery is used in the production plant and also required manpower for packaging and distribution. The example of these sectors are consumer goods, craft & fashion, construction, manufacturing etc. There are two types of industries under this sector: Heavy industry and Light industry.

Tertiary industry: – The product is manufactured from raw material in the above sector and is now ready for use by consumers. Tertiary industries generally they are not making any products but only provide services to consumers and local industries & markets. The main feature of this sector is discussion, experiences, access, etc. The financial and education sectors are two groups of tertiary sectors one for making money (financial) and the other (education) is non-profitable. Examples of these sectors are banking, educators, administrative, medical, financial, insurance, transportation retail, wholesale, real estate, hotels, police, defense services, media and information technology and so on. There are three types of industries under the tertiary sector: Telecommunication, Professional, and Franchises. These industries help in growing the Gross Domestic Product (GDP) rate of the country.

#### 7.4.1.Industry evolution

The farming and handicraft economizing processes changed to be monopolized by industry and manufacturing machines in the Industry Revolution. These changes transformed society fundamentally in terms of living and working styles. In the 18th century Britain began this process and spread all over the world. The production of manufactured goods and the use of natural resources have increased after these technological changes [294]. The industrial evolution has been categorized in Fig. 20 and succinct as given in Table 5.

##### Fig. 20.

Industry 1.0–5.0.

##### Table 5

Evolution of smart industry

 Parameters Industry 1.0 Industry 2.0 Industry 3.0 Industry 4.0 Industry 5.0 Year 1760–1850 1880–1973 1989–2013 2011–2020 2021-future Main Objective Replacing manual labor, Mechanization, Water Power, Steam Power Upgradation of resources, Mass Production, Assembly Line, Mechanization Computer Aided Automation, CAD/CAM, Inter-connecting the world Computerized automation, Sensor Robotic Manufacturing With A.I. A.I. Anticipates Human Needs and Plans Resources, Synergetic co-production and Bio-upgradation Focus on Textile manufacture, Iron industry, steam power, machine tools, chemicals, cement, gas lighting, glass making, agriculture, transportation and many more. Iron, steel, rail, electrification, machine tools, paper making, petroleum, chemical, maritime technology, rubber, bicycles, automobile, engines and turbines, telecommunications and trendy business management. Semiconductor industry, Digital circuits, Programmable Integrated circuit, Telecommunication, wireless communication, Renewable energy sector, Automate the all production industries. All type of Industries, such as Primary, secondary and territory sectors with intelligent system. With the use of Iot device the 3D objects are manufactured with 3D techniques (like 3D printing or additive manufacturing). Key Technologies Machine tools, Water power and Steam power Electrical power, telephones, Internal Combustion engine, railroad networks, gas, telegraph, sewage and water supply Robot, Programmable Integrated circuit, Internet, Industry Automation. IoT, Big data Augment reality Simulation Cloud computing, Cyber security Autonomous robots. Co-bots, Skill transfer systems, Bionic enhancements, personalized bio-engineering Mathematics tool Linear programming, Geometry. Differential equation, Linear equation, Geometry Integral equation, Linear programming, Logical controller. Optimization techniques, Network theory Multi-layer neural network, deep neural network Energy source Coal, steam Fossil fuels Hybrid fuels (renewable, fossil, nuclear) Renewable Electricity Renewable Electricity Achievements Transportation, employment, sustained growth, Agriculture development. Electrical power grid, telephones, telegraph, Internal Combustion engines Telecommunication, Renewable energy, Automated industries, Robots. Fully Automated System, Artificial intelligent system in industry application to work in uncertain situations. Make 3D view of objects with additive manufacturing Limitations Pollution, Takes maximum time Maximum cost to consume electrical power. Automated system would not work in uncertain situations. Data in the cloud need to improve security and privacy. Expert systems are not yet developed for industries. New and untested technologies

Industry 1.0: – In the years 1760–1850, the first revolution was introduced with the mechanical theme [225]. It used steam, coal, and water mechanization for the manufacturing process. Production through machines had increased, so the rate of production had increased by eight times for conventional methods [270]. It also increased the standard of life by creating various goods in massive amounts. Human productivity was increased by the use of steam power [83]. Machines, stream power, and water power played an important role in the first industrial revolution’s growth. If we talk about the developing sectors in this revolution, textile manufacture, chemicals, paper machines, cement, iron industry, gaslighting, transportation, agriculture, railways, and many more are existing there [291].

Industry 2.0: – In the year 1880–1973 this revolution was introduced with industrialization [131]. With the adoption of new technologies like telephone, electric power, sewage, internal combustion engine, etc. this revolution manufactured mass production. The sectors that are developed in Industry 2.0 through technology are steel, paper making, petroleum, automobile, fertilizer, Iron, chemical, rubber, electrification, machine tools, telecommunications, and many more. Mainly this revolution occurs in America, Britain, and Germany [193].

Industry 3.0: – In the year 1989–2013, the industrial revolution ‘Industry 3.0’ was started. In this industry the production was automatically done without human interference so, the production sector grew in the engineering field. The automation was fully computerized, which increases the efficiency and reliability of the industrial system. But automation also affects employment. With the growth in technology, many industries start using robots and reducing manpower. Industry robots are designed with programmable integrated circuits and give accurate and efficient results. Robots can do painting, welding, testing, labeling, etc. It is estimated by the International Federation of Robotics there are 1.64 million robots used in industry worldwide [51].

Industry 4.0: – In 2011 the industrial revolution ‘Industry 4.0’ was first proposed by the German government. In this revolutionary trend, computerization is used in manufacturing [149]. The cyber-physical system was developed and all systems are communicated using IoT devices, cloud computing, and machine learning [272,278]. These IoT devices help ‘Industry 4.0’ for providing services and also in manufacturing. This industry transform information through Industrial IoT (IIoT) [190]. The key components of this revolution include IoT, Cloud Computing, Big Data, Cyber Security, Cognitive computing, etc. [286]. German was initiative the Industry 4.0 as “smart manufacturing for the future” [155]. This revolution has emerged with the aim of achieving mass production and increasing productivity using innovative technologies i.e. similar to previous revolutions [25,213].

Industry 5.0: – This revolution in the industry is declared by the European Commission, after discussions with various funding agencies, and organizations in Research and Innovation workshops in January 2021. To provide services to humanity this industry focuses on and highlights innovation and research. It uses the blockchain concept to integrate the generated data from different industries. To achieve social goals like employment, the standard of living and development this industry plays a key role [43].

Industry 5.0 is not entirely new it is the upgrade version of industry 4.0. With growing technology, artificial intelligence uses in industries also improved. The capability of humans, interaction with computers, and robot workers gives efficient and effective results [279]. This industry proposed the 3D techniques (like 3D printing or additive manufacturing is used for creating 3D objects) with the use of IoT [80].

#### 7.4.2.Challenges in smart industry

Data Security: – As the rapid growth of technologies in the smart industry a huge amount of data is moving between IoT devices. The security of data is also a major challenge in the smart industry. The data protected and safe from unauthorized access is also a challenge to the industry.

Data Management: – This is also the major challenge when a huge amount of data is following over the network and between devices. Data need to be well structured and in a good manner to access and get efficient results. Storage: – Storage of industrial big data is very tough both for users and developers. Data generated from various resources IoT devices are scattered and not filtered. The storage of that type of data is not easy.

#### 7.4.3.Applications of PDS in smart industry

Remove ambiguity: – The data generated in the smart industry is ambiguous. Wang et al. proposed a technique “Fingerprint Summary” for cluster data de-duplication which is time and space-efficient. They use BF in this technique in each node, for reducing data duplication. For efficient detect and remove duplicate data [57] proposed a new data structure i.e. ‘Improved Streaming Quotient Filter (ISQF)’.

Validation: – BF is used for data validation. BF is used to reduce memory consumption and bypass the unnecessary comparisons in the validation process [109]. This process is space-efficient.

### 7.5.Smart energy

Smart energy plays an important role in solving the various issues of past, present, and future like healthcare, agriculture, the environment, sustainable development, and many more [79]. The energy-saving systems have already been developed in various cities and buildings, in the last few years. A lot of studies for efficient energy have already been done [142]. Instead of using the word “smart grid”, a broader approach is to use the terms “Smart Energy” or “Smart Energy Systems” (Fig. 21).

The energy system is also facing various challenges like stability, energy efficiency, cost control, operational efficiency, environmental issue, service management etc. [90]. By achieving smart energy management big data analytics provide new opportunities to deal with these challenges [18]. The innovative storage solution and distributed resources for efficient power transmission, clean power generation, dynamic power distribution, and rational electricity consumption have been proposed [188]. Smart grid achieves the energy transmission and data collection at the same time by integrating energy and information flow [187]. The primary focus of smart grids is on electricity sectors like cooling, heating, electricity, transportation, industry, building etc. It also provides affordable, achievable solutions for sustainable and renewable energy [168]. In the last few years, due to the huge growth in industry, energy consumption has also increased. It also increases data generation. With the emergence of ICT, the energy systems are being digitized [309]. The massive amount of generated data from energy-related sources needs to be well structured for efficient and fast results [177]. Due to the rapid increase in the population, energy consumption has also increased. Renewable energy like wind and solar power is also in its developing stage and has brought challenges like energy security and the adoption of new technologies [262]. Industries are using more than one-third of the total electrical energy used by the country, for production, construction, and mining. So in order to gather valuable information, they have collected a big database. This data is being used to raise the standard of living [231].

##### Fig. 21.

Smart energy.

The variety of big data, like the status of the device, consumption of electricity, and interaction with the user, is being collected by the smart grid. To process this big dataset, various techniques like analysis, optimization [310], clustering [232], classification [307] and forecasting [88] have been applied. So, the accurate prediction of electricity demand and consumption, the operation and generation of power in real-time can be optimized, and also effectively develop the pricing mechanism. With the use of big data analytics, the smart grid gives more control over the use of energy to the customer, supplies economic and reliable energy, responds quickly to the demand for electricity, and also quickly restores and detect the failure and many more facilities [124]. This also helps in taking decision for customers, producers, operators, and regulators in the smart grid [308]. Energy management uses emerging technologies like ICT (Fig. 22).

The pattern of consumption and production of energy is changed due to involving of big data. The energy big data has involved 3Es and 4Vs energy, empathy, exchange, value, velocity, volume, and variety [308].

##### Fig. 22.

Energy management (energy cloud) [197].

#### 7.5.1.Challenges in smart energy

There are various opportunities and some challenges brought by energy big data. Some challenges are listed here:

Effectively collection of energy big data: It is very difficult to collect data from energy resources for giving efficient and effective decision-making and quick responses.

Management storage of energy big data: This is also one of the major challenges for energy big data to provide better services to customers.

Mining and analyzing of energy big data: The mining of data for cleaning big raw data and analysis is a very tough job.

Lack of effective and efficient decision making: The decision making is one of the impotent key points of a smart energy system, it also needs to be improved by joining efficient techniques like PDS.

Privacy preserving of energy big data: This is one of the major concerns to giving security to energy big data. With the use of IoT devices, a huge amount of data is flowing without any security measures.

To achieve the above-discussed challenges efficient and effective tools and techniques like PDS are required.

#### 7.5.2.Applications of PDS in smart energy

Privacy Preserving: Zhang et al. proposed a mechanism Cuckoo-RPL (Routing Protocol for Low-Power and Lossy Networks) to defend from Advanced metering infrastructure (AMI) network from blackhole attack. Cuckoo-RPL is also useful to defend from other attacks like version number and gray hole attack [303].

Network traffic management: Chaudhary et al. designed a “SDN-enabled multi-attribute-based secure communication protocol” in the Smart Grid environment for their entity communication. They use a cuckoo filter for fast forwarding of data [56].

Load Balancing: Debnath et al. present a scheme i.e. ‘BloomFlash for flash storage device. This scheme also achieved the load balancing of elements across the BF component. This scheme proves that the BF is useful in load balancing techniques [72].

### 7.6.Smart governance

Smart governance means accessing the government services free and in a better manner using various free data projects. With the emergence of ICT in smart cities, smart governance can improve the services provided by the cities [159]. Developing the businesses of the individual smart economies can play an important role in providing a collaborative platform. A smart economy also emphasizes economic competitiveness in the development of the city, on the competitive edge economic and human activity [143].

Smart Economy: – Smart economy that is for success with high social welfare, sustainability, and resource-efficient, is based on innovative technologies. It also helps in improving the quality of life by adopting new innovative ideas and building new entrepreneurs, and start-ups, which increase competitiveness and productivity [263]. This whole system is based on technologies and ICT for urban planning and economic advancement. For social benefits smart economy is also expected to deliver more products and services without compromising energy and pollution [144].

Smart Government: – Smart government is the timely demand of the 21st century and smart governance is its key tool. The key pillar of smart governance is to use technology. The public administration is required to update itself with the emergence of technology. No one can oppose adopting smart governance as it is the new face of public administration, governance, and political process [237]. The e-government adopts the electronic process in the administration system and political system and it is the starting face of smart government. There is a trend of using modern technologies in public administration for smart governance systems by developed countries and their researchers, academicians, politicians, and practitioners [182]. The role of smart government is presented in Fig. 23.

##### Fig. 23.

Role of smart government.

An intelligent network is created in the sector of governance and it is directly related to the internet where people can connect with each other for communication even in remote areas. Better communication with real-time objects in the intelligent network is done through the distributed network [234]. As it is not an automated decision-maker so it is not artificial intelligence. It only connects to the people for gathering information and making the decision and use it in the future. Due to the rapid growth of IoT devices and digital applications, a lot of data is produced. The source of big data is social networking websites, mobile phones, daily household appliances, various private and government websites, and smart devices being used by various researchers.

The solutions to these challenges, problems, and threats are big data-driven technologies [75] like PDS. The government agencies taking future and proper decisions, and identify criminals and corruption, with the help of proper use of big data. The government is generating and managing the knowledge with their major responsibility [5].

#### 7.6.1.Challenges in smart governance

Adopting Big data technologies: Some developing countries are already adopting big data technologies in smart governance. But many are facing problems, due to their bad handling and management, lack of knowledge, cost-effectiveness, and data available being either unstructured or semi-structured.

Data Privacy: Much confidential data is uploaded by people and used by public administration. So, it needs to be protected from malicious users.

Applications of PDS in Smart Governance: As such, we have not found any existing use of PDS in smart governance. So here the scope of use of PDS is highly recommendable. PDS is an efficient data structure for managing data storage and is also compatible with new technologies. PDS also helps in protecting data.

### 7.7.Smart society

A smart society means to promote the satisfaction of citizens satisfaction and the well-being of metropolitan residents. In this reference, a smart society includes a large number of smart: people, infrastructure, education, living; water and waste management systems and many more [243]. Society 4.0 faces the challenge with information sharing and related knowledge, while in society 5.0 the increased process complexity and assured sustainability due to the massive amount of data combined with environment and human physical investigation. The major challenge with big data is to take real-time decisions [94].

#### 7.7.1.Society evolution

Society is coexistence with nature, and according to ethnographers, society 1.0 had begun with the birth of humans known as the hunting society (Society 1.0). In 13,000 BC the settlements had been firmly established and irrigation techniques had been developed known as an agrarian society (Society 2.0). At the end of the 18th century, the steam locomotives were invented and mass production had started, it was the industrial society (Society 3.0). In the latter 20th century computers were invented and the distribution of information was started, this is the information society (Society 4.0). At the beginning of the 21st century, “super-smart society” has introduced and known as (Society 5.0). The social evolution has categorized and concise as given below (Fig. 24).

##### Fig. 24.

Society evolution 1.0–5.0 [95].

Society 1.0: – This society’s evolution begins with the birth of humans. This society is also called a hunting society. People used simple tools for full fill their daily needs including food. People have changed their habits on the availability of resources [138].

Society 2.0: – In 13000 BC this social evolution was introduced with new developing agriculture techniques and it is also known as an agrarian society. With the advancement of technology and demographical changes, this revolution is transformed from the earlier society revolution [29]. In Mesopotamia, the hand-made pottery and the cultivation of barley and wheat were found at early stages [14].

Society 3.0: – This societal revolution begin at the end of the 18th century, when modern physics, gravity law, and the invention of the steam engine had discovered and also called Industrial Society. This society changes the face of the earth forever. As already discussed industry revolutions are one of the growing fields in terms of economy as well as academia [254]. This also builds the relationship through transportation, environment, society, and many more.

Society 4.0: – This is the Information Society, initially planned in 1972 in Japan. This society aims for a new era after the post-industrial revolution in the year 1985 [179]. Here the production of information promotes human creativity, and the transition, and development of society.

Society 5.0: – At the beginning of the 21st century this revolution was introduced with a vision of a “super-smart society”. This society provides solutions to many social problems through technologies because it is human-centered. This also improves the quality of life, the use of robots also increases, and also environment-friendly [14].

#### 7.7.2.Applications of smart society

The smart society is directly concerned with citizens and daily lives. The main applications are discussed below (Fig. 25).

Smart Home/Houses: – To increase the quality of life and independence, the homes are equipped with technologies called “smart home” [73]. Smart homes include home appliances like television, air conditioners, smart fans and lights, etc. are connected with IoT devices to effectively deliver the services. To achieve the goals of providing the services efficiently, the Smart Home Reasoning System (SHRS) plays an impotent role to make decisions [181]. Reducing environmental emissions, energy management, and increasing home automation is the primary objective of smart home [229].

Smart Living/People: – Smart living gives new opportunities to citizens to increase their standard of living. It needs to follow an inclusive strategic approach across all age groups and demographics [203]. It provides solutions that are controllable, productive, sustainable, economical, and efficient. Smart living is changing people’s lives with the emergence of new technologies. Yan et al. propose an architecture that controls home lighting using a Bluetooth-based Android smartphone [295]. Peoples are safer in their homes, if they have to face any problem, they call independently also the objective of smart living [153].

Smart Buildings: – Smart building is defined as it is efficient energy management, a convenient and comfortable environment with reasonable investment, and is designed to provide service and management [169]. A smart building also includes automated processes like security, heating, lighting, air conditioning, ventilation, and many more [221]. To develop the smart building the things required are future-proof devices, IT skilled team, and robust wireless infrastructure. The basic components of smart building for security includes CCTV system, access control, intrusion system, gate automation etc [184].

##### Fig. 25.

Applications of smart society.

Smart Education: – With the advancement of technologies everything may be interconnected, instrumented with AI [313]. Smart education is also an emerging area nowadays and also needs attention from both researchers and academics [135]. Various smart education projects [121] have already been performed in recent years, in which the first smart education project is carried out by Malaysia in 1997 [52]. Smart education [62] faces certain issues like accessing student knowledge, comparing behavioral patterns of a student, data integration, data mining, detecting effective and emotional state of the student, and many more [60].

#### 7.7.3.Challenges in smart society

Efficiency: – All resource uses in a smart society have their limits like battery power, memory storage, and bandwidth required for communication. They directly affect efficiency, which also increases with the rapid growth in data.

Heterogeneous data: – The data generated in the super-smart city is heterogeneous, which leads to challenges in processing, analyzing, and mining data. Getting adequate information from big data in heterogeneous information networks is also a big challenge.

Privacy preserving: – On increasing of smart technologies in a smart society the issue of security, and privacy is the main concern. Challenges in all aspects like access control, authentication, policy enforcement etc.

Applications of PDS in Smart Society: As such we have not found any existing use of PDS in the smart society. So here the scope of use of PDS is highly recommendable. In lieu of the above challenges, there are many studies and researches available that PDS and its variants give an efficient result with memory management, and storage, also to handle heterogeneous data. PDSs also provide security to the data.

### 7.8.Smart sustainability

The idea of sustainability was introduced in 1987. Sustainability define that the to meet today’s needs without sacrificing the future ability to fulfill their requirements like social, economic, and environmental [19]. Emerging technologies and digital governance are also part of smart sustainability [178]. It also requires a balance between the technology, policy, and management by local government [235]. The main pillars of sustainability are environmental, economic, and social. Smart sustainability has the potential for solving the problems of urban areas [296]. In sustainable development, governance plays an impotent role [84]. In sustainable development’s planning and implementation, the stakeholder and policymakers lack practical research knowledge. To achieve this new technologies like ICT, IoT and cloud are being used in a smart city. Modern technology and creativity are being focused on the framework of smart cities as compared to sustainability cities depending on the data-driven identification of the dynamic changes in the broadcast relationship [258]. To achieve the required level of sustainability the data came from various sources in smart cities are need to be well structured. The IoT technologies measures in smart sustainable cities are air and water quality, green urban areas, tourism and culture, energy, digital transformation, legality, and security (Fig. 26). Sustainable economic advancement includes all the factors that are included in a smart city. These factors are green building, smart education, social responsibility, water management, sustainable energy, smart health, smart governance, natural resource management, sustainable transportation, and waste management (Fig. 27).

##### Fig. 26.

Smart sustainable.

##### Fig. 27.

Smart sustainable city.

#### 7.8.1.Challenges in smart sustainability

Collection of data: – For developing sustainability a huge amount of big data is required at a place.

Storage management: – The management of this data in a proper manner is a big challenge.

Privacy and Security: – Security and privacy of this data is also the main concern.

Applications of PDS in Smart Sustainability: As such we have not found any existing use of PDS in smart city sustainability. So here the scope of use of PDS is highly recommendable. The PDS and its variant are very much effective in data collection with less time and efficient storage management. It also provides security to the data movement in smart sustainability.

## 8.Comprehensive analysis of PDS in smart city

In the smart city, the data, and information are the entrance to instantly bounded competitive benefits. Today, billions of people are accessing and releasing huge amounts of data via the internet, and social networks. This growth in data required efficient storage and handling of data is a big challenge for both academia and industry [96]. To improve the efficiency of data access and testing, the storage of monthly or annual data production from various companies, hospitals, institutions, and forests is at data centers [118]. The variants of data structures in PDSs are important for big data and live streaming systems. A BF is a probabilistic randomized data structure given by Burton Bloom, for efficiently storing information of static sets to support membership queries [36]. Presently BF is widely used in many networking and security algorithms [201]. After studying the existing uses of PDS in smart cities, it is found that many researchers have used BF in healthcare for efficient storage of patient data, privacy-preserving etc. in various applications Some of the existing applications of PDS in a smart city are recapitulated in below (Table 6).

##### Table 6

Existing use of PDS in smart city

 Author/ Year Area PDS Description Contribution of PDS Limitation D Liu et al. [161]/2022 Smart Cities Counting Bloom Filter (CBF) The data storage scheme has proposed which is distributed and secure, it is used in edge computing where blockchain is enabled CBF is used where the storage checking is failed. Then CBF recognize the data dynamically and locating error data It is tough to calculate the value of counter and it also increases the memory overhead. CongPu et al. [217]/2022 General* Bloom Filter (BF) The two mechanisms liteSAD and proDIO has been proposed to investigating the sybil attack. BF is used to reduce the processing time and memory cost. BF is not removed from this proposed scheme. G. S. Aujla et al. [111]/2021 Healthcare Vehicles in COVID Deletable Bloom Filter (DBF) DBF is use to overcome congestion and improve responsiveness. DBF maintain the global information of the flow tables and edge devices. DBF facilitates the interoperability of network devices If more then one item has same but index in DBF then collision is occurred, Issue of Fault tolerance, Flow table management. Seham A., et al. [11]/2021 Healthcare Bloom Filter (BF) An infection control system is introduced with the used of Blockchain for privacy-preserving. In this system one leader elected by authority update the two BF one for infected user and other for close contact user Two BF are used for infected and suspected users, which reduce the storage space A major drawback of using BFs is that their is no function for deleting data. Heiko et al. [40]/2021 Smart City Bloom Filter (BF) An overlay network is proposed for related trust and reliable issues, which is fully decentralized. BF used to increase privacy of client’s Data security and Privacy issue to overlay users. Alshdadi et al. [15]/2021 Smart Vehicle (Transport) Bloom Filter (BF) A system is proposed to minimize the cyber attacks, it also increase the security of smart vehicle. This system is IoT-based Cyber-Physical System. BF is used for authenticating the vehicle ids No proper data management. S Bhatia et al. [35]/2021 Healthcare Mortan Filter (MF) (advanced Cuckoo filter) A technique is proposed to provide security to the patient personal data which use cloud for electronic transformation of patient records. Morton filter improves security and throughput as compare to exiting mechanisms It use underloaded buckets and many sparse buckets that are combined into a block so that data stored is more densely. V Leithardt et al. [154]/2021 Smart Transportation Bloom Filter (BF) A system is design to provide the security and privacy to the data used for License Plate Recognition (LPR) in smart city. Also improve the performance of blockchain based storage. BF is used to maintain the user’s privacy and also oppose attacks from third-party Blockchain may be fail due to the shortcoming in engineering requirement and no standard. K Wang et al. [283]/2021 Healthcare Bloom Filter (BF) A system is proposed which provide privacy to searchable encryption method to patient data. BF is used for searching the values and store in verification table use of Multiset hash. F. Alassery et al. [12]/2021 Smart Environment Bloom Filter (BF) Propose a mechanism for fast packet delivery in IoT using BF. They reduce the size of routing information by using aggregation. BF is used in all sensor nodes for collecting routing data. Affect life of battery on increasing the size of BF.
##### Table 6

(Continued)

 Author/ Year Area PDS Description Contribution of PDS Limitation Soleymani et al. [252]/2021 Smart City, Smart Transportation Quotient Filter (QF) A scheme is proposed which used for privacy preserving and message authentication of vehicle node. QF maintain the authorization of vehicles in VANET. Security needs to be more focused. D.S. Jean Michel et al. [69]/2021 Smart Grid, Smart Energy Cuckoo Filter (CF) This project represent the analysis and storage management of smart grid’s big data. CF is used to store and access smart grid’s data. Security needs to be improve. C Kalalas et al. [132]/2020 Smart Transportation Cuckoo Filter (CF) A scheme for vehicle authentication which extend the 5G-AKA (authentication and key agreement) is proposed. CF is used to improve the space efficiency. CF is used to achieve authentication of multiple vehicles at a time in space requirement No road side unit for broadcast to adjacent vehicles for message verification. PP Ray et al. [223]/2020 E-Healthcare Bloom Filter (BF) A Blockchain and IoT based scheme is proposed for simplify the payment verification process in real life healthcare applications. BF use in privacy preserving Bitcoin transactions raised the problem of resource-constrained tool, limited due to new technology. Su, Yuan, et al. [259]/2020 Electronic Health Records (EHRs), Cuckoo Filter (CF) An authorized certificate less conjunctive keyword search on encrypted EHRs, is proposed Improve search efficiency and allow data owners to flexibly manage (insert and delete) their EHRs in the cloud. space of hash tables in cuckoo filter become smaller due to can’t avoid false positive. Singh A et al. [245]/2020 Smart Devices, Smart Grid Bloom Filter (BF) A scheme is proposed to handle the data traffic by managing network resources. They also perform security checks for the secure transformation of data using double hashing. BF is used for storage management Not consider the impact on quality-of-service (QoS). B Peng et al. [211]/2020 Air Quality, Smart Environment Bloom Filter (BF) A query optimized method is proposed for storing Optimized Row Columnar (ORC) format data for air quality based on row group index and BF index. BF is used for indexing adjusting the number of hash functions and bit set length is required for best efficiency. Che et al. [57]/2020 Industry, General* Bloom Filter (BF), Quotient Filter (QF) Propose a new data structure i.e.’Improved Streaming Quotient Filter (ISQF) which is used to detect and delete the duplicate data ISQF is used to store the signatures of elements in a data stream and provide nearly zero error rate. need to handle conceptual data drift. S. Garg et al. [98]/2020 Internet of Vehicle Count-min Sketch (CMS), Bloom Filter (BF), Quotient Filter (QF), HyperLogLog (HLL) A scheme is proposed for Software-Defined Internet of Vehicle (SD-IoV) to manage the traffic of data, detect the anomaly in suspicious node, check cardinality using PDS CMS is used for traffic management, BF is used for anomaly detection, QF is used for fast and efficient storage of nodes, HLL is to measure the cardinality of each flow passing through switch Compromising the sensitive information using attacks. B. Charyyev et al. [55]/2020 Smart Home Locality Sensitive Hashing (LSH) Propose a method to analyze the voice and utilize the network traffic of a smart speaker to fingerprint the voice command. LSH is used to analyze the voice command for smart home speaker assistance. Traffic flow classification.
##### Table 6

(Continued)

 Author/ Year Area PDS Description Contribution of PDS Limitation N. Giatrakos et al. [103]/2020 Smart City Locality Sensitive Hashing (LSH) A technique is proposed to provide a direct way for the accuracy of bandwidth during detection of outlier procedure. They also elaborate on the applicability of their technique in smart city applications. LSH is used during outlier detection for examining operational mode. Not able to detect network-level attacks. S. Kulkarni [145]/2020 General* Count-min Sketch (CMS) Analysis the various methods of data streaming of CMS is one of the useful sketch for cheeking number of occasions of standard things no implementation proof. F. Peng et al. [212]/2019 Smart City, Smart Environment Bloom Filter (BF) Propose a scheme to achieve data privacy which optimizes the local differential privacy algorithm in mobile crowdsensing systems and for data analysis a data aggregation algorithm is proposed BF is used to remove noise data and reduce the number of a participant in task Problem in getting meaningful statistic because of large size BF. A. Islam et al. [125]/2019 Healthcare Bloom Filter (BF) A blockchain-based scheme is proposed to provide protection from cyber threats in the healthcare system BF is used to reduce the transmission of data for authenticating the users. With the increase of users and cases, processing and validation time also increases. Xu et al. [293]/2019 E-Healthcare Variant Bloom Filter (VBF) In this study the e-healthcare system data sharing to assist the cloud to achieve privacy protection. BF and message verification code is used to protect healthcare data VBF use for message authentication code to classify Personal Health Information (PHI) files Difficulty in deletion of data and false positive rate may exist. Mahmoud et al. [173]/2019 Smart Environment (Water Distribution) Bloom Filter (BF) A blockchain based technique for smart meter data aggression in water distributed system is proposed. BF is used to identify the customer. Data loss due to data tampering, require high integrity of data. T Zhang et al. [303]/2019 Smart Meter, Smart Grid Cuckoo Filter (CF) Study and propose new blackhole attack which is bypass the existing defense mechanism and to protect Advanced metering infrastructure (AMI) from this attack a new technique i.e. ‘Cuckoo-RPL’ (Routing Protocol for Low-Power and Lossy Networks) based on cuckoo filter. CF is used to create a hash table to store all the legal members of the AMI network. consider external attacks only not internal attacks. S. Garg et al. [99]/2019 Smart Transport Quotient Filter (QF) Propose a technique secure the Vehicular Ad hoc Networks (VANETs) communication using QF. QF is used to check whether a node has entered in the network and also check any attack initiation in network. Security need to be more focused. Ni et al. [198]/2019 Smart Parking Cuckoo Filter (CF) Propose a parking protocol which is secure and privacy preserving using two factor authentication for self driving vehicles. CF is used to protect the user’s location privacy. no security from cyber attacks.
##### Table 6

(Continued)

 Author/ Year Area PDS Description Contribution of PDS Limitation Liu, Hong, et al. [162]/2018 Healthcare Bloom Filter (BF), MinHash A scheme for privacy preserving of wearable devices and control data access, unique authentication in smart healthcare system is proposed. BF is used for data efficiency without disclose the privacy, MinHash is used for authentication privacy preserving to find the similar data fields without using personal information of different patient Some issue in big data analysis, prediction, intelligent inference. Dong Zheng et al. [306]/2018 Smart Healthcare Bloom Filter (BF) A scheme for sharing medical data efficiently is proposed. The attribute based encryption for user privacy is used. BF is use to control the access by hiding all attributes. Fail in cipher text verification on cloud. Mahmood A et al. [65]/2017 smart transportation Cuckoo Filter (CF) A privacy preserving scheme is proposed for Vehicular Ad-hoc Networks (VANETs) which is independent of hardware CF to improve authentication efficiency in the batch message verification phase No signature authentication. W Song et al. [253]/2017 General* Bloom Filter (BF) Propose a scheme which is secure and efficient, which provide privacy on retrieval of encrypted large amount of cloud data. BF is used in retrieval algorithm for tree indexing no risk evaluation and security risk on collusion attacks. Zhang et al. [300]/2016 Smart Transportation (RFID) Bloom Filter (BF) A mechanism for reducing the data transmission rate while identification of process to improve the efficiency and accuracy is proposed. BF is used to increase the efficiency data with reducing transmission rate during process identification. More hash functions are required to reduce false positive rate. E Yousef et al. [297]/2016 Smart Environment (Water Pollution) Bloom Filter (BF), Counting Bloom Filter (CBF) In this study a scheme for monitoring the water pollution which is energy efficient is proposed. BF is used to Save the energy through reducing the transmissions rate. Privacy need to improve, data management. Amadeo M et al. [16]/2016 Smart City Bloom Filter (BF) Propose a Information Centric Networking (ICN) Model which approve the data dissemination BF is use for storing the path information in source-routing Practical deployment of ICN. A Goyal et al. [109]/2016 Industry, General* Bloom Filter (BF) Propose and space efficient algorithm using BF and de-normalized schema to validate the data of two cross databases (RDBMS and NoSQL) for making decision and providing accurate information. BF is to check the element of set using is_member function. small probability of false positives. B Wang et al. [282]/2014 General* Bloom Filter(BF), Locality Sensitive Hashing (LSH) Propose a scheme overcome the problem of multi keyword fuzzy search over encrypted data LSH function is used in BF to construct file index to provide efficient solution not able to represent the identical bi-gram (used for keyword construction). G. Li et al. [158]/2014 General* Count-min Sketch (CMS) A scheme is proposed for anomaly detection in Wireless Sensor Network (WSN) using CMS. CMS is used for summarize the data Yet to implement. Vatsalan D et al. [277]/2013 Healthcare, Government Bloom Filter (BF) Propose a record linkage technique between database and organization. It would also provide privacy to records. BF is used for record matching Not able to deal with re-identification attacks.
##### Table 6

(Continued)

 Author/ Year Area PDS Description Contribution of PDS Limitation Beretka et al. [33]/2013 Smart Energy Locality Sensitive Hashing (LSH) Propose an algorithm to rising the power quality by distributed local generation. LSH is used as a feature sets which are extracted from load data using auto-encoders User require prior training of auto-encoder. Durham et al. [81]/2010 Healthcare Bloom Filter (BF) A mechanism for matching the patient record using string comparison method to integrate the record with corresponding patient is proposed. BF is used for approximate matching with a patient medical record. Too Many hash functions for each field

*: – May be applicable in smart city applications.

## 9.Generational data management

A massive amount of data is flowing in a smart city. Obviously, this data needs to be managed for their well-being to be used for both personal and civic data services. The use of WEB 3.0 for data management is also important. Islam et al. proposed a blockchain-based mechanism using bloom filter which provides protection from cyber threats in the healthcare system [125]. In this concern, Liu et al. proposed a blockchain-based distributed data storage scheme with enabled edge computing. The counting bloom filter is used when the storage checking fails to locate the error data and to realize data dynamics [161]. Similarly, Nie et al. propose a secure and privacy-preserving blockchain-based data-sharing scheme. For secure profile matching, the ‘Key-policy attribute-based encryption’ algorithm is used, and to verify the authenticity of ciphertext, a bloom filter with hash functions is designed [200]. A secure framework in a sustainable city environment is proposed by Singh et al. for smart parking that is energy efficient and blockchain-enabled. For secure communication of parking zone data, the Elliptic Curve Cryptography (ECC) algorithm is used at the transport layer to encrypt and decrypt the data [247].

## 10.Research opportunities and challenges in smart city

As discussed in Section 1.1, big data analysis, retrieval, and processing have very high importance from the perspective of smart cities. As storage and retrieval of large volumes of unstructured data, especially when responses are required in real-time, remains a significant challenge for researchers. From the extensive LR done, some of the identified research areas are:

Filtering and processing of sensor data: – The data generated by various sensors and wearable devices have some limitations like security, privacy, ethics, data format, user acceptance, and big data concerns. Also, have incompatibility issues between data and information. The information collected by these devices may contain some noise. This sensor data may be also corrupted by the signals antiques like missing value and noise issues, which significantly reduce phase performance. Before using data for any future analysis, this issue needs to be addressed.

Difficult to monitoring user’s social networking data: – It is difficult to monitor users’ data on social networking sites. In smart healthcare, doctors can’t rely on data from social media. But still, a system for detecting the psychological disturbance in patients is presented named emotional healthcare. Some techniques used to detect depressive and stressful content are Convolutional Neural Networks (CNN), Recurrent Neural Networks (RNN), and Bi-directional Long Short-Term Memory (Bi-LSTM). Also proposed recommendation systems for patients to get a text-based on the results while they had been monitored.

Lack of efficient computational models and data structures: – The expansion of massive data has generated large and complicated data sets. Traditional methods used for storage and retrieval of data, increase the computation overhead in big processing and hence are unable to meet the requirements of the users. For big data handling some advanced data, models are required which produce results in minimum time with minimum computation overhead. One of the basic problems of massive data is to style efficient computational models and data structures for solving these problems in big data.

The need for energy-efficient data processing: – The rise of uses of IoT devices concept paved the way for a smart city. Energy management has become a major issue resource as IoT devices are constantly consuming huge amounts of energy. These concerns must be considered in order to establish an effective mechanism. The efficient use of power aims to promise a sustainable city. In addition, IoT devices produce a larger amount of data which is needed for optimal processing. Enforcing another challenge in securing smart cities is the sharing of data and access control.

Data storage and processing: – Smart Cities take hold of the relative advantage of being robust for storing data and processing it in the information world. The applications of smart cities have generated a continuous huge amount of data from different sources. The existing traditional methods are insufficient to manage this volume of data and they have restricted processing speed and effective storage expansion costs. To overcome this issue, efficient computational models and data structures are required.

Volume data: – Although it is difficult to quantify this challenge because data sets are typically very big like hundreds of terabytes or more. The traditional storage system like Relational Database Management Systems (RDBMS) and new big data technologies like Hadoop are developed to efficiently count the data that must be kept and processed. The new data structures are also required to handle and produce results at run time.

Real time response: – The Response Time (RT) in the smart city is very important in term of service, results, data transmission etc. The RT directs to the fact that the data transmission and business data infrastructure at elevated cost and should be considered with fewer delays. In this case, various techniques have been used which are depending on the situation and difficulty of analysis. In smart cities where responses need to be given after processing huge data or by handling streaming data, then traditional techniques and approaches are not efficient. So some new data handling techniques/data structures are required to provide results in real-time.

Variety of Data: – In the smart city the source of a variety of data is also a major issue. The smart city data is available in different data sets for different applications of smart city and with different format styles like audio, video, images, text etc. The data collected from a variety of sources are ambiguous, unstructured, or semi-structured. This data needs to be in a well-structured manner for effective and efficient results. The traditional methods are not sufficient or even do not provide accurate results in the real-time scenarios. For this data management, a new data structure like PDS is required.

Searching and Retrieving for a data item from the big data: – In huge data, the task of efficiently searching and retrieving appropriate data for review in the petabyte and exabyte ranges, in a variety of formats, is a major challenge. In some applications when deadlines are associated, this challenge becomes more tedious.

Stream Processing(Data Collection and Distribution Analysis): – The streaming of big data in smart cities is a big challenge. There is a huge amount of data is flow in the smart city. The processing and analysis of this set of data are required. When raw data is combined like vehicle and road, geolocation sensor and social media, weather data, then the streaming of data may cause some issues like a too long time in result, access problem etc. Also some problems in development like it is still dependent. Various decisions and predictions are based on this type of data like traffic, future power consumption, etc. Some traditional systems are proposed but with various limitations. A new mechanism is required to design for efficient and effective results.

Integration of Heterogeneous Data Sources (Diversity Consolidation): – The data coming from different sources is not in proper sequence, maybe in a different format, or ambiguous. The operation on these data sets is difficult to apply like validation, authentication, updating, alteration etc. This data is need to be well structured and organized.

Natural Text Analysis and Communication(social media analysis): – The analysis of natural text from social networks is available through mobile devices like smartphones is also challenging. This data is used for monitoring the behavior and emotions of citizens in real life. The information on location from various social networking sites is not in a proper format. The data include the comments and statements about the user’s feelings, thoughts, interests, relations etc. are integrated with sensor data. The reliability of this data is also a big challenge.

Ambient Intelligence issues/challenges (specific to the current generation of smart city domains): – The expeditious growth of conversion from rural to urban areas and urban to smart cities is increasing rapidly. It also increases the usage and deployment of smart technologies in everything and everywhere in the city. So, smart cities may also face some ambient intelligence issues. The main concern in smart cities is AI, and privacy when combine with automation and autonomous system [257]. This may also create some design trade-offs like:

Human control vs. automation: Fail to recognize the speed limit sign or fooled by scam stickers on road by autonomous driving [86]. Autonomous car driving (Uber) had met a deadly accident with a woman walking at night [185]. Hard Behavior is also one of the problems in autonomous systems. An example of this is conversations of customers with fully automated call centers, and online shopping without the involvement of humans. In AI-based behavior, there are missing traceability, transparency, and incomprehensible decisions.

Privacy vs. smartness: There is also a trade-off between privacy and intelligence. To provide smartness, the data should be provided to smart services. For this, blockchain is used, which is also cost-effective.

Infrastructure: To improve the standard of citizens, sensor technology is used for analyzing and gathering information. These sensors generally collect the data like air quality, crime rates, rush hour stats, etc. To install these types of sensors a complicated and costly infrastructure is involved [224].

Hackers vs Security: The threat level to security has also increased, as the uses of sensors and IoT technology have expanded.

Being Socially Inclusive: The programs like ‘smart transit’ which is a great idea for the bustling city for real-time updates. This may also raise some issues like: some people in smart cities can’t afford to take transit, all elderly growing people do use smart mobile devices or apps, and how it is possible to use and reach technology to these groups of people?

## 11.Discussion and conclusion

In this paper, we have discussed the role of big data in Section 1.1, where we emphasize the importance of big data in smart cities. In Section 2 the generations of smart cities have been discussed. There are various architectures that have been discussed by many researchers, but the most appropriate architecture for the smart city is elaborated in Section 4. Los Angeles took the first action or made the first contribution to smart city projects in 1974. They analyze the urban big data. In Section 5 listed various projects to date and also future plans regarding smart cities. The role of big data in smart cities is crucial in a smart city. Probabilistic Data Structures (PDS) have been discussed (Section 6) as a key solution to many applications of smart cities. This paper also emphasizes the various applications of smart cities, such as smart healthcare, smart transportation, smart environment, agriculture, smart governance and economy, smart society, people, education, and smart sustainability (Section 7). It has been found, after going through various proposed techniques in the area of smart cities; that there is an influence of big data in a smart city. Generated data is inconsistent, semi-structured, or unstructured, lack of efficiency in retrieval and storage management of data, privacy, and security has major concerns. In the smart city, the collection of data itself is a big challenge. The data collected using IoT devices, records (medical history), social media, and web pages are too large. Obviously, this data is redundant and unstructured. Various monitoring systems in smart cities also have issues in the analysis and representation of this big data with low dimensions. Many researchers have put their sincere endeavors into extracting information from a huge amount of knowledge databases. The main challenge in this effort is that there is no standard approach to efficiently map and keep the big data on consistent data structures. The existing tool and techniques cannot work efficiently and satisfactorily in data management. To store and processing of data for optimal recovery and exploring procedures, data structures like PDS is one of the adequate standards to use. In the conclusion of this paper after reviewing related work, we have listed the exiting and scope of PDS in various applications of smart cities in Table 7.

##### Table 7

Domain specific approaches of PDS in smart city

 PDS Smart Healthcare Smart Transport Smart Environment Smart Industry Smart Energy Smart Governance Smart Society Smart Sustainability BF ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ CMS ⋆ ✓ ✗ ✗ ✓ ✗ ✓ ⋆ LSH ✓ ⋆ ✓ ✓ ✓ ⋆ ✓ ⋆ QF ✓ ✓ ✓ ✗ ✓ ✗ ⋆ ✗ CF ✓ ✓ ✓ ⋆ ✓ ✗ ⋆ ✗ HLL ✗ ✓ ✗ ✗ ⋆ ✗ ⋆ ⋆

⋆: – May be Applicable.

A few research opportunities and challenges have been concluded after analyzing the existing research available for the future. We will try to address some of these research challenges (Section 10). The goal of this paper is to provide a comprehensive review of PDS and its applications in the domains of smart cities. The foremost aim of this paper is to provide a detailed survey of PDS in smart cities for readers and researchers who want to explore this field, along with the research opportunities in the domains.

None to report.

## Appendices

### AppendixAcronyms

The acronyms used in this paper are listed in Table 8.

##### Table 8

Acronyms used in the survey and their definitions

 Acronym Definition Acronym Definition AI Artificial Intelligence ORC Optimized Row Columnar AMI Advanced Metering Infrastructure PHI Personal Health Information AMQ Approximate Member Query PoAuth Proof-of-Authentication APN Access Point Name PURSUIT Pursuing a Pub/Sub Internet ARRA American Recovery and Reinvestment Act QF Quotient Filter BF Bloom Filter RDMS Relational Database Management Systems Bi-LSTM Bi-directional Long Short-Term Memory RFID Radio Frequency Identification CBF Counting Bloom Filter RGI Row Group Index CC Cloud Computing RNN Recurrent Neural Networks CCTV Closed Circuit Television RPL Routing Protocol for Low-Power and Lossy Networks CF Cuckoo Filter RSU Road Side Unit CIAM Content-centric IoT-based Air pollution Monitoring RT Response Time CMS Count Min Sketch SAE Staked Auto-Encoder CMoS Central Monitoring Station SAM Smart Agricultural Monitoring CNN Convolutional Neural Network SAR Synthetic Aperture Radar COVID-19 Coronavirus Disease-19 SDN Software Defined Networking DDS Digitale Stad SEM Smart Environment Monitoring DoS Denial of Services SHRS Smart Home Reasoning System DRL Dynamic Range Learning SIS Smart Irrigation System EHRs Electronic Health Records SQL Structured Query Language FMS Farm Management System SSD Single Shot MultiBox Detector GDP Gross Domestic Product STS Smart Transportation System GoI Government of India SVM Support Vector Machine GPS Global Positioning System UAV Unmanned aerial vehicle HDFS Hadoop Distributed File System VANETs Vehicular Ad-hoc Networks ICT Information and Communication Technology VBF Variant Bloom Filter IIoT Industrial Internet of Things VPN Virtual Private Network IoHV Internet Healthcare Vehicle (IoHV) VMKSE Verifiable Multi-Key Searchable Encryption IoT Internet of Things WDS Water Distribution System ISQF Improved Streaming Quotient Filter WFH Work From Home ITS Intelligent Transportation System WSNs Wireless Sensor Networks LRP License Plate Recognition YSCP Yokohama Smart City Project LSH Locality Sensitive Hashing 3G, 4G Third, Fourth Generations LSTM Long Short-Term Memory 5G Fifth Generations LTE Long-Term Evolution 6G Sixth Generations