Latest IEEE Big Data Projects:
TN Tech World (TNTW) is a pioneer in developing the latest IEEE Big Data Projects for CSE in online, IT, and MCA students. TN Tech World offers customized projects for the 2023 IEEE Big Data Project online. Java Big Data Projects are developed using Eclipse IDE and Database as MYSQL. TN Tech World (TNTW) provides Latest IEEE Big Data Projects for CSE in online, IT, B.E., B.Tech., M.E., M.Tech., M.S., M.Sc., MCA, B.Sc., BCA, and Diploma Projects.
TN Tech World (TNTW) brings you the widest variety of the latest IEEE Big Data Projects for CSE in online students, researchers, and engineers. Get big data project topics and ideas for study and research. Our list of projects on big data is updated every month to add the latest IEEE Big Data Projects for CSE online students as per the latest technologies. Keep visiting this page for the updated list of the Latest IEEE Big Data Projects for CSE online that make use of mining the data to deliver various functionalities.
Big Data:
Big Data is a term used to describe a collection of data that is huge in size and yet growing exponentially with time. Big Data analytics examples include stock exchanges, social media sites, etc. Big data refers to massive, complex, and high-velocity datasets. As stated above, big data is the fuel that powers the evolution of AI’s decision-making. In addition, big data can be explored and analyzed for information and insights.
Advantages:
- Customer Acquisition and Retention
- Focused and Targeted Promotions
- Potential Risks Identification
- Innovate
- Complex Supplier Networks
- Cost optimization
- Improve Efficiency
Latest IEEE Big Data Projects List:
Abstract
Clustering techniques have been widely adopted in many real world data analysis applications, such as customer behavior analysis, targeted marketing, digital forensics, etc. With the explosion of data in today’s big data era, a major trend to handle a clustering over large-scale datasets is outsourcing it to public cloud platforms. This is because cloud computing offers not only reliable services with performance guarantees, but also savings on in-house IT infrastructures. However, as datasets used for clustering may contain sensitive information, e.g., patient health information, commercial data, and behavioral data, etc, directly outsourcing them to public cloud servers inevitably raise privacy concerns. In this paper, we propose a practical privacy-preserving K-means clustering scheme that can be efficiently outsourced to cloud servers. Our scheme allows cloud servers to perform clustering directly over encrypted datasets, while achieving comparable computational complexity and accuracy compared with clusterings over unencrypted ones. We also investigate secure integration of MapReduce into our scheme, which makes our scheme extremely suitable for cloud computing environment. Thorough security analysis and numerical analysis carry out the performance of our scheme in terms of security and efficiency. Experimental evaluation over a 5 million objects dataset further validates the practical performance of our scheme.
Abstract
With the globalization of service, organizations continuously produce large volumes of data that need to be analysed over geo-dispersed locations. Traditionally central approach that moving all data to a single cluster is inefficient or infeasible due to the limitations such as the scarcity of wide-area bandwidth and the low latency requirement of data processing. Processing big data across geo-distributed datacenters continues to gain popularity in recent years. However, managing distributed MapReduce computations across geo-distributed datacenters poses a number of technical challenges: how to allocate data among a selection of geo-distributed datacenters to reduce the communication cost, how to determine the Virtual Machine (VM) provisioning strategy that offers high performance and low cost, and what criteria should be used to select a datacenter as the final reducer for big data analytics jobs. In this paper, these challenges is addressed by balancing bandwidth cost, storage cost, computing cost, migration cost, and latency cost, between the two MapReduce phases across datacenters. We formulate this complex cost optimization problem for data movement, resource provisioning and reducer selection into a joint stochastic integer nonlinear optimization problem by minimizing the five cost factors simultaneously. The Lyapunov framework is integrated into our study and an efficient online algorithm that is able to minimize the long-term time-averaged operation cost is further designed. Theoretical analysis shows that our online algorithm can provide a near optimum solution with a provable gap and can guarantee that the data processing can be completed within pre-defined bounded delays. Experiments on WorldCup98 web site trace validate the theoretical analysis results and demonstrate that our approach is close to the offline-optimum performance and superior to some representative approaches.
Abstract
Secure data deduplication can significantly reduce the communication and storage overheads in cloud storage services, and has potential applications in our big data-driven society. Existing data deduplication schemes are generally designed to either resist brute-force attacks or ensure the efficiency and data availability, but not both conditions. We are also not aware of any existing scheme that achieves accountability, in the sense of reducing duplicate information disclosure (e.g., to determine whether plaintexts of two encrypted messages are identical). In this paper, we investigate a three-tier cross-domain architecture, and propose an efficient and privacy-preserving big data deduplication in cloud storage (hereafter referred to as EPCDD). EPCDD achieves both privacy-preserving and data availability, and resists brute-force attacks. In addition, we take accountability into consideration to offer better privacy assurances than existing schemes. We then demonstrate that EPCDD outperforms existing competing schemes, in terms of computation, communication and storage overheads. In addition, the time complexity of duplicate search in EPCDD is logarithmic.
Abstract
The new generations of mobile devices have high processing power and storage, but they lag behind in terms of software systems for big data storage and processing. Hadoop is a scalable platform that provides distributed storage and computational capabilities on clusters of commodity hardware. Building Hadoop on a mobile network enables the devices to run data intensive computing applications without direct knowledge of underlying distributed systems complexities. However, these applications have severe energy and reliability constraints (e.g., caused by unexpected device failures or topology changes in a dynamic network). As mobile devices are more susceptible to unauthorized access, when compared to traditional servers, security is also a concern for sensitive data. Hence, it is paramount to consider reliability, energy efficiency and security for such applications. The MDFS (Mobile Distributed File System) [1] addresses these issues for big data processing in mobile clouds. We have developed the Hadoop MapReduce framework over MDFS and have studied its performance by varying input workloads in a real heterogeneous mobile cluster. Our evaluation shows that the implementation addresses all constraints in processing large amounts of data in mobile clouds. Thus, our system is a viable solution to meet the growing demands of data processing in a mobile environment.
Abstract
The skyline operator has attracted considerable attention recently due to its broad applications. However, computing a skyline is challenging today since we have to deal with big data. For data-intensive applications, the MapReduce framework has been widely used recently. In this paper, we propose the efficient parallel algorithm SKY-MR + for processing skyline queries using MapReduce. We first build a quadtree-based histogram for space partitioning by deciding whether to split each leaf node judiciously based on the benefit of splitting in terms of the estimated execution time. In addition, we apply the dominance power filtering method to effectively prune non-skyline points in advance. We next partition data based on the regions divided by the quadtree and compute candidate skyline points for each partition using MapReduce. Finally, we check whether each skyline candidate point is actually a skyline point in every partition using MapReduce. We also develop the workload balancing methods to make the estimated execution times of all available machines to be similar. We did experiments to compare SKY-MR + with the state-of-the-art algorithms using MapReduce and confirmed the effectiveness as well as the scalability of SKY-MR .
Abstract
Traditional parallel algorithms for mining frequent itemsets aim to balance load by equally partitioning data among a group of computing nodes. We start this study by discovering a serious performance problem of the existing parallel Frequent Itemset Mining algorithms. Given a large dataset, data partitioning strategies in the existing solutions suffer high communication and mining overhead induced by redundant transactions transmitted among computing nodes. We address this problem by developing a data partitioning approach called FiDoop-DP using the MapReduce programming model. The overarching goal of FiDoop-DP is to boost the performance of parallel Frequent Itemset Mining on Hadoop clusters. At the heart of FiDoop-DP is the Voronoi diagram-based data partitioning technique, which exploits correlations among transactions. Incorporating the similarity metric and the Locality-Sensitive Hashing technique, FiDoop-DP places highly similar transactions into a data partition to improve locality without creating an excessive number of redundant transactions. We implement FiDoop-DP on a 24-node Hadoop cluster, driven by a wide range of datasets created by IBM Quest Market-Basket Synthetic Data Generator. Experimental results reveal that FiDoop-DP is conducive to reducing network and computing loads by the virtue of eliminating redundant transactions on Hadoop nodes. FiDoop-DP significantly improves the performance of the existing parallel frequent-pattern scheme by up to 31 percent with an average of 18 percent.
Abstract
Privacy has become a considerable issue when the applications of big data are dramatically growing in cloud computing. The benefits of the implementation for these emerging technologies have improved or changed service models and improve application performances in various perspectives. However, the remarkably growing volume of data sizes has also resulted in many challenges in practice. The execution time of the data encryption is one of the serious issues during the data processing and transmissions. Many current applications abandon data encryptions in order to reach an adoptive performance level companioning with privacy concerns. In this paper, we concentrate on privacy and propose a novel data encryption approach, which is called Dynamic Data Encryption Strategy (D2ES). Our proposed approach aims to selectively encrypt data and use privacy classification methods under timing constraints. This approach is designed to maximize the privacy protection scope by using a selective encryption strategy within the required execution time requirements. The performance of D2ES has been evaluated in our experiments, which provides the proof of the privacy enhancement.
Abstract
The traditional methods of clustering are unable to cope with the exploding volume of data that the world is currently facing. As a solution to this problem, the research is intensified in the direction of parallel clustering methods. Although there is a variety of parallel programming models, the MapReduce paradigm is considered as the most prominent model for problems of large scale data processing of which the clustering. This paper introduces a new parallel design of a recently appeared heuristic for hard clustering using the MapReduce programming model. In this heuristic, clustering is performed by efficiently partitioning categorical large data sets according to the relational analysis approach. The proposed design, called PMR-Transitive, is a single-scan and parameter-free heuristic which determines the number of clusters automatically. The experimental results on real-life and synthetic data sets demonstrate that PMR-Transitive produces good quality results.
Abstract
In big-data-driven traffic flow prediction systems, the robustness of prediction performance depends on accuracy and timeliness. This paper presents a new MapReduce-based nearest neighbor (NN) approach for traffic flow prediction using correlation analysis (TFPC) on a Hadoop platform. In particular, we develop a real-time prediction system including two key modules, i.e., offline distributed training (ODT) and online parallel prediction (OPP). Moreover, we build a parallel k -nearest neighbor optimization classifier, which incorporates correlation information among traffic flows into the classification process. Finally, we propose a novel prediction calculation method, combining the current data observed in OPP and the classification results obtained from large-scale historical data in ODT, to generate traffic flow prediction in real time. The empirical study on real-world traffic flow big data using the leave-one-out cross validation method shows that TFPC significantly outperforms four state-of-the-art prediction approaches, i.e., autoregressive integrated moving average, Naïve Bayes, multilayer perceptron neural networks, and NN regression, in terms of accuracy, which can be improved 90.07% in the best case, with an average mean absolute percent error of 5.53%. In addition, it displays excellent speedup, scaleup, and sizeup.
Abstract
The deduplication based on attribute-based encryption can be well used in eHealth systems to save storage space and share medical records. However, the excessive computation costs of existing schemes lead to inefficient deduplication. In addition, the frequent changes of clients’ attribute weaken the forward secrecy of data, and thus, how to achieve the attribute revocation in deduplication is a problem that remains to be solved. In this paper, we propose a variant of the attribute-based encryption scheme that supports efficient deduplication and attributes revocation for eHealth systems. Specifically, an efficient deduplication protocol based on the nature of prime number is used to alleviate the computation burden on the private cloud, and attribute revocation is realized by updating the attribute agent key and the ciphertext. Moreover, outsourcing decryption is introduced to reduce the computation overhead of clients. The security analysis argues that the proposed scheme can reach the desired security requirements, and the visual experiment result indicates the excellent performance of the proposed scheme while realizing deduplication and attribute revocation.
Abstract
In this paper, we propose a novel secure role re-encryption system (SRRS), which is based on convergent encryption and the role re-encryption algorithm to prevent the privacy data leakage in cloud and it also achieves the authorized deduplication and satisfies the dynamic privilege updating and revoking. Meanwhile, our system supports ownership checking and achieves the proof of ownership for the authorized users efficiently. Specifically, we introduce a management center to handle with the authorized request and establish a role authorized tree (RAT) mapping the relationship of the roles and keys. With the convergent encryption algorithm and the role re-encryption technique, it can be guaranteed that only the authorized user who has the corresponding role re-encryption key can access the specific file without any data leakage. Through role re-encryption key updating and revoking, our system achieves the dynamic updating of the authorized user’s privilege. Furthermore, we exploit the dynamic count filters (DCF) to implement the data updating and improve the retrieval of ownership verifying effectively. We conduct the security analysis and the simulation experiment to demonstrate the security and efficiency of our proposed system.
Abstract
As one important technique of fuzzy clustering in data mining and pattern recognition, the possibilistic c-means algorithm (PCM) has been widely used in image analysis and knowledge discovery. However, it is difficult for PCM to produce a good result for clustering big data, especially for heterogenous data, since it is initially designed for only small structured dataset. To tackle this problem, the paper proposes a high-order PCM algorithm (HOPCM) for big data clustering by optimizing the objective function in the tensor space. Further, we design a distributed HOPCM method based on MapReduce for very large amounts of heterogeneous data. Finally, we devise a privacy-preserving HOPCM algorithm (PPHOPCM) to protect the private data on cloud by applying the BGV encryption scheme to HOPCM, In PPHOPCM, the functions for updating the membership matrix and clustering centers are approximated as polynomial functions to support the secure computing of the BGV scheme. Experimental results indicate that PPHOPCM can effectively cluster a large number of heterogeneous data using cloud computing without disclosure of private data.
Abstract
The mission of subspace clustering is to find hidden clusters exist in different subspaces within a dataset. In recent years, with the exponential growth of data size and data dimensions, traditional subspace clustering algorithms become inefficient as well as ineffective while extracting knowledge in the big data environment, resulting in an emergent need to design efficient parallel distributed subspace clustering algorithms to handle large multi-dimensional data with an acceptable computational cost. In this paper, we introduce MR-Mafia: a parallel mafia subspace clustering algorithm based on MapReduce. The algorithm takes advantage of MapReduce’s data partitioning and task parallelism and achieves a good tradeoff between the cost for disk accesses and communication cost. The experimental results show near linear speedups and demonstrate the high scalability and great application prospects of the proposed algorithm.
Abstract
Personal health record (PHR) is a patient-centric model of health information exchange, which greatly facilitates the storage, access, and share of personal health information. In order to share the valuable resources and reduce the operational cost, the PHR service providers would like to store the PHR applications and health information data in the cloud. The private health information may be exposed to unauthorized organizations or individuals since the patient lost the physical control of their health information. Ciphertext-policy attribute-based signcryption is a promising solution to design a cloud-assisted PHR secure sharing system. It provides fine-grained access control, confidentiality, authenticity, and sender privacy of PHR data. However, a large number of pairing and modular exponentiation computations bring heavy computational overhead during designcryption process. In order to reconcile the conflict of high computational overhead and low efficiency in the designcryption process, an outsourcing scheme is proposed in this paper. In our scheme, the heavy computations are outsourced to ciphertext transformed server, only leaving a small computational overhead for the PHR user. At the same time, the extra communication overhead in our scheme is actually tolerable. Furthermore, theoretical analysis and the desired securing properties including confidentiality, unforgeability, and verifiability have been proved formally in the random oracle model. Experimental evaluation indicates that the proposed scheme is practical and feasible.
Abstract
Users store vast amounts of sensitive data on a big data platform. Sharing sensitive data will help enterprises reduce the cost of providing users with personalized services and provide value-added data services. However, secure data sharing is problematic. This paper proposes a framework for secure sensitive data sharing on a big data platform, including secure data delivery, storage, usage, and destruction on a semi-trusted big data sharing platform. We present a proxy re-encryption algorithm based on heterogeneous ciphertext transformation and a user process protection method based on a virtual machine monitor, which provides support for the realization of system functions. The framework protects the security of users’ sensitive data effectively and shares these data safely. At the same time, data owners retain complete control of their own data in a sound environment for modern Internet information security
Abstract
Due to the complexity and volume, outsourcing ciphertexts to a cloud is deemed to be one of the most effective approaches for big data storage and access. Nevertheless, verifying the access legitimacy of a user and securely updating a ciphertext in the cloud based on a new access policy designated by the data owner are two critical challenges to make cloud-based big data storage practical and effective. Traditional approaches either completely ignore the issue of access policy update or delegate the update to a third party authority; but in practice, access policy update is important for enhancing security and dealing with the dynamism caused by user join and leave activities. In this paper, we propose a secure and verifiable access control scheme based on the NTRU cryptosystem for big data storage in clouds. We first propose a new NTRU decryption algorithm to overcome the decryption failures of the original NTRU, and then detail our scheme and analyze its correctness, security strengths, and computational efficiency. Our scheme allows the cloud server to efficiently update the ciphertext when a new access policy is specified by the data owner, who is also able to validate the update to counter against cheating behaviors of the cloud. It also enables (i) the data owner and eligible users to effectively verify the legitimacy of a user for accessing the data, and (ii) a user to validate the information provided by other users for correct plaintext recovery. Rigorous analysis indicates that our scheme can prevent eligible users from cheating and resist various attacks such as the collusion attack.
Abstract
With the advent of cloud computing, secured data deduplication has gained a lot of popularity. Many techniques have been proposed in the literature of this ongoing research area. Among these techniques, the Message Locked Encryption (MLE) scheme is often mentioned. Researchers have introduced MLE based protocols which provide secured deduplication of data, where the data is generally in text form. As a result, multimedia data such as images and video, which are larger in size compared to text files, have not been given much attention. Applying secured data deduplication to such data files could significantly reduce the cost and space required for their storage. In this paper we present a secure deduplication scheme for near identical (NI) images using the Dual Integrity Convergent Encryption (DICE) protocol, which is a variant of the MLE based scheme. In the proposed scheme, an image is decomposed into blocks and the DICE protocol is applied on each block separately rather than on the entire image. As a result, the blocks that are common between two or more NI images are stored only once at the cloud. We provide detailed analyses on the theoretical, experimental and security aspects of the proposed scheme.
Abstract
The identification of social media communities has recently been of major concern, since users participating in such communities can contribute to viral marketing campaigns. In this work, we focus on users’ communication considering personality as a key characteristic for identifying communicative networks i.e., networks with high information flows. We describe the Twitter Personality based Communicative Communities Extraction (T-PCCE) system that identifies the most communicative communities in a Twitter network graph considering users’ personality. We then expand existing approaches in users’ personality extraction by aggregating data that represent several aspects of user behavior using machine learning techniques. We use an existing modularity based community detection algorithm and we extend it by inserting a post-processing step that eliminates graph edges based on users’ personality. The effectiveness of our approach is demonstrated by sampling the Twitter graph and comparing the communication strength of the extracted communities with and without considering the personality factor. We define several metrics to count the strength of communication within each community. Our algorithmic framework and the subsequent implementation employ the cloud infrastructure and use the MapReduce Programming Environment. Our results show that the T-PCCE system creates the most communicative communities.
Abstract
In the era of big data, many users and companies start to move their data to cloud storage to simplify data management and reduce data maintenance cost. However, security and privacy issues become major concerns because third-party cloud service providers are not always trusty. Although data contents can be protected by encryption, the access patterns that contain important information are still exposed to clouds or malicious attackers. In this paper, we apply the ORAM algorithm to enable privacy-preserving access to big data that are deployed in distributed file systems built upon hundreds or thousands of servers in a single or multiple geo-distributed cloud sites. Since the ORAM algorithm would lead to serious access load unbalance among storage servers, we study a data placement problem to achieve a load balanced storage system with improved availability and responsiveness. Due to the NP-hardness of this problem, we propose a low-complexity algorithm that can deal with large-scale problem size with respect to big data. Extensive simulations are conducted to show that our proposed algorithm finds results close to the optimal solution, and significantly outperforms a random data placement algorithm.
Abstract
The digitalization of mental health records and psychotherapy notes has made individual mental health data more readily accessible to a wide range of users including patients, psychiatrists, researchers, statisticians, and data scientists. However, increased accessibility of highly sensitive mental records threatens the privacy and confidentiality of psychiatric patients. The objective of this study is to examine privacy concerns in mental health research and develop a privacy preserving data analysis approach to address these concerns. In this paper, we demonstrate the key inadequacies of the existing privacy protection approaches applicable to use of mental health records and psychotherapy notes in records based research. We then develop a privacy-preserving data analysis approach that enables researchers to protect the privacy of people with mental illness once granted access to mental health records. Furthermore, we choose a demonstration project to show the use of the proposed approach. This paper concludes by suggesting practical implications for mental health researchers and future research in the field of privacy-preserving data analytics.
Abstract
Spurred by service computing and cloud computing, an increasing number of services are emerging on the Internet. As a result, service-relevant data become too big to be effectively processed by traditional approaches. In view of this challenge, a clustering-based collaborative filtering approach is proposed in this paper, which aims at recruiting similar services in the same clusters to recommend services collaboratively. Technically, this approach is enacted around two stages. In the first stage, the available services are divided into small-scale clusters, in logic, for further processing. At the second stage, a collaborative filtering algorithm is imposed on one of the clusters. Since the number of the services in a cluster is much less than the total number of the services available on the web, it is expected to reduce the online execution time of collaborative filtering. At last, several experiments are conducted to verify the availability of the approach, on a real data set of 6225 mashup services collected from Programmable Web.
Abstract
The explosive growth of demands on big data processing imposes a heavy burden on computation, storage, and communication in data centers, which hence incurs considerable operational expenditure to data center providers. Therefore, cost minimization has become an emergent issue for the upcoming big data era. Different from conventional cloud services, one of the main features of big data services is the tight coupling between data and computation as computation tasks can be conducted only when the corresponding data are available. As a result, three factors, i.e., task assignment, data placement, and data movement, deeply influence the operational expenditure of data centers. In this paper, we are motivated to study the cost minimization problem via a joint optimization of these three factors for big data services in geo-distributed data centers. To describe the task completion time with the consideration of both data transmission and computation, we propose a 2-D Markov chain and derive the average task completion time in closed-form. Furthermore, we model the problem as a mixed-integer nonlinear programming and propose an efficient solution to linearize it. The high efficiency of our proposal is validated by extensive simulation-based studies.
Abstract
Service recommender systems have been shown as valuable tools for providing appropriate recommendations to users. In the last decade, the amount of customers, services and online information has grown rapidly, yielding the big data analysis problem for service recommender systems. Consequently, traditional service recommender systems often suffer from scalability and inefficiency problems when processing or analysing such large-scale data. Moreover, most of existing service recommender systems present the same ratings and rankings of services to different users without considering diverse users’ preferences, and therefore fails to meet users’ personalized requirements. In this paper, we propose a Keyword-Aware Service Recommendation method, named KASR, to address the above challenges. It aims at presenting a personalized service recommendation list and recommending the most appropriate services to the users effectively. Specifically, keywords are used to indicate users’ preferences, and a user-based Collaborative Filtering algorithm is adopted to generate appropriate recommendations. To improve its scalability and efficiency in big data environment, KASR is implemented on Hadoop, a widely-adopted distributed computing platform using the MapReduce parallel processing paradigm. Finally, extensive experiments are conducted on real-world data sets, and results demonstrate that KASR significantly improves the accuracy and scalability of service recommender systems over existing approaches.
Abstract
Cloud computing opens a new era in IT as it can provide various elastic and scalable IT services in a pay-as-you-go fashion, where its users can reduce the huge capital investments in their own IT infrastructure. In this philosophy, users of cloud storage services no longer physically maintain direct control over their data, which makes data security one of the major concerns of using cloud. Existing research work already allows data integrity to be verified without possession of the actual data file. When the verification is done by a trusted third party, this verification process is also called data auditing, and this third party is called an auditor. However, such schemes in existence suffer from several common drawbacks. First, a necessary authorization/authentication process is missing between the auditor and cloud service provider, i.e., anyone can challenge the cloud service provider for a proof of integrity of certain file, which potentially puts the quality of the so-called ‘auditing-as-a-service’ at risk; Second, although some of the recent work based on BLS signature can already support fully dynamic data updates over fixed-size data blocks, they only support updates with fixed-sized blocks as basic unit, which we call coarse-grained updates. As a result, every small update will cause re-computation and updating of the authenticator for an entire file block, which in turn causes higher storage and communication overheads. In this paper, we provide a formal analysis for possible types of fine-grained data updates and propose a scheme that can fully support authorized auditing and fine-grained update requests. Based on our scheme, we also propose an enhancement that can dramatically reduce communication overheads for verifying small updates.
Abstract
Due to the high volume and velocity of big data, it is an effective option to store big data in the cloud, because the cloud has capabilities of storing big data and processing high volume of user access requests. Attribute-Based Encryption (ABE) is a promising technique to ensure the end-to-end security of big data in the cloud. However, the policy updating has always been a challenging issue when ABE is used to construct access control schemes. A trivial implementation is to let data owners retrieve the data and re-encrypt it under the new access policy, and then send it back to the cloud. This method incurs a high communication overhead and heavy computation burden on data owners. In this paper, we propose a novel scheme that enabling efficient access control with dynamic policy updating for big data in the cloud. We focus on developing an outsourced policy updating method for ABE systems. Our method can avoid the transmission of encrypted data and minimize the computation work of data owners, by making use of the previously encrypted data with old access policies. Moreover, we also design policy updating algorithms for different types of access policies. The analysis show that our scheme is correct, complete, secure and efficient.
Abstract
Big data has emerged as a new era of information generation and processing. Big data applications are expected to provide a lot of benefits and convenience to our lives. Cloud computing is a popular infrastructure that has the resources for big data processing. As the number of mobile devices is fast increasing, mobile cloud computing is becoming an important part of many big data applications. In this article, we propose a novel MapReduce-based framework to process geo-dispersed big data in mobile cloud architecture. The proposed framework supports simple as well as complex operations on geo-dispersed big data, and uses various data aggregation schemes to satisfy different application requirements.
Abstract
Cloud computing offers a new way of service provision by re-arranging various resources over the Internet. The most important and popular cloud service is data storage. In order to preserve the privacy of data holders, data are often stored in cloud in an encrypted form. However, encrypted data introduce new challenges for cloud data deduplication, which becomes crucial for big data storage and processing in cloud. Traditional deduplication schemes cannot work on encrypted data. Existing solutions of encrypted data deduplication suffer from security weakness. They cannot flexibly support data access control and revocation. Therefore, few of them can be readily deployed in practice. In this paper, we propose a scheme to deduplicate encrypted data stored in cloud based on ownership challenge and proxy re-encryption. It integrates cloud data deduplication with access control. We evaluate its performance based on extensive analysis and computer simulations. The results show the superior efficiency and effectiveness of the scheme for potential practical deployment, especially for big data deduplication in cloud storage.
Abstract
One of the key objectives in accident data analysis to identify the main factors associated with a road and traffic accident. However, heterogeneous nature of road accident data makes the analysis task difficult. Data segmentation has been used widely to overcome this heterogeneity of the accident data. In this paper, we proposed a framework that used K-modes clustering technique as a preliminary task for segmentation of 11,574 road accidents on road network of Dehradun (India) between 2009 and 2014 (both included). Next, association rule mining are used to identify the various circumstances that are associated with the occurrence of an accident for both the entire data set (EDS) and the clusters identified by K-modes clustering algorithm. The findings of cluster based analysis and entire data set analysis are then compared. The results reveal that the combination of k mode clustering and association rule mining is very inspiring as it produces important information that would remain hidden if no segmentation has been performed prior to generate association rules. Further a trend analysis have also been performed for each clusters and EDS accidents which finds different trends in different cluster whereas a positive trend is shown by EDS. Trend analysis also shows that prior segmentation of accident data is very important before analysis.
Abstract
There is a growing trend towards attacks on database privacy due to great value of privacy information stored in big data set. Public’s privacy are under threats as adversaries are continuously cracking their popular targets such as bank accounts. We find a fact that existing models such as K-anonymity, group records based on quasi-identifiers, which harms the data utility a lot. Motivated by this, we propose a sensitive attribute-based privacy model. Our model is the early work of grouping records based on sensitive attributes instead of quasi-identifiers which is popular in existing models. Random shuffle is used to maximize information entropy inside a group while the marginal distribution maintains the same before and after shuffling, therefore, our method maintains a better data utility than existing models. We have conducted extensive experiments which confirm that our model can achieve a satisfying privacy level without sacrificing data utility while guarantee a higher efficiency.
Abstract
Given a set of files that show a certain degree of similarity, we consider a novel problem of performing data redundancy elimination across a set of distributed worker nodes in a shared-nothing in-memory big data analytic system. The redundancy elimination scheme is designed in a manner that is: (i) space-efficient: the total space needed to store the files is minimized and, (ii) access-isolation: data shuffling among server is also minimized. In this paper, we first show that finding an access-efficient and space optimal solution is an NP-Hard problem. Following this, we present the file partitioning algorithms that locate access-efficient solutions in an incremental manner with minimal algorithm time complexity (polynomial time). Our experimental verification on multiple data sets confirms that the proposed file partitioning solution is able to achieve compression ratio close to the optimal compression performance achieved by a centralized solution.
IEEE Enabling Efficient User Revocation in Identity-Based Cloud Storage Auditing for Shared Big Data
Abstract
Cloud storage auditing schemes for shared data refer to checking the integrity of cloud data shared by a group of users. User revocation is commonly supported in such schemes, as users may be subject to group membership changes for various reasons. Previously, the computational overhead for user revocation in such schemes is linear with the total number of file blocks possessed by a revoked user. The overhead, however, may become a heavy burden because of the sheer amount of the shared cloud data. Thus, how to reduce the computational overhead caused by user revocations becomes a key research challenge for achieving practical cloud data auditing. In this paper, we propose a novel storage auditing scheme that achieves highly-efficient user revocation independent of the total number of file blocks possessed by the revoked user in the cloud. This is achieved by exploring a novel strategy for key generation and a new private key update technique. Using this strategy and the technique, we realize user revocation by just updating the non-revoked group users’ private keys rather than authenticators of the revoked user. The integrity auditing of the revoked user’s data can still be correctly performed when the authenticators are not updated. Meanwhile, the proposed scheme is based on identity-base cryptography, which eliminates the complicated certificate management in traditional Public Key Infrastructure (PKI) systems. The security and efficiency of the proposed scheme are validated via both analysis and experimental results.
Abstract
This paper presents result analysis of K-Mediod algorithm, implemented on Hadoop Cluster by using Map-Reduce concept. Map-Reduce are programming models which authorize the managing of huge datasets in parallel, on a large number of devices. It is especially well suited to constant or moderate changing set of data since the implementation point of a position is usually high. MapReduce is supposed to be framework of “big data”. The MapReduce model authorizes for systematic and instant organizing of large scale data with a cluster of evaluate nodes. One of the primary affect in Hadoop is how to minimize the completion length (i.e., make span) of a set of MapReduce duty. For various applications like word count, grep, terasort and parallel K-Mediod Clustering Algorithm, it has been observed that as the number of node increases, execution time decreases. In this paper we verified Map Reduce applications and found as the amount of nodes increases the completion time decreases.
Abstract
The advancements in computer systems and networks have created a new environment for criminal acts, widely known as cybercrime. Cybercrime incidents are occurrences of particular criminal offences that pose a serious threat to the global economy, safety, and well-being of society. This paper offers a comprehensive understanding of cybercrime incidents and their corresponding offences combining a series of approaches reported in relevant literature. Initially, this paper reviews and identifies the features of cybercrime incidents, their respective elements and proposes a combinatorial incident description schema. The schema provides the opportunity to systematically combine various elements-or cybercrime characteristics. Additionally, a comprehensive list of cybercrime-related offences is put forward. The offences are ordered in a two-level classification system based on specific criteria to assist in better classification and correlation of their respective incidents. This enables a thorough understanding of the repeating and underlying criminal activities. The proposed system can serve as a common reference overtaking obstacles deriving from misconceptions for cybercrimes with cross-border activities. The proposed schema can be extended with a list of recommended actions, corresponding measures and effective policies that match with the offence type and subsequently with a particular incident. This matching will enable better monitoring, handling and moderate cybercrime incident occurrences. The ultimate objective is to incorporate the schema-based description of cybercrime elements to a complete incident management system with standard operating procedures and protocols.
Abstract
Cloud computing plays a significant role in big data era since it can provide dynamic, scalable virtual resource services via the Internet. However, how to enhance the security level of cloud computing is a challenging issue which is urgently to be tackled. In this paper, we center on the data security in cloud computing and present an attribute based proxy re-encryption scheme with keyword search (ABPRE-KS) to provide flexible and secure data sharing among users in the cloud. In our scheme, a user’s access privileges are described by an access structure consisting of several attributes while ciphertexts are labeled by several target attributes. A delegator can transform the original ciphertexts into proxy ciphertexts encrypted by the delegatee’s attributes without leaking any sensitive information to the cloud server. Besides, a search request on the ciphertexts is allowed by a delegatee if his credentials satisfy the delegatee’s access policy. By security analysis, our ABPRE-KS is confidential and keyword semantic secure under BDBH assumption.
Abstract
Spam has become the platform of choice used by cyber-criminals to spread malicious payloads such as viruses and trojans. In this paper, we consider the problem of early detection of spam campaigns. Collaborative spam detection techniques can deal with large scale e-mail data contributed by multiple sources; however, they have the well-known problem of requiring disclosure of e-mail content. Distance-preserving hashes are one of the common solutions used for preserving the privacy of e-mail content while enabling message classification for spam detection. However, distance-preserving hashes are not scalable, thus making large-scale collaborative solutions difficult to implement. As a solution, we propose Spamdoop, a Big Data privacy-preserving collaborative spam detection platform built on top of a standard Map Reduce facility. Spamdoop uses a highly parallel encoding technique that enables the detection of spam campaigns in competitive times. We evaluate our system’s performance using a huge synthetic spam base and show that our technique performs favorably against the creation and delivery overhead of current spam generation tools.
Abstract
With the globalization of service, organizations continuously produce large volumes of data that need to be analysed over geo-dispersed locations. Traditionally central approach that moving all data to a single cluster is inefficient or infeasible due to the limitations such as the scarcity of wide-area bandwidth and the low latency requirement of data processing. Processing big data across geo-distributed datacenters continues to gain popularity in recent years. However, managing distributed MapReduce computations across geo-distributed datacenters poses a number of technical challenges: how to allocate data among a selection of geo-distributed datacenters to reduce the communication cost, how to determine the Virtual Machine (VM) provisioning strategy that offers high performance and low cost, and what criteria should be used to select a datacenter as the final reducer for big data analytics jobs. In this paper, these challenges is addressed by balancing bandwidth cost, storage cost, computing cost, migration cost, and latency cost, between the two MapReduce phases across datacenters. We formulate this complex cost optimization problem for data movement, resource provisioning and reducer selection into a joint stochastic integer nonlinear optimization problem by minimizing the five cost factors simultaneously. The Lyapunov framework is integrated into our study and an efficient online algorithm that is able to minimize the long-term time-averaged operation cost is further designed. Theoretical analysis shows that our online algorithm can provide a near optimum solution with a provable gap and can guarantee that the data processing can be completed within pre-defined bounded delays. Experiments on WorldCup98 web site trace validate the theoretical analysis results and demonstrate that our approach is close to the offline-optimum performance and superior to some representative approaches.
Abstract
Question and Answer (Q&A) systems play a vital role in our daily life for information and knowledge sharing. Users post questions and pick questions to answer in the system. Due to the rapidly growing user population and the number of questions, it is unlikely for a user to stumble upon a question by chance that (s)he can answer. Also, altruism does not encourage all users to provide answers, not to mention high quality answers with a short answer wait time. The primary objective of this paper is to improve the performance of Q&A systems by actively forwarding questions to users who are capable and willing to answer the questions. To this end, we have designed and implemented SocialQ&A, an online social network based Q&A system. SocialQ&A leverages the social network properties of common-interest and mutual-trust friend relationship to identify an asker through friendship who are most likely to answer the question, and enhance the user security. We also improve SocialQ&A with security and efficiency enhancements by protecting user privacy and identifies, and retrieving answers automatically for recurrent questions. We describe the architecture and algorithms, and conducted comprehensive large-scale simulation to evaluate SocialQ&A in comparison with other methods. Our results suggest that social networks can be leveraged to improve the answer quality and asker’s waiting time. We also implemented a real prototype of SocialQ&A, and analyze the Q&A behavior of real users and questions from a small-scale real-world SocialQ&A system.
2023 IEEE Big Data Projects | 2023 IEEE Big Data Projects for CSE | 2023 IEEE Big Data Projects for ISE | 2023 IEEE Big Data Projects for EEE | 2023 IEEE Big Data Projects for ECE | final year 2023 IEEE Big Data Projects | final year 2023 IEEE Big Data Projects for CSE | final year 2023 IEEE Big Data Projects for ISE | final year 2023 IEEE Big Data Projects for EEE | final year 2023 IEEE Big Data Projects for ECE | Top 2023 IEEE Big Data Projects | Top 2023 IEEE Big Data Projects for CSE | Top 2023 IEEE Big Data Projects for ISE | Top 2023 IEEE Big Data Projects for EEE | Top 2023 IEEE Big Data Projects for ECE | Latest 2023 IEEE Big Data Projects | Latest 2023 IEEE Big Data Projects for CSE | Latest 2023 IEEE Big Data Projects for ISE | Latest 2023 IEEE Big Data Projects for EEE | Latest 2023 IEEE Big Data Projects for ECE | 2023 IEEE Android Big Data for M-Tech | 2023 IEEE Big Data Projects for BE | 2023 IEEE Big Data Projects for MCA | 2023 IEEE Big Data Projects for Diploma | 2023 IEEE Big Data Projects for BCA
2023 IEEE Big Data Projects online | 2023 CSE IEEE Big Data Projects online | 2023 ISE IEEE Big Data Projects | EEE 2023 IEEE Big Data Projects Online| 2023 ECE IEEE Big Data Projects Online | final year 2023 IEEE Big Data Projects online | final year 2023 IEEE Big Data Projects online for CSE | final year 2023 IEEE Big Data Projects online for ISE | final year 2023 IEEE Big Data Projects online for EEE | final year 2023 IEEE Big Data Projects online for ECE | Top 2023 IEEE Big Data Projects online | Top 2023 IEEE Big Data Projects online for CSE | Top 2023 IEEE Big Data Projects online for ISE | Top 2023 IEEE Big Data Projects online for EEE | Top 2023 IEEE Big Data Projects online for ECE | Latest 2023 IEEE Big Data Projects online | Latest 2023 IEEE Big Data Projects online for CSE | Latest 2023 IEEE Big Data Projects online for ISE | Latest 2023 IEEE Big Data Projects online for EEE | Latest 2023 IEEE Big Data Projects online for ECE | 2023 IEEE Big Data Projects online for M-Tech | 2023 IEEE Big Data Projects online for BE | 2023 IEEE Big Data Projects online for MCA | 2023 IEEE Big Data Projects online for Diploma | 2023 IEEE Big Data Projects online for BCA |
2023 IEEE Big Data Projects | 2023 IEEE Big Data Projects for CSE | 2023 IEEE Big Data Projects for ISE | 2023 IEEE Big Data Projects for EEE | 2023 IEEE Big Data Projects for ECE | final year 2023 IEEE Big Data Projects | final year 2023 IEEE Big Data Projects for CSE | final year 2023 IEEE Big Data Projects for ISE | final year 2023 IEEE Big Data Projects for EEE | final year 2023 IEEE Big Data Projects for ECE | Top 2023 IEEE Big Data Projects | Top 2023 IEEE Big Data Projects for CSE | Top 2023 IEEE Big Data Projects for ISE | Top 2023 IEEE Big Data Projects for EEE | Top 2023 IEEE Big Data Projects for ECE | Latest 2023 IEEE Big Data Projects | Latest 2023 IEEE Big Data Projects for CSE | Latest 2023 IEEE Big Data Projects for ISE | Latest 2023 IEEE Big Data Projects for EEE | Latest 2023 IEEE Big Data Projects for ECE | 2023 IEEE Android Big Data for M-Tech | 2023 IEEE Big Data Projects for BE | 2023 IEEE Big Data Projects for MCA | 2023 IEEE Big Data Projects for Diploma | 2023 IEEE Big Data Projects for BCA
2023 IEEE Big Data Projects online | 2023 CSE IEEE Big Data Projects online | 2023 ISE IEEE Big Data Projects | EEE 2023 IEEE Big Data Projects Online| 2023 ECE IEEE Big Data Projects Online | final year 2023 IEEE Big Data Projects online | final year 2023 IEEE Big Data Projects online for CSE | final year 2023 IEEE Big Data Projects online for ISE | final year 2023 IEEE Big Data Projects online for EEE | final year 2023 IEEE Big Data Projects online for ECE | Top 2023 IEEE Big Data Projects online | Top 2023 IEEE Big Data Projects online for CSE | Top 2023 IEEE Big Data Projects online for ISE | Top 2023 IEEE Big Data Projects online for EEE | Top 2023 IEEE Big Data Projects online for ECE | Latest 2023 IEEE Big Data Projects online | Latest 2023 IEEE Big Data Projects online for CSE | Latest 2023 IEEE Big Data Projects online for ISE | Latest 2023 IEEE Big Data Projects online for EEE | Latest 2023 IEEE Big Data Projects online for ECE | 2023 IEEE Big Data Projects online for M-Tech | 2023 IEEE Big Data Projects online for BE | 2023 IEEE Big Data Projects online for MCA | 2023 IEEE Big Data Projects online for Diploma | 2023 IEEE Big Data Projects online for BCA |
We’re providing the best 2023 Final Year Latest IEEE Big Data Projects for CSE in online students who can draw inspiration to work on your projects. We encourage finishing numerous projects to master the various capabilities and functionalities of Big Data. We have provided projects of various skill levels so you may select according to your competence. Use the guide and follow along based on your knowledge and expertise. Whether you are new, intermediate, or advanced, we have 2023 Final Year Latest IEEE Big Data Projects for CSE in online students and all levels of learners.