Xinhua Dong,Ruixuan Li,Wanwan Zhou,Dongjie Liao,and Shuoyi Zhao
(School of Computer Scienceand Technology,HuaZhong University of Scienceand Technology,Wuhan 430074,China)
Abstract In this paper,we survey data security and privacy problems created by cloud storage applications and propose a cloud storage security architecture.Wediscussstate-of-the-art techniquesfor ensuringtheprivacy and security of data stored in thecloud.Wediscusspol?icies for access control and data integrity,availability,and privacy.We also discuss several key solutions proposed in current litera?ture and point out futureresearch directions.
Keyw ords cloud storage;cloud computing;data security;privacy-preserving
With the recent development of information tech?nology,various types of cloud storage plat?forms have appeared.These platforms are of?ten convenient,scalable,and cost-effective,so cloud computing services are widely used.Amazon's EC2/S3,Google's MapReduce/AppEngine,Microsoft's Azure,IBM's Blue Cloud and Salesforce's CRM are well-known cloud ser?vice platforms.In China,the cloud platforms of Sinochem Group and Wuxi and Dongying municipal governments have appeared.At thesametime,thenational strategy for cloud stor?age and utility computing has been developing.Utility comput?ing service models based on cloud storage have many new characteristics.New models based on cloud storage use a vari?ety of techniquesdesigned totightly manageresourcesand pro?vide users with flexible services.These new service models are likely to have a knock-on effect and cause significant changes within the industry.They will give rise to many new security issues that need to be addressed.At present,cloud storage is only really applied in scenarios where a high level of security isnot required.Privacy and security aresignificant ob?stacles in the development of utility computing based on cloud storage.Security vulnerabilities of cloud service providers such as Amazon,Google,and Microsoft are widely publicized,and much attention has been drawn to these vulnerabilities.In[1],cloud security and privacy issues are detailed.The Cloud Security Alliance(CSA)has made recommendations for solv?ing problems in cloud computing applications.In[2],the Euro?pean Network and Information Security Agency(ENISA)de?tailed the risks to data and benefits of data security in cloud computing applications.In[3],EMCCorporation's RSA Secu?rity Division analyzed the most basic security issues in the cloud infrastructure.Domestic Chinese counterparts have also widely discussed theissue of cloud security[4].Satisfactory so?lutions to cloud security and privacy problems will be a strong driving force for the overall development of cloud storage ser?vices.Solving these problems is also theoretically and practi?cally important to vigorously promote the national digital infra?structure and national information security.
Cloud computing involves the development of distributed processing,parallel processing,and grid computing.With cloud computing,huge computing programs are automatically split into smaller subroutines via the network.Processing and analysis is referred to multiple servers in a large system,and results are returned to the user.The network service provider can process massive amounts of information in seconds—a ca?pability that is equal to that of powerful supercomputer net?works.
Cloud storage is an extension of cloud computing and is one of a large variety of storage devices found in networks.Through the use of application software and clustering,grid technology,distributed file system,or other functions,cloud storage works with other storage devices to provide a service access function and a whole data-storage system.When the cloud processing core requires large-scale data storage and management,a large number of storage devices need to be con?figured,and a cloud storage system needs to be set up.Thus,cloud storage is a cloud computing system in which data stor?ageand management isat thecore.
Cloud storage systems differ from traditional storage systems in that they are designed for multiple types of online storage services.Traditional storage systems are designed for specific applications,such as high-performance computing or transac?tion processing.In terms of performance indexing,cloud stor?age services are primarily indexed according to data security,reliability,and efficiency.Cloud storage systems are suitable for large-scale applications,a wide range of services,and com?plex network environments.In terms of data management,cloud storage systems provide traditional file access similar to POSIX,but cloud storage also supports mass data management and public services.
Cloud storage is effective,flexible,low-cost,and easy to manage.Cloud storage can be used in either enterprise applica?tions or personal applications,depending on the type of service and user orientation.
In enterprises,cloud storage is used for storage space leas?ing,remote data backup,disaster recovery,and video surveil?lance.With an IDC data center,operators can lease storage space to enterprises and institutions that do not wish to pur?chase mass-storage devices.A high-performance,high-capac?ity cloud storage system and remote data backup software also allows an operator to help enterprises and institutions build their own remote-backup and disaster-recovery systems as well as remote real-time video surveillance and playback sys?tems.
Cloud storage applications for individuals include network disks,online document editing,and online games.Network disksareused toupload and download fileswhen auser isstor?ing and backing up personal data over the internet.With on?line document editing,a user can simply access a web page,such as Google Docs,to edit,manage,and transmit documen?tation.Cloud computing and storage can also be used to build a huge game server cluster so that all players are managed as a gameserver group,and gamingbecomesmoreexciting.
Because cloud storage is large-scale,complex,and dynam?ic,it creates many new security and privacy problems.These problems include unauthorized access to data,threats to confi?dentiality,threats to data integrity,unavailability of data,and lack of privacy.
To prevent unauthorized access to the storage system,a pro?vider needs to confirm the user's identity and verify whether that user has the permission to access resources or perform certain operations.The user submits an access request to the cloud storageprovider through storageaccessinterfaces.
When using a cloud storage service,a user uploads their lo?cal data to the storage server and downloads the data when they need it.In this process,data passes through a public cloud,private cloud,and internet transmission line.Data may be stolen or altered when stored in the server.In this case,the confidentiality of theuser'sdataiscompromised.
In a both traditional and cloud storage environments,data in?tegrity is remotely verified to prevent data tampering and coun?terfeiting.However,in a cloud environment,users do not have absolute control over their data,and there is a greater need to verify the integrity of the data.When the user updates their da?ta in the cloud,the server must promptly update the data.Therefore,real-time verification of integrity is essential.
Preventing data from being lost and ensuring the sustained,effective use of data are important responsibilities of a cloud storage provider.This requires the provision of strategies that ensuredataavailability,backup,and recovery.
Users often disclose personal information,such as credit card numbers or user name,when purchasing storage services.These need to be protected.A user's digital identity,certifi?cates,access,and operation records also need to be protected.A storage service provider needs mechanisms to guarantee the privacy of user information.
In[5],a secure cloud storage architecture is proposed.This architecture comprises business application layer,application interface layer,platform software layer,and infrastructure lay?er(Fig.1).These layers provide information service manage?ment,statistical analysis,and a variety of safety measures.
In the access layer,an authorized user can log on to the cloud storage system via a standard public application inter?face to use cloud storage services.Different cloud storage pro?viders have different types of access methods.The application interface layer is the SaaSlayer of cloud computing services.Different cloud storage providers can develop different applica?tion interfaces and provide different application services based on the type of business.However,security problems may arise in access control and single sign on(SSO).The infrastructure management layer is the core of the cloud.It includes clusters,distributed file systems,and grid computing and allowscooper?ation between storage devices in the cloud.Content distribu?tion systems and encryption of stored data ensures that data in the cloud is not accessed by unauthorized users.Various da?ta-backup and disaster-recovery measures ensure that data stored in the cloud is not lost.In this layer,there may be prob?lems with data confidentiality,integrity,availability,and intru?sion.The data-storage layer is the fundamental part of cloud storage.In the data-storage layer,huge amounts of data are managed in aunified way.Thedata-storageisalsoused for vir?tual management of storage,monitoring hardware,and fixing faults.In this layer,security problems arise in virtualization and in the firewall.

▲Figure1.Architecturefor ensuring security and privacy in a cloud storagesystem.
For a user uploading their data to a cloud storage system,privacy mainly depends on whether the cloud storage provider supports encryption,differential privacy protection,compulso?ry destruction of agreement,or data privacy management.However,cloud storage providers may also snoop on a user's data or analyze the privacy preferences of users.To fully pro?tect user privacy,data needs to be encrypted,anonymized,or scrambled beforebeinguploaded tothecloud.
Existing technologies for controlling access to data,encrypt?ing data,and ensuring data integrity and availability have been transplanted to cloud storage.Experts have proposed different models for ensuring data security in different cloud systems.The key technologies in these models relate to access control,data confidentiality,data integrity authentication,and data availability.
To control access to encrypted data,the owner of the data maintains encryption keys and manually sends them to users who want access to the data.This has become a bottleneck in the cloud storage environment.In[6],a new cryptographic access control scheme,called attribute-based ac?cess control for cloud storage(AB-ACCS),was proposed.Each user's private key is labeled with a set of attributes,and data is en?crypted with an attribute condition so that the user can only decrypt the data if their attributes satisfy the data's condition.In[7],the au?thors consider the complexity of fine-grained access control for a large number of users in the cloud and propose a secure and efficient revocation scheme based on a mod?ified ciphertext-policy attri?bute-based encryption(CP-ABE)algorithm.This algorithm is used to establish fine-grained access con?trol in which users are revoked ac?cording to Shamir's theory of se?cret sharing.With a single sign-on(SSO),any authorized user can log in to the cloud storage system through a standard common appli?cation interface.
In a cloud storage environment,internal administrator's privileged mode is potentially a serious threat to user privacy data.To guarantee data privacy when administrator privileged mode is used,a variety of protection methods have been pro?posed.Attribute-based encryption(ABE)includes key-policy attribute-based encryption(KP-ABE)[8]as well as CP-ABE[9].In ABE,decryption rules are contained in the encryption algorithm,and frequent distribution of keys in the access-con?trol-based ciphertext is unnecessary.When the access control policy is changed,the data owner encrypts the data again.In[10],a method based on proxy re-encryption is proposed.A semi-trusted agent with proxy key can re-encrypt a ciphertext;however,the agent cannot gain the corresponding plaintext or compute the decryption key of either party in the authorization process[11].In[12],a fully homomorphic encryption(FHE)mechanismisproposed.FHEpermitsa specific algebraic oper?ation based on ciphertext,and the result is still encrypted.That is to say,retrieval and comparison of the encrypted data ends with the correct results,but the data is not decrypted throughout the whole process.The FHE scheme requires a huge amount of computation and is not always easy to imple?ment with existing technology.
In[13],proofs of retrievability(POR)are proposed.A POR scheme allows an archive or backup service(prover)to pro?duce a concise proof that a user(verifier)can retrieve a target file.The POR model verifies data integrity without the user having to download f iles themselves.A drawback of our pro?posed POR scheme is that the target file requires processing prior to being stored with the prover.This step creates compu?tational overhead and increases the storage requirements on the prover.In addition,the POR scheme is based on static files,so the scope of its application is smaller.In[14],a flexi?ble distributed scheme based on POR is proposed.This scheme assumes that operations are dynamic and data integrity is publicly verified.With Merkle Harsh Tree(MHT),the new scheme supports secure,efficient update,deleting,and ap?pending of data blocks.The improved scheme supports public data-integrity verification and authorized third-party integrity verification.
A variety of data backup and disaster recovery measures guarantee that data stored in cloud is not be lost.These mea?sures ensure that cloud storage is secure and stable.To coun?ter theft of legacy data,the United States Department of De?fense proposes reset and special processing[15].In[16],a new scheme called Safe Vanish is proposed.This scheme pre?vents hopping attacks by extending the length of key shares and significantly increasing the cost of mounting an attack.Theauthorsof[16]alsoproposeusingthepublic key cryptosys?temtoprotect against sniffing.
Access control technology is more suited to a fine-grained cloud storage environment,and this creates large overhead.Data confidentiality is ensured by a variety of encryption meth?ods,and confidentiality is considered in relation to access con?trol.A data integrity verification scheme eliminates user con?cern throughout the data storage process.Researchers have paid attention to availability but are now also paying attention to security throughout the storage process.Security mecha?nisms create efficiency problems,and the trade-off between security and efficiency need tobefurther researched.
In a cloud storage system,privacy can be lost because of da?ta outsourcing and service leasing.User data is stored in the cloud environment and ismanaged by the cloud storage provid?er.The security of this data depends on the level of technology used by theserviceproviders.
In[17],a system called Arivat is proposed.This system is based on MapReduce and is designed to provide strong securi?ty and privacy for distributed computations on sensitive data.Mandatory access control and differential privacy are integrat?ed in a novel way.In[18],the author proposes privacy manage?ment across the whole data lifecycle and uses a mandatory data destruction protocol to control user data.Dissolver is a proto?type system based on Xen virtual machine monitor and CHA?OSsystem[18].It ensures that the user's text data only exists in a private operating space and the user's key only exists in the memory space of the virtual machine monitor.Data in the memory and the user's key are destroyed at a time specified by theuser.
The system ensures the server-side privacy of user data throughout the data's lifecycle.In[19],a cloud storage frame?work is proposed to ensure data privacy and security.This framework has a multitree structure for indexing.An extirpa?tion-based key derivation algorithm(EKDA)is used for key management,and discrete algorithm-based search on encrypt?ed keyword(DLSEK)isused for data sharing and ciphertext re?trieval.Lazy revocation is incorporated into the framework to deal with changesin user accessrightsand dynamic dataoper?ations.In[20],a mechanism with differential privacy is incor?porated in the Map-Reduce computation model to analyze ser?vice efficiency and security of the mass data.A decision-tree generation algorithm is also incorporated into the computation?al model.Together,thesemeasures satisfyε-differential priva?cy.
Generally speaking,users distrust or only partly trust the cloud storage environment because as storage“tenants,”they lack complete control over their data.A service provider has the potential to violate the privacy of user data,so the data needs to be processed before being uploaded to the cloud.Typ?ically,data is encrypted,obfuscated,or anonymized before be?inguploaded.
Encrypting data negatively affects the processing of the da?ta.Improving the speed and efficiency of ciphertext processing and retrieval is the focus of current research.In[21]-[23],the authors have done extended research on privacy preservation in the cloud and propose ciphertext retrieval solutions.In[24],a computable encryption scheme based on vector and matrix calculations(CESVMC)is proposed.In this scheme,cloud da?ta is divided into two main categories:string and numeric.En?crypted strings can be retrieved using fuzzy retrieval,and the four basic arithmetic operations can be performed on numeric data.
Anonymous technology includes k-anonymity,L-diversity anonymous,and T-closeness anonymous.K-anonymity guar?antees that each sensitive attribute is hidden in the scale of k groups[25].This means that the probability of recognizing the individual does not exceed 1/k.The level of privacy depends on the size of k.The statistical characteristics of the data are retained asmuch as possible;however,k-anonymity is not on?ly applicable to sensitive data.An attacker could mount a con?sistency attack or background-knowledge attack to confirm a link between sensitive data and personal data.This would con?stitute a breach of privacy.L-diversity anonymous ensures that each group's sensitive attributes have at least L different values[26].This means that an attack has a maximum proba?bility of 1/L of recognizing a user's sensitive information.T-closeness anonymous is based on L-diversity anonymous[27].In T-closeness anonymous,the distribution of the sensi?tive attribute is taken into account,and the distribution differ?ences between sensitive properties and values in groups does not exceed T.Anonymous technology is mainly used for data?base privacy,location privacy,and trajectory privacy,but we proposeapplyingit cloud storageprivacy.
In[28],a privacy manager that scrambles user data in the client is proposed.The privacy manager protects and monitors privacy according to the user's preferences.The privacy-pre?serving method in[24]supports data dyeing based on the nor?mal cloud model in[10].This method can be used to protect documents,images,videos,software,and other types of data.It also involves much less computation than traditional encryp?tion or decryption.In[29],a novel privacy-preserving da?ta-perturbation algorithm NETPA is proposed for clustering.The primitive data set can be perturbed by changing the value of neighboring main attributes,which is found in each data ob?ject,with the average attribute value of data objects in the data set's k-nearest neighborhood.This perturbation strategy is used to maintain stable k-nearest neighbor relations in primi?tive data.NETPA effectively stops privacy breaches.In[30],a novel data privacy protection mechanism based on partitioning and classification was proposed.The mechanism partitions the original data into a small,locally deployed block and a large,remotely deployed block.Then,data dyeing and data encryp?tion are used according to the different security requirements of the data.This safeguardstheprivacy of data in thecloud,in?creasesflexibility,and reducesoverhead.
Much attention has been focused on safeguarding data at the storage provider's side;however,dynamic privacy needs at the user side have largely been ignored.Encryption,access control strategies,and other security mechanisms generally safeguard the privacy of data.There are many factors that al?low privacy breaches,and user privacy requirementsvary wide?ly.Traditional authentication and security management strate?giesareinsufficient for datastored in thecloud.
In this paper,we have described an architecture that en?sures data privacy and security in cloud storage.We have also discussed access control and data integrity,confidentiality,availability,and privacy technologies.Cloud storage systems are moving towards unlimited bandwidth,capacity,and pro?cessing power,and data must be securely accessible anytime and anywhere.Because of changing demands,existing technol?ogy cannot ensure the privacy and security of data stored in the cloud.Further research needs to be done on scalability of se?cure storage and secure storage management in a large,com?plex cloud.
Storage devices are provided by a number of different ser?vice providers and shared by a large number of users.Frequent equipment deployment,data operations,and data access make the cloud a dynamic environment.Cloud data storage and man?agement thereforeneedstobesafebut alsohighly scalable.
Intentional breachesof privacy call for dynamic countermea?sures in a real-time cloud storage system.With frequent changes in computer technology,the means of breaching data privacy are constantly changing;consequently,security re?quirements are also changing.Privacy-preservation strategies need to be constantly devised for cloud storage systems.
An optimal balance must also be struck between security and availability in a cloud storage system.Security and avail?ability exist in a contradictory relationship,and increasing safety often decreases availability.The foremost requirement of the data owner is security,and access limitations need to be imposed on applications.Further research is needed into the effectsof security on dataavailability.