999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

An Optimized English Text Watermarking Method Based on Natural Language Processing Techniques

2021-12-15 08:11:02FahdAlWesabi
Computers Materials&Continua 2021年11期

Fahd N.Al-Wesabi

1Department of Computer Science,King Khalid University,Muhayel Aseer,Kingdom of Saudi Arabia

2Faculty of Computer and IT,Sana’a University,Sana’a,Yemen

Abstract:In this paper,the text analysis-based approach RTADZWA(Reliable Text Analysis and Digital Zero-Watermarking Approach)has been proposed for transferring and receiving authentic English text via the internet.Second level order of alphanumeric mechanism of hidden Markov model has been used in RTADZWA approach as a natural language processing to analyze the English text and extracts the features of the interrelationship between contexts of the text and utilizes the extracted features as watermark information and then validates it later with attacked English text to detect any tampering occurred on it.Text analysis and text zero-watermarking techniques have been integrated by RTADZWA approach to improving the performance,accuracy,capacity,and robustness issues of the previous literature proposed by the researchers.The RTADZWA approach embeds and detects the watermark logically without altering the original text document to embed a watermark.RTADZWA has been implemented using PHP with VS code IDE.The experimental and simulation results using standard datasets of varying lengths show that the proposed approach can obtain high robustness and better detection accuracy of tampering common random insertion,reorder,and deletion attacks,e.g.,Comparison results with baseline approaches also show the advantages of the proposed approach.

Keywords:Text analysis;NLP;hidden markov model;zero-watermarking;content authentication;tampering detection

1 Introduction

For the research community,the reliability and security of exchanged text data through the internet is the most promising and challenging field.In communication technologies,authentication of content and automated text verification of honesty in different Languages and formats are of great significance.Numerous applications for instance;e-Banking and e-commerce.Render information transfer via the Internet the most difficult.In terms of content,structure,grammar,and semantics,much of the digital media transferred over the internet is in text form and is very susceptible to online transmission.During the transfer process,malicious attackers can temper such digital content and thus the changed count [1].

For information security,many algorithms and techniques are available such as the authentication of content,verification of integrity,detection of tampering,identification of owners,access control,and copyright protection.

To overcome these issues,steganography and automated methods of watermarking are commonly used.A technique of digital-Watermarking (DWM),which can be inserted into digital material through various details such as text,binary pictures,audio,and video [2,3].A fine-grained text watermarking procedure is proposed based on replacing the white spaces and Latin symbols with homoglyph characters [4].

Several conventional methods and solutions for text watermarking were proposed [5,6]and categorized into different classifications such as linguistic,structure,image-based,and formatbased images [7].To insert the watermark information into the document,most of these solutions require certain upgrades or improvements to the original text in digital format material.Zerowatermarking without any alteration to the original digital material to embed the watermark information is a new technique with smart algorithms that can be used.Also,this technique can be used to generate data for a watermark in the contents of a given digital context [1,7-9].

Restricted research has centered on the appropriate solutions to verify the credibility of critical digital media online [10-12].The verification of digital text and the identification of fraud in research earned great attention.In addition,text watermarking studies have research concentrated on copyright protection in the last decade.However,less interest and attention has been paid to integrity verification,identification of tampering,and authentication of content due to the existence of text content is natural language-dependent [13].

Proposing the most appropriate approaches and strategies for dissimilar formats and materials,especially in Arabic and English languages,is the most common challenge in this area [14,15].Therefore,authentication of content,verification of honesty,and detection of tampering of sensitive text is a major issue in different systems that need critical solutions.

Some instances of such sensitive digital text content are Arabic interactive Holy Qur’an,online,eChecks,tests,and marks.Different Arabic alphabet characteristics such as diacritics lengthened letters and extra symbols of Arabic make it simple to modify the key meaning of the text material by making basic changes such as modifying diacritic arrangements [16].The most popular soft computation and natural language processing (NLP) technique that supported the analysis of the text is HMM.

The author suggests a reliable approach known as RTADZWA (Reliable Text Analysis and Digital Zero-Watermarking Approach) for transferring and receiving an authentic English text via the internet).The proposed approach is based on a second-order of alphanumeric mechanism based on the Markov model for content authentication and tampering detection of English text transmitted via the Internet.It consists of a model that operates in collaboration between zero watermarking and the Markov model as NLP techniques.In this approach,the second-order of alphanumeric mechanism has been used for text analysis in order to extract the interrelationships between the contents of the given English text and to generate a watermark key.The generated watermark will be embedded logically in the original English context without any modifications or effect on the size of the original text.Embedded watermark will be used later after the transmission of text via the Internet to detect any tampering occurring on the received English text and to determine if it is authentic or not.

The primary objective of the RTADZWA method is to achieve high accuracy of content authentication and sensitive detection of tampering attacks in English text,which has gained great importance and needs more security and protection via the Internet.

The remainder of the article is structured.In Section 2,the author explains the existing works done so far.In Section 3,the author discussed the suggested approach (RTADZWA).The simulation,implementation,are provided in Section 4,results discussion is provided in Section 5,and finally,the author concludes the article in Section 6.

2 Related Work

According to the processing domain of NLP and text watermarking,these existing methods and solutions of text watermarking reviewed in this paper classified into linguistical,structural,and zero-watermark methods [1,7,13].

2.1 Linguistical Methods

Natural language is the foundation of approaches to linguistic text watermarking.The mechanism of those methods embedding the watermark is based on changes applied to the semantic and syntactic essence of plain text [1].

To enhance the capability and imperceptibility of Arabic text,A method of text watermarking is suggested room dependent on the accessible words [17].In this method,any word-space is used to mask the Boolean bit 0 or 1 that physically modifies the original text.

A text steganography technique was proposed to hide information in the Arabic language [18].The step of this approach considers Harakat’s existence in Arabic diacritics such as Kasra,Fatha,and Damma as well as reverses Fatha to cover the message.

A Kashida-marks invisible method of watermarking [19],based on the features of frequent recurrence of document security and authentication characters,was proposed.The method is based on a predetermined watermark key with a Kashida placed for a bit 1 and a bit omitted.

The method of steganography of the text was proposed to use Kashida extensions depend on the characters ‘moon’and ‘sun’to write digital contents of the Arabic language [20].In addition,the method Kashida characters are seen alongside characters from Arabic to decide which hidden secret bits are kept by specific characters.In this form,four instances are included in the kashida characters:moon characters representing ‘00’;sun characters representing ‘01’;sun characters representing ‘10’;and moon characters representing ‘11’.

2.2 Structural Methods

Structural methods are material dependent in which altering on the structure of the original text is performed to hide watermark data.

A text steganographic approach [21]based on multilingual Unicode characters has been suggested to cover details in English scripts for the use of the English Unicode alphabet in other languages.Thirteen letters of the English alphabet have been chosen for this approach.It is important to embed dual bits in a timeframe used ASCII code for embedding 00.However,multilingual ones were used by Unicode to embed between 01,and 10,as well as 11.The algorithm of Text Watermarking is used to secure textual contents from malicious attacks according to Unicode extended characters [22].The algorithm requires three main steps,the development,incorporation,and extraction of watermarks.The addition of watermarks is focused on the development of predefined coding tables,while scrambling strategies are often used in generation and removal,the watermarking key is safe.

The substitution attack method focused on preserving the position of words in the text document has been proposed [23].This method depends on manipulating word transitions in the text document.Authentication of Chinese text documents based on the combination of the properties of sentences,text-based watermarking approaches have been suggested [24,25].The proposed method is presented as follows:firstly,a text of the Chinese language is split into a group of sentences,and for each word,the code of a semantic has been obtained.The distribution of semantic codes influences sentence entropy.The distribution of semantic codes influences sentence entropy.

2.3 Watermarking Methods

A zero-watermarking method has been proposed to preserve the privacy of a person who relies on the Hurst exponent and the nullity of the frames [26].For watermark embedding,the two steps are determined to evaluate the unvoiced frames.The process of the proposed approach bases on integrating an individual’s identity without notifying any distortion in the signals of medical expression.

A zero-watermarking method was proposed to resolve the security issues of text-documents of the English language,such as verification of content and copyright protection [27].A zerowatermarking approach has been suggested based on the authentication Markov-model of the content of English text [28,29].In this approach,to extract the safe watermark information,the probability characteristics of the text of English are involved and stored to confirm the validity of the attacked text-document.The approach provides security against popular text attacks with a watermark distortion rate if,for all known attacks,it is greater than one.For the defense of English text by copyright,based on the present rate of ASCII non-vowel letters and terms,the conventional watermark approach [30]has been suggested.

3 The Proposed Approach

This paper proposes a novel reliable approach by integrating NLP and text zero-watermark techniques in which there is no need to embed extra information such as watermark key,or even to perform any modifications to the original text.The second-order of alphanumeric mechanism of the Markov model has been used as NLP technique to analyze the contents of English text and extract the interrelationships features of these text contents.

The main contributions of our approach,RTADZWA can be summarized as follows:

· Unlike the previous work,in which the watermarking is performed by affecting text,content,and size,our approach RTADZWA embeds the watermarking logically without any effect on the text,content,and size.

· In our approach RTADZWA,watermarking does not need any external information because the watermark key is produced as a result of text analysis and extracting the relationship between the content itself and then making it as a watermark.

· Our approach RTADZWA is highly sensitive to any simple modification on the text and the meaning in the English text,which is known as the complex text.The three contributions mentioned above are found somehow only in images but not in the text.This is the vital point concerning the contribution of this paper.

· In addition,our approach RTADZWA can effectively determine the place of Tempering occurrence.This feature can be considered an advantage over the Hash function method.

3.1 Watermark Generation and Embedding Procedure

This subsection involves three sub-procedures which are pre-processing procedure,text analysis and watermark generation procedure,and watermark embedding procedure as illustrated in Fig.1.

Figure 1:RTADZWA zero-watermark processes

3.1.1 Pre-processing Procedure

The pre-processing of the original English text is one of the key steps in both the watermark generation and extraction processes to convert letter cases from the capital to small letters,remove extra spaces and newlines,and it will directly influence the tampering detection accuracy and watermark robustness.The original English text (OET) is required as input for pre-processing process.

3.1.2 Text Analysis and Watermark Generation Procedure

This procedure includes two subprocesses that are building Markov matrix,text analysis,and watermark generation processes.

·Building a Markov matrixis the starting point of English text analysis and watermark generation process using the Markov model.A Markov matrix that represents the possible states and transitions available in a given text is constructed without reputations.In RTADZWA approach,each unique pair of alphanumeric within a given English text represents a present state,and each unique word a transition in the Markov matrix.During the building process of the Markov matrix,the proposed algorithm initializes all transition values by zero to use these cells later to keep track of the number of times that the ithpair of alphanumeric is followed by the jthalphanumeric within the given English text document.

The algorithm of the Markov matrix constructing is performed as shown in Algorithm 1 below.

Algorithm 1:Algorithm of building Markov matrix using RTADZWA

where,OET:is an original English text,PET:is a pre-processed English text,a2_mm:states and transitions matrix with zeros values for all cells,ps:refers to the current state,ns:refers to next state.

Text analysis and watermark generation procedure:after the Markov matrix was constructed,natural language processing and text analysis process should be performed to found interrelationships between contexts of the given English text and generate watermark patterns.In this algorithm,the number of appearances of possible next states transitions for each current state of pair of alphanumeric will be calculated and constructed as transition probabilities by Eq.(1).

where n:is the total number of states,and i:is ithcurrent state of pair of the alphanumeric.

This example of the English version demonstrates how this method was used to introduce the phase of transformation from the current state to the next state.

When you use the second level of the secret Markov-model of alphanumeric approach,each pair of alphanumeric is a present state.Text processing is done as the text is read and the relationship meaning exchanged between the current and the next states is calculated.The accessible transitions from the above sample of the English text are shown in Fig.2 below.

Figure 2:English-text samples representation based RTADZWA

Fig.3 illustrates the analysis results of the given English sample and represents each state and their transitions based on the second level and alphanumeric approach of the Markov-model.

Figure 3:English text analysis and watermark generation based RTADZWA

The algorithm of text analysis and watermark generation procedure is formally introduced and performed as illustrated in Algorithm 2.Where ppa:previous unique pair of alphanumeric,cpa:current unique pair of alphanumeric.

Algorithm 2:Watermark generation algorithm of RTADZWA

3.1.3 Watermark Embedding Procedure

Watermark embedding has taken place logically in this method without needing to change the original text.In fact,the feature extraction of the given English-text,watermark key is embedded logically by identifying all non-zero values in the Markov chain matrix.All these non-zero values are sequentially concatenated to form the original pattern of watermark keyWMPO,as defined in Eq.(2) and Fig.4.

Figure 4:The generated original pattern of watermark key WMPO using RTADZWA

The algorithm of the watermark embedding procedure using the RTADZWA approach is introduced formally and implemented as shown in Algorithm 3.

Algorithm 3:Algorithm of watermark embedding using RTADZWA

Where a2_WMPOis an original watermark pattern.

3.2 Watermark Extracting and Detecting Procedures

This procedure consists of two key algorithms that are extracting and detecting the watermark.However,a2_EWMAextracted from the obtained will be extracted (AETP) and matched by the detection algorithm with a2WMPO.AETPis required as input to run this algorithm.Hence,it is necessary to perform the algorithm of watermark generation for obtaining the pattern of watermark for AETPas presented in Fig.5.

Figure 5:Zero-watermark of RTADZWA procedures of extraction and detection

3.2.1 Watermark Extraction Procedure

AETPshould be provided as input to run this algorithm.Though,a2_WMPAis a core output of this algorithm as presented in Algorithm 4.

Algorithm 4:Algorithm of watermark extraction based RTADZWA

where AETP:pre-processed English-text attacked,a2_EWMA:attacked pattern of watermark key.

3.2.2 Algorithm of Watermark Detecting

a2_WMPAand a2_WMPOshould be provided as the inputs needed for this algorithm to run.However,the status of the given English-text is a core output of this algorithm which can be actual or tampered with.The watermark detection process is performed by two sub-steps which are:

■Main matchingfor a2 WMPOand a2 EWMAis achieved.If these two watermark patterns are similar in appearance,then there’ll be a warning,“English text contents is authentic and no tampering occurred”.Likewise,the note will be rendered “This English text document is tampered and not authentic”,and then it continues to the next step.

■Secondary matchingis performed by matching each state’s transition status in the entire produced pattern of watermarks.This means a2_EWMAof each state is contrasted with an analogous transition of a2_WMPOas given by Eqs.(3) and (4) below

where a2_PMRTrepresents tampering detection accuracy rate value in transition level,(0<a2_-PMRT<=1)

where a2_PMRS:value of tampering detection accuracy rate in state level,(0<a2_PMRS<=100).

The weight of every state in the Markov matrix must be determined following the equivalent rate of every state,as is seen in Eq.(5).

where a2_PMRS:is the total matching value in the ithstate level.

The ultimate a2_PMR ofAETPand AET are computed by Eq.(6).

The distortion rate of the watermark is the sum of manipulative attacks on the contents of the English context that have been defined by a2_WDR and calculated by Eq.(7).

The algorithm of watermark detection is formally introduced and applied as seen in Algorithm 5.

The effects of the method of watermark extraction and detection is illustrated in Fig.6.

4 Implementation and Simulation

A variety of implementation and simulation simulations are conducted to test the accuracy of RTADZWA output and tampering detection.This section outlines a setting for implementation and experimentation,conditions for experiments,typical dataset experimental scenarios,and discussion.

4.1 Simulation and Implementation Environment

The self-developed software was developed to evaluate and assess the efficiency of RTADZWA.The RTADZWA implementing environment is:CPU:Intel Core i7-4650U/2.3 GHz,RAM:8.0 GB,Windows 10-64 bit,PHP VS Code IDE programming language.

Figure 6:Results of extraction of watermarks and detection using RTADZWA

4.2 RTADZWA Simulation and Experiment Findings

The performance of RTADZWA refers to the accuracy rate of tampering detection of illegal attacks.

4.2.1 RTADZWA Experiment Under Small(10%)Attack Volumes

Tampering detection accuracy results of RTADZWA under 10% of attack volume of all attacks against all dataset sizes are graphically illustrated in Fig.7.These results are discussed below.

Figure 7:Performance evaluation of RTADZWA under 10% volume of all attacks

From Fig.7 above,results under 10% attack volume show the best tampering detection rate in all scenarios of deletion attack.This means RTADZWA very sensitive to the small volume of deletion attack.

4.2.2 RTADZWA Experiment Under Mid(20%)Attack Volumes

As observed from the results shown in Fig.8 under 20% attack volume,RTADZWA gives the best performance in all scenarios of deletion attack,as well as results shown in scenario of 10% attack volumes.

Figure 8:Performance evaluation of RTADZWA under 20% volume of all attacks

4.2.3 RTADZWA Experiment Under Mid(50%)Attack Volumes

As observed from the results shown in Fig.9 under 50% attack volume,RTADZWA gives the best performance in all scenarios of deletion attack in cases of a very small and middle datasets.However,in the case of the small and large datasets,the RTADZWA is more sensitive under reorder attacks.

Figure 9:Performance evaluation of RTADZWA under 50% volume of all attacks

4.2.4 RTADZWA Simulation and Experiment Findings Under All Attack Volumes

The performance of RTADZWA refers to the accuracy rate of tampering detection of illegal attacks.

To evaluate the performance of RTADZWA,Scenarios of many studies are performed as shown in Tab.1,for all forms of attacks and their volumes.

Table 1:Assessment performance of RTADZWA under all volumes

The results shown in Tab.1 and Fig.10,it seems that the RTADZWA approach gives sensitive results of detection of tampering in all attacks that the structure,semantics,and syntax of the content of Arabic text may have been carried out.As a comparison of tampering based on attack types,the results show that the most sensitive tampering detection in all attack volume scenarios is the insertion attack.

5 Comparison and Result Discussion

5.1 Baseline Approaches

The performance results are critically analysed and compared between RTADZWA and baseline approaches UZWAMW and HNLPZWA and show discussion of their effect under the major factors i.e.,dataset size,attack types,and volumes to find which approach gives the best performance.Baseline approaches and their objectives are stated in Tab.2.

Figure 10:RTADZWA performance under all volumes of various attacks

Table 2:Compared baseline approaches

5.2 Comparative Results

5.2.1 Results of Dataset Impact

This section tests the various data set size impact on watermark reliability against all forms of attacks within their multiple volumes.Tab.3 shows a comparison of that effect using RTADZWA with HNLPZWA and UZWAMW approaches.

Table 3:Detection accuracy comparison based on the English text size

The comparative results as shown in Tab.3 and Fig.11 reflects the performance of RTADZWA approach.The results show that in the proposed RTADZWA approach,the highest effects of dataset size that lead to the best performance are ordered as ASST,ALST,AMST,and AHMST,respectively.This means that performance increased with increasing text length and decreased with decreasing text length.On the other hand,results show that RTADZWA approach outperforms both HNLPZWA and UZWAMW approaches in terms of watermark robustness under all scenarios of dataset sizes.

Figure 11:English text size-based comparison of tampering detection impact

5.2.2 Results of Attack Type Impact

Tab.4 shows a comparison of the different attack type’s effect on tampering detection accuracy of RTADZWA,HNLPZWA,and UZWAMW approaches against all dataset scales and all attack volume scenarios.In all cases of attack types,low effect detected under insertion attack by RTADZWA and baseline HNLPZWA and UZWAMW approaches because deletion and reorder attacks represent both insertion and deletion tampering at the same time.

Table 4:Detection impact comparison based on attack type

In general,the comparative results shown in Tab.4 and illustrated in Fig.12 show that RTADZWA outperforms baseline HNLPZWA and UZWAMW approaches with high-performance rate and low effect of attack types.This means that the proposed RTADZWA approach is strongly recommended and applicable for content authentication and tampering detection of English text under all attack types.

Figure 12:Attack type-based comparison of tampering detection effect

5.2.3 Results of Attack Rates Impact

Tab.5 provides a comparison of the multiple attack volume effects on the performance of tampering detection for both dataset size,and volume scenarios.The comparison is performed using RTADZWA,HNLPZWA,and UZWAMW approaches.

Table 5:Detection accuracy comparison based on attack rates

Tab.5 and Fig.13 show how the performance is influenced by low,mid,and high attack volumes.In cases of mid and high attack volumes,a very high effect is detected by baseline HNLPZWA and UZWAMW approaches.However,a very low effect is detected by the proposed RTADZWA approach.In Fig.11,it can be seen that if the attack volume increases,the tampering detection accuracy also increases.In all cases of low,mid,and high attack volumes,it seen RTADZWA outperforms baseline HNLPZWA and UZWAMW in terms of performance in all scenarios of low,mid,and high volumes of all attacks.This means that RTADZWA approach is strongly recommended for content authentication and tampering detection of English text under all volumes of all attacks.

Figure 13:Attack rate-based comparison of tampering detection accuracy

6 Conclusion

Based on second level order and alphanumeric mechanism of hidden Markov model,a novel hybrid approach of natural language processing and zero-watermarking has been developed which is abbreviated as RTADZWA.The core aim of the proposed RTADZWA is content authentication and tampering detection of English text transmitted via the Internet.RTADZWA approach is implemented in PHP programming language using VS code IDE.The simulation and experiments are performed on various standard datasets under different volumes of insertion,deletion,and reorder attacks.RTADZWA approach has been compared with HNLPZWA and UZWAMW approaches.Comparison results show that RTADZWA outperforms baseline HNLPZWA and UZWAMW approaches in terms of general performance which represents watermark capacity,watermark robustness and tampering detection accuracy under all scenarios of all attack types and volumes.For future work,the author will intend to improve the performance using the high-level of the alphanumeric mechanism of the Markov model.

Funding Statement:The author extends his appreciation to the Deanship of Scientific Research at King Khalid University for funding this work under Grant Number (R.G.P.2/25/42),Received by Fahd N.Al-Wesabi.www.kku.edu.sa.

Conflicts of Interest:The author declares that he has no conflicts of interest to report regarding the present study.

主站蜘蛛池模板: 一本大道在线一本久道| 蜜桃臀无码内射一区二区三区 | 精品国产一区二区三区在线观看| 毛片视频网址| 午夜无码一区二区三区| 91国内在线观看| 伊人天堂网| 91精品综合| 97综合久久| 国产剧情国内精品原创| 欧美精品1区| 欧美成人看片一区二区三区| 欧美劲爆第一页| 少妇极品熟妇人妻专区视频| 亚洲天堂日韩av电影| 亚洲午夜天堂| 四虎国产精品永久一区| 欧美区一区二区三| 日韩高清中文字幕| 久久6免费视频| 亚洲aⅴ天堂| 视频二区中文无码| 日韩精品久久久久久久电影蜜臀| 国产好痛疼轻点好爽的视频| 亚洲一级毛片免费观看| 激情在线网| 一级福利视频| 中文字幕永久在线观看| 成人无码区免费视频网站蜜臀| 国产综合欧美| 一本久道热中字伊人| 亚洲av成人无码网站在线观看| 国产成人1024精品| 国产美女自慰在线观看| 中文字幕日韩视频欧美一区| 国产99视频在线| 99精品一区二区免费视频| 中文国产成人精品久久| 99re热精品视频中文字幕不卡| 久久香蕉欧美精品| 国产鲁鲁视频在线观看| 强奷白丝美女在线观看 | 全部毛片免费看| 欧美日韩在线成人| 国产呦精品一区二区三区下载| 精品国产一区91在线| 国产成人av一区二区三区| 91久久天天躁狠狠躁夜夜| 久久精品国产电影| 日韩精品资源| 欧美亚洲香蕉| 精品人妻无码中字系列| 精品国产免费观看一区| 亚洲福利视频网址| jizz在线观看| 五月天福利视频| a天堂视频在线| 欧美亚洲国产精品久久蜜芽| 国产精品无码久久久久久| 在线观看免费黄色网址| 欧美一级高清片欧美国产欧美| 久操线在视频在线观看| 亚洲精品福利视频| 欧美一级高清免费a| 制服丝袜一区二区三区在线| 99久久精品免费观看国产| 国产大片黄在线观看| 少妇极品熟妇人妻专区视频| 小13箩利洗澡无码视频免费网站| 中文字幕免费播放| 五月天久久综合国产一区二区| 丰满的少妇人妻无码区| 成·人免费午夜无码视频在线观看| 国产人人射| Aⅴ无码专区在线观看| 97在线公开视频| 日韩无码视频网站| 四虎影视国产精品| 日韩精品无码免费专网站| 国产精品入口麻豆| 亚洲综合久久成人AV| 91无码人妻精品一区|