999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

數據提取須有德

2022-03-07 13:39:06朱利葉斯切爾尼奧斯卡斯云天
英語世界 2022年10期

文/朱利葉斯·切爾尼奧斯卡斯 譯/云天

近來,互聯網正經歷著與18 世紀早期“采金熱”類似的現象,特別是在數據提取方面。數據因其巨大的價值而被某些分析師稱為“新石油”。數據領域仍然對大大小小的參與者開放,但這也導致了若干不專業的行為,甚至有人設法獲取有密碼保護的數據。

2盡管許多網站確實包含IP禁令等防御措施,但由于競爭加劇和各種經濟因素,網絡爬蟲和服務器之間的無形沖突仍在持續,并愈演愈烈。盡管大多數人很樂意利用億客行、谷歌購物、PriceGrabber 和天巡網等聚合網站的低價優勢,但人們并沒有意識到上述沖突正發生在不同的電商平臺之間。

符合道德的網頁數據抓取:目的的重要性

3使用工具的目的有好有壞,網頁數據抓取也不例外。一種相當常見的情況是以營銷為目的抓取個人數據。數億用戶通過電商平臺上的服務協議條款同意公開他們的數據,無論他們是否意識到了這一操作。然而,數據遭泄露的問題在于,這些數據由社交媒體機構提取,卻為僵尸網站所用。這類網站在未經用戶許可的情況下創建個人資料,并羅列出個人的詳細信息。

4結果,網頁數據抓取的負面新聞越來越多,這使得公眾對自身數據價值和隱私的認識有所提高。網頁數據抓取本身并沒有什么不道德的,因為它不過是把人們通常需要手動操作的活動自動化了。主要的區別在于,網頁數據抓取使用機器人程序,在極短時間內爬取大量網站、提取海量信息,從而實現更大規模的信息搜集。

5提取公開的數據需要代理。簡單來說,代理是網絡爬蟲和服務器之間的中介。使用代理可以將數據請求均勻地分配到服務器,這樣能確保以合理的速率請求數據,也可保證請求方匿名。

不道德抓取的后果

6不道德抓取所采用的數據提取方式可能損害個人隱私,導致服務器過載。

7盡管很多網站試圖通過IP禁令來防止不道德抓取,但這漸漸變得徒勞,因為使用了代理,而且這些代理能夠模擬人類行為來規避服務器問題。這最終可能導致服務器過載(使在線企業耗費資金)、互聯網透明度降低、公眾在隱私問題上的不信任加重。

網頁數據抓取道德規范是必要的

8網頁數據抓取大有裨益,但這有賴于有自由且透明的互聯網可用。我確信,如果我們能遵循一些準則,使局面對每個人都公平,那么網頁數據抓取將有益于整個科技領域:

1. 只抓取公開的網頁

2. 研究目標網站的法律文件以確定你依照法律是否接受其服務條款。如果接受,確定自己是否不會違背

3. 合理請求數據以保證服務器功能不受損害(DDoS 攻擊)

4. 尊重源網站對所獲得的任何數據的隱私保護

5. 使用以合乎道德的手段獲取的代理

并非所有代理都是平等的

9眾所周知,當今正在運行的某些代理,其獲取方式并不道德。許多代理通常是人們從下載到個人設備里的應用程序中獲取的。很難確定這些用戶是否意識到了他們的設備正在被使用。但可以肯定的是,如果用戶同意了具有誤導性或是容易混淆的服務條款,從而不情愿地將個人設備變成住宅代理網絡中的參與者,那么將這類程序用作代理一定是不道德的。

合乎道德的做法能提升公平性與責任心

10現代網頁數據抓取的某些方面缺乏明確性,需要道德規范來為行業帶來秩序。如果業內人士能夠就專業的網頁數據抓取方法達成共識,這將有助于維護一個公平、開放、自由的網絡環境,使企業與消費者雙贏。關于數據抓取在各行各業所能發揮的最大潛能,我們對此的了解仍處在早期階段,所以讓我們抓住這個大好時機,以最合乎道德的方式來推動創新、促進發展。 □

The internet is currently undergoing a similar phenomenon to the gold rushes of the early eighteenth century,specifically when it comes to data extraction. With data now dubbed by some analysts as the “new oil” in terms of its value, the field is still open to small and large players alike, which has led to some unprofessional activities that extend all the way towards the acquisition of password-protected data.

2While many websites do contain defensive measures such as IP bans, the invisible conflicts between scrapers1scraper 網絡爬蟲,一種按照一定的規則,自動抓取萬維網信息的程序或腳本。后文的抓取、爬取,均指從萬維網上收集數據。and servers are ongoing and gaining in intensity, due to increased competition and economic factors. Most people don’t realise these are taking place between e-commerce stores, although they are happily taking advantage of the low prices found on aggregator websites2aggregator website 聚合網站,指的是通過人為技術方式收集其他網站的熱點內容,進而將相關鏈接內容分類聚合成為自己網站內容的網站。

2 aggregator website 聚合網站,指的是通過人為技術方式收集其他網站的熱點內容,進而將相關鏈接內容分類聚合成為自己網站內容的網站。like Expedia, Google Shopping, Price-Grabber and Skyscanner.

Ethical web scraping: the importance of intention

3Tools can be used for positive and negative purposes, and web scraping is no exception. A fairly common scenario is the scraping of personal data for marketing purposes. Hundreds of millions of users agree to release their data through terms of service agreements on e-commerce sites—whether they realise it or not. The issue with the exposed data, however, is that it has been extracted by social media agencies and used by now-defunct websites that create profiles and list personal details without user permission.

4As a result, web scraping is increasingly being subjected to negative press that has resulted in increased awareness from the public with respect to the value and privacy of their data. There is nothing inherently unethical about web scraping as it automates activities that people often do on a manual basis. The main difference is that web scraping does it on a much bigger scale by using bots to crawl numerous websites and extract huge amounts of information in seconds.

5Extracting publicly available data requires proxies3proxy 代理,一種特殊的網絡服務。它允許客戶端通過這個服務與服務器進行連接。. In short, proxies act as intermediaries between the web scraper and web server. Employing proxies allows distributing data requests evenly to the web server, ensuring that the data is requested at a fair rate, as well as providing the anonymity factor to the requesting party.

The consequences of unethical scraping

6Unethical scraping uses data extraction in a way that may compromise4compromise 危及,損害。privacy and result in server overload.

7While many websites try to prevent it through IP bans, this is becoming futile5futile 徒勞的。due to the use of proxies and their function in circumventing66 circumvent 逃避(規則或限制)。server issues by simulating human behaviour. The end results can be server overloads that cost online businesses money, reduced internet transparency and more distrust from the public with respect to privacy issues.

A web scraping code of ethics is necessary

8Web scraping has many benefits that depend upon the availability of a free and transparent internet. I believe it would benefit the entire tech space if we adopted a few guidelines in order to make the landscape fair for everyone:

1. Scrape publicly available web pages only

2. Study the target website’s legal documents to determine whether you will legally accept their terms of service and if you will do so—whether you will not breach these terms

3. Make reasonable requests for data in order to ensure that server function is not compromised (DDoS attack7DDoS attack 即distributed denial-of-service attack,分散式阻斷服務攻擊,一種網絡攻擊手法。該手法的目的在于將目標電腦的網絡資源及系統資源耗盡,待目標電腦負荷過重而倒下后,通過系統漏洞入侵目標電腦。)

4. Respect privacy concerns of source websites with regards to any data obtained

5. Make use of proxies procured in an ethical manner

Not all proxies are equal

9It is commonly known that some proxies operating today are not ethically sourced, with many often obtained through applications downloaded by people on their devices. Whether these individuals are aware that their device is being used is difficult to ascertain.What’s certain is that it’s definitely not ethical to use them as a proxy in cases where they consented to misleading or confusing terms of service that unwillingly turn their device into a participant on a residential proxy network.

Ethical practices lead to increased fairness and accountability

10There are some aspects of modern web scraping activity that are missing clarity, and a code of ethics is needed to bring order to the industry. If those in the industry can come together in agreement over a professional approach to web scraping, it will help to maintain a fair, open and free internet that will benefit both businesses and consumers. We are still in the early stages of discovering the full potential of data scraping in different industries, so let’s take advantage of this golden opportunity to drive innovation and create growth in the most ethical way possible. ■

主站蜘蛛池模板: 在线观看国产精品日本不卡网| 国产综合无码一区二区色蜜蜜| 国产人成在线观看| 成人免费网站久久久| 国产资源站| 亚洲无码日韩一区| 午夜精品一区二区蜜桃| 日韩色图在线观看| 99视频在线观看免费| 欧美亚洲欧美| 久爱午夜精品免费视频| 在线色综合| 亚洲国产精品不卡在线| 内射人妻无码色AV天堂| 国产精品手机在线观看你懂的 | 国产亚洲精品yxsp| 国产高潮流白浆视频| 亚洲天堂日韩在线| 亚洲视屏在线观看| 国产福利微拍精品一区二区| 亚洲αv毛片| 91精品啪在线观看国产| 青青国产成人免费精品视频| 欧美一区国产| YW尤物AV无码国产在线观看| 国产精品无码久久久久久| 久久精品无码中文字幕| 亚洲综合精品第一页| 久久96热在精品国产高清| 美女国产在线| www亚洲精品| 一级成人a做片免费| 成人无码一区二区三区视频在线观看 | 乱色熟女综合一区二区| 国产精品yjizz视频网一二区| 久久久久无码精品国产免费| 国产精品自在自线免费观看| 亚洲床戏一区| 精品成人免费自拍视频| 女人爽到高潮免费视频大全| 日韩麻豆小视频| 四虎永久在线精品影院| 久久精品午夜视频| 人人澡人人爽欧美一区| 久久精品中文字幕少妇| 国产亚洲欧美日本一二三本道| 久草性视频| 欧美色图第一页| 国产精品成人观看视频国产| 亚洲欧州色色免费AV| 亚洲浓毛av| 亚洲一级色| 国产91线观看| 美女毛片在线| 亚洲首页在线观看| 成人亚洲天堂| 精品视频一区二区三区在线播| 国产成人乱无码视频| 香蕉色综合| 亚洲欧美一区二区三区麻豆| 无码一区二区波多野结衣播放搜索 | 日韩人妻少妇一区二区| 2021国产精品自产拍在线观看| 激情影院内射美女| 国产白浆在线观看| 亚洲一级毛片在线观播放| 毛片免费观看视频| 最新国产你懂的在线网址| 欧美精品另类| 亚洲国产清纯| 亚洲第七页| 日本亚洲国产一区二区三区| 中国特黄美女一级视频| 成人在线亚洲| 干中文字幕| 国产成人av一区二区三区| 国产精品一区二区无码免费看片| 波多野结衣无码视频在线观看| 香蕉久久永久视频| 亚洲人成电影在线播放| 成人在线不卡视频| 欧美午夜小视频|