999精品在线视频,手机成人午夜在线视频,久久不卡国产精品无码,中日无码在线观看,成人av手机在线观看,日韩精品亚洲一区中文字幕,亚洲av无码人妻,四虎国产在线观看 ?

Clustering and Data Analysis

2018-05-14 13:16:44丁立人
留學 2018年19期
關鍵詞:數學

1. Introduction

Clustering is a process of sorting objects, elements or data into groups according to their similarity or dissimilarity. In this thesis, topological foundation and several approaches are going to be explained.

2. Definition

In a set of data, a cluster is a group of elements in which the elements are more similar to each other than elements in other clusters. We can put these elements into a metric space to measure the similarity between them by a "distance". This function's purpose would be measure the similarity between two elements. Given a set X, a metric about X is a function X × X → R such that

1. d(x; y) ≥ 0 for all x; y ∈X and d(x; y) = 0 i x = y.

2. d(x; y) = d(y; x) for all x; y ∈ X.

3. d(x; y) ≤ d(x; z) + d(z; y) for all x; y; z ∈ X.

A pair (X; d) is called a metric space. To form a cluster, we first define a relation x ~R y as

x ?R x′ iff d(x; x′) ≤ 2R

in which R ∈ R and R ≥ 0. This show these two element are similar. Then we can find a equivalence class accord to relation x ~R y defined with following: if there exists a sequence of elements x0, …… xn such that x = x0 ?R x1, ……, xn?1 ?R xn = y, then x ~R y.

Now set of equivalence classes about x forms a partition of the whole set, all elements in this class are more similar to each other comparing to elements not in the class--the cluster. Different functions aiming different type of data input. For data which can be quantify, they can be put into Rn then distance between two elements can be calculate. If data can't be quantified, then for C elements, a symmetric matrix C\C can be build and some function can be used to determine the similarity.

3. Clustering and data analysis

Clustering is one of the most vital task of data analysis, because clusters and process clusters form can indicate important information and underlying pattern which can't be provided by other methods.

4. Clustering algorithms

All clustering methods divide elements into groups in which elements are similar to each other using a similarity standard.

4.1 Hierarchical Clustering

Trying to form cluster, we would find that different threshold R form clusters with different size. If the threshold is 0, then the clusters would each only contain one element; As R increases, elements become connected and multiple clusters joined together and become one cluster. We can informally defines, that hierarchical clustering is the process finding such a hierarchy of clusters within a set of elements. We can use dendrograms shown hierarchy intuitively in (Figure 1.1), where each horizontal segment represent components being connected.

Bottom-up hierarchy is called an agglomerative clustering. We start from R = 0, when there are as many connected components as the number of individual points, as well as the number of clusters(Figure 1.2). As R increases, points start to become connected (Figure 1.3). At last, all elements in the data set are included in one cluster (Figure 1.1).

4.2 K-means Clustering

K-means Clustering is one of the most popular Flat Clustering algorithm. Unlike hierarchical clustering, flat clustering is focused on find the suitable R value.

4.3 Which one is better?

It's hard to say which method is better, since both of them have their advantage.

5. Clustering in Data Analysis Examples

The clustering data analysis example I use is the relation between GDP per capita and Fertility rate.

In our situation, there are some countries that have too few population so the data is missing. These data should be filtered out first. Since all data are in real number, we can map data into a Euclidean space. Many points locate near the x-axis, and some other near the fertility rate 2. This shows that there are many countries that have low GDP per capita have higher fertility rate, countries that have relatively higher GDP per capita have fertility rate around 2(Figure 3.1).

As the threshold increases, there are three clusters forming: cluster with F between 4 and 5, and with G under 5000$; second one is located at the left-bottom corner of the graph, with fertility rate around 2 and G roughly around 10000$; last one is the cluster with G from 30000$ to 50000$ and fertility rate around 2. In the first cluster, Congo rep, Ethiopia, Iraq, and South Sudan are suffer from poverty or war and have a high fertility rate with a low income level. The second cluster include countries such as China, Russia etc, rapidly growing recently. The third group are mostly consist of MDCs including UK, France, Canada etc. These countries are all highly developed and most of them have fertility rate less than two. Pattern of these three cluster actually is a strengthening evidence for the theory of demographic transition.

Figure 3.3 almost exactly give the partition of developing countries and developed countries.

6. Conclusion

Clustering is a very effective method in data analysis. I believe that the power of clustering is shown in the example about demo-graphics, in which clustering revealed three groups of countries that each on a different stage of demographic.

丁立人

年齡:17

城市:北京

年級:12

目標專業:數學,計算機科學

在夏校學習的一個月以來,我發現到應用拓撲學和之前初高中學的數學是完全不同的,應用拓撲和它的基礎學科之一即線性代數對我來說是巨大的挑戰。學習過程中給我留下印象最深的是聚簇算法,這是一種可以把有相似特征的數據歸于幾個相應的群中,還有空間變化,即通過函數將一個向量空間轉化為另一個。從有所了解到能夠寫出這篇論文,我的進步絕不僅限于應用拓撲學相關的知識,還培養了獨立研究的能力,并讓我對高等數學更為嚴謹的邏輯有了一定的認識。

在論文中,我主要介紹了聚簇算法和拓撲的聯系,以及用人口學相關的例子介紹了一種聚簇算法。

猜你喜歡
數學
中等數學
中等數學(2021年4期)2021-12-04 13:57:52
中等數學
中等數學(2021年7期)2021-12-03 04:01:41
中等數學
中等數學(2021年1期)2021-12-02 03:08:08
中等數學
中等數學(2021年3期)2021-12-02 00:28:14
中等數學
中等數學(2020年11期)2020-12-18 01:23:21
我們愛數學
我為什么怕數學
新民周刊(2016年15期)2016-04-19 18:12:04
數學到底有什么用?
新民周刊(2016年15期)2016-04-19 15:47:52
我難過,因為我看到數學就難過
數學也瘋狂
主站蜘蛛池模板: 国产真实乱人视频| 久久99热这里只有精品免费看 | 亚洲国模精品一区| 91色综合综合热五月激情| 久久永久免费人妻精品| 伦精品一区二区三区视频| 国产精品网曝门免费视频| 日本免费一区视频| 色综合色国产热无码一| 丁香婷婷在线视频| 狠狠做深爱婷婷久久一区| 成人精品免费视频| 在线观看国产精品第一区免费| 精品亚洲麻豆1区2区3区| 免费Aⅴ片在线观看蜜芽Tⅴ| 日韩黄色在线| 亚洲av无码久久无遮挡| 久久精品亚洲热综合一区二区| 韩日免费小视频| 综合成人国产| 91人人妻人人做人人爽男同| 国产精品香蕉在线观看不卡| 狠狠色丁婷婷综合久久| 国产精品久久精品| 亚洲成人在线网| 激情亚洲天堂| 爽爽影院十八禁在线观看| 国产爽歪歪免费视频在线观看| 97视频在线观看免费视频| 91成人免费观看在线观看| 欧美成人在线免费| 亚洲第一中文字幕| 黄色网址免费在线| 欧美亚洲国产精品久久蜜芽| 国产成人a在线观看视频| 热久久国产| 中文字幕永久在线看| 91精品国产91欠久久久久| 91美女视频在线| 真人免费一级毛片一区二区| 69av免费视频| 亚洲综合一区国产精品| 成人国产小视频| 91精品人妻互换| 国产区人妖精品人妖精品视频| 一区二区三区高清视频国产女人| 亚洲性日韩精品一区二区| 国产精品第| 亚洲第一成年免费网站| 亚洲男人的天堂在线观看| 亚洲日韩精品无码专区97| 日韩亚洲综合在线| 毛片久久网站小视频| 国产在线精彩视频二区| 国产免费看久久久| 国产一区二区免费播放| 久久9966精品国产免费| 一级毛片中文字幕| 国产无码制服丝袜| 国产精品冒白浆免费视频| 91无码人妻精品一区| 一级毛片免费观看久| av无码一区二区三区在线| 成人午夜在线播放| 激情六月丁香婷婷| 国产亚洲欧美在线专区| 国产女人喷水视频| 97se亚洲| A级毛片高清免费视频就| 2018日日摸夜夜添狠狠躁| 国产精品专区第一页在线观看| 久久黄色小视频| 亚洲首页在线观看| 91蝌蚪视频在线观看| 亚洲激情99| 日本高清在线看免费观看| 亚洲国产AV无码综合原创| 五月六月伊人狠狠丁香网| 在线一级毛片| 亚洲国产成人在线| 日本三区视频| 国产精品xxx|