CEMD-664 1. Normalize the data. Convert the dataset to a standard scale where the range of each feature is normalized to [0,1]. This will help in computing the distance between the features.
```python
from sklearn.preprocessing import Normalizer
scaler = Normalizer()
dataset = scaler.transform(dataset)
```
2. Perform the a hierachical clustering. Perform a hierarchical clustering
based on the precomputed distances. Use the "complete" method for cluster merging. Get the distance matrix from the precomputed distances.
```python
from sklearn.clustering import HierarchicalCluster
hcl = HierarchicalCluster(n_clusters = 4,
method = "complete")
dataset = hcl.fit_predict(dataset)
```
3. Perform the PCA which will convert the data into 2D Cartesian space. Get the distance matrix from the precomputed distances.
```python
from sklearn.decomposition import PCA
pca = PCA(n_components = 2)
dataset = pca.fit_transform(dataset)
```
4. Perform the last K-means clustering. Perform a k-means clustering using only the first 2 PCA variables. Find out the number of clusters.
```python
from sklearn.clustering import KMmeans
kmeans = KMmeans(n_clusters = 4)
dataset = kmeans.fit_predict(dataset)
```
5. Get the clusters. Extract the 4 clusters in the dataset. Then make a scatter plot showing each of the clusters.
```python
import matplotlib.pyplot as plt
plt.scatter(dataset[:,0], dataset[:,1], c = dataset
)
plt.show()
```
In this time, I will show the point map.
2025年4月5日