全部版块 我的主页
论坛 计量经济学与统计论坛 五区 计量经济学与统计软件 EViews专版
1228 2
2019-01-30

Hi everyone, I am going to share a case analysis about market analysis and conjoint analysis. From this example, you can learn to use basic statistical output and learn how to use cluster analysis. The dataset is golf.csv. If you need this data set, please contact me and I will be happy to share it with you.If you like, remember to give me a thumb up oh!

The dataset is the information about some golf course manufacturing costs, courses, etc. Through this example, we can learn how to observe statistical data and perform a simple cluster analysis to observe clustering.

##

%cd /Users/shimonyagrawal/Desktop

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import numpy as np

golf = pd.read_csv('golf.csv')

#Why will courseID not be relevant in a clustering model?

golf = golf.drop('courseID', 1)

golf

The course ID is not relevant in clustering just number indicating unique identifier. This variable will not yeild any results in the analysis since it's just a number depicting each response by golf course vendors. Here, it does not have any significant value required for analysis.

elevation

square_feet

est_playing_time

land_obstacles

water_obstacles

tunnel_shots

est_construction_cost

est_maintenance_cost

average_hole_length

average_hole_width

0

11.6421037.1843.3510.03.03.0103082.727261.2218.993.90

1

6.5823646.4442.3010.04.03.091637.936553.9121.352.49

2

11.0820012.2841.439.03.03.0107049.475847.0619.092.63

3

9.9120761.9046.0410.04.03.0101799.558876.0119.363.51

4

11.9919818.7544.827.06.04.094731.848445.7016.812.67

...

..............................

245

10.4523963.7351.468.04.03.099027.529333.8120.392.63

246

13.7723337.7751.399.03.04.061096.937864.5017.062.93

247

7.0123951.8341.967.03.02.0106438.632745.8118.132.53

248

8.7023850.6945.106.03.03.098163.767955.8120.323.79

249

10.2626820.4147.588.05.03.092221.4210027.7520.282.66

##Call the describe() function on your dataset.
golf.describe()The describe function gives the descriptive statistics of the variables which summarises the distribution of the variables in the datastet. Summary statistics give a quantitative analysis of the data which can be useful in simplifying large amount of data. Using the summary statistics, the analyst can identify outliers as well as where the data is skewed.
[td]

elevation

square_feet

est_playing_time

land_obstacles

water_obstacles

tunnel_shots

est_construction_cost

est_maintenance_cost

average_hole_length

average_hole_width

count

250.00000250.000000250.000000250.000250.000000250.000000250.000000250.000000250.000000250.000000

mean

10.9034822052.67760044.9161207.8403.9680002.94400094956.1602007779.28816019.7572802.964680

std

2.523902708.1774785.0011461.5440.7754880.59857911656.5244921990.5365821.7506930.459777

min

2.9200014357.48000031.6300003.0002.0000002.00000061096.9300002682.46000014.6900001.640000

25%

9.4525020162.53250041.4300007.0003.0000003.00000086997.5250006527.24250018.5600002.632500

50%

11.0850022030.71500044.7800008.0004.0000003.00000094727.8800007760.00000019.7800002.990000

75%

12.7050023974.86750048.1575009.0004.0000003.000000102375.6400008935.88750020.9250003.300000

max

17.7700029712.52000058.02000014.0006.0000004.000000126247.72000012589.80000025.4900004.210000
##Build a k-means model.
from sklearn.cluster import KMeans
kmeans_model = KMeans(n_clusters = 3, random_state = 101)
kmeans_model.fit(golf_normalize)
cluster_labels = kmeans_model.labels_
golf_cluster = golf.assign(Cluster = cluster_labels)
grouped = golf_cluster.groupby(['Cluster'])
grouped.agg({
'square_feet': 'mean',
'est_construction_cost' : 'mean',
'est_maintenance_cost' : 'mean'}).round(2)
[td]


square_feet

est_construction_cost

est_maintenance_cost

Cluster




0
24754.4184712.527551.36

1

19985.26105199.567382.96

2

22149.6292833.788187.60
golf_cluster.head()
[td]

elevation

square_feet

est_playing_time

land_obstacles

water_obstacles

tunnel_shots

est_construction_cost

est_maintenance_cost

average_hole_length

average_hole_width

Cluster

0

11.6421037.1843.3510.03.03.0103082.727261.2218.993.901

1

6.5823646.4442.3010.04.03.091637.936553.9121.352.492

2

11.0820012.2841.439.03.03.0107049.475847.0619.092.631

3

9.9120761.9046.0410.04.03.0101799.558876.0119.363.511

4

11.9919818.7544.827.06.04.094731.848445.7016.812.671

This completes the implementation of Cluster. Finally, cluster analysis should be summarized.
golf_cluster = golf.assign(Cluster = cluster_labels)
grouped = golf_cluster.groupby(['Cluster'])
grouped.agg({
'square_feet': 'mean',
'est_construction_cost' : 'mean',
'est_maintenance_cost' : 'mean'}).round(2)

[td]

square_feet

est_construction_cost

est_maintenance_cost

Cluster




0
24754.4184712.527551.36

1

19985.26105199.567382.96

2

22149.6292833.788187.60


If you like, remember to give me thumb up oh!



附件列表
下载.jpeg

原图尺寸 3.49 KB

下载.jpeg

下载.jpeg

原图尺寸 3.49 KB

下载.jpeg

二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

全部回复
2019-2-3 11:49:16
您好,如果您的求助没有解决,请到项目交易发布需求,会有更快更专业的用户帮助您 https://bbs.pinggu.org/prj/
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

2019-9-28 00:40:44

o

If you have any other questions, please reply to this post!
二维码

扫码加我 拉你入群

请注明:姓名-公司-职位

以便审核进群资格,未注明则拒绝

相关推荐
栏目导航
热门文章
推荐文章

说点什么

分享

扫码加好友,拉您进群
各岗位、行业、专业交流群