An example of market analysis and conjoint analysis of Python

1228

收藏 2019-01-30

Hi everyone, I am going to share a case analysis about market analysis and conjoint analysis. From this example, you can learn to use basic statistical output and learn how to use cluster analysis. The dataset is golf.csv. If you need this data set, please contact me and I will be happy to share it with you.If you like, remember to give me a thumb up oh!

The dataset is the information about some golf course manufacturing costs, courses, etc. Through this example, we can learn how to observe statistical data and perform a simple cluster analysis to observe clustering.

%cd /Users/shimonyagrawal/Desktop

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

import numpy as np

golf = pd.read_csv('golf.csv')

#Why will courseID not be relevant in a clustering model?

golf = golf.drop('courseID', 1)

golf

The course ID is not relevant in clustering just number indicating unique identifier. This variable will not yeild any results in the analysis since it's just a number depicting each response by golf course vendors. Here, it does not have any significant value required for analysis.

elevation	square_feet	est_playing_time	land_obstacles	water_obstacles	tunnel_shots	est_construction_cost	est_maintenance_cost	average_hole_length	average_hole_width
0	11.64	21037.18	43.35	10.0	3.0	3.0	103082.72	7261.22	18.99	3.90
1	6.58	23646.44	42.30	10.0	4.0	3.0	91637.93	6553.91	21.35	2.49
2	11.08	20012.28	41.43	9.0	3.0	3.0	107049.47	5847.06	19.09	2.63
3	9.91	20761.90	46.04	10.0	4.0	3.0	101799.55	8876.01	19.36	3.51
4	11.99	19818.75	44.82	7.0	6.0	4.0	94731.84	8445.70	16.81	2.67
...	...	...	...	...	...	...	...	...	...	...
245	10.45	23963.73	51.46	8.0	4.0	3.0	99027.52	9333.81	20.39	2.63
246	13.77	23337.77	51.39	9.0	3.0	4.0	61096.93	7864.50	17.06	2.93
247	7.01	23951.83	41.96	7.0	3.0	2.0	106438.63	2745.81	18.13	2.53
248	8.70	23850.69	45.10	6.0	3.0	3.0	98163.76	7955.81	20.32	3.79
249	10.26	26820.41	47.58	8.0	5.0	3.0	92221.42	10027.75	20.28	2.66

##Call the describe() function on your dataset.
golf.describe()The describe function gives the descriptive statistics of the variables which summarises the distribution of the variables in the datastet. Summary statistics give a quantitative analysis of the data which can be useful in simplifying large amount of data. Using the summary statistics, the analyst can identify outliers as well as where the data is skewed.
[td]

elevation	square_feet	est_playing_time	land_obstacles	water_obstacles	tunnel_shots	est_construction_cost	est_maintenance_cost	average_hole_length	average_hole_width
count	250.00000	250.000000	250.000000	250.000	250.000000	250.000000	250.000000	250.000000	250.000000	250.000000
mean	10.90348	22052.677600	44.916120	7.840	3.968000	2.944000	94956.160200	7779.288160	19.757280	2.964680
std	2.52390	2708.177478	5.001146	1.544	0.775488	0.598579	11656.524492	1990.536582	1.750693	0.459777
min	2.92000	14357.480000	31.630000	3.000	2.000000	2.000000	61096.930000	2682.460000	14.690000	1.640000
25%	9.45250	20162.532500	41.430000	7.000	3.000000	3.000000	86997.525000	6527.242500	18.560000	2.632500
50%	11.08500	22030.715000	44.780000	8.000	4.000000	3.000000	94727.880000	7760.000000	19.780000	2.990000
75%	12.70500	23974.867500	48.157500	9.000	4.000000	3.000000	102375.640000	8935.887500	20.925000	3.300000
max	17.77000	29712.520000	58.020000	14.000	6.000000	4.000000	126247.720000	12589.800000	25.490000	4.210000

##Build a k-means model.
from sklearn.cluster import KMeans
kmeans_model = KMeans(n_clusters = 3, random_state = 101)
kmeans_model.fit(golf_normalize)
cluster_labels = kmeans_model.labels_
golf_cluster = golf.assign(Cluster = cluster_labels)
grouped = golf_cluster.groupby(['Cluster'])
grouped.agg({
'square_feet': 'mean',
'est_construction_cost' : 'mean',
'est_maintenance_cost' : 'mean'}).round(2)
[td]

square_feet	est_construction_cost	est_maintenance_cost
Cluster
0	24754.41	84712.52	7551.36
1	19985.26	105199.56	7382.96
2	22149.62	92833.78	8187.60

golf_cluster.head()
[td]

elevation	square_feet	est_playing_time	land_obstacles	water_obstacles	tunnel_shots	est_construction_cost	est_maintenance_cost	average_hole_length	average_hole_width	Cluster
0	11.64	21037.18	43.35	10.0	3.0	3.0	103082.72	7261.22	18.99	3.90	1
1	6.58	23646.44	42.30	10.0	4.0	3.0	91637.93	6553.91	21.35	2.49	2
2	11.08	20012.28	41.43	9.0	3.0	3.0	107049.47	5847.06	19.09	2.63	1
3	9.91	20761.90	46.04	10.0	4.0	3.0	101799.55	8876.01	19.36	3.51	1
4	11.99	19818.75	44.82	7.0	6.0	4.0	94731.84	8445.70	16.81	2.67	1

This completes the implementation of Cluster. Finally, cluster analysis should be summarized.
golf_cluster = golf.assign(Cluster = cluster_labels)
grouped = golf_cluster.groupby(['Cluster'])
grouped.agg({
'square_feet': 'mean',
'est_construction_cost' : 'mean',
'est_maintenance_cost' : 'mean'}).round(2)

[td]