Property Assessment from Scrapped data

Steps undertaken

  • data Aquisition
    • scrap
  • Prepare
    • clean,
    • segment,
    • dropna
  • Concatenate all df
  • do assesments
  • handle outliers
  • visualize
In [10]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

sns.set_style('darkgrid')

%matplotlib inline

Data Aquisition

Load data and concatenate

In [76]:
dfj = pd.read_csv('df_jumia.csv')
dfp =  pd.read_csv('df_p24.csv')
dfb =  pd.read_csv('df_brK.csv')

df = pd.concat([dfj,dfp,dfb])
df.drop('Unnamed: 0',axis=1,inplace=True)

df.head()
Out[76]:
desc location size value
0 5 Bedroom Tonwhouse Corner House in Lavington MASADUKU LINE, Lavington, Nairobi, Nairobi 14.0 52000000.0
1 Ambassadorial Palace in Spring Valley peponi road, Spring Valley, Nairobi, Nairobi 11.0 280000000.0
2 Spring Valley Ambassadorial Palace Spring Valley, Nairobi, Nairobi 11.0 290000000.0
3 10 Bedroom House in Karen- 3KE1375839 Hardy, Karen, Nairobi, Nairobi 10.0 150000000.0
4 10 Bedroom House For Sale In Karen (Kenya) - 3... Karen, Nairobi, Nairobi 10.0 150000000.0
In [3]:
len(df)
Out[3]:
3371
In [4]:
df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 3371 entries, 0 to 519
Data columns (total 4 columns):
desc        3362 non-null object
location    3299 non-null object
size        3360 non-null float64
value       3349 non-null float64
dtypes: float64(2), object(2)
memory usage: 131.7+ KB
In [77]:
# df['size'] = pd.to_numeric(df['size'])
# df = df.fillna(0)

# # df['value(Million)'] = df['value'].apply(lambda x: x/1000000)
# df.head()

Explore the data

In [11]:
sns.jointplot(x=df['size'],y='value',data=df,size=15)
Out[11]:
<seaborn.axisgrid.JointGrid at 0x1a09734860>

Drop Outliers

  • 50 bedroom house
  • 4 bedroomed house valued at Ksh 3200000000 - (3.2B)
In [34]:
df[df['size']==50]
Out[34]:
desc location size value
0 Two houses on 3/4 acre each 25 bedrooms on thi... Gigiri road , Gigiri, Nairobi 50.0 300000000.0
In [35]:
df.drop(0,inplace=True)
In [44]:
df[df['value']==3200000000.0]
Out[44]:
desc location size value
1065 4 bedrooms all ensuite in 3. 6 acres prime land Lower Kabete Kitisuru, Spring Valley, Nairobi 4.0 3.200000e+09
In [45]:
df.drop(1065,inplace=True)
In [59]:
sns.jointplot(x=df['size'],y='value',data=df)
Out[59]:
<seaborn.axisgrid.JointGrid at 0x1a172f47b8>
In [61]:
sns.swarmplot(x='size',y='value',data=df,size=10)
Out[61]:
<matplotlib.axes._subplots.AxesSubplot at 0x1a1916ef60>