Welcome to Bukit Vista : Database Cleansing Project
| November 10, 2021
By Lukman and Adri(Data Engineer Intern – Product Development Chapter)
Hi, Vistans. We are both new Visterns and this is a‘welcoming’ project for us. They said the problem with the database has been a problem for a long time and we are aware of that too. So we tried to do what we could do. Hope you enjoy our journey! Happy reading!
Well, as we all know Bukit Vista is a property management startup. We manage many properties from guesthouses, villas, to hotels. Today, We want to know what kind of property has the most visitors? Is the property close to the beach? Or a property close to the crowd?
Yep. This must be answered before we make a decision about the property. We can’t assume, right? In Bukit Vista We have to use data! Therefore, we need to know how many properties and units Bukit Vista actually owns. According to the database, until November 2021 Bukit Vista has established partnerships with more than 190 properties consisting of more than 1000 units. But have we ever asked whether these numbers are correct? How much property does Bukit Vista actually own?
Do not believe? How about Villa“ABC”?
What just happened at Villa“ABC” is very confusing for us. We certainly can’t allow this kind of thing to happen. We must be able to inspire delight to out guest as well as our partner. We must not confuse our guests when they make a booking. We also must confuse our partners with how many units they actually have.
This is not only related to the Product Development Chapter, but also several other chapters such as CM and Marketing. We need to meet with several people to raise awareness of the existing problems and make a common understanding so that in the future similar problems do not happen again. From what we are going to do, we hope to be able to determine the number of properties, units, and listings to units that are in accordance with the actual situation.
One more thing before we get started. So you don’t get confused, Property is a residential building usually in the form of Guest Houses, Villas, to Hotels. Then the unit is the smallest unit of the property that can be sold. And listing, it is how we market the property in the Online Travel Agents that we usually use.
We still remember that our journey started from Villa“ABC”. How is it possible that Villa“ABC”, which actually only has 4 bedroom units, seems to have 9 in it. This happens because there are units containing 2 and 3 bedrooms, even though they should lead to the same bedroom between those 4. So from there we also know that there are listings that do not lead to the actual unit. It didn’t stop there, we also found that there were several properties that were duplicated with other properties. So, instead of doing an analysis, we don’t know how many properties and units Bukit Vista actually owns.
Our idea started from a very simple query that tried to compare the number of bedroom properties with the number of bedroom units. Oops! We file fast! it is unreliable because we cannot believe that the existing bedroom units are the correct data. Yep, we have to correct the value of the bedroom unit first through BIGRR and some simple queries. Of course, according to CM’s instructions and the numbers must be valid!
And we grow day by day, we brainstorm and pair with seniors. And we found one important thing. Each faulty unit has the same characteristics. And that will help us detect which ones they are.
That’s right! We start with the keywords below
Bedrooms or bedrooms
Rooms or rooms
BR or br
BookingCom or Booking.com
Extra or extra
Rollaway, rollaway, rollaway
This is very helpful for us even though there are still some units that even though they are wrong but are not detected through the keywords above. We must not be missed, we must pay attention to the property!
Allright. Starting from a simple query to a long python script that is ready to run. Run the Script!
The way this script works is quite simple. Instead of writing down how the script works for you, you better look at the picture below.
And this is the result
The number of data
The number of data
Finally! We have run the script in the production database with a different table name. With a little review of CM we are ready to rename the table to keep its structure. It’s Live now!
Oh yes, the last thing that is not less important in this project. We will create a Guideline how to assign listings to multiple units instead of us creating new units. And also we will make guidelines on how to name properties and units so that the same error will not occur in the future.
Cool. The script has been executed, the table name has been changed. That means that our work is already live on Bukit Vista’s guest portal. Great achievement for us. Now we know that until this moment Bukit Vista has collaborated with 196 properties with a total of 1004 units. And also many listings linked into units are 2112 listings. This is certainly good for Bukit Vista so that we can inspire delight to our partners and guests. We also hope that our partners can inspire guest delight from their property. And we also hope that guests can inspire delight themselves, their friends, and their families, so that they will be happy to be able to make reservations again with Bukit Vista. Now we have accurate data so we can process the data and make a business strategy from there.
Okay, We think we’ve done a pretty good job cleaning up the properties, units, and also listing the unit tables. But we can’t seem to stop. We have another problem with standardizing both property and unit names. Maybe we can fix that another time! At least with the property and unit data that is neat, we can analyze the property and the units. This is our first project and we are very happy to do it. Sharing and pairing, get to know other chapters, very supportive environment, helped us to finish it. So we can’t wait to make another impact.
The Question is still the same:“What kind of property that they want to book?”