Analyzing Airbnb data from Madrid

Thales M. Meier
6 min readDec 23, 2021

Introduction

I'd take a wild guess and say that you probably used the Airbnb app at least once, whether to go on a little trip during a weekend, or to plan a large trip abroad. But, well, in case you didn't, I'm sure you've heard about the startup that took the entire lodging business by storm.

Airbnb's logo

Founded in 2008 and based in San Francisco - California, Airbnb is already considered the largest company in the lodging sector, but with the peculiarity of not owning a single rental property. Instead, all its revenue comes from the commission taken on every successful rental throughout the world, adding up to its Market Cap of more than a hundred billion US Dollars (DEC/21).

The Inside Airbnb website is an initiative to collect and share data from Airbnb, across many cities in the World, such as all rental listings, their prices, reviews, location, owner, neighbourhood, type of rental, days available during the year, minimum nights per rental, and many more.

For this project, I chose to analyze the city of Madrid, the capital of Spain and a city I hold very dear, as I've lived there for a year and a half during an interchange.

About Madrid

The Spanish capital, alongside Barcelona, are the 2 greatest tourist destinations across all Spain, and all the revenue from these activities round up to become Spain's third largest source of income, falling behind only after the industrial and business sectors.

The people who visit Spain come mostly from Europe and the United States and, in 2019, accounted for a total of 83.7 million tourists, leaving Spain as the second most visited country in the entire World for that year.

La Puerta del Sol — Madrid's most renowned square

In Madrid, you most definitely won't run out of options when planning your trip, as you can find almost anything that suits your taste, ranging from historical religious places to some of the world's most famous museums, or participating in some of the country's most traditional (and polemical) events, such as the Spanish bullfighting.

Corrida de Toros (Spanish Bullfighting) in the Plaza de Toros de Las Ventas — Madrid

The Data

As I mentioned earlier, all the data was gathered from the Inside Airbnb website, with the latest entries up to the 07th of November, 2021. For this analysis, I'll be using mostly the Pandas and Matplotlib libraries inside Python 3.10.

In the .csv file, we had a total of 17634 entries (rows) and 17 attributes (columns). Among these attributes, the main ones we'll be working with are neighbourhood, location (latitude and longitude), price, minimum nights, availability throughout the year, room type and number of reviews.

In the dataset, we had only a few missing values, but none on any of the attributes I mentioned above, so there is no need to worry about them.

Dealing with outliers

A large concern, when working with all sorts of data, is to analyze the distribution of the variables and determine the existence, or not, of outliers.

Outliers aren't always pieces of data that were wrongly inserted or typos, but maybe data that could be real, only that doesn't fit properly amongst the rest.

By plotting histograms, it was easy to find some outliers, mostly regarding price and minimum nights, as we can see below.

Histogram plot for the original price and minimum nights variables

It's easy to identify the outliers on both distributions: most values remain closer to smaller prices and less minimum nights. Taking a closer look, we can also find that these values aren't feasible: needing to rent a property for at least a thousand nights, or spending more than 10000 EUR on a property for a single night. As I mentioned earlier, it could be true, but these values surely don't fit with the rest of the data.

I decided, then, to clean the data, by erasing those outliers. From a complete statistical point of view, the total cut should've been larger. I chose, however, to determine reasonable values (300 EUR and 30 minimum nights), as the parameters for the cleansing. With that, we had about 91% of the original dataset left, but now without those outliers.

Exploring our Data

In case you're planning a trip to Madrid, you can expect to find a Airbnb property for the average price of 80.03 EUR.

This kind of property is also, actually, the most common in Madrid. With nearly 10000 listings, entire homes or apartments are responsible for 60% of all the city listings, as we can see below.

The amount of listings for each room type in Madrid

As for the average minimum nights required, you can expect a mean of 3.8 nights (4, of course). But, the most common value is 1 night, followed by 2 nights.

About the neighbourhoods

Among all the 128 listed neighbourhoods on the dataset, the most popular ones are Embajadores — with 2033 total listings, Universidad, Palacio, Sol and Justicia.

Also, we can analyze the neighbourhoods regarding their mean price, and the discovery wasn't much of a surprise. The most expensive neighbourhood to rent is Recoletos, with an average price of 115.42 EUR for a one-night rent, about 44% more expensive than Madrid's average price.

Not much of a surprise because Recoletos, that is a part of the Salamanca District, actually has the most expensive square meter in all Spain.

Paseo de Recoletos

According to the 2021 data from Telemadrid.es, 1 square meter in Recoletos is worth a whooping amount of 8683 EUR, which is actually 2.36 times more expensive than the average for the entire city, 3673 EUR (Idealista.com).

You might want, though, to save up during your stay in Madrid. In that case, the best choice would be to search for a place in Zofío (29.73 EUR) or Los Rosales (33.71 EUR), the 2 neighbourhoods with the lowest fares.

Conclusion

Madrid is definitely a great city, with lots to do and explore, and the Airbnb app can be used to find the best place for spending a couple nights there.

In case you're planning a trip to Madrid, the average prices can give you a pretty decent overview of how much you'll probably spend.

Thank you so much for reading through this article, hope you enjoyed it! You can find me on LinkedIn or on Github.

You can access the entire study on the Google Colab Notebook, available here.

--

--

Thales M. Meier

I’m a Cargo Pilot at the Brazilian Air Force, where I’ve had the opportunity to develop a few Operations Research studies, applying MCDM to real world problems.