Performance Analysis of Weather Data using Machine Learning

  

Weather and Climate Observations

Weather data is primarily important for determining the climate of a region. Climate is determined by a number of factors.The formation and advancement of storm systems, the amount of precipitation an area gets, and the number of cloudy days are all influenced by air pressure, temperature, and humidity at various altitudes.These influences affect the environment on a local, international, and global scale over time.

Why performance analysis of weather data is important ?

The value of weather data analytics in human life is immense.Accurate weather forecasting is beneficial to the agricultural industry, tourism, and preparing for natural disasters such as floods and droughts.Weather forecasting has a lot of economic appeal in news organisations, government agencies, and industrial agriculture.

Performance analysis of meteorological data:

we can use weather and climate datasets to better understand and forecast the effect on shipping and logistics processes, ship temperature-sensitive goods, and select potential destinations more efficiently to increase production. I have considered one such a weather dataset from Kaggle 

More about the weather dataset:

👉 The dataset has hourly temperature recorded for last 10 years starting from 2006-04-01 00:00:00.000 +0200 to 2016-09-09 23:00:00.000 +0200. It corresponds to Finland, a country in the Northern Europe.
👉 It has the following attributes
👉 It holds 96,453 entries of data corresponding to 12 different attributes as stated above.

This blog would help to transform the raw data into information and then convert it into knowledge. It explains the process of data cleaning and perform analysis for testing the given Hypothesis. The hypothesis states that "Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming" . It means that we need to find whether the average Apparent temperature for the month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not. This monthly analysis has to be done for all 12 months over the 10 year period. The visualization is performed using several libraries in python

Requirements:

👉Numpy
👉Pandas
👉Matplotlib or seaborn library

Steps for Performance Analysis using python libraries:

👉Import the required libraries

👉 Input the dataset into a dataframe 


👉Analyse the type of each attributes

👉 Formatting the Formatted Date attribute to date time object for better analysis


👉 Set the index of the dataframe to Formatted Date. Since hourly data is provided, resampling it to monthly data will be efficient.
👉 Resampling involves changing the frequency of your time-series observations
👉There are two types of resampling => Upsampling and downsampling 
👉The average apparent temperature and humidity are displayed using the mean() function. It is to be noted that 👇 "MS" denotes the month starting 

Visualization:


We could conclude from the above plot that the humidity remained almost constant over the years. The average apparent temperature also remains same since the peaks lie on the same line.

Plotting the variation in Apparent Temperature and Humidity for the month of April every year:

No change in average humidity. Increase in average apparent temperature can be seen in the year 2009 then again it dropped in 2010 then there was a slight increase in 2011 then a significant drop is observed in 2015 and again it increased in 2016. It is quite evident that there is a sharp rise in temperature past 2010 whereas there is a fall over 2014.

Heatmap Representation:

A Heatmap is a two-dimensional graphical representation of data where the individual values that are contained in a matrix are represented as colors.

A histogram representation provides better visual insights


Summarizing Humidity vs Temperature with replot() method:



Conclusion:

Thus, the performance analysis of weather data provided better insights on determining the factors which affect the climate and environment. Without no doubt, global warming is deteriorating the atmosphere and impacting different environmental criteria. As a result of this analysis, it could be concluded that there has been a slight increase temperature over the last ten years(given in the dataset) and taking humidity into account, we can state that it has almost remained the same throughout the years.

Comments

Articles by Hemapriya

Comprehending the state-of-art Digit Recognizer dataset using machine learning

Market Basket Analysis using Association Rule-Mining in R language