Data Exploration 101
- durlabhinc2
- Mar 24, 2025
- 2 min read
Updated: Apr 28, 2025
In this blog we will read about how a typical data analysis journey begins with Data Exploration, and is widely used by professional or casual researchers. Let's start with a question 'When did we actually start data exploration ?' I personally believe we started it as soon as we were born, thanks to our natural intelligence, about which I will elaborate in some other blogs.
In this post, I will further elaborate on 'Data Exploration' and why is it so important or stepping stone in data analysis journey using some historical data that is readily available and can raise interest of readers from almost all background as it relates to the financial market. There's always some news about the S&P 500 index and it is quite often heard that the index went up/down by certain percentage. That is what intrigued my thought to check the history of the percentage change in a day or intra-day change. Even though myself not being from a financial background, I have always been interested in finding some pattern/ trend or story being told by a data-set of any kind. Turns out simple data table showing 'Open', 'Close', 'Low', High', and 'Volume' for each day ever since , S&P500 index, was launched in March 04, 1957 is readily available through YahooFinance. And just like that, data exploration begins.
Interesting enough! let's find an answer to another question 'How many time has the percentage change ranged between +1 % and + 2% and so on? ' This can be checked quickly using histograms. For example, I plotted a histogram for the intra-day percentage change in last 20 years, with 100 equally spaced percentage difference range, as shown below.

I can make a quick comment upon the visualization that the most of the times the percentage change has concentrated/centered around some mean value with some spread. It is quite interesting to observe that the percentage change has reached some extreme values but quite rarely. That's Data Exploration 101!
I am aware that I have raised more questions than answered. Stay tuned and I will be explaining further details such as the legends, lines, and different lines in the histogram above. In the next blog, I'll go further deep and talk about the extreme values or outliers, asymmetry, and other surprising stories this specific data-set has to unravel.
Please feel free to add your comments, questions, or suggestions.

Looks like track to success 🙌
Very informative read!
Nice .. keep it up 👍🏻