By Tim Graves
Spreadsheets. Comma separated files. Databases. Endless columns of data scrolling away vertically and horizontally.
I bet you love them, eh?
In my last blog post I talked about how Artificial Intelligence has been saving me time with long textual documents.
Now I want to focus the labour-saving beam of AI onto data.
Strange as it might sound, I think that Artificial Intelligence is beginning to point us towards a more natural way of relating to data: a dynamic that is, ironically, much more human.
ChatGPT Plus
My walkthrough will use ChatGPT Plus, which is a monthly subscription service.
Most of you won’t be paying for the subscription, but my examples are still worth following, because:
• The University plans to install Microsoft Copilot on campus machines. Copilot will bring AI features to MS Office packages, including Excel.
• The paid features of ChatGPT will inevitably be rolled out to all users at some point.
• There are already free alternatives that allow you to experiment with similar features (e.g. https://julius.ai).
Consider my examples as a taster of how our interactions with data might soon be changing.
Walkthrough
For my example I have chosen a 45,000-line CSV file that lists daily entrances to the University of Sussex Library since 2010.
First, I upload it to ChatGPT and supply a basic explanation of the data.
Me: “This file contains data in two columns: the first gives a day in the format dd/mm/yyyy; the second gives the total number of entrances to the University of Sussex Library.”
To warm up, I can do the standard tasks that I would normally handle in Excel.
I can get overall totals.
Me: “Show me a table that gives the total entrances for each year”
AI dutifully draws me such a table:
I can ask for simple charts.
Me: “Draw a line graph of entrances in 2022: grouped by week”
ChatGPT provides:
I can ask for graphs that I would usually find more fiddly to create in Excel.
Me: “Show me a line chart that has one line for the entrances by day in April 2023 and another line showing the entrances by day in October 2023.”
ChatGPT draws:
So far, I could have done these tasks in Excel or any other statistical package (with bit more column selection, right clicking and menu-trawling).
But what if I wanted a chart that included data that wasn’t included in my original file?
Let’s suppose I am interested in activity by the day of the week. My original file holds no information about Monday, Tuesday, Wednesday.
What am I to do?
Me: “Show me a heat map for 2022. The X axis should show the day of the week, and the Y axis the month”.
Because ChatGPT was trained on 1.76 trillion parameters harvested from the Internet and beyond, it can easily fill in the day of the week for me (and create a heatmap, which I always find annoying in Excel).
So, my interaction with the data is changing into something much more human than my usual experience of squinting at spreadsheets.
It is becoming a dialogue.
And because it is a dialogue, I can get straight to the point and ask the questions that brought me to the data in the first place, as if I were talking to a superhuman expert on library entrances.
Me: “What was the busiest week of 2023?”
AI: “The busiest week of 2023 was the week ending on May 14th, with a total of 27,836 entrances to the University of Sussex Library”.
And,
Me: “Looking across all years, what week of the year tends of be busiest?”
AI: “Across all years in the dataset, the busiest week of the year tends to be Week 19. This week typically falls in May”.
I can also ask AI to make future predictions based on patterns in the data.
Me: “How many entrances would you predict for the first week of October in 2024?”
AI: “Based on the average number of entrances for the first week of October across all available years in the dataset, the predicted total number of entrances for the first week of October in 2024 would be approximately 19,874”.
I can even ask AI to help me interpret the information, using knowledge from outside the confines of the source data file.
For example, when I look at a chart comparing September across all years, I notice a large drop in 2020, so I ask the AI to hazard an explanation (although I’m not obliged to believe it).
Me: “What might account for the reduction in 2020?”
AI: “The reduction in entrances to the University of Sussex Library in September 2020, as compared to previous years, can be attributed to several factors related to the COVID-19 pandemic and its effects on university operations and student behaviour”.
In Summary
By using Artificial Intelligence to talk directly with a dataset, you can adopt a conversation to explore the potential information it holds.
I am hoping that this holds out the promise of a future where we won’t spend so much time sorting, filtering, and graphing columns of numbers.
Since all that any of us really want from a file of raw data is the answer to human questions, what could be more instinctive than transforming that relationship into a conversation?
Note on GDPR
I am always cautious when feeding data into any AI.
ChatGPT assures us that it aligns with GDPR in its processing of data:
“When ChatGPT analyses datasets provided by users, it does not store, remember, or “learn” from this data in a way that would affect future interactions with other users. Each interaction with ChatGPT is stateless, meaning that it doesn’t retain personal data from one interaction to the next within these live sessions”.
It also adds, though:
“In practice, when using ChatGPT or any AI model for analysing datasets that include personal data, organizations should ensure that they are doing so in a manner compliant with GDPR”.
As the field of AI data analysis is so close to the cutting edge and evolving far faster than the law of any nation, I play it safe and never put in any identifiable data that might come back to haunt me as a GDPR concern
Leave a Reply