Tuesday, 26 September 2017

Big Data. Deep History. Oh, brave new world.


I’m quite taken with this article on two fronts:

First, in light of this week’s discussion topic on Open Access and the linking of data into larger datasets, through this project we will witness the ways in which existing, disparate data are aggregated and employed to attempt to yield greater insights. In this article, pulling together an array of datasets is conveyed as a technical challenge – one that involves digitization and establishing the right synthesize of column names and data types.  Yet, recall the variation in results on Scottish castles from two different “comprehensive” databases; considering the subjective, dynamic nature of data and the relational practices of the researchers, it seems apparent to me that they will encounter – if they choose to – deeper differences in: methods of classification; areas of research attention; and theoretical, temporal, and personal perspectives of the researchers, among others. If I do an aggregate search in this database for “Hopi Yellow”, who decided where yellow ended and orange began? Did they do it the same way?

I’m hopeful the interdisciplinary constitution of the project team will generate theoretical and practical differences that don’t permit unspoken assumptions; however, I’m as much interested in their process as their project outcome. Perhaps they’d be willing to publish paradata associated with the database’s creation.

The second element of this article that interests me is the positioning of Big Data as an opportunity for better Digital Archaeology. This is not just an article – it's a press release developed by the university’s communications department. The headline is not about the place(s) or people(s) that form part of the project scope; it’s about Big Data. That’s all you need to know! Big Data is the point – it’s an end in itself. Given the intended purposes of the article, it’s unsurprising that the project is pitched as a chance to crunch bigger numbers, be more “streamlined”, and help website visitors (the public?) easily access data.

Friends, this is not just a database. This is a KNOWLEDGE DISCOVERY SYSTEM.

These are uncritical rallying cries for the benefits of Big Data. It’s interesting to see this article, at this time, in light of the discrepancy between utopian conceptualizations of Big Data from a few years back and the messy, uneven realities of data sharing now.

--

I’m re-reading this before posting, and realizing I’m coming across as quite pessimistic. That’s not my intent. I’m not going to reject Big Data as an idea because I find it off-putting or because it doesn’t live up to the idealized hype. For any shortcomings in practical implementation, if bigger data sets can help us tell better stories about the past, I’m excited for that. If we want to realize these benefits, I suggest we need to be attuned to the contingent processes of knowledge generation, whether as an archaeologist 75 years ago or a professor at INSITE: Centre for Business Management and Analytics today.


Trevor

4 comments:

Joanna said...

I haven't had a chance to read the article yet, Trevor, so I hope to comment in full later. Your post reminded me to share the example I spoke about in class about how the importance of "where the numbers come from" (as well as validation) affects even the most scientific areas of the field and is important for all data, in case anyone was curious:

https://www.worldcat.org/title/the-measure-and-mismeasure-of-the-tibia-implications-for-stature-estimation/oclc/91817338&referer=brief_results

Amedeo Sghinolfi said...

As you said, creation of databases and then the processing of these great amount of data could help us in reconstructing settlement patterns, pottery styles distribution etc... but paradata is a big issue. Speaking from my own experience, certain repositories that have not been digitized yet should be accurately checked; as Joanna pointed out, artefacts might have been classified using incorrect methods.
As regards the emphasis on Big data, as we saw during Theory classes, we may be still on the top of the "Big data parabola".
Lastly, the press release underlines that this system will help us understanding today's migrations. Are we sure about that? I doubt that an analysis about 800 AD Southwestern US can allow us understanding processes that are taking place in a completely different socio-economic context

i-ing the past said...

apropos this discussion comes an article in the Guardian today:

https://www.theguardian.com/science/2017/oct/02/archaeology-and-blockchain-a-social-science-data-revolution?

The article is about the potential of Blockchain (think of Bitcoin as secure data repositories) the serve as preserving archaeological and cultural heritage data. As is always the case for these kinds of article, the promise is there without the peril... as you've all noted, how is the paradata and variable thesauri negotiated. But, given that very few facilities can generate their own digital archives and databases... a common lexicon COULD allow more parts of the record to be commonly archived and daisy-chained - or block-chained - together. Does underscore how solutions of managing digital information does not have to mirror analogue ways of managing information. WHAT is important and "mine-able" data, though, will need to be addressed in a very analogue, broad conversation kind of way, I suspect.

Jeff Grieve said...

Hi Trevor,

Thanks for the post. In addition to the nuances that you and others have raised, I would like to highlight the perils of our collective and persistent use of the term "Big Data". Like Culture and Community and Digital Archaeology - the term will mean different things to different people in different contexts. I didn't take your post as overly pessimistic, but I do think you are right to challenge us to critically evaluate what we expect to gain from "Big Data" and to follow through clearly with our public articulation of those objectives. I think that "Big Data" is an overused techie marketing phrase ... similar to "the Internet of Everything" ... that we should avoid because of its ambiguity if at all possible.