Part 5: Archiving and archive research
Digital archives are goldmines for the online investigator.
In part 1 about reverse image search, I used an archived copy of a Tiktok post to show you the fake video which purported to show Russian paratroopers invading Ukraine.
That is because when people are caught with falsehoods, they often try to delete the traces. When they do, it’s essential that we’ve archived the proof.
But not only that. When we’re working with controversial content, it’s also a common problem – from a research perspective – that the tech platforms might remove it.
Then there’s the problem with link rot. Even in the best of research circumstances, where no one is editing or deleting any specific content, it still happens that websites change their structures, catalogue systems or even names. This could also cause links to stop working, even if you’ve saved them, if you haven’t uploaded them to a timeless archive.
So archiving is incredibly important for our own sake. But we can also use what other people have archived, for our own investigations. And this is what I really love about open source intelligence (OSINT), that there is usually no need to keep secrets – we’re working with cases that are already out in the open, and the more people who are cooperating to gather good source material and help develop new tools and methods, the better for everyone.
We’re going to start by looking at the biggest archive, that’s most commonly used for these purposes.
(Live demonstration of archive.org and the Wayback Machine extension)
But there are other archives too, depending on what you’re looking for. For example, if you’re investigating very controversial platforms, it’s possible that archive.org has excluded that content. Such is the case with the anonymous forum 4chan.org. It was the birthplace for much of modern internet culture, but in 2016 became notorious as a hub for far-right extremism. In 2017, that’s where the conspiracy theorist QAnon movement emerged.
Today, 4chan is one of the websites excluded from archive.org, as they don't want to risk archiving illegal content. That is a problem for me as a researcher, who’s worked a lot with 4chan. But luckily, there are other, less known archives that’ve specialized on 4chan content because of this.
They’re much harder to navigate – often different archives have very different content, which means that you might have to scan them all if you’re looking for a specific post or comment. But in that way, it’s still possible to research the birth of the QAnon movement, even though the traces of these first years might’ve been removed from the mainstream platforms. And this is very important for us as journalists, but also for future historians when they tell the story about such a movement.
In the reference list, you can watch two more detailed guides about archive.org and the Wayback Machine, from the organization itself.
This is the last part of this online course. Usually we have a live Q&A afterwards, where we get together and discuss the exercises. So hopefully I’ll see you there!
But now, it's time for you to expand the memory of the internet.
Exercise 1: This video was removed by Twitter. Find it via archive.org and figure out approximately at what time it was removed. https://twitter.com/realDonaldTrump/status/1346928882595885058
Exercise 2: Pick a website or social media post of your own choice and archive it via the Wayback Machine Extension. If you mark the box “Outlinks”, you’ll be able to navigate the archived version as if it was a normal website. Otherwise, it’ll be more like a screenshot.
Exercise 3: (Warning: 4chan is notorious for its often disturbing content – if you don’t feel comfortable researching the forum, it’s absolutely OK to skip this exercise)
The QAnon movement was formed around a mysterious figure called Q, who posted “Q drops” – vague but spectacular claims – to 4chan. The first one was posted to the Politically Incorrect board (/pol/) and read:
“HRC extradition already in motion effective yesterday with several countries in case of cross border run. Passport approved to be flagged effective 10/30 @ 12:01am. Expect massive riots organized in defiance and others fleeing the US to occur. US M’s will conduct the operation while NG activated. Proof check: Locate a NG member and ask if activated for duty 10/30 across most major cities.”
Could you find the original post via a 4chan archive? When exactly was it posted?
References:
The Internet Archive – https://archive.org
The Wayback Machine Extension – https://chromewebstore.google.com/detail/wayback-machine/fpnmgdkabkmnadcjpehmlllkndpkmiak
In-depth guides to the Wayback Machine here, to the Internet Archive as a whole here