Experiments on Fake News detection and prevention

Screen grab of Event Registry home page( http://eventregistry.org/)

News is the cornerstone of civilisation

In his book, Sapiens, the author Yuval Noah Harari states how language evolved as evolutionary response for Homo Sapiens due to a need to share information about one another. Surviving within tribes was vital for humans, since we were never able to live alone.

Thus, it was news that fuelled the growth of language — that critical evolutionary phenomenon that has set sapiens on a path, leading to this day.

What happens when people read fake news?

We are programmed to react to news — that is the primary power that information has on human minds.

Possible algorithms for fighting fake news

Unless we understand the psychology of online news consumption, we won’t be able to find a cure for what The New York Times calls a “digital virus.”

#1 — The signature pattern based approach

Treating Fake News as virus gives us a clue into the nature of algorithms that can help us in detecting sites that spread Fake News.

  1. Perform text pattern analysis of a massive database of sample fake news
  2. Create a very fast pattern recognition engine that uses the sample fake news to get trained.
  3. Use the training set to identify any incoming fake news based on similar patterns and flag them for human editors.
  4. Learn from the correct matches and over a period of time become increasingly accurate in flagging potential fake news.

#2 — The Sentiment based approach

All fake news is written to generate one of many clear sentiments, as outlined in Melissa Zimdars’ seminal research document. These are usually things like anger, fear, hatred, sadness, excitement.

#3 — Using Event Registry

Event Registry is the single biggest repository of real time news generation and event analysis in the world. It exposes APIs to do real time queries of news and events around the world from authenticated sources.

#4 — The domain and whois source monitor

Initial research was quick to indicate that there are clusters of geographical areas from which Fake News generates. Deep IP monitoring, and checking origin of network data, over a large scale and indicate clusters on the web, from which Fake News generates.

Infrastructure & Implementation

  1. Seed database of fake news sites which can be constantly monitored, edited and added to.
  2. A massive content crawler to crawl the sites and create a training repository or database of fake news content.
  3. An NLP and text-parsing engine that will provide the intelligence layer needed to trigger the algorithms.
  4. An API for consumers to use the fake news detection system.
  5. An alerting system and web front-end for editors to monitor and manage the entire infrastructure.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store