Pollsters famously got the result of the 2017 general election wrong, underestimating the size of the Labour surge which saw Theresa May's slender majority slashed to a knife-edge. Can a York-based data analyst using a different method do better than the pollsters this time? STEPHEN LEWIS reports

Forget what the polls are telling you. They suggest that the Conservatives are clinging stubbornly to a lead of 10 or 11 points over Labour, and are heading for a solid majority.

Wrong, says York-based data analyst Steve Brewer. His analysis of more than 50 million tweets since the election campaign began suggests something different.

Yes, the Tories will be the single largest party with 289 seats, he predicts. But they'll be 37 seats short of an overall majority. Labour will actually get more votes than the Conservatives, he believes - though, because of our first-past-the-post system and the way constituency boundaries are drawn, they'll end up with slightly fewer seats: 286 to the Tories' 289.

In Scotland, meanwhile, the SNP will be big winners - taking 50 per cent of the vote share and many votes that previously went to Labour.

And the Lib Dems? Well, they'll pick up one extra seat, leaving them on 13 overall, he predicts. But Lib Dem leader Jo Swinson could well lose her Scottish seat to the SNP. And they'll fail to win the hotly-contested York Outer seat. In a major surprise, Steve predicts that Labour's Anna Perrett will win here, with the incumbent Conservative MP pushed into third place behind the Lib Dems.

Bold predictions. So what does he base them on?

Data. Lots of it.

Steve runs a company called Text Mining Solutions from a desk at the Hiscox business club. His business uses a sophisticated computer algorithm to 'mine' information in written documents and social media posts to look for certain key words, phrases and features. These are then analysed to generate unbiased statistics about opinions, issues, popular trends - and who or what is being talked about..

A former chemist, forensic scientist and account manager at FERA (the Food and Environment Research Agency), the 55-year-old married dad of two set up his company in York in 2011. He now has an international client base, which includes Coca Cola (he used his text mining algorithm to help the company comb through research papers to identify potential natural sweeteners), the Body Shop ... and the University of Sheffield, with which he is working on an analysis of the complete carbon footprint of a traditional Christmas dinner.

But he had never let his algorithm loose on an election - until now.

He sees it as a real chance to showcase what his company can do. "The algorithm is so versatile and can be applied to so many situations," he says. Including, it turns out, tracking how political parties are doing...

It can do more than just measure which party is being most talked about, however. His model can trawl through vast amounts of Twitter mentions and then identify positive or negative 'weightings' attached to mentions of political leaders or parties - 'I hate the Tories' would be negative, for example, while 'Vote Boris' would be positive.

So his election predictions are based not only what and who is being talked about on Twitter - but also the way in which they are being talked about and attitudes towards them.

That is what enables him to generate his predictions about vote share - and from that, about the number of seats which will be won or lost.

So how confident is he that he's got it right?

Extrapolating from vote share to number of seats won is a bit of a 'black art', he admits. But he says he has one huge advantage over traditional pollsters - the sheer amount of data he can process. He focusses on Twitter, because it is at the moment the only large social media platform that allows you to download content for analysis.

Since the election campaign began, he has been processing 1.5 million tweets every day - a total of well over 50 million and counting.

No traditional pollster relying on interviews conducted by telephone can process anything like that much data - which makes his findings more reliable, he believes.

He accepts that his model may not be entirely accurate, however. It's biggest potential weakness is that the people who use Twitter may not be entirely representative of the voting population as a whole.

"Certain demographics are not fully represented - for example many elderly people don't tweet," he admits. Those who do tweet are perhaps a bit younger than the voting population as a whole, and might veer more towards Labour, the Greens or the Liberal Democrats.

His model certainly predicts accurately how the Twitterati will vote, he says. Whether that is representative of the voting population as a whole remains to be seen. "I'm very excited to see what is going to happen!" he admits.

And if his predictions are wrong? Well, the pollsters got wrong in 2017, he says, so he'll be in good company. "And we'll be able to learn from it and make adjustments for next time..."

STEVE'S PREDICTIONS

Overall winner

Steve's model predicts a hung Parliament, with Labour getting the biggest vote share but the Conservatives winning most seats: 289 to Labour's 286. They will, however, be 37 seats short of a majority.

According to his predictions, the make-up of the House of Commons after the election will be:

  • Conservative 289 (down 29 seats on last time, making them the 'biggest loser')
  • Labour 286 (up 24)
  • Lib Dems 13 (up one)
  • SNP 41 (up 6)
  • Plaid Cymru 3 (down one)
  • Northern Ireland seats 18

The Greens will lose their only seat (that of Caroline Lucas) and the Brexit Party will win no seats, he predicts.

York Outer

Predictions of who will win York Outer are based on mentions on Twitter of the four candidates (Keith Aspden for the Liberal Democrats, Anna Perrett for Labour, Julian Sturdy for the Conservatives and Independent Scott Marmion) and attitudes expressed towards them.

In a major surprise, Steve's model predicts that Conservative incumbent Julian Sturdy will be pushed into third place. Labour's Anna Perrett will win the seat, with more than 50 per cent of the vote, with the Lib Dems in second following a late surge. Steve's analysis shows that while the Conservatives began strongly in York Outer, they faded over time, while Labour and the Lib Dems remain strong throughout.

York Central

No surprises here. Steve's model predicts Labour's Rachael Maskell retaining her seat with 60 per cent of the vote, with the Conservatives second, the Greens third and the Lib Dems squeezed.

Big name casualties?

Steve's model predicts that Jo Swinson could well lose her East Dunbartonshire constituency to a resurgent SNP.

Other political 'big beasts' predicted to lose their seats include:

  • Green MP Caroline Lucas, who will lose to Labour in Brighton Pavilion
  • Former Conservative leader Iain Duncan Smith, who will lose his Chingford and Woodford Green constituency to Labour
  • Lib Dem Alistair Carmichael, who will lose his Orkney and Shetland seat to the SNP

The trends which will decide the election

York Press:

Text Mining Solutions graphic showing the key trends which will decide the election. the thickness and height of the line shows how much a topic is being discussed. health is green, Brexit ochre

The health service and Brexit are the two key issues which have been consistently important to voters throughout the campaign, Steve's analysis reveals. In the early days of the campaign, Brexit dominated the chatter, but more recently, the health service has taken over as the most important issue. Other key issues, measured by how much they have been talked about on twitter, have included transport, the economy, trust, public services, enployment, crime/ justice and education. There was a brief surge in interest in 'religion' towards the end of November, which Steve puts down to discussions of antisemitism and Islamophobia. The environment, immigration and equality scored low as issues that mattered to voters.

Trust

Throughout the campaign, Steve's model measured trust in politicians by picking out terms that indicate trust or distrust (such as trust itself, truth, honesty, integrity, lies etc) and looking at how these were associated with particular parties and/ or politicians. The model showed high levels of distrust of both the Conservative and Labour parties. It also revealed that students distrust the Liberal Democrats- no real surprise there...

How the system works

Steve's algorithm 'mines' tweets, hashtags and mentions on Twitter, looking for key words and phrases plus the names of politicians and political parties. The model then produces graphs and flow charts revealing the people and issues being most talked about over time. The data can then be broken down further revealing, on a scale of one to five, whether the people tweeting have strong negative or positive views towards the people or things they are tweeting about.

Throughout the campaign, Steve's model has been processing something like 1.5 million tweets every day.

"By using the various facets of a tweet, such as the text itself, user descriptions and mentions, the algorithm has been able to classify each tweet according to the indicated political preference of the tweeter," Steve says. "It is the detailed and thorough analysis of tweets that allows Text Mining Solutions to then complete a vote share calculation with accurate results," he says.

Steve has been writing an election blog throughout the campaign. You can read it here: www.textminingsolutions.co.uk/news/