Artificial intelligence helps news editors to pick punchy headlines and optimise the front page

Article

09.06.2020

VTT

Artificial intelligence can help journalists to choose the best headline and position for their news stories. This was the conclusion reached by VTT Technical Research Centre of Finland’s experts after analysing readers’ behaviour in electronic media and experimenting with predictive tools. One of the tools, called the Headline Machine, is now being trialled at Kaleva Media.

The most popular articles in online newspapers are easy to spot with web analytics. They attract more clicks than other stories and hold readers’ attention for longer – but why? Every reporter and editor can probably offer a theory, but it is still difficult to predict the effect of a story’s headline, pictures, topic, quality and positioning.

Artificial intelligence experts from VTT Technical Research Centre of Finland (VTT) decided to focus on the role of the headline and began to study how good an indicator of an article’s popularity its headline actually is. They partnered up with Kaleva Media, which publishes 15 local newspapers in Northern Finland. The new tool developed by VTT’s experts – named the Headline Machine – was put to the test against Kaleva Media’s editorial team.

The test consisted of 80 previously published news headlines that were divided into four categories on the basis of their popularity among readers. The Headline Machine was pitched against five seasoned journalists. The journalists successfully identified the correct categories for 20 headlines, while the Headline Machine scored 50 out of 80.

“I find this to be an encouraging sign that there is definitely scope for more research. Our editors at different newspapers have already begun to use the machine on a trial basis. The Headline Machine is an exciting first step towards journalists’ being able to analyse the effectiveness of their headline even before the article is published”, says Kaleva Media’s Head of Data Business Heidi Kananen.

Previously published articles provide a learning platform for the Headline Machine

VTT’s research scientists based the design of the Headline Machine on international studies and interviews with Kaleva Media’s staff to establish the criteria for a successful headline. The chosen criteria were the number of clicks and the time that an average reader spends on the article. Four different impact categories were identified based on these parameters: powerful, ineffective, intriguing and interesting.

An intriguing headline is one that attracts a lot of clicks but does not hold readers’ attention, while an interesting headline persuades the few readers who do click on it to stay on the page. A powerful headline succeeds on both fronts, an ineffective one on neither.

The next step was to build a neural network model capable of analysing headline options for features such as the number of words and proper names as well as ratios between different parts of speech. In addition to these features, the Headline Machine also factors in various metadata such as the time of publication and the chosen media platform. The features, the metadata and the actual wording of the headline determine its impact category, i.e. how successful the headline is likely to be.

“We used Kaleva Media’s previously published articles, which included 7,000 headlines the effectiveness of which had already been measured by means of web analytics software, to teach the Headline Machine about picking impact categories”, explains Senior Scientist Sari Järvinen from VTT.

Fluency in natural languages – including Finnish – is a must

“What makes artificial intelligence so fascinating is the fact these models can perform tasks and calculations that would be far too complex or time-consuming for humans. Media companies are already well versed in data visualisation, and it will probably only be a year or two before we begin to see the influence of machine learning in content production and marketing. For this to happen, however, it is crucial for these models to be fluent in natural languages”, says Heidi Kananen from Kaleva Media.

The Headline Machine is based on a popular neural network model called BERT (Bidirectional Encoder Representations from Transformers), to which a research team at the University of Turku had taught Finnish. The database used to teach the BERT model was even more extensive than that of Kaleva Media.

“Thanks to BERT, the Headline Machine has learned, for example, to recognise the coronavirus as an interesting topic this spring, despite the fact that the machine was designed on the basis of last spring’s articles. This is because the BERT model is capable of understanding the rudimentary meanings of words”, Järvinen explains.

This neural network model was approximately 60% successful in predicting the impact of Kaleva Media’s test articles, but Järvinen believes that an even higher level of accuracy is achievable with more teaching. There is a limit, however.

“The popularity of an article is also highly dependent on, for example, the positioning of the story on a page”, says Principal Scientist Asta Bäck from VTT. Her project group focused on the effect of positioning and built a separate machine intelligence model for analysing this aspect.

Repositioning after publication leads to optimal layout

It is hardly surprising that news articles placed at the top of the page get the most attention. Knowing this is not helpful, however, when the aim is to keep attracting readers to certain stories without holding other stories off the prime spot for too long.

“Our model predicts how many clicks an article is likely to get positioned differently on a page. This analysis cannot be performed until after a story has been published, as the clicks that have already happened are the best indication of the story’s future popularity”, Bäck explains.

The data that the project group used to teach the model were taken from an online newspaper’s front page at five-minute intervals. The data included the front-page articles in order by popularity and the number of clicks as well as characteristics such as age, headline and category.

“We were able to see very quickly that the popularity of stories is not just about their positioning but also the topic. In our test, readers who were interested in politics were able to find the relevant articles regardless of where they were on the page, while entertainment news only got clicks in prominent places.”