29 Mar Author Vectors: Google Knows Who Wrote Which Articles via @bill_slawski
Does Google care about who created specific content on the web, and do they use that information for purposes such as ranking pages on the web?
We can’t be certain of that, but Google has filed patents about authors and provided ways for content creators to indicate that they have published something somewhere.
My Interest in Authors
I have been interested in authorship long before I became involved in SEO, and saw it appearing in search-related patents.
One of my favorite writers and among the most famous writers in the English language was William Shakespeare, who wrote many plays that are still frequently performed even today, such as “Hamlet”, “Macbeth”, and “The Tempest.”
Shakespeare coined many phrases that have become part of the English Language, like “All that glitters is not gold.”
But there is no real firm documentation that Shakespeare was truly the author of the plays and poems that he is so well known for.
There are rumors that have been circulating for years that others were the actual authors of what Shakespeare wrote, such as playwright Christopher Marlowe.
Back when I was an English major in college, we studied the writing of many different authors and the styles that they used when they write.
Part of our task as students was to know the quirks and idiosyncrasies of how those authors wrote well enough so that we could recognize something they wrote when we saw it, without their names attached to it.
You can start recognizing how each author writes after reading enough of their works.
Authors we studied in English classes and some examples of their writing include:
Thomas Carlyle
An English Renaissance author who wrote about philosophy and history, from a work called “Sartor Resartus”:
“Considering our present advanced state of culture, and how the Torch of Science has now been brandished and borne about, with more or less effect, for five thousand years and upwards; how, in these times especially, not only the Torch still burns, and perhaps more fiercely than ever, but innumerable Rushlights, and Sulphur-matches, kindled thereat, are also glancing in every direction, so that not the smallest cranny or dog-hole in Nature or Art can remain unilluminated,—it might strike the reflective mind with some surprise that hitherto little or nothing of a fundamental character, whether in the way of Philosophy or History, has been written on the subject of Clothes.”
Ernest Hemingway
An American novelist, known for his easy to read content, from “The Old Man and the Sea”:
“He was an old man who fished alone in a skiff, in the gulf stream, and he had gone eighty-four days, now without catching a fish. In the first forty days, a boy had been with him. But after forty days without a fish, the boy’s parents had told him that the old man was definitely Salao, which is the worst form of unlucky, and the boy had gone at their orders in another boat which caught three good fish the first week.”
William Faulkner
An American novelist, known for his long sentences written in a stream of consciousness manner, from “The Sound and the Fury”:
“When the shadow of the sash appeared on the curtains it was between seven and eight o’ clock and then I was in time again, hearing the watch. It was Grandfather’s and when Father gave it to me he said I give you the mausoleum of all hope and desire; it’s rather excruciating-ly apt that you will use it to gain the reducto absurdum of all human experience which can fit your individual needs no better than it fitted his or his father’s. I give it to you not that you may remember time, but that you might forget it now and then for a moment and not spend all your breath trying to conquer it. Because no battle is ever won he said. They are not even fought. The field only reveals to man his own folly and despair, and victory is an illusion of philosophers and fools.”
Google’s Interest in Authors
I wrote about an Agent Rank Patent in 2007, which described reputation scores that would potentially boost rankings for pages based upon the identity of authors or editors or commentators or reviewers on pages.
Later, when the social network Google+ was around, Google introduced authorship markup which allowed authors to link content to their Google+ profiles.
When I first went into SEO, I had no idea that the people at Google would be as interested in authors as I was, but I learned by looking at their patents that they are.
Here’s a brief history of some of the processes and algorithms they used when looking at authors of content
Agent Rank & Reputation Scores to Boost Rankings Based on Agents Associated with Pages
Back in 2007, I wrote a post for Search Engine Land about the Agent Rank patent.
Under the original version of Agent Rank, all of the people involved in the creation of content on a page (author, publisher, editor, or reviewers) could digitally sign the content on a page.
The reputation scores of those agents could potentially boost the ranking of that content.
That Agent Rank patent was updated a couple of times with continuation patents, but there is no sign that it was ever released or implemented.
The inventors behind the patent are still at Google.
It’s possible that Agent Rank was an influence on the implementation of Authorship Markup at Google.
We don’t know that for certain.
Authorship Markup at Google+
Authorship markup was implemented using Google+ profiles and could influence the rankings of content created by people whom you may have been connected to in Google+.
Google did file a couple of patents related to Authorship Markup.
I wrote about them in Google Authorship Markup Patent Applications Published.
There is a detailed look at Authorship markup, and how it met an end at Search Engine Land in the post It’s Over: The Rise & Fall Of Google Authorship For Search Results, which provides a lot of details on how it was used.
The Question of What May Have Replaced Authorship Markup Is Raised
A couple of years after Google announced that they were no longer using authorship markup, an announcement was made by Google spokespeople.
They said it was OK to remove authorship markup that they may have published because:
“We don’t use authorship markup anymore. We are too smart.”
We weren’t provided more details than that.
This was reported upon in the post, Google: It Is Now Safe To Remove Authorship Markup, We Don’t Use It Anymore.
Exactly what has replaced authorship markup?
Google Quality Rater’s Guidelines Mention Content Creators’ Reputations
Google has been publishing links to their quality rater’s guidelines as they have been updated, giving us a look at those and what they are telling Human Raters about the content that they evaluate.
The latest version of the guidelines had a section that focused upon creator reputation, which reminded me of the reputation scores we saw mentioned in the Agent Rank patent.
You can read more about those in Google Quality Rater’s Guidelines: Google’s New Creator Reputation: Guide For Site Owners & Creators
According to that post, and the quality rater’s guidelines, the creator of content on pages still seems to be something that Google is interested in trying to understand.
Author Reputation at Google
Google has mentioned author information in the posts I linked to above and in a few other patents.
I wanted to share some additional articles about the topic to provide some information about its history since this post is intended to add to the topic by adding something news.
I am adding a couple of articles that provide more details about the history of author information from sites, and one that tells us that it is not something that Google uses in ranking pages.
But, I am calling that into question with this post.
Author reputation at Google is a topic that is frequently discussed in the SEO industry and there are many different views. Here are some more:
- Why Author Reputation Matters More Than Ever for Search
- The Three Pillars of SEO: Authority, Relevance, and Trust
- Google: We Do Not Rank Websites Based on Author Reputation
A New Google Patent on Author Vectors to Understand Who Wrote What
Google was granted a patent this March on the topic of text classification, using a neural network approach.
It reminds me of a patent I recently wrote about in a post I called Google Using Website Representation Vectors to Classify with Expertise and Authority.
The website representation vectors patent described using neural networks to classify websites based upon features found on those sites into different industries and levels of expertise.
This author vectors patent tells us about how it also may classify sites:
“Text classification systems can classify pieces of electronic text, e.g., electronic documents. For example, text classification systems can classify a piece of text as relating to one or more of a set of predetermined topics. Some text classification systems receive as input features of the piece of text and use the features to generate the classification for the piece of text.”
The patent also describes how neural networks work:
“Neural networks are machine learning models that employ one or more layers of models to generate an output, e.g., a classification, for a received input. Some neural networks include one or more hidden layers in addition to an output layer. The output of each hidden layer is used as input to the next layer in the network, i.e., the next hidden layer or the output layer of the network. Each layer of the network generates an output from a received input in accordance with current values of a respective set of parameters.”
How Does the Process in This Patent Work?
It starts with obtaining a set of sequences of words. That set of sequences of words make up a number of first sequences of words.
For each of those first sequences of words, the second sequence of words follows that first sequence of words.
That first sequence of words and each second sequence of words can be classified as being authored by a particular author.
A neural network system could be trained on those sets of words to determine an author, and an author vector may be used to characterize a particular author.
The patent tells us about the advantages of following the processes in this patent.
An author vector that effectively characterizes an author can be generated from a text written by the author without that text being labeled.
Once generated, the author vector can characterize different properties of the author depending on the context of the use of the author vector.
By clustering the author vectors, clusters of authors that have similar communication styles and, in some implementations, personality types can be effectively be generated.
Once generated, the author vectors and, optionally, the clusters can be effectively used for a variety of purposes.
This patent can be found at:
Generating author vectors
Inventors: Brian Patrick Strope and Quoc V. Le
Assignee: Google LLC
US Patent: 10,599,770
Granted: March 24, 2020
Filed: May 29, 2018
Abstract
“Methods, systems, and apparatus, including computer programs encoded on computer storage media, for generating author vectors.
One of the methods includes obtaining a set of sequences of words, the set of sequences of words comprising a plurality of first sequences of words and, for each first sequence of words, a respective second sequence of words that follows the first sequence of words, wherein each first sequence of words and each second sequence of words has been classified as being authored by a first author; and training a neural network system on the first sequences and the second sequences to determine an author vector for the first author, wherein the author vector characterizes the first author.”
In my examples of text above from Thomas Carlyle, Ernest Hemingway, and William Faulkner, it is fairly easy to tell what each has written, and what other content that they may write may be like.
To a degree, that is the point of this patent.
Google can use neural networks to learn about and understand the styles of authors and to be able to tell them apart.
The patent tells us:
“The author vector generated by the author vector system for a given author is a vector of numeric values that characterizes the author.
In particular, depending on the context of the use of the author vector, the author vector can characterize one or more of the communication style of the author, the author’s personality type, the author’s likelihood of selecting certain content items, and other characteristics of the author.”
This patent might look at content written by a particular author that might consist of:
- A sentence.
- A paragraph.
- A collection of multiple paragraphs.
- A search query.
- Another collection of multiple natural language words.
Takeaways Regarding This Author Vectors Process
Google has been looking at collecting data about authors who create content.
It has also come out with a number of approaches that could:
- Generate things such as reputation scores.
- Boost content under an approach such as authorship markup for people who might be connected to other people in a social network such as Google+.
Additionally, Google has been exploring the use of neural networks to develop approaches that might:
- Understand the context of words in queries better.
- Classify websites better.
- Now understand who the authors of content might be easier.
Not every author is William Shakespeare, but we don’t really know who William Shakespeare actually was.
Different authors can have different writing styles and different levels of expertise and interest in different topics.
Google is telling us with this new patent on author vectors that they may be able to identify the authors of unlabeled content.
Is this new approach one that has replaced the authorship markup?
At least one Google representative was telling us that there was no longer a need for authorship markup and that Google was smart enough to tell who authored what content.
That was in 2016.
This author vector patent approach was filed in 2018 with the USPTO.
We have no idea when it might have been developed.
We also aren’t quite sure how Google might use author vectors, if ever.
But now we know that Google might be better at identifying who the authors of content might be.
More Resources:
- Why Author Reputation Matters More Than Ever for Search
- The Three Pillars of SEO: Authority, Relevance, and Trust
- Google: We Do Not Rank Websites Based on Author Reputation
Sorry, the comment form is closed at this time.