How to improve the content using Natural Language Processing

Author: Ruth Burr Reedy (Ruth Burr Reedy), vice president of strategy at marketing agency UpBild. The company specializes in technical SEO and web analytics.

In this article we will talk about how Google uses natural language processing technology (Natural Language Processing, NLP) for the understanding of the content, and how this knowledge can be used to optimize the text for search engines and for the people.

Relationships between the entities, the words and the way users search

To understand what is at stake in the content, the Google is spending a lot of time, energy and money on things such as Neural Matching and Natural Language Processing.

It comes along with the evolution of search towards more dialogic. But often people are looking for something, not knowing exactly what they need, and Google wants them to get it. Therefore, the company spends a lot of resources on the understanding of the links between entities and between words, as well as how people use search words.

For example, the user sees the effect of the "soap opera" in the TV, but does not know what it is. At the same time he wanted to know what was happening to his device.

In this case, the user can perform a search for the type of [why on TV a strange image].

Due Neural Matching Google understands that one of the possible answers to this query can be "effect" soap opera "." As a result, the search engine can provide an appropriate result and meet the needs of the user.

understanding the importance of

The main task of natural language processing (Natural Language Processing, or NLP) - learn to understand the language, isolating him from important information.

Significance, content and essence

Determination of significance or saliency (salience) - is figuring out how much the analyzed fragment is linked to a particular entity. At this stage of its development Google is really good at extracting entities from content fragments. Essence - this is mainly nouns - proper nouns and common nouns: persons, places, and things.

In determining the significance of Google is trying to figure out how these entities are related to each other, what is at stake on this page and how it corresponds to a given topic.

Natural Language Processing (NLP) APIs

Currently, there are several freely available API, which is can be used to understand natural language processing. For instance:

Ed. - the use of these API's important to understand that not all of the functions supported by the Russian language. Both tools focus primarily on the English-language materials.

Are these companies API in your own projects, it is not known. But to take advantage of all interested persons.

To do this, you need to copy the content fragment and see what the essence of Google is able to learn from it, and how important the search engine considers each of these entities with respect to this piece of content as a whole. That is the extent to which that content meets the stated theme.

Google assigns a significance level content ratio from 0 to 1, and the closer the final index to 1, the more significant is the content with respect to the subject.

Thus, for example, 0.9 - this is a very good result, while 0.01 shows that some relationship to the subject of the content is, but not expressed.

In our opinion, SEO-specialists is important to understand that the value or selection - the future of related keywords. For example, optimizing the content for the query [cookies with chocolate chips], we will also pay attention to options such as the recipe of chocolate chip cookies, chocolate chips, etc. Keyword Variations, TF-IDF - all these older methodology to understand what is at stake in the content.

Instead, we need to understand what kind of effect, and how Google sees the relationship between them. As it determines that the content that is meaningful in relation to a single entity, must also contain other entities.

Engaging an expert - the best way to create relevant content

For example, in the case of prescription cookies with chocolate chips, we need to make sure that the text contains words such as "oil", "meal" and "sugar".

This is easily done if your disposal there is a recipe of the cookies and you know that it should be. And this is the new trend, we are beginning to see in SEO: the best way to create content that will be relevant (significant) in relation to a specific topic - it is to attract the expert in this matter.

A person with a deep knowledge of the topic, of course, will include content related terms, because he knows what is important and what is not.

SEO-experts it is time to start investing in content and experts, so they can create a deep, relevant and meaningful content that is needed by all.

How to use the API for SEO

One of the possible directions - is the optimization of the pages that are ranked on the subject, but only on the 2nd page of the issue.

In this situation, Google is generally understood that the page is relevant to a particular topic, but not sure it's good content and resources. In other words, the signal is, but it is weak.

In this case, you can take this content, pass it through developed Google API or other tool for natural language processing, and see which entities will be removed, and any connection to be determined between them.

Sometimes you will see that the text needed to eliminate ambiguity. Returning to the chocolate cookies: In English, the word «cookies» can mean "cookies" and "cookies." That is, the word may have several meanings.

If you see that the processing of natural language tool can not correctly identify your essence, you need to think about how to resolve the ambiguity.

In many cases, the API gives a result which indicates that the document has been referenced to the specific topic, but its relevance has been identified as the likely low. In this situation, just need to work on the content to Google it easier to extract the essence and relate them to each other.

This brings us to the second important point: we can now create content for people and for machines simultaneously. The times when it was necessary to add keywords for Google, hiding them from users who have long since passed.

Now you can create content for Google, which will also be more convenient for perception from the point of view of users, because now the principles of readability for machines and people are becoming more similar.

Tips for creating content more readable for humans and machines

In preparing this article, we asked a number of experts who are engaged in the creation of content, share tips on how to write better, clear, easy to read and understand texts.

We then selected those councils that also work in terms of NLP systems.

As we mentioned above, the natural language processing - the process by which Google is trying to understand how nature relate to each other within a given piece of text.

  • Short and simple sentences

Write simply. Avoid flowery speech.

  • One idea is to offer

If you have a lot of proposals and slozhnopodchinonnyh pronouns, then users will be difficult to perceive the text.

It also makes it difficult machines parsing your content.

  • Connect the questions with answers

If you bring the issue, try to immediately give the answer, and not precede it with the text of 500 words.

In general, these three tips to improve readability reduced to decrease the distance between the semantic entities.

If you want a natural language processing system understands that the two entities in your content are closely related, move them closer to each other in a sentence.

Remove the excess and reduce the number of semantic leaps that search engine spiders can make between entities in understanding the interactions between them. As a result, you get a more readable and readable content, which will also be easier to parse and understand robots.

  • First the specifics, then nuances

Take for example the question: "At what temperature is best to bake cookies?". In reality, it may be different - depending on the purpose. However, such a response is not good for anyone.

Imagine that a user sets the issue in the Google Voice Mode and receives a response. He does not help him, but it's true - the temperature can be different.

Therefore, to improve readability first better privesti- specific numbers (e.g., temperature range - «180-200 ° C»), and then explain nuances.

This answer looks much better: it contains the concrete figures and it shortened the distance between question and answer.

  • Do not spread the idea of the tree

Quickly go to the main. Highlight the main essence, the main theme of its content, and then go into the details. A well-structured content is more easily comprehended by all parties.

  • Avoid jargon

Jargon is difficult to understand. Also avoid repetitions and rarely used words. The less frequently used word, the less likely that Google understands its semantic relationships with other entities.

Be brief and specific. Remove all jargon. All of this helps, again, to reduce the distance between the semantic entities and makes them easier to parse.

  • Organize information in a manner to suit the user's path

Think about what information may be needed to the user every step of the way.

  • highlight subtopics

To do this, use subheadings. This is a basic tip, but many still do not adhere to it. If you do not do this for their users, then make for the machines.

  • Use a formatted list

Improve the perception of the text also contribute bulleted or numbered lists. Isolation list also facilitates parsing content for robots.

If you think that many of these tips resonate with those recommendations, which are given in relation to the chosen snippets (featured snippets), then you are right. Hit site featured snippets - this is a good indicator that you are creating content that the robot can find, parse, understand and extract.

Therefore, if you are working on hit site to favorites snippets, you're already doing many of the things described above.

  • Grammar and spelling are also important

These things are important to users. Not for everyone, but important. They are also important for the search engines.

Things such as grammar, spelling and punctuation - this is a very simple signals for cars. Google comes to this aspect in the Guidelines for Assessors. In particular, the search engine said that a well-written, well-structured, grammatically correct text without spelling errors may indicate a content authority. This does not mean that such content immediately take higher positions in search results. But shortcomings in this area can damage the site.

Use NLP-tools to improve the content

These tools help us to understand how readable, understandable and relevant content is. Using these solutions, you can create better materials for users.


