Almost 50 gigabytes of stolen data from Yandex services were recently shared online. The company is trying to downplay the leak but the source code shared via torrent can reveal a lot of useful information about how its services – and the web search engine in particular – actually work.

One of the most interesting – and potentially damaging – facets of the leak is the source code of the Yandex search engine, namely the ranking factors used by the algorithm to provide results for user search queries. The leak lists 1,922 unique ranking factors, the majority of which are marked as “deprecated” and have likely been replaced in the most recent versions of Yandex code. The first ranking factor employed by the Russian search engine is “PAGE_RANK”, which is a clear reference to the most important algorithm used by Google to rank web pages. As for Yandex’s own web search, the leaked algorithm seems to favor pages that aren’t too old, have a lot of organic traffic (ie unique visitors), are code-optimized and are hosted on reliable servers or are Wikipedia pages. The Yandex leak surely offers a lot of information to SEO professionals about how a world-class search engine actually works, even though security implications should not be that interesting. Shestakov said that there is no personal data involved, and the few API keys have likely been used for testing only. Yandex’s official press release about the incident said the leaked code fragments are “outdated and differ from the version currently used” by its services, while some of the published fragments “were never actually used in operations.” The company is still investigating the seemingly politically-motivated incident and will take all possible measures to improve its management oversight so that there will be no more leaks in the future.