Shogi software radically developing its capabilities by machine learning

I am engaged in the development of shogi software. As you know, a “go” computer program has continuously defeated the world’s top professional human players since 2015, creating big news. From a researcher’s point of view, however, we were not particularly surprised.

Nevertheless, the progress we witnessed whereby something that used to take 20 years in a conventional way could be achieved in only one year by a new method is phenomenal.

Now let me describe briefly the history of how the AI of these game software products has been developed. The very early age of shogi software was developed with fully hand-crafted components of board evaluation – functions that can digitalize how a position is good or bad.

For example, we assigned 100 points for a “歩 (Pawn)” piece and 800 points for a “銀 (Silver)” piece, and also we described the knowledge and the standard moves as a rule. So, for example, it would be very valuable when “歩” sits just below “金 (Gold)”. It was very time-consuming work, but such an approach was the best we could do when computer capabilities were limited at that time.

However, the ability of human beings to continue inputting the rules correctly without contradiction is limited. Hence, as the next step, we omitted the manual work using the statistical approach. We collected records of games created by human beings and studied the statistics; in other words, we digitalized the tendency of good moves.

By furthering this kind of concept, we introduced the approach of machine learning to extract deep knowledge from the human records of moves. Based on the records of professional players, and by comparing the moves actually made and the moves actually not made, the machine could learn the moves that are actually adopted. It is generally difficult to tell why the moves actually adopted by high-ranking professionals are good moves. But if they turned out later to improve the position, we let the machine mimic them without assessing them as good or bad. We let the machine mimic the moves in a large quantity. As a result of the machine learning, the skill level of computer shogi was drastically enhanced. The programming contents of what you call AI today are broadly the same as these computer players.

The computer shogi players thus grown to be powerful enough can create high-level games by competing with each other without the help of human beings anymore. They can learn now by accumulating the knowledge of “I won the game by doing this” in various positions. You could say that in a sense AI follows the same process as human beings improving their shogi skills by learning and training.

What a little different from human beings is that computers have the overwhelming ability to examine all possible positions. While it is vital in many games, not only in shogi, to find a future situation where you are in a better position, AI thoroughly examines the positions 10 moves or 20 moves ahead where you are in the lead at a rate of more than 1 million positions per second to find the best move. By having this future reading ability and the precise judgment of both good and bad for each position in a balanced manner, AI can create wisdom exceeding human ability.

It is vital to design an overall system for solving problems

We hear the assertion these days that singularity, where computers surpassing the human ability will reach the level human beings cannot forecast, will come in the future and will present a threat. However, I do not agree with this threat theory.

It is certain that computers have already surpassed us completely in areas of examining vast volumes of data and possibilities instantly and selecting the best outcome, and they will improve such skills from now on.

However, as was the case with shogi AI development, AI will not learn anything only by reading a large volume of human records without telling it what is desirable and what it should aim for. By giving AI the guidance that some human moves are good ones and that they should mimic them, AI has learned the strong moves as a response to the guidance.

I believe that this mechanism is the common structure for all AI technologies not only those associated with games.

For example, let us consider route-navigation applications. If you merely let a computer read all timetables of all routes, it is only a dictionary. When you set up a problem whereby the route allowing you to move in the shortest time is the best solution, the computer will end up giving you such an answer.

By the way, do people only want to move in the shortest time? What should be done to make people feel comfortable in moving from one location to the other?

Now let us load into the computer the history data of how people have actually moved. Then we might see the data showing, for example, that many people are using the station which is off the shortest route. The computer has no idea why. It may be that the route has no stairs to go up and down when transferring, or the station might have a standing soba shop serving delicious soba. But if we make the definition “the route selected by many people is probably the comfortable moving route”, then AI can learn the good route from the history data. In other words, by formulating the desirable route in a form allowing the use of learning, we can create a situation where AI can really exhibit its full power.

Just to test this idea, let us recommend someone who wishes to transfer using the route found by this route search AI. If this person actually used this recommended route, it should show that the route was a comfortable one. If, on the other hand, the person transferred using a different route, the selection of AI should mean a poor result. Then, AI can improve its accuracy further by receiving a reward for the solution to the problem of comfortable route.

Furthermore, depending on the season, people may wish to walk for a while and get off at one station before the destination on a nice day. People may wish to take a bus to the station on a rainy day. Then, to select the best answer for the problem of finding a comfortable route, it will be necessary to read data such as season and weather and let AI learn it.

After all, it is human beings that teach AI what is desirable, how people view the problem, and how to solve the problem. Designing an entire system of such problem-solving process is critical. It is a problem for human beings as to whether to allow AI to get out of control or govern AI under control.

It is the role of human beings to consider what should be learned by AI and what should be optimized

Lately, AI is being used more and more in our daily life. For instance, when you go shopping or read news on the Internet, AI will learn the products or news you tend to select and display such things with higher priority on your screen. Many people think this is convenient, I believe.

On the other hand, some people caution that it will create a wall around you just like an octopus trap because it is comfortable to be enclosed by information you like and you will not dare to come out from it.

It is necessary, then, to make a case for solving such a problem if you think that such an octopus trap is preventing you from seeing the larger world. Such action may give some hints to system developers on how to present a new assignment to AI.

Or, by searching some different types of news once in a while intentionally, you might be able to tell AI that the world you wish to know is a different one on your own. This way, you could be engaged in the learning of AI from the user side, not leaving everything to system developers.

For this purpose, it is necessary to understand the mechanism of AI as much as possible: how it is working and what kind of learning it does.

In EU, they are creating ethical guideline directives that the AI system must give understandable explanations of their solutions.

For example, when AI gives a rejection answer to someone who applied for a job, the company which uses AI for recruiting is required to present a persuasive explanation to the applicant. Why was such a result shown? Was the reason the career or an estimated level of motivation by some form of measurement? Was there any unfair discrimination in the reasoning? And so on.

This is of course a human rights problem, but it is also to prevent the calculation of AI from being a black box. In other words, you must avoid a situation of hiding the process of what data were read, what problem was given before learning, and how the solution was reached.

This is a very important point, and if this becomes a black box, it may entail the world of the singularity threat theory.

The important point here for us all is not simply to jump into the convenience and simplicity created by the functions of AI that surpass those of human beings but to review the large framework containing such convenience and simplicity from one step away from it.

What is the thing we really want to do, what is the problem for it, what should we let AI learn and what should be optimized? These are the things we human beings should think.

Did you want to enter into the octopus trap created by AI, and did you want AI to decide which job you should take? The ability to think about these things adequately is the literacy, and I believe that to be equipped with this kind of ability may be our future life strength.

Shogi AI has developed so far by setting the goal to be strong, but now the world is full of strong shogi programs. Therefore, we are now studying AI that can enrich the postmortem or AI that allows comfortable defeat. We are shifting from the pursuit of strength to the problem of how users can play comfortably. There may be nothing that demonstrates more than these games that AI is a tool for human beings.

* The information contained herein is current as of May 2019.
* The contents of articles on Meiji.net are based on the personal ideas and opinions of the author and do not indicate the official opinion of Meiji University.

Information noted in the articles and videos, such as positions and affiliations, are current at the time of production.