Hitachi has developed new AI technology designed to come up with solutions to contentious scientific or economic issues based on big data.
The technology can analyse text data on issues that are subject to debate, and presents reasons and grounds for either affirmative or negative opinions on those issues in English.
The technology focuses on values such as health, economics and public safety, which are considered important to people and communities when expressing opinions, and uses correlations between those various values and relevant issues in the society to identify reasons and grounds with a high degree of reliability from among large volumes of news articles.
By using multiple viewpoints, it is able to present reasons and grounds without bias toward a single perspective.
The technology could be applied to future systems to analyze contents of company documents, published reports or electronic medical records, in order to form opinions and generate data to support decision making.
In 2014, Hitachi developed a technology that extracts specified information from electronic medical records (e.g., illnesses and affected areas) with a high degree of accuracy*.
Using this technology, Hitachi has now developed a new technology for analyzing large volumes of news articles about a given topic, and presenting reasons and grounds for opinions, in English, which are highly reliable.
Details of the technology developed are as given below.
(1) Created “Value Dictionary” as a standard for identifying reasons and grounds for opinions
When giving reasons or grounds for opinions on a question that is subject to debate, it is assumed that people use their own respective viewpoints. Hitachi focused on values such as health, economics and public safety, which are considered important to people and communities, and created a “Value Dictionary” that systematically organizes those values based on a database*2 – a database that records affirmative and negative opinions regarding a large number of discussion topics. Specifically, a list of values that serve as a basis of decision making by people or communities, and the system extracts words demonstrating a strong relationship to the values based on the frequency of use in thedatabase, designating those words either as “positive” or “negative” in relation to those values. Furthermore, the values and relevant words were systematically arranged by assigning a score according to “importance” based on the frequency of use. For example, in the case of the value “Health,” the relations with words, such as “exercise” which is positive, and “disease” and “obesity” which are negative, were systematically arranged.
(2) Metadata is created by identifying correlations between issues and their values from huge volumes of text data
The system identifies the types of values encompassed in recorded issues, from among the various sentences used in large volumes of news articles, and creates database expressing whether those issues have positive or negative effects on those values. For example, from an article stating that “Noise is harmful to health,” it is determined that the issue of “noise” has the negative effect of suppressing the value “Health,” and this information is managed as database. Using this method, the system created approximately 250 million metadata (issue – value correlation data) from around 9.7 million news articles.
The system uses this huge volume of metadata as well as the Value Dictionary outlined in (1) above to select multiple values with strong correlations with a given topic from among the many news articles. By searching for sentences in all of the news articles that contain one of these values, the system extracts sentences that could potentially serve as reasons or grounds for agreement or disagreement with the topic in question.
(3) Calculated reliability of the extracted sentences
The sentences extracted using the Value Dictionary (1) and the Metadata (2) are scored based on the source of the quote, the numerical evidence and the rhetorical expressions in order to estimate whether the sentences have a strong correlation with the specified topic and value. By processing all of the sentences that could potentially serve as reasons or grounds for opinions, and evaluating scores, it is possible to select and present reliable grounds.
(4) Constructed architecture to realize asynchronous distributed processing of multiple algorithms
In order to increase processing speed and present responses within a designated time period, Hitachi constructed an architecture to realize asynchronous distributed processing of multiple algorithms in the various processes, from the analysis of the main topic to the selection of values, the article search and the presentation of reasons and grounds for opinions. This architecture executes parallel distributed processing of algorithms while at the same time executing asynchronous processing to the next process, in order to extract the desired grounds within the specified period of time.
Process of forming reasons and grounds
These technologies were developed with the cooperation of the Inui-Okazaki Laboratory, Graduate School of Information Sciences, Tohoku University (President: Susumu Satomi). By combining these four technologies, Hitachi have developed a technology that analyzes huge volumes of text data, and presents reasons and grounds for either affirmative or negative opinions on given topics. In the future, Hitachi will continue research and development aimed at achieving artificial intelligence that will enable logical dialogue between humans and computers.
The results outlined above are scheduled to be presented at ACL-IJCNLP 2015 (53rd Annual Meeting of the Association for Computational Linguistics and 7th International Joint Conference on Natural Language Processing), an international conference to be held in China during July 26-31, 2015.