Transfer from: Hohhot Daily
It is conceivable that due to the huge potential demand in the fields of public welfare, public safety and national security, and the strong promotion of the rapid development of AI technology, in the near future, AI lip reading is expected to achieve rapid promotion and deep popularization, and the industrial prospects are very promising.
According to the continuous breakthrough of the bottleneck, it has become a reality that AI technology has achieved great success in the field of lip language recognition.
Many problems need to be broken.
However, Yan Huaizhi also said that at present, China’s artificial intelligence lip language recognition technology is still in its infancy, and there is still a long way to go if we want to use artificial intelligence to accurately recognize lip language.
From the perspective of language itself, human language has a high complexity. Of all the phonetic symbols involved in human speech, only about 30% are directly controlled by human lips, and 70% are teeth sounds, tongue sounds and throat sounds that are difficult to distinguish by naked eyes or even machine vision. Moreover, different people’s tone of voice, dialects, conjunctions, accents, and even beard cover will all lead to subtle changes in mouth shape, and it is this subtle change that will seriously affect the recognition and judgment of lip language by artificial intelligence.
From a technical point of view, the environment for artificial intelligence to collect lip language is usually complicated, and it is very difficult to accurately identify it. As far as the current artificial intelligence technology is concerned, the recognition level of long sentences and complex sentence patterns is not satisfactory, not to mention the problems of multi-scene recognition and lip recognition of multi-person images.
Yan Huaizhi said that only by solving the above problems can AI achieve a breakthrough in lip reading and move towards a mature stage of development.
There are many differences between different languages of human beings. Can AI read the lips of each language?
Yan Huaizhi introduced that most of the successful AI lip reading systems were limited to English models, because most AI models were trained based on English data. However, from the technical framework, the training models of different languages are basically the same, or they can be realized by the same kind of technical means.
Of course, in order to adapt to lip language recognition in different languages, some adaptive adjustments need to be made: on the one hand, data in corresponding languages should be selected for targeted training; On the other hand, the AI model needs to be adjusted, such as incorporating time masking, optimizing language model and improving superparameters.
In addition, the same language will have different mouths, even if the mouths are similar, they may represent completely different meanings. Therefore, a mature AI lip reading system needs a large number of lip feature sample data, and covers as many application scenarios and different types of speakers as possible, so as to improve the generalization ability of the trained lip recognition model and improve the recognition accuracy of AI lip reading for different mouth shapes and different ideographic languages.
Technical double-edged sword in urgent need of supervision
Despite all kinds of difficulties, more and more AI companies have begun to set foot in and plan to deepen the artificial intelligence lip recognition track. At present, the choices of major AI giants are different, which can be divided into lip language data, lip language video recognition, lip language understanding and so on.
Yan Huaizhi also said that at present, many artificial intelligence lip recognition technology fields have achieved initial breakthroughs, the prospect of full chain integration is expected, and industrial clusters are gradually taking shape.
From the perspective of application scenarios, AI lip reading has begun to emerge in the fields of social welfare and public safety. Judging from the current layout of the giants and the development trend of related technologies, AI lip reading is expected to have broad application prospects in identity recognition, national security, intelligent systems and so on. "It is conceivable that due to the huge potential demand in the fields of public welfare, public safety and national security, and the strong promotion of the rapid development of AI technology, in the near future, AI lip reading is expected to achieve rapid promotion and deep popularization, and the industrial prospects are very promising." Yan Huaizhi said.
Of course, technology application is a double-edged sword. Many people worry that lip-reading by AI will reveal the private content in people’s conversations, whether the parties are speaking publicly, whispering or talking to themselves. "Zhang Zhangkou" was stolen by others, and it was really terrible to think about it carefully.
Yan Huaizhi said that this kind of worry is not unfounded. On the one hand, the privacy leakage caused by AI lip reading may be caused by malicious lip reading, on the other hand, it may be the normal use of AI lip reading system, but the improper protection of storage and use leads to the theft or abuse of relevant data, which in turn causes damage to personal rights and interests. Moreover, because it involves the conversation content of the parties and has obvious directionality, this kind of privacy disclosure may be more harmful than ordinary personal information disclosure.
Therefore, Yan Huaizhi suggested that from the perspective of privacy protection, we should strengthen the formulation of relevant laws and regulations at the management level, strictly regulate and restrict the application scenarios, scope and purposes of AI lip reading, and increase the supervision and punishment of malicious use of technology. In addition, it is necessary to strengthen the construction of the security protection system of AI lip-reading system at the technical level, improve the recognition accuracy of the system by technical means, avoid technical abuse, and effectively ensure the content security of user conversations.