PCMAN – It seems that voice recognition is not enough to embrace the future. Microsoft has started to explore ways to apply human capabilities of responding to stimulation into its machine learning system. Machine learning allows devices to respond to the user’s voice when they speak, keep track of each individual wandering in the house, unlock the door just by voice identification, and further more, identifying users’ emotions. All of these capabilities are prepared by Microsoft to be added to its Project Oxford, a set of cloud-based machine learning services that was introduced on May 2015 at Microsoft’s Build conference.
A set of artificial intelligence became a part of the newly introduced features under Microsoft’s Technology and Research Division. These new API include:
- Emotion Recognition
A part of Azure-based service, this feature process facial images portraying different human emotions. It can categorize emotions from anyone visible in an image. It could be used to apply metadata to images to identify if the collective of people are happy or sad, to gather data regarding people’s reaction towards specific events, displays, or marketing messages.
- Spell Check
A web API-based spell checker that could be integrated with any mobile and cloud application, which recognized misspellings, capitalization, contextual spelling errors, and other text problems. It resides in the cloud, so it never has to be updated, and the service would follow shifts in spelling, and improve its recommendation.
- Video Processing
Video processing tool developed by Microsoft’s Hyperlapse, the technology can process chunks of video to identify and track unique faces and movements from the video. With its detection capabilities, the machine-learning algorithm behind the service can edit videos based on particular parameters, including image stabilization on video clips to remove excessive camera movement.
- Speaker Recognition
Besides turning speech into text, the new speaker recognition feature allows applications to perform an identity check on the person who’s speaking. Hence, this feature could be used to detect a change on the user who’s using an application, and force additional authentication measurers for advance security purposes.
- Custom Recognition Intelligence Services (CRIS)
CRIS is a tool that allows developers to build speech recognition services for application regarding the location and the user of which the application is being used. When an application is being used in an environment with loud background noise, it would improve its speech recognition quality—hence converting voice to text wouldn’t be a problem.
These technologies make it seems like the future is not too far away, and we are hopeful that these features will be used for a greater good.