Location: Beijing
Group Overview
Researchers in the Intelligent Multimedia (IM) Group are turning their ideas into reality in areas such as computer vision, image/video understanding, pattern recognition, machine learning, and cloud media. Projects are intended for the design of next generation intelligent image/video systems and pushing the state-of-the-art forward in multimedia related research. Current research directions include image/video analysis, deep learning, human understanding, scene understanding, etc.
Roles & Responsibilities
We are looking for a highly motivated intern to work on research
projects on speech, e.g. keywords
spotting, denoising, text to speech. You will work closely with FTEs developing novel algorithms for speech related work. The technologies you develop may be shipped to Microsoft future products such as Microsoft Cognitive Services, Office Media (Stream, Teams, PPT), and Azure Media Analytics Services. More specifically, we focus on the investigation/design of keywords spotting, finding a fast, light, and high-quality solution.
Required Qualifications:
· MS/PhD student in Computer Science, Software Engineering, Electrical Engineering, or any related technical field · Background in machine learning, deep learning, speech, or multi-media processing, especially on speech recognition. · Good programming skills, familiar with Pytorch. · Good communication skills and excellent teamwork · With your advisor’s approval
Prefer:
· Familiar with both the traditional and the most recently NLP techniques, e.g., LSTM, transformer, BERT, etc. · Have keywords spotting or ASR related project experience.
Required Internship Duration:
Can commit at least 6-months internship.
Application Process
Send your CVs/resumes in English (if applicable, in Chinese as well) in PDF/Word/Txt/Html format to zhiyzh@microsoft.com, and note Name_Intern_School_Grade
|