Senior Design Project Project short-name: YouTalkWeSign ( https://youtalkwesign.com ) Final Report Abdurrezak Efe, Yasin Erdoğdu, Enes Kavak, Cihangir Mercan Supervisor: Hamdi Dibeklioğlu Jury Members: Varol Akman, Mustafa Özdal
Contents 1. Introduction 2. System Overview 3. Final Status 4. Final Architecture and Design 4.1 GUI Tier 4.2 Application Logic Tier 4.3 Storage Tier 5. Tools and Technologies used 6. Impact of Engineering Solutions 6.1 Global Impact 6.2 Social Impact 6.3 Economic Impact 7. Contemporary Issues 8. References 1
1. Introduction YouTalkWeSign is a web application which is designed to serve people with hearing impairments or deafness. Through an on-screen avatar, the app translates the spoken words in YouTube videos into sign language. The app can be run by replacing the "youtube" word in the video address with "youtalkwesign". Therefore, our website is https://www.youtalkwesign.com/. For example, when user makes this request to our website: https://www.youtalkwesign.com/watch?v=oeydhbngsz0, we generate the sign language from the text transcript of the video. Then, we display it on the bottom right corner of the video. Hence, the video and the sign language avatar are played simultaneously. 2. System Overview YouTalkWeSign has a good-looking and attractive user interface. It is user friendly and easy to use. Trending videos link is available at our website s sidebar and the users are able to see what the trend among deaf people is. Also, users are able to search for a video with keywords just like they do at the YouTube. In addition, users can register to our website to benefit from keeping track of their watch history and hearted videos by giving us a username, a valid email address and a password. For some of the videos, content providers are uploading the subtitles besides the video that they have uploaded. Besides, for some of the videos, YouTube has a speech-to-text subtitles functionality. For these two types of videos, we simply get the speech text from the YouTube. Then, we translate this text to the sign language. Hence, users are able to watch the video with the corresponding translation instantly. However, for some of the videos, there is no way we can see subtitles. For these videos, we first convert speech to text. Then, we translate this text to the sign language. While we are doing this conversion, we are aiming to make the conversion to the sign language as much as fast because we do not want our users to wait a lot. For this purpose, we are also willing to make our server stronger with higher CPU. While translating, our translator avatar s speed will change dynamically. The requested video by the user may have different speech intensities in different time periods. Hence, avatar speed will be adjusted accordingly. 3. Final Status Our Project is live at https://youtalkwesign.com or at http://ytws.io/. All the features are available for users to explore. These features are: 2
Users can watch any video from YouTube with sign language translation by replacing the "youtube" part before the ".com" with "youtalkwesign. Users can change video settings such as sound and full screen in application website just like they do at the YouTube. Users can pause the video. When it is paused, the avatar will also be paused. They play and pause simultaneously. Users can open and close subtitle under the avatar with the icon near the play/pause. Users can change the background of an avatar. Users can drag and drop the avatar. 3
Users are able to see trending videos regarding view counts at our website (from the sidebar). Users are able to search for a video with keywords just like they do at the YouTube (from the navbar). Also, while doing that, they will not be distracted from the video they play currently. The search results will come under the page because our application is one-page application. In these search results below, the videos that have green tick have the subtitles available so they have instant conversion. However, the videos that have red tick have no subtitles so they need speech-to-text conversion first. 4
If user clicks on a video with a red X symbol, that means, we need to convert speech to text first. For this period, user will see this screen: Users can register to our website to benefit from keeping track of their watch history and hearted videos by giving us a username, a valid email address and a password (from the button at the navbar). After a successful registration, the system redirects user to the home page as logged in. Registered users can log in to the website with their username and password at any time and become authenticated. 5
Authenticated users can see their watch history. Authenticated users can heart a video. They can see their hearted videos. Finally, they can remove heart from a hearted video. User with username sa s hearted list: 6
4. Final Architecture and Design For YouTalkWeSign, we choose the three-tier architectural style which will organize our subsystems into three layers: GUI, Application Logic and Storage. YouTalkWeSign s 3-tier architecture is below. (Full resolution: https://i.imgur.com/krmbqld.jpg ) 4.1 GUI Tier The GUI tier consists of HTML/CSS pages that are powered by Javascript and Thymeleaf Java template engine. For the design of the web pages, Bootstrap v4 beta is used. 7
Therefore, our website is full responsive and has a consistent design. In addition, for the purpose of being a dynamic webpage, we have benefited from jquery a lot. Also, the icons that are used in our website is from the Font Awesome library. 4.2 Application Logic Tier Our application logic tier consists of Controller/Service pairs. In this logic, first, request (GET or POST) from the user comes to the related controller and controller catches it. Then, controller calls its service class to get the job done. (Full resolution: https://i.imgur.com/rlheupd.jpg ) At this package, AboutUsController is responsible for showing our project s information to the user when clicked to About Us. HistoryController is responsible for keeping user s watched videos at the database. LoginController is responsible for authentication. RegisterController is responsible for authorization. MainController is responsible for displaying the video with the corresponding sign language translation. It is the most important one because its job starts by getting the transcript of the video. Then, it prepares the words that are spoken in the video with the signs of them. Then, it administers the GUI to make video and avatar play simultaneously. HeartedController is responsible for keeping user s hearted videos at the database. TrendingController decides the most watched videos at our website and keeps them at the database. SearchController is responsible for bringing search results with regard to keyword. 4.3 Storage Tier At this tier, first of all, all videos have a transcript besides their youtube id, title and thumbnail image url. Transcript consists of a list of Text objects. Each text object has a start time and duration from that time and the text that is spoken at this time interval. Then, the Text object will have a list of words which keeps words along with their sign video URLs. In addition, Users relation with the videos will be kept at database; their histories and hearteds. 8
Example Transcript: 5. Tools and Technologies Used For front-end of the web application, we use: Thymeleaf (Java template engine) Bootstrap v4 (for a responsive and a good-looking design) jquery v3.2.1 (for being a one-page app with ajax calls) For back-end of the web application, we use Java technologies: Spring Boot (auto configured version of Spring MVC) Spring Security (JDBC authentication and authorization) Spring Data JPA (database operations with MySQL [login, register, history, hearted and trending videos]) For the implementation of the web application, we use: Eclipse IDE Oxygen Version Apache Maven (for a dependency management). Our server will be in a Digital Ocean droplet. Apache Tomcat 8 will be our web application manager where we will deploy our application. YouTube transcripts are in XML format. For parsing it, we use JAXB. We have a database that includes 6000~ words. These words are at their simple forms. For example, if schools word passes in the YouTube video, there is no schools at our database, but there is school. For this purpose, we need to convert words that passes in YouTube video to their simpler forms. For our search feature, we use YouTube Search API. For giving alerts as a toaster, we use toaster library: https://github.com/codeseven/toastr User can set the sound of a video. For this slider, we use: https://github.com/seiyria/bootstrap-slider For right click context menu on the avatar, we use: https://github.com/dgoguerra/bootstrap-menu 9
For converting Youtube video format from mp4 to wav file, we use: FFmpeg software Jave (Java Audio Video Encoder) library For generating transcript for videos without subtitles, we use: Google Cloud Speech-to-Text API Google Cloud Platform For adding cartoon effect on avatar video frames, we use: OpenCv library for Python Pillow library for Python For concatenating video frames to create video again, we use: Moviepy library for Python 6. Impact of Engineering Solutions 6.1 Global Impact The main target of the YouTalkWeSign project is to serve people with hearing impairments and deafness.there are 466 million people who disabling hearing loss around the world.[1] These people mostly uses sign languages to communicate with each other and other people. There are many different sign languages which specific to world languages such as english, portuguese, arabic and turkish. However, the most common sign language is American Sign Language(ASL) among people. There are lots of applications to serve these people about sign language. Yet, none of them has a great success for global impact on people with using ASL. YouTalkWeSign aims to have a global impact by creating usable, common and efficient platform. YouTalkWeSign is basically web application to convert voice or subtitles to sign language. It is working compatible with Youtube which is one of the most common social platform around the world. Therefore, YouTalkWeSign can have huge global impact on people from different cultures. Since, in today's world, Youtube is one of the top social networks in the world. People from everywhere can watch and upload video on Youtube. With using YouTalkWeSign web application, people can watch video on youtube with avatar who translates to sign language. It is working for only English language, but it can be used for any video in any language in the future. Thanks to these updates, global impact of YouTalkWeSign will be improved. 6.2 Social Impact YouTalkWeSign is web application for sign language translation and it is working compatible with Youtube social media. Owing to this, it has a indisputable impact in the social lives of the people. By translating video voice or subtitles to sign language, it enables people from all over the world who have hearing impairments to watch any video with sign language 10
translation. Most of these people have no awareness about most of Youtube videos. There are various type of videos on Youtube such as news, tv shows, movies, talks and tutorials. Most of them have no subtitles. Even if some of them have subtitles, people with having hearing impairments do not prefer reading subtitles while they are watching video at the same time. With using YouTalkWeSign, people will be able to watch any video with sign language translation. YouTalkWeSign can make any video as understable and easy to watch for those people. 6.3 Economic Impact YouTalkWeSign can have economic impact as increasing number of video views and making any advertisement video more accessible. With making any video on Youtube understandable for people who have hearing impairments, YouTalkWeSign can provide videos on Youtube with increasing view number. Moreover, Youtube has many advertisement videos, therefore, YouTalkWeSign can convey them to more people. Thanks to these, YouTalkWeSign can increase number of customers for advertisement videos and it can increase profits and earnings. 7. Contemporary Issues The language of YouTalkWeSign is selected English since it is most used language in the world. Additionally, the sign language we used is American Sign Language(ASL) as the number of people with hearing disability that know ASL is about 500K according to Ethnologue s 2013 report, making it the fourth most common sign language. Our tool makes use of Youtube as Youtube is the largest online video streaming service out there and they provide transcripts for a lot of videos which makes our service faster and reliable. Since our tool works with Youtube, it definitely needs internet connection. Some problems may occur when number of users increase and our tool comes in a point of bottleneck for generating transcripts for demanded videos. Nevertheless, once we generate a transcript for a video, we store the signature and the transcript for the next time. 8. References [1] Deafness and hearing loss, World Health Organization. [Online]. Available: http://www.who.int/en/news-room/fact-sheets/detail/deafness-and-hearing-loss. [Accessed: 03-May-2018]. [2] Anon, (2018). [online] Available at: https://web.archive.org/web/20131126034146/http://www.ethnologue.com/subgroups /deaf-sign-language [Accessed 3 May 2018]. 11