We used the provided unsupervised machine learning implementation for automated annotated datasets and the online Google translation tools to create new datasets as well. For this purpose, we used OLID2019, OLID2020 datasets, and generated new datasets, which we made publicly available. Specifically, we pre-processed all provided datasets and developed an appropriate strategy to handle Tasks (A, B, & C) for identifying the presence/absence, type and the target of offensive language in social media. We handled offensive language in five languages: English, Greek, Danish, Arabic, and Turkish. In this work, we tackled Task A, B, and C of Offensive Language Challenge at SemEval2020. Although there is extensive research in identifying textual offensive language from online content, the dynamic discourse of social media content, as well as the emergence of new forms of offensive language, especially in a multilingual setting, calls for future research in the issue. Abstract With the proliferation of social media platforms, anonymous discussions together with easy online access, reports on offensive content have caused serious concern to both authorities and research communities.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |