Sarach Tuomchomtam. Computational demographic and personality recognition on anonymous social media. Doctoral Degraee(Computer Science). Kasetsart University. Office of the University Library. : Kasetsart University, 2021.
Computational demographic and personality recognition on anonymous social media
Abstract:
Demographics and personality are the core aspects that define a per sons characteristics and behavioral patterns. In recent years, computational user recognition research is gaining more attention due to the rise of online social media and its consequential human behavioral datasets. However, the data collection pro cess requires extensive user involvement, such as filling out a survey or explicit permission for profile access. In addition, the existing prediction methods focused on human-designed features and did not utilize rich user-generated content, which is the most direct way for people to express themselves. Therefore, this thesis pre sents a study on computational demographic and personality recognition of anony mous users on online social media based on their public descriptions and content. Our study covers the extraction and prediction of seven private attributes, including gender identity, age group, residential area, education level, political affiliation, re ligious belief, and personality type. To extract these attributes, we first identify the potential values and patterns of the attributes on the platform. Then, we use the pat terns to extract the attributes from public user descriptions in relevant communities. Next, we translate the descriptions of each attribute into one unified format. For the prediction, we propose several feature sets, mainly community activity and hybrid features. Then, we measure the performance of several prediction models using the extracted attributes as labels. Experimental results are promising for both the ex traction and prediction of the seven attributes. Our framework extracted the attrib utes of 45,751 unique users from Reddit, a social media website of user-aggregated content. We also found that our proposed feature sets outperform the ones from previous work on personality prediction with an F1 score of 64.4%. Demographic prediction scores are 98.1% for residential area, 94.7% for education level, 92.1% for gender identity, 91.5% for political affiliation, 60.6% for religious belief, and 52.0% for the age group. Despite many research on large platforms such as Facebook and Twitter, we have shown that Reddit is a potential source of demographic and personality study as well.
Kasetsart University. Office of the University Library