The 2019 Bubbles Hackathon

Final presentations: 3 pm, 30 October 2019, Posner Hall 151

Hackathon Groups

Lost in Translation
mjkern, mkornyev, stembunk, andrewh1, a.k.a., The Cognitive Bottlenecks
As the world gets increasingly connected via the internet we communicate across a multitude of cultures and languages. As such translation is happening on an unprecedented scale and, while translation is known to be imperfect, it is difficult to quantify how much information is lost and what effect this has on readers. We will study the translation archives of the fictional writing community found of SCP to discover how much entropy is lost in crowd-sourced translation. We will also compare the upvotes on these texts to see how community members respond to these translations and the originals. This investigation will give us insight into the information lost in translations from various languages and help us measure the importance of such loses.

Comparative Entropy of Information Dissemination in Academic Disciplines
ahemanth, merlebac, sasyed, mbuchman, salexand, a.k.a., line-earthers
We know that each field of study has a formal mechanism for information dissemination; but very little is known about how these mechanisms compare in terms of entropy. This limits our understanding of the differences between writing for various academic fields and the ways in which these fields approach research. To learn more about how journals differ, we will analyze papers from several different subjects using entropy given a rolling-window; entropy evolution between introduction, body, and conclusion; and topic modelling. Our conclusions will make it easier to identify different trends between different fields of research.

Exploring student accuracy in estimating the distances between academic buildings
asindhwa, edenting, mrfuhrma, jfeldgoi, a.k.a., The Campus Cartographers
We thought that individual students would be really inaccurate in their mental map of campus, but that through “wisdom of the crowds” they would be accurate, and that we would be able to discern differences in accuracy based on students’ majors, gender, and whether or not they were surveyed indoors or outdoors. Consequently, we tested students accuracy by having them estimate direct path distances between a set of buildings on campus, comparing it to the actual distances, and examining overall answers aggregated and stratified by major, gender, and survey location (e.g. outdoors).

Data Science Behind Box Office Prediction
jzou1, shiqic, hakyungk, xinyuh, a.k.a., IMDb Assure
We hear a lot about what movies topped the box office each year, but we hardly know much about the dynamics of what goes on behind the scenes that makes a movie profitable. For instance, do famous actors and actresses in the cast always attract a large audience? During what time of the year do people like to go for movies the most? Since movies are such a popular mode of entertainment, we think it would be interesting to figure out what features make up a good movie and build a box office prediction model for future movies.

Given a dataset on 5000 movies from TMDb, we will start with basic data pre-processing and feature engineering to figure out what major factors may contribute to the variances in movie revenues. First, we will do exploratory data analysis as well as visualization on relevant predictor variables and then, having defined movie revenue as the response variable that we want to predict, we will analyze its significance of correlation with each predictor. Aside from our given data, we assume social contexts may also play a role in determining the revenue of movies. For example, we will try to see if there is a correspondence in trends between revenue and national economic growth. Moreover, we will look at the Social Context Index (SCI)’s overlapping percentage between movie keywords and news headline keywords (top 600 before the year of movie release) extracted from the ABC news headlines dataset ‚Äî and test its significance in affecting the box office outcome.

Correlating the news headlines with the box office outcome, we are effectively trying to understand the effect of social context on people's interest in the movie topics. What this entails is the “Innovation Dynamics” which will give movie producers insights about their future efforts in their creative process and the creative economy.

Integrating the found significant variables as predictors, we will find the best statistical model using these features by conducting cross validation. This will not only let us predict the revenues for upcoming movies, but also help directors and producers determine what attributes need to be focused on in order to increase their revenue.

How People Behave When Leaving Online Reviews and Ratings
yujunz, xiangtic, rjlane, sijings
Based on the Grammar and Online Product Reviews dataset, we want to examine and answer the question of "How People Behave When Leaving Online Reviews and Ratings". For traditional shopping experience, people make decisions based on their physical interactions with the items (looking, smelling, tasting, and feeling it). However, for online shopping, many of these factors are lost. People may, therefore, depend more on other customers' reviews and ratings of the product to make purchase decisions. In our project, we want to look at how online reviews and ratings are given to help more people make better choices when looking at them. In specific, we want to look at factors such as correct punctuation usage, number of spelling errors, and review length. We want to discover how these factors reflect customers’ attitudes towards products of different quality and category (e.g. if expensive products get better quality reviews; if negative reviews tend to be shorter than positive ones; if people seem to be more involved in writing bad reviews than good ones; if people do not give many reviews for standardized products such as toilet paper). Moreover, we also want to look at how people's behavior evolves over time with the advance of technology.  

How have the expectations of interviewers changed over time?
pasthana, wardak, dpratomo, joohyeuy, mtsang1
We are familiar with the types of behavioral interview questions being asked, but we know little about how these questions changed over time. This causes candidates to be unsure of what answer the interviewer is looking for. Using the 6,207 behavioral interview questions posted on Glassdoor, we will utilize text analysis and topic modeling to choose ten categories of questions and examine changes in the types of questions asked over time. This will allow us to gain a better understanding of what employers are seeking in prospective employees and how their expectations have changed over time.

Predicting a Singer’s Political Party Based on Lyrics and Genre
nvembu, gshandar, anishkrishnan
We know very little about the correlation between specific genres and the political party of the singer, and we want to be able to figure out if a singer is a democrat or republican just by knowing the lyrics of a song and its genre. In order to do this, we will first train the algorithm with the top 20 songs from the top Country, Rap, pop, R&B, and Alternative genres using gradient descent optimization and natural language processing. This will allow us to classify a singer’s political party given the lyrics and genre, from the test data.

Social Norms in Starbucks
nluong keviny1 jrao1 seoyeonc swong1 fsitu
Starbucks, as a company, pioneered the modern coffeehouse and brought along unique cultural changes to people’s experiences with coffee. Some of these fascinations include the use of Starbucks' unique terminology for Starbucks’ Drink Sizes. We hope to investigate the social norm of Starbucks' terminology, and how the culture of the chain reinforces unique behavior that won’t normally be seen in typical coffee shops. Our plan is to survey customers at Starbucks at three different locations (Craig, Forbes, and Fifth) and take down a variety of data from information on their order and order sizes to information on their identity. We aim to investigate the possibility of associations between the variables in our research and the usage of unique Starbucks terminology. Through our research, we hope to find what factors play into why one adopts Starbucks’ social norms.

Wisdom of the crowds using qualitative and quantitative measures
efinnera, jslomka, ukommine, asandova, jbajaj, gferrant, a.k.a., The Jellibellies
Wisdom of crowds is a well-documented phenomenon: that large groups of people are collectively smarter than individual experts when it comes to problem solving. We aim to compare the wisdom of crowds on quantitative questions to that on qualitative questions and to see if being able to see others' answers prevents convergence on the correct answer (i.e. if early incorrect answers irreparably anchor subsequent responses). We hypothesize that individual responses will converge for the qualitative setting but not for the quantitative setting. To test our hypothesis, we utilized a 2x2 study design whereby participants either guessed the amount of jelly beans in a jar or were asked whether long distance relationships work. The participants in these conditions were split into blind and transparent conditions. In the blind condition, participants filled out an individual survey recording their answers. In the transparent condition, participants input their answers in a spreadsheet where they could see all previous responses. According to our preliminary analysis, the data seems to suggest convergence in blind conditions, and to a higher extent in the quantitative condition than the qualitative condition.

Which visual elements of an Instagram post result in the highest engagement across various content categories?
madelinm, hyejos, kyoungby, denisel1
Instagram has become one of the most used platforms for people to share pictures, interact with others, and promote businesses. This study seeks to analyze which elements of an Instagram post elicits the highest levels of engagement across various content categories to provide insight to users. We hypothesized that more vibrant photos (high in Hue, Saturation and Brightness) receive the most engagement in general. Therefore, we analyzed HSB, number of likes, and comments of posts from this year across different categories of content using R and Excel and parsing the HTML source pages. We expect to find an increase in engagement for posts with higher HSB which means Instagram users are more likely to like, comment, and follow for bright, saturated photos. By observing Creator categories such as influencers, sports, celebrities, food, animal, and photography accounts, we scraped the ten most recent posts to examine what type of content users are being exposed to and engaging with. This analysis will not only provide insight on what type of content is influencing the online community today, but will also allow Instagram influencers and account owners to cater their posts to certain audiences to generate the engagement they desire.

How does an individual’s demographics influence their ethical decision-making in the presence of competing values?
nbellant, shrutim, sananth, cby, jebritto, aileeny, shiwanip, a.k.a., The Super Cool Group
Decisions, both life-changing and relatively routine, are often laced in ethical ambiguity. This study aims to investigate the influential factors behind people’s ethically-inclined decisions. The rationale behind this study was the overwhelming perception that people’s decision-making is strongly influenced by their religion, SES, gender, and age, since people from different backgrounds hold different sets of values which in turn influence their decision-making. We hypothesize that an individual's background will have a statistically significant effect on their decision-making as a result of their values. To investigate this question, we created a two part survey on Qualtrics which first collected participants' demographics, and then presented participants with hard decisions which included different values. After running the initial analysis of the data collected through the survey, we found that men prioritize different values than women in ethical decision making.

Country Music as an American Caricature
nmattson, jmeyers1, jli6, rhianeb, a.k.a., Cowboy Babies
We know very little about how much influence country music carries as a reflection of citizens of the United States. This causes a problem as themes in country music may perpetuate a caricature of American culture. We will analyze the top country songs and artists over the past five decades in order to determine whether an American stereotype has been established through the genre and the people who claim to represent this lifestyle.

Electronic Cigarette Use and Habits in College Students
mjaeb, recasey, ocs, mdrisco, tdirks
2019 has seen new medical reports about the harms of e-cigarettes, juuls, and babes, and new legislation has been introduced to disincentivize the use of such products, but to see how exactly Carnegie Mellon students' nicotine habits have formed and are affected by current events, we wanted to introduce a survey into the students body. We sent out a google survey form with questions that ranged from how the student began to use these products, how frequently they engage in such activities, and finally we concluded with their views on quitting and if they wanted to quit, their struggles with the process.

How to make the most successful movie at any given time in history?
mingyuax, mcliu, jfung1, smwilkin, kyip, a.k.a., The Anime Baby Boys
Film and movies reflect people's interests in the past, present, and future. We thought that the popularity of movie genres reflects the general public's interest and trends in news events around the world. We did an analysis of past movie data such as genre, runtime and ratings to see the changes in movie trends over time. We compared such trends to societal trends and real life events by using the GDELT Project, hoping to find a correlation between the two.