On Thursday night, I headed down to the beautiful Bell House, in the Gowanus section of Brooklyn, for this month's Secret Science Club lecture featuring economist and data scientist Dr Seth Stephens-Davidowitz, University of Pennsylvania Wharton School lecturer, former Google data analist, and author of Everybody Lies: Big Data, New Data, and What the Internet Can Tell Us About Who We Really Are.
Dr Stephens-Davidowitz opened his lecture by noting that, over the past eighty years, if researchers wanted to judge people's beliefs, they would conduct an opinion survey. The problem with that is that people lie to surveys. In a survey concerning sex and condom use, the participants lied about frequency of sex and use of prophylactics: extrapolating from survey data, the women's answers would indicate that 1.1 billion condoms would be used per year. According to the male respondents, 1.6 billion condoms would be used per year. Actual sales figures indicated that about six-hundred million condoms were actually sold the year the study was conducted. If frequency of sex had been extrapolated from the study and correlated with condom sales, there would have been far more pregnancies expected that year.
People lie to surveys, they also 'mess with' surveys for various reasons- in one recent survey, when asked the question 'What color is a red ball?', six percent of respondents answered 'green' and six percent answered 'undecided'. Screwing with surveys is a particular problem when teens are the participants. In a survey formulated to determine if adopted teens were more likely to drink than those who lived with their biological parents, more than half of the respondents who claimed to be adopted had not been. Dr Stephens-Davidowitz noted that it's fun to mess with surveys.
Internet searches are a different matter- Dr Stephens-Davidowitz likened Google searches to 'digital truth serum'. People are comfortable telling their browsers things that they don't tell other people. According to Google trends, searches for porn were more common than searches for weather reports among twenty percent of men and four percent of women. Search engines give users incentives to tell the truth... if one wants the results one wants, one must use the proper search terms to find them.
Dr Stephens-Davidowitz noted that Google searches revealed a lot of secret racism that was missed by polls. A map of racist Google searches correlates uncomfortably with a map of support for Donald Trump's candidacy. Dr Stephens-Davidowitz noted that the true divide when it comes to frequency of racist searches targeting African-Americans is not North vs South, but East vs West, with a higher percentage of easterners using racist search terms. It is pretty safe to say that racism is the number one factor in the ascendancy of Trump.
In another recent survey concerning sexuality, the highest percentage of men claiming that they are gay was in Rhode Island, with 4.8% of respondents answering in the affirmative. The state with the lowest percentage of affirmative responses was Mississippi, with 2.7%. In contrast, the percentage of Google users searching for gay porn was 5.3% in RI and 4.8% in MS. The numbers are similar everywhere. Among women, the search term 'Is my husband gay?' occurs ten times more frequently than 'Is my husband cheating?' and eight times more frequently than 'Is my husband depressed?' This query is most common down South.
Another survey indicated that there is a self-induced abortion crisis. With more and more states restricting legal abortions, desperate girls and women are using the search term 'How do I perform an abortion myself?' This search term exploded around 2011, just as the crackdown occurred.
In India, the top way to complete the search term 'My husband wants me to' is 'breastfeed him'.
Dr Stephens-Davidowitz noted that, if Google just confirmed analysts suspicions, it wouldn't be so revolutionary, but the unexpected results of trend analysis revealed secrets that the researchers didn't suspect. He wryly noted that this was a BIG WIN for science. In the case of the Indian Google search terms, none of the experts knew that about the breastfeeding fetish.
If Google is 'digital truth serum', Dr Stephens-Davidowitz said, Facebook is a 'digital brag to friends about how good one's life is'. People are even more dishonest on Facebook than they are on surveys. While the National Enquirer sells more copies than The Atlantic, on Facebook the latter publication gets mentioned with forty-five times the frequency of the former. The top terms women use to describe their husbands on Facebook are: 'best', 'best friend', 'amazing', 'greatest', and 'so cute'. On Google, the top terms women use in searches regarding their husbands are: 'gay', 'jerk', 'amazing', 'annoying', and 'mean'. Dr Stephens-Davidowitz gave us a good piece of advice- knowing the truth is better than not knowing, don't compare your Google searches to others' Facebook posts.
Learning of our biases can also be helpful. Parents are twice as likely to use search terms such as 'gifted' and 'genius' when describing their sons, while they are more likely to use the queries 'is my daughter overweight?' or 'is my daughter ugly?' than to ask that of their sons. While it is difficult to ask racist internet searchers not to be racist, it isn't that difficult to tell parents not to be biased.
In the immediate aftermath of the San Bernardino mass shooting, the top Google search was 'kill Muslims'. There was an explosion of anti-Muslim rage, with other popular terms being 'I hate Muslims', 'Muslims are evil', and 'Muslims terrorism'. As searches using such terms rise, hate crimes rise. In the aftermath of the shooting, President Obama delivered a speech from the Oval Office in which he implored Americans:
“Just as it is the responsibility of Muslims around the world to root out misguided ideas that lead to radicalization, it is the responsibility of all Americans — of every faith — to reject discrimination.”
While the speech was well-received by pundits, minute-by-minute the anti-Muslim searches skyrocketed. The media consensus was 'Nice job, Obama', while the search engines revealed rage and backlash. Later on, in the speech, President Obama noted:
“Muslim Americans are our friends and our neighbors, our co-workers, our sports heroes—and, yes, they are our men and women in uniform who are willing to die in defense of our country. We have to remember that.”
By engaging people's curiosity, the internet rage-fest calmed a bit. In a speech to a Baltimore mosque congregation, President Obama doubled down on his appeal to people's curiosity about Muslim-Americans:
Generations of Muslim Americans helped to build our nation. They were part of the flow of immigrants who became farmers and merchants. They built America’s first mosque, surprisingly enough, in North Dakota. America’s oldest surviving mosque is in Iowa. The first Islamic center in New York City was built in the 1890s. Muslim Americans worked on Henry Ford’s assembly line, cranking out cars. A Muslim American designed the skyscrapers of Chicago.
Rage and violence are important issues, but insane people were usually not studied... when people make crazy Google searches, what enrages them? Conversely, what calms them down? With search engine analytics, studying an angry mob is now a science, so a more effective approach to addressing violence can be formulated.
Dr Stephens-Davidowitz then opened up the floor to a lengthy Q&A session. Some bastard in the audience raised the specter of Rule 34 and, while the Good Doctor (shockingly, to me) wasn't familiar with the term, he assured us that it exists. In answer to one query, he made sure to note that data is neither good nor evil- the users choose to use them for good or ill... investigators use data to solve crimes, scammers use data to fleece consumers. Using data, corporations can target small sets of the population with advertisements. In answer to a question about people's ability to stop being honest on Google, he noted that, right after Edward Snowden's leak, embarrassing searches (including searches for 'Nickelback') slowed down. Regarding elections, Google searches are getting better, but it is still hard to predict elections. Politics being a sensitive area, searches tend to be bad- models must be based on data, not on people's responses to direct questions. Data gives us a deeper and richer view of people than the surface view that surveys provide.
In answer to a question about how individuals can use information to combat corporate dominance, Dr Stephens-Davidowitz did note that consumers can use internet searches to seek out lower priced goods, but that Big Data overall makes corporations more powerful. Google knows truths about you before your family does.
In 2004, Google users tended to be students or intellectuals, so searches about science were more popular by percentage of searchers. Now, the internet has a much broader user base. There has always been unseemly behavior, Dr Stephens-Davidowitz described it as 'a dark element of anonymous people doing horrible things'. Early on, internet searches concerning suicide often elicited deplorables urging 'do it', while later interventions in the search algorithms altered results to refer users to suicide prevention hotlines. Searches regarding 'child abuse' are more ambiguous, as older kids often do post-abuse searches, which can result in interventions by officials. When asked about a breakdown of internet users by age, Dr Stephens-Davidowitz noted that this can't be done, referring to Peter Steiner's famous 1993 New Yorker Cartoon: On the Internet, nobody knows you're a dog.
When asked about the strangest American habit that he's learned about, Dr Stephens-Davidowitz noted that people google Google. He indicated that using Google Trends is a powerful way to put public data to use.
Another questioner asked him how to spot fake news, and Stephens-Davidowitz noted that conspiracy theories have been popular long before the Sandy Hook Massacre.
In answer to another question, Dr Stephens-Davidowitz noted that, while internet searches tend to correlate with offline activity, there can be holes in the dataset that don't play out- while searches for 'God' tend to correlate to the Bible Belt, the top result for Google searches for the word (at 2%) were related to the God of War video game franchise.
In order to mess up the data, one would have to be subtle- yahoos using Yahoo are at a disadvantage because searching for oppositional reasons merely indicates interest. There are pitfalls- one can cherry pick data, use of one strident word can have a disproportionate effect.
Dr Stephens-Davidowitz ended by addressing the ethical issues of data analytics, and whether companies such as Google should intervene when troublesome searches are made... does Google know when someone's doing something bad? He noted that a lot of people have horrible thoughts, but don't follow through on them. On the question of whether suicidal ideation correlates with suicide rates, he indicated that, while he was aware of 3.5 million searches about suicide, only four-thousand of the individuals followed through with killing themselves. While he encountered some disturbing revelations, such as the extent and effects of racism, he also encountered hopeful revelations- people's searches can verify some of the suspicions but allay other ones. For instance, while people are insecure about their own shortcomings, they are usually more forgiving of those of their partners.
The lecture was thought-provoking and entertaining- as someone who probably spends too much time on the internet, it was a nice overview of what really goes on in this crazy Series of Tubes. Here's a short media appearance by Dr Stephens-Davidowitz on the topic of 'Internet Truth Serum':
For more substantive media, here's a broad selection of appearances by the good doctor.