Data is an abstraction or a physical activity. When describing data we are measuring one element that may actually have multiple variables that influence its outcome.
On Monday, the LSU Tigers will play the Clemson Tigers for the College Football National Championship. Before, during and after the game, reporters, fans, and announcers will compare many metrics. They will discuss turnovers, first downs, penalties, etc. but the most common statistic (beyond the score) will be offensive yards. Offensive yardage represents many things: the quality of the offensive line (or its lack of execution), each coach’s play-calling, and the quality of the quarterback/receivers play. For example, Joe Burrow’s highlight against Georgia represents a 71 yard pass and that is all. The duration of the play, etc., are compressed into one small data point.
1st & 10 at LSU 20
(3:57 – 3rd) Joe Burrow pass complete to Justin Jefferson for 71 yds to the Geo 9 for a 1ST down
So, when you are watching the game, remember the announcers often describe an action by a single variable, one which is influenced by many things. And for some items, “data” fails to describe the variables that create these memorable moments.
We have all seen or heard this quote from Peter Drucker.
The focus on performance is a byproduct of a data rich world. Deploying “the internet of everything”, provides the ability to improve system performance at a greater degree of granularity if we all can agree upon the desired outcome.
A fan of slapstick/physical comedy, I always enjoyed this skit. Lucille and Vivian are unable to keep up with their chocolate wrapping assignment. They eventually “hide the evidence” that the system is failing, as their confidence turns to panic. (The woman manager actually created a perverse incentive, i.e., no unwrapped chocolates. To avoid being fired, they actually do a worse job than being truthful about their work, or the manager observing to see if they were preforming as expected.)
The manager saw the chocolates were gone. She was delighted, but did not understand the system’s real performance. One could argue that her measurement tools were weak, but her eyesight was sufficient to allow her to believe that no other testing was necessary, the objective was met, no unwrapped chocolates in the other room. Lucille and Vivian do not confront the manager. Their mouths are full of chocolates, thus agreeing to be overworked yet again.
So, when examining ways to manage performance measurements, industrial processing does a good job of discussing flow charts, etc., but it may not necessarily capture the ingenuity of the work bench! And this is where the second Drucker quote serves as a useful counterpoint.
But there may be a better quote… “just remember performance measures are like a box of chocolates.”
Summary:“Everybody Lies”http://sethsd.com/everybodylies was an enjoyable, fascinating book describing how understanding metadata about internet searches can provide information concerning people’s “true feelings, emotions, or opinions”. The book assumes people are more honest when they are anonymously seeking information. Reviewing those searches in aggregate provides information that social scientists may be unable to collect in other formats.
The Main Arguments
Researchers struggle to understand people’s behaviors, needs and their true opinions. In Part I, Data, Big and Small, the author outlines the need to frame social science research based on understanding big and small data. Using his grandmother’s dating advice was a great example of using Big Data (page 25). But there are cautions here, for we can pick and choose what observations we use in making those conclusions.
People will “lie” to researchers for many reasons, such as not expressing their true feelings to avoid judgement by the researcher. In this case, the use of internet searches, often done in private, can provide a way to better estimate broad trends concerning how people understand the world. The main section of the book, Part II, the Powers of Big Data, illustrates the disconnect researchers face when researching topics such as Sex, Hate and Prejudice, Internet, Child Abuse and Abortion, Facebook and Customers. Each topic gets an introduction concerning what people have studied, and how using internet search information can confirm, deny, or provide new insights into the topic.
Throughout the book, there were cautionary tells that having more data may not generate more/useful information or that not every belief can be quantified through the data. His discussion criticizing studies that would find “most Knicks fans live in the New York area” are useless. In Part III, “Big Data: Handle with Care“, the author begins the real discussion: big data can be a boon to good governance and addressing social needs. But the real caveat is that such needs may not be in everyone’s self-interest. There are questions that having more data could introduce more errors, such as Dimensionality, where the odds of finding a correlation between two elements is increased simply because there is just more data to find possible correlation.
Methodology, Evidence, and Context
The report was not an analytically oriented book, but the charts and tables were helpful in illustrating how we “lie to ourselves” when we consider our public disclosures (Facebook posts) compared to our private searches. I went to Google Trends to test a few searches, and it is a useful proxy concerning people’s interest in a topic by time and geography. The book presented, and footnoted, many studies, showing the author’s thoroughness, and would be a useful first document for additional research on some of these topic areas.
The book’s context and layout were very accessible, and the stories engaging. While I would have enjoyed seeing even more tables, charts, etc., such would have reduced the effectiveness of the work (and I could look them up with the references!) There are some graphics in the Ted Talk, which I found very helpful.
I enjoyed the comparison between himself and his brother regarding baseball. I am not a baseball fan, but my father loved football. Cultural references do shape experiences in ways we do not understand when we were children, but these items influence our adulthood’s tastes and desires.
I thought the best part of the whole piece was Chapter 8, Mo Data, Mo Problems? What We Shouldn’t Do, (especially after the A/B testing sections- scary that we are so easy to manipulate!) With more data, comes the assumption that “we” can do more. But does more data mean we have more actionable items, or do we simply have more confusion when making choices. The author mentions the Minority Report, the movie. When discussed in this context, the original story written by Philip K. Dick is even more horrific, as other PreCogs pick up the story at different points. Based on concerns with big data, there exist more ethical challenges that remain to be addressed concerning ownership of our physical and online identifies.
Finally, I liked the honesty of the “conclusion challenge”, especially after mentioning how Freakonomics influenced his professional interest in data research. Seth, if we ever meet, I will buy the first round in celebration of your success in writing such an accessible, fun, and most importantly, insightful book.
The older I get, the more I see this message true. It is easy to assume we are all experts. For a researcher, this is not a good attitude. We all know the one way to do any research activity (process, data, approach, etc.), but in doing so, we often forget the joy that comes from learning something new. It is in that learning, based on recommendations, comments, critiques, etc., that we grow as researchers. But it is in the teaching to others where we learn more.
As a young researcher working at the Port of Long Beach, I answered requests generated from the port staff. (As my time in Long Beach occurred before the Internet became the “knowledge search tool”, I had to understand what people needed and why they needed the information!) After plenty of, “This is not what I need”, “I wanted it like the report you did a year ago”, “how much do you spend on data purchases”, I realized that it was not only understanding their “question”, but knowing what intelligence they needed. So, I asked questions about their request (sometimes the light bulb takes a while to come on…). Surprising, once I took the time to question the requester, the better the research (more timely, focused, etc.) became. (There was a great discussion on the importance of questions by Hal Gregersen on “The One You Feed” Podcast.)
Disclaimer: The following assumes these are internally generated questions. While the same approach could be used for evaluating service consulting requests, there exist other program elements one would add beyond these questions.
The questions fall into four broad categories: Institutional, Skills, Costs, and Review. The Institutional category links the inquiry to the organization’s goals and values. One could argue these are the most important to know, for they outline what is expected, but I would argue they are not the only thing to assess. The Skills category is a self-determination about your ability to provide the answer, while Costs outline what (if any) additional resources may be needed. Finally, the last category is Review, i.e., what can I do better/different in my current work activities based on this request. (Rearranging the 4 categories results in RISC, an appropriate reminder of the possible consequences of bad/misinformed research.)
Institutional: The objective is to provide timely intelligence to support the organization’s mission. In many ways, knowing the right answer but for the wrong question does not help anyone, and researchers must guard against our own biases concerning what we think someone needs. I had to learn to ask the following questions:
Who needs this,
Who asked the question,
When do they expect an answer,
What are their expected outcomes (and by when),
Can you repeat their inquiry back to them in a clear, concise manner,
Will this require an internal review, and if so, who would do that work,
Will this intelligence be used internally or externally,
Who will review this work,
How important is this request when compared to other requests,
Into what format do you want the report (chart, text, etc.)
Is this question related to some legal request, requiring documentation, or following specific guidance goal,
While this require a presentation/training on my part when completed,
What level of confidence are they willing to accept, which can range from a rough guess to a high degree of confidence?
Skills: In many ways, this is the hardest category to consider, for one must be honest. Without this assessment, the researcher may needlessly expose themselves to having their work deemed less than acceptable over time. Some questions may include:
Do I have the time,
Do I have access to the data to complete the task,
Do I have the software/skills to complete the task,
Do I want to do this research,
What happens if I don’t do this,
Is this like previous questions I (or others) have answered in the past,
Can you repeat their inquiry back to them in a clear, concise manner,
Can someone else answer this question better than me,
Do I have the domain knowledge to understand the topic,
Do I need a collaborator,
Do I need some training to answer this question?
Costs: Sometimes there are costs associated with doing business researcher. Not all data is accessible in the format one needs, nor, as people believe is all information “free” on the internet. The researcher must understand the resource costs, but these may matter little to the person who generated the inquiry!
Do I need to purchase data/information services,
Do I need to get a license or right to access the data,
Do I need to purchase software or hardware,
Do I need to hire a consultant because I do not have the skills time or energy to complete this project is anticipated format,
Can I legally share this data, or does it have to be summarized, etc.?
Do I need to pay for training to respond to this request?
Review: After the work is delivered, sometimes it is helpful to review with the inquirer to understand how your research met their needs. And for any professional researcher, this is an ongoing query regarding “do I have the right knowledge to do my appointed tasks”. These questions may include discussions such as:
Will I be asked similar questions in the future,
Do you want to yourself/others to access this information directly without asking me,
Do you need training to access the data themselves,
Do you or I need more domain knowledge,
Did the information satisfy our organization’s needs?
So, what did I do once I better understood internal needs?
After a while, I started to see where most questions centered around “who was doing what where” and “were they successful”. Knowing most questions focused on certain topics, it was easy incorporate those queries into my ongoing data/market research activities. Ultimately, this lead to the development of the Port’s first maritime data mart by integrating PIERS into Oracle with many long-forgotten programs (such as Paradox and Brio). The datamart, using various scripts, generated quarterly market reports for Senior Staff. The information also provided specialized research studies for current or potential clients of the port concerning market patterns.
But people do not “understand the value of information”, something every researcher laments. When I was at the Port of Long Beach, Don Wylie, my boss, instructed me to include on every report “the data was developed by the Trade and Maritime Services using PIERS data”. The following year, there was no debate concerning renewing the PIERS data purchase, nor the value that the Trade Office provided.
In sum, asking the right questions, through a structured approach, can illuminate everyone’s expectations. This should result in more successful projects, while demonstrating the value of a robust internal research mission.
We talk about others being a legend in their own mind, although we like to think we are “Masters of Our Domain”. When it comes to data and analysis, that domain may not be a physical space, but the information and intelligence one manages/controls. For example, my background has focused on ports, transportation and freight movements, resulting in my domain knowledge regarding international trade.
But there is more than simply being the Master of One’s Domain to be a solid researcher. One has to know how domain knowledge can shape a research question.
Let’s look at this exchange from “Monty Python and the Holy Grail “, where the troll asks three questions. One of the questions is fairly complicated. The King asks for clarification, based on the domain knowledge gained earlier in the film from two solders who possess the specialized knowledge of swallows.
The question concerning the average airspeed velocity of an unladened swallow may only interest researchers examining the physics of avian flight (or Monty Python fans here and here). But having learned something about swallows earlier, the King knew enough about the domain to ask for clarification (in this case, to delay), by asking about another data attribute.
Regarding the query, the question of the average airspeed reflects a question concerning a specific data element, but the second question was based on another attribute, namely the type of swallow. For most researchers, knowing that extra bit of information may make the difference of good research or great research, or in this case, who lives or dies. So, there remains a benefit to being the domain master, as King Arthur reminded Bedevere as they cross the bridge, but only if one learns not only new data but how to apply that information.
There is the old nursery rhyme about how a kingdom is lost because a horseshoe falls off. The poem refers to paying attention to little things that can make a difference, as the casual relationship of minor things failing can evolve into major problems (the Space Shuttle Colombia is but one of many examples). While one could argue its importance on military logistics or other more mundane tasks (such as learning the basics when mastering any skill), the same logic could be applied to not only the development of data but to data applications.
In the age of “Big Data”, we see where more information can provide insights that were unavailable just five years ago. The use of Artificial Intelligence and Machine Learning will transform how we collect, manage and process data, providing insights that will assist researchers and decision makers. However, the casual relationships between collecting/using data with any unintended consequences remain.
For example, one could argue that I represent three people: a physical me who eats, sleeps and walks around, while there is a legal me, who signs legal documents and has financial interests. There is an emerging digital me, where I live and work in a virtual world. My information is collected, processed, and analyzed, as I become “a product” sold to others. In many ways, the data collected from millions of digital actions are creating better horseshoe nails for business, governments and others, but will this lead us to lose the kingdom of our individualism?
As a researcher, I have often heard people lament, “We studied this in the past and nothing was done”, or “Why are we not using this approach”, or some variation concerning the fact that data and information are not being used after the being developed, purchased, or studied. The question is that we think, using our crystal ball, we have built a masterpiece, and wonder why people don’t adopt our insights. We often forget that this “knowledge” could be slow to be adopted by others for many reasons.
Failure of adoption:
The first is simply the WHY? Sometimes when doing research we understand more about the question that the person who needs the answer. So while we prepare our work, we forget our client will only use what they can understand with some level of confidence. How often have we seen a more senior person misspeak based on information not properly summarized for them?
Secondly, there remains the ever consuming “tyranny of the urgent”, in that the research is needed in a timely manner, but the research is not needed beyond the “now”. The reasons can vary from staff turnover, policy change, new leadership, the findings were not what was expected, to a thousand different reasons. Furthermore, data is perishable, something that is often forgotten by the researcher, but not the client.
Thirdly, the experts may not agree with your opinions. My wife is a fan of Downton Abbey, and during season 3, Sybil Branson died after childbirth. The tragedy was there were two doctors arguing over her treatment, and the older doctor stated to the other doctor he is to not interfere. In many ways, we can find people with good intentions failing to achieve an expected outcome because they are using older models from the past. They remain uncommitted to learn, and without the application of new information, their working knowledge could, and does, fail, in providing actionable insights, or even providing the wrong information. Presenting this expert with new information may only lead them to become more entrenched to their position.
Finally, our research may not actually answer the question being asked!
For the research community, the ghost of people not adopting our great ideas haunts the adoption of our “great efforts”. But we must understand what the client may do with the research once it has been delivered, which may depend upon how we communicate before, during and after the research process!
Recently, Teen Vogue highlighted how crossing the U.S. border remains a daily reality for most people. The author focused on four stories: travel for work, school, shopping and family, although there are other reasons, such as medical or tourism. The article sited a Bureau of Transportation Statistics website which complied numbers, mode and locations crossings into the United States. And BTS does not report illegal crossings…that is Customs and Border Protection’s information.
So in 2018, the largest U.S. crossings with Mexico are shown on the following map
So, there are a few crossings in Southern California, El Paso, and the lower Rio Grande Valley with the largest passenger crossings, mostly by automobile or walking. The largest bus crossings are in Laredo, but the largest Pedestrian and automobile crossings occurred in San Ysidro and El Paso.
No one simply travels across the border just to say, “well I was in “so-and-so””, (well, unless you are interested in joining the Traveler’s Century Club.) Sometimes, we forget that for each data point, there is a reason why someone wanted to enter the United States. But for local and national groups, it is just as important to know the total number and location of how people entered the United States to assist in planning infrastructure and operational needs or to quantify the border’s economic contribution activity.
How the graph was created: Downloaded a custom table from BTS Border Crossing Database, converted the file into Excel to fix the geography, and than imported in Tableau.
Related databases: The BTS information does not include passenger flights, which are reported here
There is some information on commercial freight traffic, such as trains and trucks,which is presented below for 2018. For a fuller comparison of commercial freight transportation across the borders, go TransBorder Freight Data.
I am reminded of this Peanuts cartoon. Linus tells Lucy not to count the snowflakes for he already knows the answer. (We can argue that maybe Linus is a seasoned researcher, but I’m sure he was outside earlier doing what Lucy was doing.)
And what is counting, but simply putting a sequence such as “1, 2, 3,…”. I could type any number on a keyboard and generate data. When we look at data, sometimes we can get so absorbed in knowing a number that we forget why knowing a number matters. What benefit is it for Linus or Lucy to know the numbers of snowflakes? No one measures a single snowflake, but rather snowflakes, as the aggregate matters. For example, one snowflake weighs nothing, but too many can collapse a roof.
There remains a need to count and observe the world, and I am guilty of looking for data when I do research. So, statistical and data approaches are warranted to make sure we have sound information to make a decision. While it is easy to measure snowflakes or other actions, sometimes transportation and economic data is not as clearly observed. Understanding what is needed to be known helps us see the world before we go outside to count snowflakes.