Xavier Lee Smucker
Rewriting
I really enjoy crafting words. Beyond this blog and Facebook status updates (believe it or not, I sometimes agonize over them), academic papers and my dissertation are my primary writing outlets. To me, the tedious aspects of writing are the elements tangential to the act itself, like structuring the paper and formatting the document. One of my favorite writing-related activities is to rewrite a passage, seeing how much more concise I can make it. Not only does this channel my competitiveness in a productive way, it also allows me the freedom to write initial drafts thinking more about the ideas than how to express them.
Here are a couple of examples from a methodological paper about work I did at the Consulting Center with the Pennsylvania Fish & Boat Commission. Disregard the symbols; they are a byproduct of writing in LaTeX. I make no claim that this is great writing, but I think you’ll have to admit that the rewrites are improvements over the originals.
Sample 1 Original (90 words):
What may seem as an arbitrary level of specificity for the computation of these elements (for instance, why calculate $g_{m}$ only for each mode, but $b_{jkl}$ for each combination of geostrata, month, and day type but not mode) is a consequence of our method of calculation which is intuitive but requires a lot of data. In some cases, there was simply not enough data to estimate every combination and somewhat subjective decisions had to be made about which factors to consider. A more satisfactory method would involve a model-based regression approach.
Sample 1 Rewrite (79 words):
What may seem an arbitrary level of specificity for the computation of these elements (for instance, why calculate $g_{m}$ only for each mode, but $b_{jkl}$ for each combination of geostrata, month, and day type but not mode) is a consequence of our estimation method which is intuitive but data-intensive. In some cases, we lacked sufficient data and somewhat subjective decisions had to be made about which factors to consider. A more satisfactory method would utilize a model-based regression approach.
Sample 2 Original (116 words):
As an illustrative example, suppose only 5\% of agent wait time throughout the day is spent during the 800 hour, while 10\% is spent during the 1000 hour. In this case, if there was an equal amount of effort being expended during the two hours, twice as much would be captured in the 1000 hour because of the extra time interview agents were spending during that hour. Thus, to standardize these interview effort quantities we would divide by 0.05 any effort obtained based on an interview during the 800 hour and by 0.10 any effort obtained during the 1000 hour. Note that these new values have no units and are used simply to standardize the wait times to to be used to produce the distribution.
Sample 2 Rewrite (95 words):
For example, suppose only 5\% of agent wait time is spent during the 800 hour, while 10\% occurs from 1000-1100. If an equal amount of angler effort was expended in each hour, twice as much would be captured in the 1000 hour because of the extra time agents spent conducting interviews during that hour. Thus, to standardize these interview-elicited effort quantities we would divide by $0.05$ any interview-elicited effort obtained in an interview during the 800 hour and by $0.10$ any effort obtained during the 1000 hour. Note that these standardized values are unitless and are used only to produce the distribution.
Posted in Grad School, PhD, Statistics, Writing | Tags: Writing
Predicting the Number of Snowfalls
Someone told my mother-in-law that the day of the first snow would predict the number of snows. For instance, this year the first snowfall in these parts was on November 21 so this model would predict 21 distinct snow dumps this year.
To test this, I tried to keep track of the number of snows during the 2008/2009 season. This generated some controversy within the Smucker household surrounding the definition of a snowfall. I argued that if it snowed and then stopped and then snowed again, this constituted two distinct snowfalls even if these events happened during the same afternoon. Amy thought I was liberal at times in my snowfall quantity assignments.
At any rate, I counted 27 snowfalls this season, the last one being on April 7. It is obvious, then, that the proposed model is faulty. Instead, we can predict the number of snowfalls by taking the day of the first snowfall and adding to it one less than the day of the final snowfall.
This revised model fits the data perfectly.
Posted in Statistics
The Baby Name Wizard
There is a ton of information packed into this baby name chart. I post this in honor of the fact that we must choose a name for our impending son and I think we have.
Testimony Times
In conservative Mennonite churches – at least in those with which I am familiar – once or twice a year in anticipation of Communion there is a service dedicated to the testimonies of church members. For me, it is often encouraging because I get to hear how God is working in my Christian brothers’ and sisters’ lives.
Now, one may or may not be interested in predicting, at the outset of the service, how long it will last. To do this, you need two pieces of information: 1) How many people will testify; and 2) How long their testimonies will take, on average.
Several weeks ago on April 5 we had a testimony service at my church. I decided to do a little data collection and note the length of time from the beginning of one testimony to the next. I collected this data for probably 80% of the people, skipping perhaps the first 15 (all elderly) and the last 10 to 15. So I could have a biased sample in that I only included a portion of the older people, but we will assume that all is well.
Using Minitab I explored the data a bit. First a histogram:

And now some summary statistics:
N: 90
Mean: 45.56
Median: 35
Standard Deviation: 36.21
Standard Error of Mean: 3.82
Min: 5
Max: 170
Q1: 20
Q3: 55
Several interesting things here. First of all, the mean is almost smack dab on 45 seconds. Thus, as a decent estimate of how long the testimonies will take, count the number of people who will give testimonies and multiply by 0.75 minutes.
I think it’s interesting that the maximum is under 3 minutes. No extended discourses from anyone. You can see from the histogram that most of the testimonies were less than 1 minute with only a relative few longer than that. Oh, and there was one uncertain data point. I probably should have thrown it out, but I counted it as 2:25 when it possibly was 1:25, for what it’s worth.
Posted in Christianity, Statistics
Flossing Principles
I’m proud of Amy, for she has become a disciplined flosser since we’ve been married. I have followed in her footsteps to some degree, but I have imposed certain principles which govern this part of my oral hygiene:
I floss every other day, with the following exceptions:
- I don’t floss on weekends
- I don’t floss if I have one or more canker sores
- I don’t floss if I’m sick
- I don’t floss on vacation
To me, this places reasonable limits upon the inconvenience of flossing, with the added bonus of allowing minor indulgence when I’m not feeling altogether well.
Farming Skills
I was sitting in our weekly meeting at the consulting center, and my boss got to talking about students and how some of them do not meet his expectations. He said a senior faculty member told him once that farm kids do the best in the program (which I assume was referring to applied statistics), because they come in with a very practical problem-solving mindset.
On the farm, you have to solve the problem at hand with what you have, and such a skill can be translated into a setting such as applied statistics, where you might have some messy data and a method that sort-of-but-doesn’t-quite fit. So you evaluate what is important and necessary in the data analysis and find a way to make it work.
Posted in Statistics
Problems with Building the Kingdom
I’ve been thinking, recently, about building the kingdom of God and how that relates to me, right now. Frankly, I don’t feel that I am doing my part on a day-to-day basis here in State College. Mostly, it’s because of my spiritual apathy. But I believe there are also a couple of aggravating factors which you might call the “feet in two Christian worlds” (FITCW) problem and the “your main Christian world is far away” (YMCWIFA) problem. Unfortunately, these have followed me from Oregon State to Penn State.
The FITCW problem presents itself when the spiritual energy you might focus on your church is split between two receivers, in our case our home church and the Christian grad students group. The YMCWIFA problem is closely related and is posed when you go to church far from where you live and/or go to school or work. These things work in tandem to dampen the sacred scintillation that might prod me into ongoing, everyday work for the kingdom.
Perhaps a more basic issue brought up by these two problems is a lack of immediate community. Our church is far away so we are unable to experience life with them on a daily basis. The Christian grad student group is close by, but because we don’t fully plug in it has little chance to be the solution.
Now does that sound like an excuse?
The next thing to think about is how to defeat this, since I have a year-and-a-half or so left.
Posted in Christianity, Grad School, Personal
Obscure Scrabble Words
My in-laws are not big fans of the Scrabble Dictionary, with its massive list of arcane words. If the probability that a word comes up in everyday conversation approaches zero (i.e. zoeal), or if it is a shortened form of another word (i.e. za), or if it is a blatant perversion of a well-known word (i.e. luv), then it probably won’t be popular with them.
Consequently, they would like this, from Andrew Gelman. He decries the acceptability of foreign words like qat and xu, but as a commenter points out, where do you draw this line, since many English words are borrowed from other languages?
I think the Scrabble Dictionary gets a bad rap. It is simply reflective of the words which are considered, by one dictionary or another, to be valid English words. To craft an alternate list of admissible words would be hugely confusing, unless clear guidelines could be established which would easily distinguish “Scrabble words” from others (akin to the rule that no proper nouns are permitted).
I do think that in a “friendly” game, a reasonable rule would be that a player should be able to honestly define any word that is laid. Or, alternatively, decide on a dictionary and allow people to look up their words before they play them. The problem with the latter rule is that it renders infeasible the concept of the Challenge, which is a really interesting part of the game.
Bad Dreams
Over the years, I’ve had a recurring dream which invariably includes these elements:
- I’m a college student
- I’ve registered for a class but missed most or all of it
- The final exam is rapidly approaching
If you’ve been a student who cares about school, you realize this is horrifying.
Remarkably, it seems that this periodic nightmare has updated itself. Instead of playing the role of delinquent student, I am now teaching a class for which I have somehow shirked duty.
My guess is that in both cases a deep-seated fear of failure has exerted some influence on my subconscious.
Posted in Academia, Education, Grad School, Personal, PhD | Tags: Nightmares
Categories
- Academia
- capitalism
- Christianity
- Christmas
- Culture
- economics
- Education
- Family
- Food
- Games
- Grad School
- Hillary Clinton
- Humor
- Marriage
- Michael Novak
- Music
- Personal
- PhD
- Politics
- Public Speaking
- qualifier
- Research
- Ron Paul
- Running
- school
- Slate
- Society
- Statistics
- Tech
- Travel
- Uncategorized
- Wordpress
- Writing



