Monday, December 26, 2011

Calculate created files' size

Quick note to self on calculating cumulative size of regular files created in a certain timeframe:
find . -type f -a \( ! -ctime +6 \) -a -ctime +5 \
       -a -xdev -a -exec du -b \{\} \; |
cut -f1 -s |
( sz=0; while read fsz; do sz=$((${sz}+${fsz})); done; echo ${sz} )
In the example above it will add up sizes of files updated between 5 and 6 days ago. Mac OSX users remove the -b option.

Wednesday, June 15, 2011

To ask or not to ask

While Marina waits for eel to be skinned in a Chinatown shop, Dennis contemplates by the store's frog tank:

- I wonder, if Marina may cook that frog...

- (me) Ask her.

- Oh, no, she will cook it!

Friday, April 1, 2011

Questions about nuclear power industry impact

It would be interesting to learn about studies answering some questions I did not find answers to:
  1. What is the environmental and human impact of the whole supply chain for nuclear energy, not just the power plants? That would include warehousing, transportation, waste disposal.

  2. Most statistics about industrial influence on human life focus on the immediate mortality and length of life. It would be interesting to have some sort of measurement of life quality for accident victims. Than may include their disease history and community metrics. For example, my current perception is that victims of nuclear accidents will receive more per person support from a US government, than those suffering from the coal industry impact. True or not?

  3. How many of the impacted people lost a bodily function - sight, hearing, olfaction, limbs' functions?

  4. How many people had to take medicine for life or generally be under medical supervision for life?

  5. What is a percentage of people who gained full employment after an incident?

  6. For how long they were able to sustain the employment?

  7. How about their average income? Is there statistically significant change?

  8. Did their living space square footage change?

  9. Were they required to relocate and how far?

Any reference to materials suitable for general public are appreciated. Also, other questions measuring life quality would be interesting to hear.

Seth's Blog post: The triumph of coal marketing

The Seth's Blog post: The triumph of coal marketing raises interesting questions. Mr.Godin positions the questions as marketing ones. Overall I understood the general thought flow of his post as:

(a) The graphic allows to experience the perception bias against the nuclear power industry. (b) It is nothing but coal industry's marketing, which moved public perception against the nuclear power industry. (c) Marketing is powerful.

The (a) and (c) statements are quite agreeable to me, albeit a bit trivial. The (b) statement seems to be a careless public disservice - and here is why I think so.

After asking myself, which issues would make me emotionally side with coal or nuclear energy, I found the following (completely unscientifically, just sifting through my memory):

  • Nuclear energy installations' ability to produce less frequent, but large events, while there seem to be more of smaller events, or continuous damage from the coal industry. I believe it is a natural property of our mind to emphasize large infrequent events, than almost routine stream of small events.

  • Highly emotional attachment of nuclear technology to it's first use as a weapon in Japan. Quite reverse (even though centuries ago), coal was initially used to keep houses warm. No wonder coal has a more positive clout from the beginning, vs. the negative one for nuclear power. Modern nuclear arms issues and fears do not help the positive image either.

  • Traditional distrust to many government-provided reports. Such distrust may have diverse roots. For example, for the population of former Soviet Union, it may stem from a long history of fabrications, half-truths and secrecies throughout the government. For American population it may come from the overall secrecy of the original use of nuclear technology by the military. The distrust is also fueled by the questionable reporting practices. For example, "The triumph of coal marketing" blog post is based on an original article at Next Big Future blog. That article in turn bases it's coal and nuclear industries' numbers on a 10-year old ExternE publication, 12-year-old IAEA publication, an unspecified source for WHO data on yearly deaths, a broken link to Metal/Nonmetal fatalities, and a non-specific (not separating coal vs. other) document at the US Department of Labor, a WHO Chernobyl study (single event, not industry-wide), and other resources not related to coal or nuclear power. So, there are about 10 years of missing data about nuclear energy related deaths and 10 years of non-applicable data about mining. Even though the data may be right in the ballpark and fits the purpose of the original article just fine, it's extraction for Mr.Godin's post is bad enough to make this reporting either sensational, fuel that same distrust, or both (no wonder that chart unsettles a lot of people). Such quality of "reporting" comes from all sides of the debate, which does not help either.

  • Perceived choice of living proximity to a source of trouble seems to be different for nuclear and coal industries. Dangerous coal industry's locations perceived to be mostly coal mines at known locations. Many people would not think twice about living nearby those. Nuclear industry's main perceived troublemakers are energy plants - right in everyone's backyard, or at least in a metropolitan area. There also seems to be a perception that a nuclear plant may come to a neighborhood and there is nothing people can do about it. Clearly, it is not the first thought to associate coal industry dangers with coal distribution (transportation) or power plants - people do not consider it a close proximity danger as much as they do coal mines. Although I know nobody who would be thrilled to live by coal power plants, either. The point is that there feels to be less of of them build and with less buzz surrounding them.

  • Perception that long-term deaths (a.k.a. shortened life) and overall impact caused by nuclear industry are more plentiful and more impactful, whatever that means. There is little awareness which I notice about specific long-term effects on life expectancy for both coal and nuclear energy. There is enough awareness of generic long term effects of both. So most people I know will claim that negative side-effects or events of either technology potentially shortening an individual's lifespan in general. However, I can only think of a handful people I know, who would contemplate that the averaged lifespan change is about the same for people impacted by accidents in both industries. Most people will perceive the radioactive incidents to have much bigger impact on life expectancy and quality - both stronger and longer. From what I understand, that comes from a few sources:

    1. The medical fall-off from the Hiroshima and Nagasaki bombings (as in the last sentence of the RERF FAQ)
    2. General high-school physics knowledge that some isotopes have long half-lives.
    3. Mass evacuations and relocations (proper or not) which happened throughout the nuclear industry history.
    4. Government and journalistic emphasis on nuclear energy and radioactivity events in general. That seems to be coming from both of those groups' attention seeking behavior, which goes after the novelty of a subject and the subject's large events.

I did not see enough coal industry marketing to attribute my mindset to it. Unless it is an "invisible marketing"".

Based on the thinking above I can only see the "marketing theory" as a conspiracy theory. Mr.Godin says "it was advertising, or perhaps deliberate story telling". For completeness, he brings "the stories we tell", yet still converges on marketing. To me it sounds like a conspiracy theory. Here is why:

  • The "marketing theory" does not seem to explain more of the evidence than a mainstream story.
  • It employs a fallacy to jump to a conclusion.
  • There is no logic offered whatsoever to support the main thesis: "any time reality doesn't match your expectations, it means that marketing was involved".
  • No credible whistle-blowers or examples named at all, less so to support the scale of the claim.
  • The "marketing theory" is hard to falsify in this application.

Here are some reasons I can think of for people to promote conspiracy theories:

  1. if people have no personal interest promoting a conspiracy theory:
    1. they may be dumb - clearly not Mr.Godin's case
    2. they may be bored - also an unlikely case for the busy businessman
    3. they act mindlessly
  2. if people do have personal interest promoting a conspiracy theory:
    1. they may represent special interest groups. Mr.Godin gives neither representations or disclaimers, however, I dismiss that thought as improbable (*).
    2. they seek cheap publicity, often riding on a wave of some public concern.

As one may guess, my bets are on scenarios 1.3 and 2.2 for Mr.Godin's post. Both are unfortunate as the resulting post creates a negative societal value, i.e. a disservice. It is quite disheartening to see it from such a prominent public figure. Especially assuming that many readers will not care about the intricacies of the intentions.

Some needed closing comments and disclaimers:

I do not belong to any special interests party in the debate other than being a concerned citizen. I do believe that data used is correct in general and nuclear energy should have it's place in the future. I do agree that marketing is a powerful tool, which may be used for good and bad. I do not like when people exploit a community trouble without aiding the troubled community in that process.

My contributions in this post are completely unscientific. Rather they are personal observations and speculations, much like the original post of Mr.Godin.


* Also, without the relevant disclaimer a post like that would likely to violate FTC Rules if author received a compensation for it.

Monday, February 7, 2011

About notes on document-oriented NoSQL by Curt Monash

I feel that couple observations need to be added to the Notes on document-oriented NoSQL by Curt Monash.

MongoDB does not store JSON documents, but rather JSON-style documents - specifically BSON (http://bsonspec.org/). It has important performance benefits for mostly numeric flexible-schema stores (read - health and social statistics, finance). Effectively the data does not need to serialize in and out of character stream between application objects and the store.

That also allows MongodB to manage storage as a set of memory-mapped files so that the DB server has little need and overhead of managing data persistence on disk. A side effect of memory mapped files efficiency is that objects are capped in size. I believe the current limit is 8MB, but do not quote me on that.

Many RDBMS implementations can have explicit (foreign keys) and implicit (joins) references between data items. That allows to build an arbitrary, albeit complex, data graph and have it persisted in the data's meta-data or at least somewhere between an application and DB. For example in queries, views, and stored procedures.

BSON, like JSON, represents inherently acyclic data graphs - effectively directed trees. It has no build-in mechanism to keep record of any relationships in data except for containment at below object level. That seems to be consistent with MongoDB's philosophy of disclaiming any significant responsibility for meta-data. If schema management is not in the DB engine, then why should meta-schema be in the DB engine? That is a blessing and a curse, as one needs to use a proprietary format if they want to persist the structural information in the data store.

In XML-based stores one has an option of using the family of XML-related standards to record and query edges of a data graph. One can check a schema validity for an XML document. I doubt MarkLogic does it natively today however it is very conceivable to have XPath references from inside one document into innards of another and have the relationship followed as a part of a query. This is a bit more than a "true document" notion as it is a cross-document relationship.

Same thing - it is a blessing and a curse as it brings to the table character serialization penalty and an easy way to make a very convoluted data design. And I do not even want to go into performance issues of a random graph traversal.