Rabbit's Warren

The remains of the semester

Posted on July 11, 2013 by Robert

Another semester is over, with grades in and a summer of writing to look forward to. To my students, congratulations to the many who passed, condolences to the few who didn’t, and remember: tomorrow is another day!

Posted in School | Comments Off

Another (&%&^%R* Earthquake!

Posted on June 2, 2013 by Robert

A 6.3 earthquake this afternoon. Supposedly only 5.0 in Puli, but one of the strongest and longest I’ve felt in a long time. Bookshelves down again, *&^%&!@$#!! This time both at home and in my office AGAIN, the second time this semester. Trying to remember how many times I’ve had damn earthquakes knock my bookshelves down since I moved to the Big Rumble; seems like it must be somewhere around a dozen. Jeez.

Posted in Quotidiana | Comments Off

Ironic twist

Posted on May 31, 2013 by Robert

In literature classes, one of the most difficult things to explain to Chinese speaking students is the difference between “irony”, “satire”, and “sarcasm.” Apparently distinguishing these is not always easy even for native speakers, but on the Chinese side, virtually every English-Chinese dictionary translates these as one thing: 諷刺. Yet the differences are vast.

This came back to me very sharply when my fiction class this semester read the stories “Gift of the Magi” and “The Diamond Necklace”; my question for the reading response began: “These two stories are both ‘ironic’ as we discussed in class…” One student’s response was 這兩個故事說不上是諷刺。比較像是 surprise ending，就是會覺得解果怎麼會是這樣，滿意外的…鑽石項鍊比較偏向諷刺，但是智者的禮物反而沒有諷刺的感覺，覺得有點可愛，好笑.

Naturally the student gets full credit for not cravenly surrendering her reasoning powers to the teacher, but as I said in my comments on the homework: “I said ironic, not 諷刺.” This is what “failure to communicate” really looks like.

Afterthought: Actually, now I’m wondering whether I got this wrong. Is the Gift of the Magi not ironic? We can at least say that, even though Jim and Della’s gifts to each other were useless in the end, they were not totally in vain. The gifts they chose revealed how much they cared for each other, and Jim, for one, took consolation in this. In “The Diamond Necklace”, however, the revelation of the necklace’s true value at the end destroys whatever consolation Mathilde has found. Does this make it more ironic?

In fact, even “The Diamond Necklace” was not ironic to some students, just silly: “Why didn’t Mathilde just tell Madame Forestier the necklace was lost instead of wasting ten years?” Good point. [That’s sarcastic, guys.]

Posted in Literature, Translation | Comments Off

Push comes to shove

Posted on May 26, 2013 by Robert

Have finally thrown up my hands and begun to prepare my home desktop computer for Windows 7, and like all donkeys I’m doing it because of the carrot and the stick. The stick is the fact that our school’s computers, including the ones used as projectors in class, and the laptops I sometimes have to borrow, are now all running Windows 7, and I regularly embarrass myself by trying to figure out how to do things like eject my usb drive, or search the hard disk.

The carrot is that Windows 7 is supposed to provide support for Unicode 3.0, or whatever the latest and greatest is. It’s just too inconvenient without the Unified Canadian Aboriginal Syllabics code block, this is something I really need. Just kidding. What I really want are the CJK Unified Ideographs extensions, which requires Unicode 6.0 or better. Got my fingers crossed on this one.

Posted in Software | Comments Off

Fun with Google translate

Posted on May 15, 2013 by Robert

Excerpt from a recent English Chinese translation homework assignment:

The size of a single sheet of papyrus was not constant in ancient times. For most non-literary documents (letters, accounts, receipts, etc.) a single sheet was sufficient; for longer texts, especially literary ones, sheets were stuck together and made into a roll. Rolls have been found measuring as much as 45 yards long. It was usual to write on only one side. Writing on both sides of the sheets of a papyrus roll is quite rare, probably because the delicate material can be torn so easily.

I regularly check assignments against Google translate and the result this time was especially entertaining:

紙莎草紙的單頁的大小是不恆定在古代。對於大多數非文學文檔（信件，帳目，收據等）在一張紙上就足夠了;對於較長的文本，尤其是文學的，片材粘貼在一起而製成的輥。羅爾斯已發現測量多達45碼長。這是平常只在一側寫。寫在紙莎草卷的紙張的兩面是相當罕見的，可能是因為精緻的材料可以如此輕易地撕開。

I especially like the 45 yard long Rolls Royce.

Posted in Translation | Comments Off

Congratulations to Lydia!

Posted on April 28, 2013 by Robert

My MA student Lydia Lin successfully completed her thesis defense on Friday (4/26); good job! Now to get those revisions in.

Posted in School | Comments Off

UTF-8 and utf8: Another piece of the puzzle

Posted on April 8, 2013 by Robert

I have been trying to do some parsing of google search results, then save the results in mysql. Naturally my script was in perl, and naturally this was not as straightforward as it looked. The most irritating part of the problem was trying to handle the hex encoded unicode in the urls returned by google search. Some of this was just normal stuff like ?, &, = and so forth. Some, however, was not; it was Chinese unicode characters (eeeek!). This was mostly from BBS or blog searches, and a nasty long mess it was.

The normal stuff is easily handled with URI::Escape; just uri_unescape($href) and you’re done.

Unicode was a different story. Here is what I finally wound up doing:

use Encode qw(decode encode);
use URI::Escape;

…..

my $href = uri_unescape($href);
if ($href =~ s/(%.*)//) {
$href .= decode(‘UTF-8’, uri_unescape($1));
}

I first apply uri_unescape to the whole string, and this catches the ordinary stuff. Then I chop off the bit that didn’t turn into normal characters and do it again, wrapping the result in decode(‘UTF-8’, $dehexedunicode).

Two odd things. First, I cannot just do the decode unescape thing all at once and get the normal characters and unicode characters at the same time. When I try this, it simply returns the original hexed unicode string. You have to get rid of all the regular unhexed stuff before you can proceed; so far, my regex does this; guess the weird stuff is always at the end of the string or something.

Second, you must do decode(‘UTF-8’, $dehexedunicode) to get a result which you can insert in mysql. NOTICE the capitals and the hyphen. As the Encode package pod explains,

utf8 = UTF8
and
utf-8 = utf_8 = UTF-8 = UTF_8

so the difference is between hyphenated UTF-8 (strict) and unhyphenated utf8 (loose)

At first, instead of trying to get the function right AND stuff it into mysql in one go, I just grabbed the text and wrote it to a file (on a windows xp machine). For this, I used utf8, and this worked fine; the characters were reconstituted from their freeze-dried hexedness and showed up in the file without problem. But when I applied this proven technique to mysql (5.5.8), the characters were immediately discombobulated into a primeval morass. Only UTF-8 and its homonyms will do for the son of Monty. It’s just a strict sort of program.

Live and learn. sigh.

Posted in Programming | Comments Off

Han Suyin (1916?-2012)

Posted on January 19, 2013 by Robert

One notice that I missed earlier this year was the death of Han Suyin (Han Suyin Dies; Wrote Sweeping Fiction. The New York Times 5 Nov. 2012). The Times writer notes some of the criticism directed at Han, but closes by describing her as “an unapologetic patriot.” Perhaps, but before you decide, I suggest reading Simon Leys’s essay on Han, “The Double Vision of Han Suyin.”

Posted in Literature | Comments Off

et al.+++

Posted on January 18, 2013 by Robert

Aubert, B., M. Bona, Y. Karyotakis, J. P. Lees, V. Poireau, E. Prencipe, X. Prudent, V. Tisserand, J. Garra Tico, E. Grauges, L. Lopez, A. Palano, M. Pappagallo, G. Eigen, B. Stugu, L. Sun, G. S. Abrams, M. Battaglia, D. N. Brown, R. N. Cahn, R. G. Jacobsen, L. T. Kerth, Y. G. Kolomensky, G. Lynch, I. L. Osipenkov, M. T. Ronan, K. Tackmann, T. Tanabe, C. M. Hawkes, N. Soni, A. T. Watson, H. Koch, T. Schroeder, D. Walker, D. J. Asgeirsson, B. G. Fulsom, C. Hearty, T. S. Mattison, J. A. McKenna, M. Barrett, A. Khan, V. E. Blinov, A. D. Bukin, A. R. Buzykaev, V. P. Druzhinin, V. B. Golubev, A. P. Onuchin, S. I. Serednyakov, Y. I. Skovpen, E. P. Solodov, K. Y. Todyshev, M. Bondioli, S. Curry, I. Eschrich, D. Kirkby, A. J. Lankford, P. Lund, M. Mandelkern, E. C. Martin, D. P. Stoker, S. Abachi, C. Buchanan, J. W. Gary, F. Liu, O. Long, B. C. Shen, G. M. Vitug, Z. Yasin, L. Zhang, V. Sharma, C. Campagnari, T. M. Hong, D. Kovalskyi, M. A. Mazur, J. D. Richman, T. W. Beck, A. M. Eisner, C. J. Flacco, C. A. Heusch, J. Kroseberg, W. S. Lockman, A. J. Martinez, T. Schalk, B. A. Schumm, A. Seiden, M. G. Wilson, L. O. Winstrom, C. H. Cheng, D. A. Doll, B. Echenard, F. Fang, D. G. Hitlin, I. Narsky, T. Piatenko, F. C. Porter, R. Andreassen, G. Mancinelli, B. T. Meadows, K. Mishra, M. D. Sokoloff, P. C. Bloom, W. T. Ford, A. Gaz, J. F. Hirschauer, M. Nagel, U. Nauenberg, J. G. Smith, K. A. Ulmer, S. R. Wagner, R. Ayad, A. Soffer, W. H. Toki, R. J. Wilson, D. D. Altenburg, E. Feltresi, A. Hauke, H. Jasper, M. Karbach, J. Merkel, A. Petzold, B. Spaan, K. Wacker, M. J. Kobel, W. F. Mader, R. Nogowski, K. R. Schubert, R. Schwierz, A. Volk, D. Bernard, G. R. Bonneaud, E. Latour, M. Verderi, P. J. Clark, S. Playfer, J. E. Watson, M. Andreotti, D. Bettoni, C. Bozzi, R. Calabrese, A. Cecchi, G. Cibinetto, P. Franchini, E. Luppi, M. Negrini, A. Petrella, L. Piemontese, V. Santoro, R. Baldini-Ferroli, A. Calcaterra, R. de Sangro, G. Finocchiaro, S. Pacetti, P. Patteri, I. M. Peruzzi, M. Piccolo, M. Rama, A. Zallo, A. Buzzo, R. Contri, M. Lo Vetere, M. M. Macri, M. R. Monge, S. Passaggio, C. Patrignani, E. Robutti, A. Santroni, S. Tosi, K. S. Chaisanguanthum, M. Morii, A. Adametz, J. Marks, S. Schenk, U. Uwer, V. Klose, H. M. Lacker, D. J. Bard, P. D. Dauncey, J. A. Nash, M. Tibbetts, P. K. Behera, X. Chai, M. J. Charles, U. Mallik, J. Cochran, H. B. Crawley, L. Dong, W. T. Meyer, S. Prell, E. I. Rosenberg, A. E. Rubin, Y. Y. Gao, A. V. Gritsan, Z. J. Guo, C. K. Lae, N. Arnaud, J. BÃ©quilleux, A. D’Orazio, M. Davier, J. F. da Costa, G. Grosdidier, A. HÃ¶cker, V. Lepeltier, F. Le Diberder, A. M. Lutz, S. Pruvot, P. Roudeau, M. H. Schune, J. Serrano, V. Sordini, A. Stocchi, G. Wormser, D. J. Lange, D. M. Wright, I. Bingham, J. P. Burke, C. A. Chavez, J. R. Fry, E. Gabathuler, R. Gamet, D. E. Hutchcroft, D. J. Payne, C. Touramanis, A. J. Bevan, C. K. Clarke, K. A. George, F. Di Lodovico, R. Sacco, M. Sigamani, G. Cowan, H. U. Flaecher, D. A. Hopkins, S. Paramesvaran, F. Salvatore, A. C. Wren, C. L. Davis, A. G. Denig, M. Fritsch, W. Gradl, G. Schott, K. E. Alwyn, D. Bailey, R. J. Barlow, Y. M. Chia, C. L. Edgar, G. Jackson, G. D. Lafferty, T. J. West, J. I. Yi, J. Anderson, C. Chen, A. Jawahery, D. A. Roberts, G. Simi, J. M. Tuggle, C. Dallapiccola, X. Li, E. Salvati, S. Saremi, R. Cowan, D. Dujmic, P. H. Fisher, G. Sciolla, M. Spitznagel, F. Taylor, R. K. Yamamoto, M. Zhao, P. M. Patel, S. H. Robertson, A. Lazzaro, V. Lombardo, F. Palombo, J. M. Bauer, L. Cremaldi, R. Godang, R. Kroeger, D. A. Sanders, D. J. Summers, H. W. Zhao, M. Simard, P. Taras, F. B. Viaud, H. Nicholson, G. De Nardo, L. Lista, D. Monorchio, G. Onorato, C. Sciacca, G. Raven, H. L. Snoek, C. P. Jessop, K. J. Knoepfel, J. M. LoSecco, W. F. Wang, G. Benelli, L. A. Corwin, K. Honscheid, H. Kagan, R. Kass, J. P. Morris, A. M. Rahimi, J. J. Regensburger, S. J. Sekula, Q. K. Wong, N. L. Blount, J. Brau, R. Frey, O. Igonkina, J. A. Kolb, M. Lu, R. Rahmat, N. B. Sinev, D. Strom, J. Strube, E. Torrence, G. Castelli, N. Gagliardi, M. Margoni, M. Morandin, M. Posocco, M. Rotondo, F. Simonetto, R. Stroili, C. Voci, P. del Amo Sanchez, E. Ben-Haim, H. Briand, G. Calderini, J. Chauveau, P. David, L. Del Buono, O. Hamon, P. Leruste, J. Ocariz, A. Perez, J. Prendki, S. Sitt, L. Gladney, M. Biasini, R. Covarelli, E. Manoni, C. Angelini, G. Batignani, S. Bettarini, M. Carpinelli, A. Cervelli, F. Forti, M. A. Giorgi, A. Lusiani, G. Marchiori, M. Morganti, N. Neri, E. Paoloni, G. Rizzo, J. J. Walsh, D. Lopes Pegna, C. Lu, J. Olsen, A. J. Smith, A. V. Telnov, F. Anulli, E. Baracchini, G. Cavoto, D. del Re, E. Di Marco, R. Faccini, F. Ferrarotto, F. Ferroni, M. Gaspero, P. D. Jackson, L. L. Gioi, M. A. Mazzoni, S. Morganti, G. Piredda, F. Polci, F. Renga, C. Voena, M. Ebert, T. Hartmann, H. SchrÃ¶der, R. Waldi, T. Adye, B. Franek, E. O. Olaiya, F. F. Wilson, S. Emery, M. Escalier, L. Esteve, S. F. Ganzhur, G. H. de Monchenault, W. Kozanecki, G. Vasseur, Ch YÃ¨che, M. Zito, X. R. Chen, H. Liu, W. Park, M. V. Purohit, R. M. White, J. R. Wilson, M. T. Allen, D. Aston, R. Bartoldus, P. Bechtle, J. F. Benitez, K. Bertsche, Y. Cai, R. Cenci, J. P. Coleman, M. R. Convery, F. J. Decker, J. C. Dingfelder, J. Dorfan, G. P. Dubois-Felsmann, W. Dunwoodie, S. Ecklund, R. Erickson, R. C. Field, A. Fisher, J. Fox, A. M. Gabareen, S. J. Gowdy, M. T. Graham, P. Grenier, C. Hast, W. R. Innes, R. Iverson, J. Kaminski, M. H. Kelsey, H. Kim, P. Kim, M. L. Kocian, A. Kulikov, D. W. Leith, S. Li, B. Lindquist, S. Luitz, V. Luth, H. L. Lynch, D. B. Macfarlane, H. Marsiske, R. Messner, D. R. Muller, H. Neal, S. Nelson, A. Novokhatski, C. P. O’Grady, I. Ofte, A. Perazzo, M. Perl, B. N. Ratcliff, C. Rivetta, A. Roodman, A. A. Salnikov, R. H. Schindler, J. Schwiening, J. Seeman, A. Snyder, D. Su, M. K. Sullivan, K. Suzuki, S. K. Swain, J. M. Thompson, J. Va’vra, D. Van Winkle, A. P. Wagner, M. Weaver, C. A. West, U. Wienands, W. J. Wisniewski, M. Wittgen, W. Wittmer, D. H. Wright, H. W. Wulsin, Y. Yan, A. K. Yarritu, K. Yi, G. Yocky, C. C. Young, V. Ziegler, P. R. Burchat, A. J. Edwards, S. A. Majewski, T. S. Miyashita, B. A. Petersen, L. Wilden, S. Ahmed, M. S. Alam, J. A. Ernst, B. Pan, M. A. Saeed, S. B. Zain, S. M. Spanier, B. J. Wogsland, R. Eckmann, J. L. Ritchie, A. M. Ruland, C. J. Schilling, R. F. Schwitters, B. W. Drummond, J. M. Izen, X. C. Lou, F. Bianchi, D. Gamba, M. Pelliccioni, M. Bomben, L. Bosisio, C. Cartaro, G. Della Ricca, L. Lanceri, L. Vitale, V. Azzolini, N. Lopez-March, F. Martinez-Vidal, D. A. Milanes, A. Oyanguren, J. Albert, S. Banerjee, B. Bhuyan, H. H. Choi, K. Hamano, R. Kowalewski, M. J. Lewczuk, I. M. Nugent, J. M. Roney, R. J. Sobie, T. J. Gershon, P. F. Harrison, J. Ilic, T. E. Latham, G. B. Mohanty, H. R. Band, X. Chen, S. Dasu, K. T. Flood, Y. Pan, M. Pierini, R. Prepost, C. O. Vuosalo, S. L. Wu, and Babar Collaboration. “Observation of the Bottomonium Ground State in the Decay Upsilon(3s)–>Gammaetab.” Physical review letters 101.7 (2008).

Posted in Research methods | Comments Off

Backing up and recovering Endnote libraries

Posted on December 20, 2012 by Robert

I try to back up every file that gets changed in my working directory every day or two. This is not a problem except for one program: Endnote, still my main bibliography manager. Unlike 99% of all programs, even if you do nothing but open and close an endnote library (*.enl) file, Endnote will mark anywhere from 10-12 files as changed, some of them buried in sub-sub-directories. This makes backing up a pain, and tempts me to folly, like backing up every month instead of every other day.

Here is my somewhat random method to minimize Endnote’s back up workload. NOTE: This method is based on MY habits and needs, which are highly individual (weird). YOUR needs are different than mine, so be sure to think about what you need before you do it.

Anyway, here is my routine. Any time the *.enl file is changed, back it up. This does NOT change every time you open the file, and when it DOES change, it means your data has really changed, so, no problem.

However, some things DO change without there being a difference in your REAL data; this is the stuff in the two (or three) data sub-directories associated with the .enl file, and this is where I get headaches.

Say you have a library file called dummy.enl All versions of Endnote since 7 or 8 will create an associated directory called dummy.Data. Newer versions (don’t know since when) can have as many as three sub-directories under dummy.Data. Here is what they are:

dummy.DataPDF: If you attach any files (pdf, etc) to your library items using the copy to local option, they go in this sub-directory. Obviously, not every .Data directory will have this. I have stern habits about how I use this feature, which I won’t go into here.

dummy.Datatrash (since Endnote X1 at least). This has two things: a file called trash.enl and another sub-directory: dummy.Datatrashrdb. This will be filled with anywhere from 6 to 12 files for every endnote library. Yuck.

When backing up, make sure that everything in the “trash” subdirectories is ignored by your backup programs. Endnote saves stuff you delete (“put in the trash”), and then asks you if you want to get rid of it every time you open and close the library. Do not treat this feature as anything other than a safety measure, to use in case you accidentally zap stuff. Either use it on the spot, or delete it all when you close the file. Backing this up is a waste of time and cpu cycles, and will only confuse you when you actually need to restore your backups. Instead, focus on the following…

dummy.Datardb: The only important stuff for real backups is what’s in the .Datardb sub-directory. This is all in the form of MyISAM tables, which means that for each table there are three files: *.frm, *.myd, *.myi. Yuck again. The number of tables will vary from library to library, depending on what features of Endnote you use, but there is real data (or meta-data) stored here, so you have to be careful with this stuff.

Before you decide what to back up, however, you need to know what Endnote does in case this stuff is lost. If you have dummy.enl and no dummy.Data directory, Endnote will silently create a new dummy.Data directory with all the basic files it uses. This only works if dummy.enl is not corrupted. If there are problems with dummy.enl, and you don’t have the .Data directory for it, you are probably toast. However, since I back up like mad, this is not my situation. What this silent recovery means for me is that some of these basic files don’t need to be backed up. Specifically, all of the .frm files are more or less unchanging. For some reason, one file called csort.frm gets rewritten almost every time you open a library, but this is meaningless; the contents never change, so make sure it goes on the exclude list.

For the individual tables, this means that you need, at most, to back up .myd and .myi files, representing data and indexes for the MyISAM tables. For each of my Endnote libraries, I have a maximum of 6 tables under rdb: csort, jterms, misc, refs, refs_ext, and terms. Not ALL of these tables are essential for your library to work. Apparently csort saves information about the sort order you are using. This is pretty unimportant, yet all three MyISAM files (csort.myd, csort.myi, and csort.frm) are rewritten every time you open a library. Put this on your exclude list. jterms and terms are where Endnote keeps the info for your term lists. As long as your library is intact, these can be regenerated without any problem, so I exclude them too. refs is all the info in your library: every field for every reference. So if they use this, what’s the *.enl file for? Thoroughly redundant, but that’s Endnote’s business. I have no idea what refs_ext does, or its importance, have to get back to you on that. The misc table is what it says: it includes petty things like the size and location of windows, and important things that represent a real investment in time, like groups and groupsets.

This makes the misc table a real pain to back up. It is rewritten every time you open a library, so the date changes for misc.myd and misc.myi. Even though it is rewritten every time you open a library, the data file misc.myd sometimes doesn’t change at all, but because the misc table holds my group information, I absolutely want this backed up whenever I am working on groups.

I have one library with over 1400 references in it; the groups are an essential part of my data for these references. Unfortunately, I can’t tell when this important data has changed (must back it up) and when just the windows have changed (who cares, no backup). Bad news for backer uppers. Although the refs table is redundant, and can therefore be skipped, I like redundancy for real data; in addition, refs.myd apparently only changes when .enl changes; since I back up .enl, why not back up refs as well?

So the sticking point for the table data files is backing up misc, and doing it very, very, often. Can’t rely on date stamps or size to determine when to backup.

In addition to the myd data files, the myi index files change constantly. It is very tempting to ignore these changes, since most of it is meaningless MyISAM table management, but beware: if you have added rows or changed indexed fields for any of these tables, and you don’t have the index that goes with it, Endnote will refuse to open your library with the warning: “This library appears to be damaged. Please verify that no other user has this library open simultaneously with write access.” It will then demand that you use the “Recover library” function to open the library. Recover library may or may not get back all your groups. I had at least one case where it did not, and this is a bad memory.

So here is my backup regime: exclude everything in the .Datatrash directory. Back up all *.enl files, and all rdbrefs.myd files whenever they change. If you are messing with groups, you MUST backup the 3 misc files under rdb to be sure this data is safe. If I know I have not touched my groups in a blue moon, I happily skip this, but this means I cannot automatically them on the exclude list. csort, is meaningless and is a definite exclude, and terms, and jterms are both excludable. If you backup an myd file, you must backup its companion myi file, or Endnote will make you “recover” your whole library; this consists of making a copy of the library called XXX-saved in the same directory you are working in, and an associated .Data directory as well. I currently backup the refs_ext table on the same schedule as the misc table, because I don’t know what its for.

The long and short: exclude the trash directory, and jterms, terms, and csort in the rdb directory to cut down on your backup space and time. .enl and refs.myd and refs.myi are must haves. The files misc.myd and misc.myi should also be saved frequently when you are working on groups; otherwise, it is very skippable. All the *.frm files you need can be generating by opening the .enl file after renaming its .Data directory.

So, you have saved a little bit of time on your backups and a little space as well. The real reason Endnote is a pain to back up is the programmer’s failure to properly factor the table data: groups, which are an essential part of libraries, should have their own table, sorting options should not. But as big Tony often said, “Whaddaya gonna do?”

Posted in Research methods, Software | Comments Off

The remains of the semester

Another (&%&^%R* Earthquake!

Ironic twist

Push comes to shove

Fun with Google translate

Congratulations to Lydia!

UTF-8 and utf8: Another piece of the puzzle

Han Suyin (1916?-2012)

et al.+++

Backing up and recovering Endnote libraries

New ARC site is up!

Recent Posts

Categories

Archives

Links to my pages

Search previous posts