Push comes to shove

Have finally thrown up my hands and begun to prepare my home desktop computer for Windows 7, and like all donkeys I’m doing it because of the carrot and the stick. The stick is the fact that our school’s computers, including the ones used as projectors in class, and the laptops I sometimes have to borrow, are now all running Windows 7, and I regularly embarrass myself by trying to figure out how to do things like eject my usb drive, or search the hard disk.

The carrot is that Windows 7 is supposed to provide support for Unicode 3.0, or whatever the latest and greatest is. It’s just too inconvenient without the Unified Canadian Aboriginal Syllabics code block, this is something I really need. Just kidding. What I really want are the CJK Unified Ideographs extensions, which requires Unicode 6.0 or better. Got my fingers crossed on this one.

Posted in Software | Comments Off on Push comes to shove

Fun with Google translate

Excerpt from a recent English Chinese translation homework assignment:

The size of a single sheet of papyrus was not constant in ancient times. For most non-literary documents (letters, accounts, receipts, etc.) a single sheet was sufficient; for longer texts, especially literary ones, sheets were stuck together and made into a roll. Rolls have been found measuring as much as 45 yards long. It was usual to write on only one side. Writing on both sides of the sheets of a papyrus roll is quite rare, probably because the delicate material can be torn so easily.

I regularly check assignments against Google translate and the result this time was especially entertaining:

紙莎草紙的單頁的大小是不恆定在古代。對於大多數非文學文檔(信件,帳目,收據等)在一張紙上就足夠了;對於較長的文本,尤其是文學的,片材粘貼在一起而製成的輥。羅爾斯已發現測量多達45碼長。這是平常只在一側寫。寫在紙莎草卷的紙張的兩面是相當罕見的,可能是因為精緻的材料可以如此輕易地撕開。

I especially like the 45 yard long Rolls Royce.

Posted in Translation | Comments Off on Fun with Google translate

Congratulations to Lydia!

My MA student Lydia Lin successfully completed her thesis defense on Friday (4/26); good job! Now to get those revisions in.

Posted in School | Comments Off on Congratulations to Lydia!

UTF-8 and utf8: Another piece of the puzzle

I have been trying to do some parsing of google search results, then save the results in mysql. Naturally my script was in perl, and naturally this was not as straightforward as it looked. The most irritating part of the problem was trying to handle the hex encoded unicode in the urls returned by google search. Some of this was just normal stuff like ?, &, = and so forth. Some, however, was not; it was Chinese unicode characters (eeeek!). This was mostly from BBS or blog searches, and a nasty long mess it was.

The normal stuff is easily handled with URI::Escape; just uri_unescape($href) and you’re done.

Unicode was a different story. Here is what I finally wound up doing:

use Encode qw(decode encode);
use URI::Escape;

…..

my $href = uri_unescape($href);
if ($href =~ s/(%.*)//) {
$href .= decode(‘UTF-8’, uri_unescape($1));
}

I first apply uri_unescape to the whole string, and this catches the ordinary stuff. Then I chop off the bit that didn’t turn into normal characters and do it again, wrapping the result in decode(‘UTF-8’, $dehexedunicode).

Two odd things. First, I cannot just do the decode unescape thing all at once and get the normal characters and unicode characters at the same time. When I try this, it simply returns the original hexed unicode string. You have to get rid of all the regular unhexed stuff before you can proceed; so far, my regex does this; guess the weird stuff is always at the end of the string or something.

Second, you must do decode(‘UTF-8’, $dehexedunicode) to get a result which you can insert in mysql. NOTICE the capitals and the hyphen. As the Encode package pod explains,

utf8 = UTF8
and
utf-8 = utf_8 = UTF-8 = UTF_8

so the difference is between hyphenated UTF-8 (strict) and unhyphenated utf8 (loose)

At first, instead of trying to get the function right AND stuff it into mysql in one go, I just grabbed the text and wrote it to a file (on a windows xp machine). For this, I used utf8, and this worked fine; the characters were reconstituted from their freeze-dried hexedness and showed up in the file without problem. But when I applied this proven technique to mysql (5.5.8), the characters were immediately discombobulated into a primeval morass. Only UTF-8 and its homonyms will do for the son of Monty. It’s just a strict sort of program.

Live and learn. sigh.

Posted in Programming | Comments Off on UTF-8 and utf8: Another piece of the puzzle

Han Suyin (1916?-2012)

One notice that I missed earlier this year was the death of Han Suyin (Han Suyin Dies; Wrote Sweeping Fiction. The New York Times 5 Nov. 2012). The Times writer notes some of the criticism directed at Han, but closes by describing her as “an unapologetic patriot.” Perhaps, but before you decide, I suggest reading Simon Leys’s essay on Han, “The Double Vision of Han Suyin.”

Posted in Literature | Comments Off on Han Suyin (1916?-2012)

et al.+++

Aubert, B., M. Bona, Y. Karyotakis, J. P. Lees, V. Poireau, E. Prencipe, X. Prudent, V. Tisserand, J. Garra Tico, E. Grauges, L. Lopez, A. Palano, M. Pappagallo, G. Eigen, B. Stugu, L. Sun, G. S. Abrams, M. Battaglia, D. N. Brown, R. N. Cahn, R. G. Jacobsen, L. T. Kerth, Y. G. Kolomensky, G. Lynch, I. L. Osipenkov, M. T. Ronan, K. Tackmann, T. Tanabe, C. M. Hawkes, N. Soni, A. T. Watson, H. Koch, T. Schroeder, D. Walker, D. J. Asgeirsson, B. G. Fulsom, C. Hearty, T. S. Mattison, J. A. McKenna, M. Barrett, A. Khan, V. E. Blinov, A. D. Bukin, A. R. Buzykaev, V. P. Druzhinin, V. B. Golubev, A. P. Onuchin, S. I. Serednyakov, Y. I. Skovpen, E. P. Solodov, K. Y. Todyshev, M. Bondioli, S. Curry, I. Eschrich, D. Kirkby, A. J. Lankford, P. Lund, M. Mandelkern, E. C. Martin, D. P. Stoker, S. Abachi, C. Buchanan, J. W. Gary, F. Liu, O. Long, B. C. Shen, G. M. Vitug, Z. Yasin, L. Zhang, V. Sharma, C. Campagnari, T. M. Hong, D. Kovalskyi, M. A. Mazur, J. D. Richman, T. W. Beck, A. M. Eisner, C. J. Flacco, C. A. Heusch, J. Kroseberg, W. S. Lockman, A. J. Martinez, T. Schalk, B. A. Schumm, A. Seiden, M. G. Wilson, L. O. Winstrom, C. H. Cheng, D. A. Doll, B. Echenard, F. Fang, D. G. Hitlin, I. Narsky, T. Piatenko, F. C. Porter, R. Andreassen, G. Mancinelli, B. T. Meadows, K. Mishra, M. D. Sokoloff, P. C. Bloom, W. T. Ford, A. Gaz, J. F. Hirschauer, M. Nagel, U. Nauenberg, J. G. Smith, K. A. Ulmer, S. R. Wagner, R. Ayad, A. Soffer, W. H. Toki, R. J. Wilson, D. D. Altenburg, E. Feltresi, A. Hauke, H. Jasper, M. Karbach, J. Merkel, A. Petzold, B. Spaan, K. Wacker, M. J. Kobel, W. F. Mader, R. Nogowski, K. R. Schubert, R. Schwierz, A. Volk, D. Bernard, G. R. Bonneaud, E. Latour, M. Verderi, P. J. Clark, S. Playfer, J. E. Watson, M. Andreotti, D. Bettoni, C. Bozzi, R. Calabrese, A. Cecchi, G. Cibinetto, P. Franchini, E. Luppi, M. Negrini, A. Petrella, L. Piemontese, V. Santoro, R. Baldini-Ferroli, A. Calcaterra, R. de Sangro, G. Finocchiaro, S. Pacetti, P. Patteri, I. M. Peruzzi, M. Piccolo, M. Rama, A. Zallo, A. Buzzo, R. Contri, M. Lo Vetere, M. M. Macri, M. R. Monge, S. Passaggio, C. Patrignani, E. Robutti, A. Santroni, S. Tosi, K. S. Chaisanguanthum, M. Morii, A. Adametz, J. Marks, S. Schenk, U. Uwer, V. Klose, H. M. Lacker, D. J. Bard, P. D. Dauncey, J. A. Nash, M. Tibbetts, P. K. Behera, X. Chai, M. J. Charles, U. Mallik, J. Cochran, H. B. Crawley, L. Dong, W. T. Meyer, S. Prell, E. I. Rosenberg, A. E. Rubin, Y. Y. Gao, A. V. Gritsan, Z. J. Guo, C. K. Lae, N. Arnaud, J. Béquilleux, A. D’Orazio, M. Davier, J. F. da Costa, G. Grosdidier, A. Höcker, V. Lepeltier, F. Le Diberder, A. M. Lutz, S. Pruvot, P. Roudeau, M. H. Schune, J. Serrano, V. Sordini, A. Stocchi, G. Wormser, D. J. Lange, D. M. Wright, I. Bingham, J. P. Burke, C. A. Chavez, J. R. Fry, E. Gabathuler, R. Gamet, D. E. Hutchcroft, D. J. Payne, C. Touramanis, A. J. Bevan, C. K. Clarke, K. A. George, F. Di Lodovico, R. Sacco, M. Sigamani, G. Cowan, H. U. Flaecher, D. A. Hopkins, S. Paramesvaran, F. Salvatore, A. C. Wren, C. L. Davis, A. G. Denig, M. Fritsch, W. Gradl, G. Schott, K. E. Alwyn, D. Bailey, R. J. Barlow, Y. M. Chia, C. L. Edgar, G. Jackson, G. D. Lafferty, T. J. West, J. I. Yi, J. Anderson, C. Chen, A. Jawahery, D. A. Roberts, G. Simi, J. M. Tuggle, C. Dallapiccola, X. Li, E. Salvati, S. Saremi, R. Cowan, D. Dujmic, P. H. Fisher, G. Sciolla, M. Spitznagel, F. Taylor, R. K. Yamamoto, M. Zhao, P. M. Patel, S. H. Robertson, A. Lazzaro, V. Lombardo, F. Palombo, J. M. Bauer, L. Cremaldi, R. Godang, R. Kroeger, D. A. Sanders, D. J. Summers, H. W. Zhao, M. Simard, P. Taras, F. B. Viaud, H. Nicholson, G. De Nardo, L. Lista, D. Monorchio, G. Onorato, C. Sciacca, G. Raven, H. L. Snoek, C. P. Jessop, K. J. Knoepfel, J. M. LoSecco, W. F. Wang, G. Benelli, L. A. Corwin, K. Honscheid, H. Kagan, R. Kass, J. P. Morris, A. M. Rahimi, J. J. Regensburger, S. J. Sekula, Q. K. Wong, N. L. Blount, J. Brau, R. Frey, O. Igonkina, J. A. Kolb, M. Lu, R. Rahmat, N. B. Sinev, D. Strom, J. Strube, E. Torrence, G. Castelli, N. Gagliardi, M. Margoni, M. Morandin, M. Posocco, M. Rotondo, F. Simonetto, R. Stroili, C. Voci, P. del Amo Sanchez, E. Ben-Haim, H. Briand, G. Calderini, J. Chauveau, P. David, L. Del Buono, O. Hamon, P. Leruste, J. Ocariz, A. Perez, J. Prendki, S. Sitt, L. Gladney, M. Biasini, R. Covarelli, E. Manoni, C. Angelini, G. Batignani, S. Bettarini, M. Carpinelli, A. Cervelli, F. Forti, M. A. Giorgi, A. Lusiani, G. Marchiori, M. Morganti, N. Neri, E. Paoloni, G. Rizzo, J. J. Walsh, D. Lopes Pegna, C. Lu, J. Olsen, A. J. Smith, A. V. Telnov, F. Anulli, E. Baracchini, G. Cavoto, D. del Re, E. Di Marco, R. Faccini, F. Ferrarotto, F. Ferroni, M. Gaspero, P. D. Jackson, L. L. Gioi, M. A. Mazzoni, S. Morganti, G. Piredda, F. Polci, F. Renga, C. Voena, M. Ebert, T. Hartmann, H. Schröder, R. Waldi, T. Adye, B. Franek, E. O. Olaiya, F. F. Wilson, S. Emery, M. Escalier, L. Esteve, S. F. Ganzhur, G. H. de Monchenault, W. Kozanecki, G. Vasseur, Ch Yèche, M. Zito, X. R. Chen, H. Liu, W. Park, M. V. Purohit, R. M. White, J. R. Wilson, M. T. Allen, D. Aston, R. Bartoldus, P. Bechtle, J. F. Benitez, K. Bertsche, Y. Cai, R. Cenci, J. P. Coleman, M. R. Convery, F. J. Decker, J. C. Dingfelder, J. Dorfan, G. P. Dubois-Felsmann, W. Dunwoodie, S. Ecklund, R. Erickson, R. C. Field, A. Fisher, J. Fox, A. M. Gabareen, S. J. Gowdy, M. T. Graham, P. Grenier, C. Hast, W. R. Innes, R. Iverson, J. Kaminski, M. H. Kelsey, H. Kim, P. Kim, M. L. Kocian, A. Kulikov, D. W. Leith, S. Li, B. Lindquist, S. Luitz, V. Luth, H. L. Lynch, D. B. Macfarlane, H. Marsiske, R. Messner, D. R. Muller, H. Neal, S. Nelson, A. Novokhatski, C. P. O’Grady, I. Ofte, A. Perazzo, M. Perl, B. N. Ratcliff, C. Rivetta, A. Roodman, A. A. Salnikov, R. H. Schindler, J. Schwiening, J. Seeman, A. Snyder, D. Su, M. K. Sullivan, K. Suzuki, S. K. Swain, J. M. Thompson, J. Va’vra, D. Van Winkle, A. P. Wagner, M. Weaver, C. A. West, U. Wienands, W. J. Wisniewski, M. Wittgen, W. Wittmer, D. H. Wright, H. W. Wulsin, Y. Yan, A. K. Yarritu, K. Yi, G. Yocky, C. C. Young, V. Ziegler, P. R. Burchat, A. J. Edwards, S. A. Majewski, T. S. Miyashita, B. A. Petersen, L. Wilden, S. Ahmed, M. S. Alam, J. A. Ernst, B. Pan, M. A. Saeed, S. B. Zain, S. M. Spanier, B. J. Wogsland, R. Eckmann, J. L. Ritchie, A. M. Ruland, C. J. Schilling, R. F. Schwitters, B. W. Drummond, J. M. Izen, X. C. Lou, F. Bianchi, D. Gamba, M. Pelliccioni, M. Bomben, L. Bosisio, C. Cartaro, G. Della Ricca, L. Lanceri, L. Vitale, V. Azzolini, N. Lopez-March, F. Martinez-Vidal, D. A. Milanes, A. Oyanguren, J. Albert, S. Banerjee, B. Bhuyan, H. H. Choi, K. Hamano, R. Kowalewski, M. J. Lewczuk, I. M. Nugent, J. M. Roney, R. J. Sobie, T. J. Gershon, P. F. Harrison, J. Ilic, T. E. Latham, G. B. Mohanty, H. R. Band, X. Chen, S. Dasu, K. T. Flood, Y. Pan, M. Pierini, R. Prepost, C. O. Vuosalo, S. L. Wu, and Babar Collaboration. “Observation of the Bottomonium Ground State in the Decay Upsilon(3s)–>Gammaetab.” Physical review letters 101.7 (2008).

Posted in Research methods | Comments Off on et al.+++

Backing up and recovering Endnote libraries

I try to back up every file that gets changed in my working directory every day or two. This is not a problem except for one program: Endnote, still my main bibliography manager. Unlike 99% of all programs, even if you do nothing but open and close an endnote library (*.enl) file, Endnote will mark anywhere from 10-12 files as changed, some of them buried in sub-sub-directories. This makes backing up a pain, and tempts me to folly, like backing up every month instead of every other day.

Here is my somewhat random method to minimize Endnote’s back up workload. NOTE: This method is based on MY habits and needs, which are highly individual (weird). YOUR needs are different than mine, so be sure to think about what you need before you do it.

Anyway, here is my routine. Any time the *.enl file is changed, back it up. This does NOT change every time you open the file, and when it DOES change, it means your data has really changed, so, no problem.

However, some things DO change without there being a difference in your REAL data; this is the stuff in the two (or three) data sub-directories associated with the .enl file, and this is where I get headaches.

Say you have a library file called dummy.enl All versions of Endnote since 7 or 8 will create an associated directory called dummy.Data. Newer versions (don’t know since when) can have as many as three sub-directories under dummy.Data. Here is what they are:

dummy.DataPDF: If you attach any files (pdf, etc) to your library items using the copy to local option, they go in this sub-directory. Obviously, not every .Data directory will have this. I have stern habits about how I use this feature, which I won’t go into here.

dummy.Datatrash (since Endnote X1 at least). This has two things: a file called trash.enl and another sub-directory: dummy.Datatrashrdb. This will be filled with anywhere from 6 to 12 files for every endnote library. Yuck.

When backing up, make sure that everything in the “trash” subdirectories is ignored by your backup programs. Endnote saves stuff you delete (“put in the trash”), and then asks you if you want to get rid of it every time you open and close the library. Do not treat this feature as anything other than a safety measure, to use in case you accidentally zap stuff. Either use it on the spot, or delete it all when you close the file. Backing this up is a waste of time and cpu cycles, and will only confuse you when you actually need to restore your backups. Instead, focus on the following…

dummy.Datardb: The only important stuff for real backups is what’s in the .Datardb sub-directory. This is all in the form of MyISAM tables, which means that for each table there are three files: *.frm, *.myd, *.myi. Yuck again. The number of tables will vary from library to library, depending on what features of Endnote you use, but there is real data (or meta-data) stored here, so you have to be careful with this stuff.

Before you decide what to back up, however, you need to know what Endnote does in case this stuff is lost. If you have dummy.enl and no dummy.Data directory, Endnote will silently create a new dummy.Data directory with all the basic files it uses. This only works if dummy.enl is not corrupted. If there are problems with dummy.enl, and you don’t have the .Data directory for it, you are probably toast. However, since I back up like mad, this is not my situation. What this silent recovery means for me is that some of these basic files don’t need to be backed up. Specifically, all of the .frm files are more or less unchanging. For some reason, one file called csort.frm gets rewritten almost every time you open a library, but this is meaningless; the contents never change, so make sure it goes on the exclude list.

For the individual tables, this means that you need, at most, to back up .myd and .myi files, representing data and indexes for the MyISAM tables. For each of my Endnote libraries, I have a maximum of 6 tables under rdb: csort, jterms, misc, refs, refs_ext, and terms. Not ALL of these tables are essential for your library to work. Apparently csort saves information about the sort order you are using. This is pretty unimportant, yet all three MyISAM files (csort.myd, csort.myi, and csort.frm) are rewritten every time you open a library. Put this on your exclude list. jterms and terms are where Endnote keeps the info for your term lists. As long as your library is intact, these can be regenerated without any problem, so I exclude them too. refs is all the info in your library: every field for every reference. So if they use this, what’s the *.enl file for? Thoroughly redundant, but that’s Endnote’s business. I have no idea what refs_ext does, or its importance, have to get back to you on that. The misc table is what it says: it includes petty things like the size and location of windows, and important things that represent a real investment in time, like groups and groupsets.

This makes the misc table a real pain to back up. It is rewritten every time you open a library, so the date changes for misc.myd and misc.myi. Even though it is rewritten every time you open a library, the data file misc.myd sometimes doesn’t change at all, but because the misc table holds my group information, I absolutely want this backed up whenever I am working on groups.

I have one library with over 1400 references in it; the groups are an essential part of my data for these references. Unfortunately, I can’t tell when this important data has changed (must back it up) and when just the windows have changed (who cares, no backup). Bad news for backer uppers. Although the refs table is redundant, and can therefore be skipped, I like redundancy for real data; in addition, refs.myd apparently only changes when .enl changes; since I back up .enl, why not back up refs as well?

So the sticking point for the table data files is backing up misc, and doing it very, very, often. Can’t rely on date stamps or size to determine when to backup.

In addition to the myd data files, the myi index files change constantly. It is very tempting to ignore these changes, since most of it is meaningless MyISAM table management, but beware: if you have added rows or changed indexed fields for any of these tables, and you don’t have the index that goes with it, Endnote will refuse to open your library with the warning: “This library appears to be damaged. Please verify that no other user has this library open simultaneously with write access.” It will then demand that you use the “Recover library” function to open the library. Recover library may or may not get back all your groups. I had at least one case where it did not, and this is a bad memory.

So here is my backup regime: exclude everything in the .Datatrash directory. Back up all *.enl files, and all rdbrefs.myd files whenever they change. If you are messing with groups, you MUST backup the 3 misc files under rdb to be sure this data is safe. If I know I have not touched my groups in a blue moon, I happily skip this, but this means I cannot automatically them on the exclude list. csort, is meaningless and is a definite exclude, and terms, and jterms are both excludable. If you backup an myd file, you must backup its companion myi file, or Endnote will make you “recover” your whole library; this consists of making a copy of the library called XXX-saved in the same directory you are working in, and an associated .Data directory as well. I currently backup the refs_ext table on the same schedule as the misc table, because I don’t know what its for.

The long and short: exclude the trash directory, and jterms, terms, and csort in the rdb directory to cut down on your backup space and time. .enl and refs.myd and refs.myi are must haves. The files misc.myd and misc.myi should also be saved frequently when you are working on groups; otherwise, it is very skippable. All the *.frm files you need can be generating by opening the .enl file after renaming its .Data directory.

So, you have saved a little bit of time on your backups and a little space as well. The real reason Endnote is a pain to back up is the programmer’s failure to properly factor the table data: groups, which are an essential part of libraries, should have their own table, sorting options should not. But as big Tony often said, “Whaddaya gonna do?”

Posted in Research methods, Software | Comments Off on Backing up and recovering Endnote libraries

I take Mag XJ700T for a ride

MAG XJ700T takes a fall

It was the end of the road of for Mag XJ700T. I made a deal with the wise guys: 300NT$ for Mag and his crew. I could hear Mag whimpering in the trunk and thought about a quick smack in the face with a tire iron, but it would have been messy. And who was going to hear him on that country road by the Nankang River?

Why sell out Mag? I admit he was useful once, but he was long out of it, and brittle to boot. The manufacturer’s label said made in 1999, but bits of him kept flaking off whenever I tried to pick him up. Plastic has a lot less staying power than I thought. Anyway, I knew then that it was time to put him out in the cornfield with tough tony and the others.

Posted in Quotidiana | Comments Off on I take Mag XJ700T for a ride

Spam storm

In the last week I’ve gotten over three thousand spam comments, and I’m now averaging several hundred a day. This is way too much for me to cope with, so I’ve turned off comments to all posts. I’ve never had an actual comment anyway, so I guess it’s not that big a loss.

Posted in Site news | Comments Off on Spam storm

Stupid programming tricks: #1

In programming, simple things often turn out to involve obscure problems and techniques. I run into these and spend hours tracking down what went wrong, then console myself with the thought that “Next time I’ll know how to do it.” Unfortunately, by the time next time rolls around, I’ve forgotten all about it. Leaving little text files filled with incomprehensible notations scattered all over my computer doesn’t seem to help. My great idea this time: post all this stuff on my blog. Then I can google it like I do everything else! Brilliant, eh?

Today’s trick combines two obscure problems for one common activity, a stupid twofer, so pay attention, you in the back. It is very common for my perl scripts to fail to work the way I expect. When this happens, there are usually helpful error messages (always, always use strict and warn) to tell me how I’ve screwed up. Unfortunately, these are not always as helpful as one might hope.

A typical case is doing something with a large database table, say ~25,000 rows. I want to modify one of the fields based on certain conditions, and it works except for 200 exceptions. For these, 200 error messages pop up, repeatedly informing me

Use of uninitialized value $var in concatenation (.) or string at document x line y.

So something is wrong with my condition. But what? Where? I quickly pull out a trick from my stupid programmer’s bag and add a line printing out the field “id” from my table to stdout to find out what records are failing the condition, then run the script again. This is not helpful, because the field “id” prints 25000 times, scrolling so rapidly that I can barely see the longer lines that tell me there was an error. Undaunted, I pull out another trick from the bag and redirect output to my favorite, temp.txt, like so:

badscript.pl > temp.txt

This helpfully swallows up all the ids, while my screen still fills up with all the error messages without any “id” field info to tell me where the mistake was. Ids in temp.txt, error messages on screen. Whoops. This is because perl uses stderr to print out error messages, not stdout, and this is not redirected by >.

Let’s recap here in case you’re getting as confused as I did: what do I want to do? I want to send both stdout output (ids) and stderr output (error messages) to the same file. According to microsoft, this can be done as follows:

cmd.exe 1> output.txt 2>&1

Now why didn’t I think of that? Finding this info is the real trick; if you don’t google the right combination of words, it takes a while. (hint: “how to redirect standard error stream in windows”)

Now for the twofer. When I open the file, I find things that I know can’t be right, for example:
….
id = xx
error msg
id = xx
error msg
error msg
error msg
id = xx
….

I am only doing one thing per row, so it is not possible for there to be multiple errors per row. Much pulling of hair and gnashing of teeth until I realize that this is a problem I met a year and a half ago, and has to do with perl’s use of buffering. (See here for an amusing account of just a few of the many problems that this can cause.) The solution is to turn off buffering in the script before going into the loop that prints to stdout, like so:

$| = 1;

Good luck on googling that one! In fact, it is impossible to find any of the special perlvars using google. The only place you will find them is in perlvar, located on your computer as well as the internet. But how do you know to look in perlvar? And in case you’re wondering why 1 turns off buffering instead of 0, it turns out we must think of this as turning on autoflush (turn on the pipe?). Anyway, the result of adding this gewgaw and then running the script again is that temp.txt finally contains what I want: id followed by error messages immediately following where something really went wrong.

Summary: to get a file with output and error messages as they actually appear on your computer screen, you must: 1) turn on autoflush in your script ($| = 1;) and then use the microsoft way to redirect: script.pl 1> output.txt 2>&1.

I tell you, when it comes to stupid programming tricks, they don’t come much stupider than mine~~

Bonus obscurantism! If you really want to dazzle the rubes, don’t do this:
$| = 1;
try this instead:
$|++;
For a nice discussion of this and other gobsmackers, go to www.perlmonks.org and search for “Perl Idioms Explained”. I immediately printed all of them and pasted them on my wall.

Posted in Programming | Comments Off on Stupid programming tricks: #1