February 8, 2011

Moving to a new SSD

At work, we recently got new solid-state drives (SSD). I've been pretty happy with my current Windows 7 installation on my ThinkPad T61 laptop, so I wanted to move to the new drive by cloning the existing drive. One challenge here is that the old drive was a larger capacity than the new drive (320GB vs. 256GB). This post summarizes how I easily cloned my old drive to the SSD.


  1. Make sure that the total amount of used space on the old drive is less than the size of the new drive. Delete or move any large content necessary to get to this point.
  2. Use the Disk Management tool in the Windows 7 control panel (descriptive text is Create and format hard disk partitions) to shrink the primary partition of the old drive so that it is smaller than the new drive.
    • When you right-click on the partition and choose Shrink Volume, Windows will think for awhile and then tell you the maximum amount of space that can be removed. If this is not enough space, then you'll need to follow the following steps to identify the unmovable files that are preventing further shrinkage and remove them / make them movable.
    • The Windows Event Viewer records an event with type 259 that references the unmovable file that is preventing the partition from being shrunk further. When I went through this process, I ran into three types of files that blocked my progress:
      • The Windows pagefile.sys file. This is c:\pagefile.sys. You can remove the pagefile by turning off the page file in the Advanced system settings area of the System control panel. Click through to Advanced > Performance > Settings… > Advanced > Change… and choose No paging file. After you restart Windows, you'll be able to remove pagefile.sys.
      • The Windows hiberfil.sys file. This is c:\hiberfil.sys, and is used for system hibernation. Disable system hibernation in the Power Options control panel and then delete the hibernation file.
      • Windows Search index files. These files have a file extension of .wid. I was able to remove them after disabling the Windows Search service.
    • After removing / moving an unmovable file, repeat the attempt to shrink the partition until the partition can be shrunk to a size smaller than the new SSD's capacity.
  3. Connect the new SSD. I did this using a drive bay in place of my DVD drive. You could use an external USB enclosure as well, but the clone will be much slower.
  4. Create a bootable Linux CD or USB stick. I used a colleague's existing USB stick, but something like Knoppix would work just as well.
  5. Restart your machine and boot into Linux.
  6. Run fdisk –l to check which device is which drive. In my case, my old drive was /dev/sda and the SSD was /dev/sdb.
  7. Use dd to clone the old drive to the new drive: dd if=/dev/sda of=/dev/sdb bs=16M
    • Make sure that the if parameter points to the old drive and the of parameter to the new drive.
    • I used a block size of 16M but you could probably even go smaller.
    • Note that dd doesn't give any progress indication, at least not without jumping through any additional hoops.
    • At the end of the cloning, dd will give you an error because the new drive is out of space while the old drive is not yet finished. This is expected and not a problem given that the remaining space on the old drive is unallocated.
  8. Swap drives so that the SSD is the main laptop drive and boot the computer. In my case, Windows prompted me to restart once and then everything was all set to go.

Really easy to do. Let me know if you have any questions!

December 27, 2006

Playing Fetch with the DAWG

The summary: I was looking for an easy way to search through minutes of the DAWG, given that some but not all of the minutes are reproduced in plain text within a mailing list message. All minutes are (in one way or another) URL accessible, however, so I setup Apache Nutch to crawl, index, and search the minutes. I learned stuff along the way, and that's what the rest of this post shares.

One of the first things I'm doing as I'm getting up to speed in my new role as DAWG chair is finding the issues the DAWG has not yet resolved and determining whether we're on target to address the issues. One of the issues raised a few months ago was the syntactical order of the LIMIT and OFFSET keywords within queries. I had remembered that the group had reached a decision about this issue, but did not remember the details. I wanted to find the minutes which recorded the decision.

I could have searched the mailing list for limit and offset and probably found what I needed by perusing the search results. But not all minutes make it into mailing list messages as something other than links or attachments, and I didn't want to wade through general discussion. I'd rather be able to search the minutes explicitly. So here's what I did:

(I work in a Windows XP environment with a standard Cygwin installatoin.)

  1. Updated the DAWG homepage, adding links to minutes of the the past few months' teleconferences.
  2. Dug up a script I'd written last year to pull links from a Web page where the text of the link matches a certain pattern. Invoked this script with the pattern '\d+\s+?\w{3}' against the URL to pull out all the links to minutes from the Web page. This heuristic approach works well, but it would feel far more elegant to have the markup authoritatively tell me which links were links to minutes. Via RDFa, perhaps. I redirected the list of links produced by this script to the text file, dawg-minutes/root-urls/minutes.
  3. Downloaded the latest version of Apache Nutch and unzipped it, adding a symlink from nutch-install-dir/bin/nutch such that nutch ended up in my path.
  4. Followed instructions #2 and #3 from the Nutch user manual. This involves supplying a name to the user agent which Nutch crawls the Web with and also specifying a URL filter that decides which pages to crawl (or which pages not to crawl). To be on the safe side, I added these two lines to nutch-install-dir/conf/crawl-urlfilter.txt:
  5. The next step was to crawl the list of links I had already generated. I didn't want to follow any other links from these URLs, so this was a pretty simple invocation of Nutch. I did get trapped for a bit by the fact that earlier versions of Nutch required the command-line argument to be a text file with the list of URLs while the current version requires the argument to be the directory containing lists of links. I ended up invoking nutch as:
      cd dawg-minutes ; nutch crawl root-urls -dir nutch/ -depth 1
    This fetched, crawled, and indexed the set of DAWG minutes (but no other links thanks to the -depth 1) and stored the resulting data structures within the nutch subdirectory.
  6. At this point, I had (still unresolved) trouble getting the command-line search tool to work:
      nutch org.apache.nutch.searcher.NutchBean apache
    Regardless of the working directory from which I executed this, I always received Total hits: 0. This problem led me to discover Luke, the Lucene Index Toolbox, which confirmed for me that my indexes had been properly created and populated.
  7. I pressed ahead with getting Nutch's Web interface setup. I already had an installation of Apache Tomcat 5.5, so no installation needed there. Instead, I copied the file nutch-install-dir/nutch-version.war to nutch.war at the root of my Tomcat webapps directory.
  8. I started Tomcat from the dawg-minutes/nutch directory (where Nutch had put all of its indexes and other data structures), and launched a Web browser to http://localhost:5000/nutch. (The default Tomcat install runs on port 8080, I believe; I have too many programs clamoring for my port 8080.)
  9. The Nutch search interface appeared, but again any searches that I performed led to no hits being returned!
  10. Some Web searching led me to a mailing-list message which suggested investigating the searcher.dir property in webapps/nutch/WEB-INF/classes/nutch-site.xml. I added this property with a value of c:/documents and settings/.../dawg-minutes/nutch and restarted tomcat.
  11. All's well that ends well.

So I ran into a few speed bumps, but in the end I've got a relatively lightweight system for indexing and searching DAWG minutes. Hooray!

Searching the DAWG minutes with Apache Nutch

December 16, 2005

Ruby on Rails, ActiveSupport, and EBADF

While trying to setup Ruby on Rails in my Cygwin environment:

panicale:~/My Documents/ruby/sandbox 28>ruby script/server 
/usr/lib/ruby/gems/1.8/gems/activesupport-1.2.5/lib/active_support/core_ext/kernel.rb:53:in ``': Bad file descriptor - lighttpd -version (Errno::EBADF)
        from /usr/lib/ruby/gems/1.8/gems/activesupport-1.2.5/lib/active_support/core_ext/kernel.rb:53:in ``'
        from /usr/lib/ruby/gems/1.8/gems/rails-1.0.0/lib/commands/server.rb:15
        from /usr/lib/ruby/gems/1.8/gems/rails-1.0.0/lib/commands/server.rb:15:in `silence_stderr'
        from /usr/lib/ruby/gems/1.8/gems/rails-1.0.0/lib/commands/server.rb:15
        from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `require__'
        from /usr/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:21:in `require'
        from /usr/lib/ruby/gems/1.8/gems/activesupport-1.2.5/lib/active_support/dependencies.rb:214:in `require'
        from script/server:3

Web searches proved fruitless, and so I dug into the cause of the error by hand. Line 53 in ActiveSupport's kernel extension is an override of the backtick command to make a nonexistent command behave similarly in both Unix and Windows environments. But my problem is not a nonexistent command, so this error is likely not a function of this particular override of backticks. So we add the line STDOUT.puts "** RUNNING COMMAND: #{command} before the actual execution of the command. Let's see during what command we're faulting. This yielded: ** RUNNING COMMAND: lighttpd -version. To the command line we go!

But wait, what's this?

panicale:~ 34>which lighttpd
lighttpd: Command not found.

Checking into line 15 of server.rb we see that Rails is attempting to start lighttpd, and in the absence thereof (actually, if something is written to standard error while attempting to ascertain the version of lighttpd), to go with the built-in WEBrick web server. I don't have lighttpd, and yet it seems that someone (Windows? Cygwin?) is throwing a bad file descriptor error (EBADF) instead of a nonexistent command error (ENOENT) when Ruby tries to execute lighttpd -version. Why? I have no idea.

The fix is easy; simply extend ActiveSupport's transformation from exception to STDERR output to apply to EBADF in addition to ENOENT:

<   rescue Errno::ENOENT => e
>   rescue Errno::ENOENT, Errno::EBADF => e