Thursday, November 20, 2008

Extracting Files From Partial RAR Set

I came across an interesting problem today, and the solution was surprisingly easy. The facts were these: I needed a file, about the size of a CD .iso file. Since I love the newsgroups so much, that's where I started my search. I couldn't find the file by itself, but I did find it inside a pretty large set of .rar files. In fact, the full set was well over 7gb, and I really didn't want the whole thing.

I thought I might have an advantage, however. You see, .rar files are kind of like .tar files: just a collection of a bunch of files stuck together. Two of the biggest differences are that .rar files have pretty decent compression whereas .tar files have no compression, and .rar files are pretty easy to split up, which comes in handy with files that are large enough to span multiple newsgroup messages.

I had an .nzb file which held the usenet locations of 160 .rar files. I knew that there were 20 files spread across this set, and I thought that the naming convention might be predictable. If I was right, I would need to find file #12. Each file should take up approximately 8 .rar parts, which meant that I would start searching at part096.rar. I downloaded it, hopped into a bash prompt and ran:

strings part096.rar | grep Filename

It came up with Filename09.iso. I was a little off, but I knew I was close. I decided to jump ahead a little and try again with part110.rar. With it downloaded, I checked again:

strings part110.rar | grep Filename

It came up with Filename12.iso. Paydirt! I knew that I was in the middle, because it only showed one file. If it showed two, then I would know I was at the beginning or the end. So I grabbed the file before it and the one after it, and checked those with the strings command. I continued until I found the beginning and ending files. Eventually I found that my file started at part100.rar and went to part112.rar. My guess is the files were actually a little out of order (Filename09.iso was apparently immediately before Filename12.iso), and I just got lucky.

While my files were downloading, I was researching whether I could actually extract files from a partial .rar set. I found a few odd forum posts asking whether it could be done. The first one had a response with a Windows utility that claimed to do just that, but I couldn't get it installed under Wine in Linux. The rest of the messages had several responses saying that it was absolutely impossible, and that the person asking was an idiot for thinking such a think was possible.

The few responses that offered information as to why it wouldn't be possible were riddled with obviously technially-inaccurate information, which is often a sure sign that is entirely possible, if perhaps somewhat difficult. As it turns out, I had little to worry about.

With my files extracted, I scoured the man page for rar, looking for options to turn off things like file verification, etc. Each time I specified the exact name of the file that I wanted extracted. Nothing worked. But I couldn't help but notice that the following command did work without any problems:

rar t part100.rar

This command listed the file in question, and said that it was okay. Eventually, I went out on a limb and tried extracting it using the most basic options possible, and forget the filename that I needed extracted:

rar e part100.rar

Bingo! I got errors about the partial files before and after the one I wanted, but mine extracted without a problem.

I saved myself from spending hours and hours downloading a bunch of files I didn't want, just so that I could get a single file that I did want. All of my guesses paid off, as did ignoring the morons that would think me an idiot for trying something that seemed to make perfect sense. Funny how that all works, isn't it?

No comments:

Post a Comment

Comments for posts over 14 days are moderated

Note: Only a member of this blog may post a comment.