Monday, September 29, 2008

Weighted Keywords

I was thinking back today to an old company that I used to work for. As far as I know, the company went under a while ago, and even if they were still around, it was a long time ago and I never signed any non-disclosure or non-compete agreements, so I'm thinking there's no harm in talking about one of the concepts behind the product that they offered. Maybe somebody will have some use for them.

The idea was simple: a family-safe Internet filter. It wasn't just supposed to handle pornography. It had several other categories that it looked at, including gambling, shopping, games, hate and violence, even lingerie and the like. It would filter sites based on a black list (sites that we knew always matched a category), a white list (sites that we knew would never match a category) and sites that scored high enough using a weighted keyword list.

The black and white lists have always been common in blocking software. If a site is on a black list, it's bad, end of story. If it's on the white list, it's safe to look at. The weighted keywords were really what was important. A team of people looked at various sites that they knew to be bad (or in our case, match a certain category) and found keywords that were more likely to indicate if a site matched a category.

Seeing an opportunity to automate the process, I wrote a script that would Google for a specific query (related to a specific category), hit the first 100 pages that were returned, and count the number of keywords that appeared across all of those pages. It wasn't long before I even created a list of "commonly-used words" that were pretty much useless to count ("the", "of", "and", etc). I saved the results in a series of text files, including both the keyword and the count, and sent those files to the team leader. To this day I will never understand why he didn't think the count was important, but he liked having the words. It only took me a few minutes to write the script, but it saved him hours of trouble.

I never found out how they actually weighted the words. I assume they made a judgment call based on how relevant they thought the word to be. In other words, the data was completely subjective rather than statistical. This makes sense to a degree. Lots of sites with adult content are likely to contain the word "breast". But there are also sites, including CNN, which may publish articles on breast cancer, which a parent considers okay for their child to view. The word "breast" might get a score, but it will be a low score. But the appearance of "XXX" or some profanity related to adult material is going to receive a much higher score because "safe" sites like CNN are far less likely to have those words appear on their pages. If a page reached a high enough score for a particular category, it could be blocked.

Subjective data might be relevant, but I think that statistical data is far more important than this team leader gave it credit. But I wonder how much of that statistical data can be automated? I would still want to throw away certain keywords. Articles ("of", "and", "the", etc) can safely be ignored, because they will likely match all categories. I'm also not interested in numbers. Chunks of text that only contain digits can be tossed. I would probably even add prepositions to the list.

That leaves us with several other very generic words that I'm afraid to throw away. Do I throw away the word "cool"? Maybe not so much if the page is talking about climate or weather. Then again, it's such a generic word otherwise "that casino was cool", "that beach party was cool", "that fight was cool", "that Perl script was cool" that maybe it will just confuse things anyway. I haven't decided yet how to handle those words.

Once we've thrown away the overly-generic keywords, we're left with a bunch of words that may or may not be relevant. Tagging a bunch of pages as the same category might help, which is what I did for that team leader: 100 pages worth of keywords that were returned when I searched for something related to gambling, or shopping. Rather that seeing a specific word show up 3 times on one page, maybe it showed up 73 times across 100 pages. But it seems to me that maybe we could get the computer to do a little more work for us. It would take longer, but might produce more accurate results.

Tagging is a big buzzword right now. Sites like Amazon are allowing users to add their own tags to items to build up relevancy databases. It probably took a few weeks worth of manpower to write the code, but once it was up and running they had millions of users literally performing free labor for Amazon. Now when those users search for something, assuming it was properly tagged, they likelyhood of something relevat being returned is increased. One way to look at it is as a community effort. In Amazon's case, I would also look at it as free labor.

On a much smaller scale, Firefox 3 now supports tagging bookmarks. Unfortunately, their effort is little more than an afterthought and their implementation has little to no actual usefulness. When you "organize bookmarks", you can sort by tag. That's it. FF3 has no built-in tools to make any more use of tags. It's almost as bad as tagging in Blogger. The effort was poor enough that I'm quite honestly surprised that they bothered in the first place. It would have been far better to adopt GMail's label scheme, but I'm sure there are plenty of reasons why that would not be feasible (starting with the fact that Mozilla really seems to love storing bookmarks in an inherently-limiting HTML file).

Still, the tags are available. And there are plenty of other sociel bookmarking sites that handle tags somewhat better. If you are diligent in properly tagging your bookmarks, you're off to a good start: you have a set of data from which to work. That means you're already ahead of me, since I haven't bothered much with Firefox's poor bookmark tag support. But that doesn't mean I haven't tossed together some Perl code to start counting words.

This code makes use of the elinks program, which can conveniently strip out HTML from a web page and render it as a piece of text, exactly the same way as you might see in a regular browser, minus the images. It uses a file called common-words.txt which contains a series of articles and prepositions. When it's finishes, it dumps the word count to the screen. It does nothing else at the moment, but it might be useful to you.

#!/usr/bin/perl

use strict;
use Data::Dumper;

my @common_words = split( /\n/, slurp('common-words.txt') );
my %common_words;
$common_words{$_} = 1 for @common_words;
undef @common_words;

my $contents = `elinks -dump -no-numbering -no-references $ARGV[0]`;
my %words = wordcount( $contents );
print Dumper \%words;

exit;

sub wordcount {
my ( $html ) = @_;
my %word_array;
$html =~ s/^\s+//igs;
while ( $html =~ s/(.*?)\s+//is ) {
my $key = lc( $1 );
$key =~ s/\W//g;
next if $common_words{$key} == 1;
next if $key !~ /\D/;
$word_array{$key}++;
}
return %word_array;
}

Here are my thoughts. When a page is bookmarked and tagged, do a word count. Save the word count in a database and associate it with the tag and the page. As you diligently tag and wordcount pages, the database will become more useful. After a certain point in time, when you look at a page, the database should have enough information to suggest what tag or tags might be most appropriate. The more pages are tagged properly, the more accurate the computer's suggestions will become. Start sharing the database with enough users that are also diligently tagging, and the time it takes to produce accurate results will decrease.

This can certainly be used to construct a family-safe Internet filter, but it would require that the surfer(s) look at a lot of inappropriate material. And my guess is that somebody that spends that much time looking at inappropriate material isn't incredibly interested in a "family-safe Internet experience". What I think it is useful for is helping users to easily and accurately manage their own bookmarks. I guess that also begs the question, is it worth that much effort for one person to handle their own bookmarks that way? I guess it depends on how much you surf.

Let me know if you have any other thoughts on this. I think it could potentially be useful, especially if implemented with a group of people.

Friday, September 26, 2008

Command Line DVD Authoring: Part 4

This is part of a multi-part series on creating DVDs manually from the command line. It is not expected that regular users will generally be performing video editing or DVD authoring from the command line. Rather, this guide is intended for programmers who may be wishing to build a front-end for DVD authoring, and don't want to sift through miles of documentaion just to get the basics. This guide makes use of command-line utilities already freely available, but is not meant to be a complete set of documentation for any of these utilities. Instead, consider it a primer. The parts in this series are:

Part 1: Editing a Video File with MPlayer
Part 2: Converting a Video to DVD Format
Part 3: Making a DVD Menu
Part 3.1: Extracting Audio From A Video
Part 4: Building a DVD .iso File

It should be noted that while the programs themselves should remain relatively the same between Linux distros, the name of the packages themselves are likely to change. This tutorial was written using Ubuntu 8.04 as the reference OS, so if you use a different distro, your mileage may vary.


Part 4: Building a DVD .iso File

Part 3 dealt with building a menu for our DVD, using components assembled in Part 2 (and possibly in Part 3.1, depending on your needs). This part takes us the rest of the way, putting it all together into a DVD .iso file, suitable for burning.

Once we have our menu set up, we need to create an XML file that describes how it all fits together. We use the makexml command (part of the tovid package from Part 3) to do this. A typical command to go along with the previous makemenu command might look like:

/usr/share/tovid/makexml -dvd
-menu mydvd.mpg \
shownumber1.mpeg2 \
shownumber2.mpeg2 \
-out mydvd

This is pretty straight-forward. This program is designed to work with either DVDs or VCDs, so we tell it which one to use with the '-dvd' option. The '-menu' option tells it which video file to use for the menu (that's the one that we created in Part 3). We follow it with the actual video files that our DVD features, making sure they appear in the same order as we specified with the makemenu command. Lastly, we use the '-out' option to give it an outout filename (it will automatically append .xml to the end).

Once we have our XML file in place, you can tweak it to your heart's desire, but at this point we're ready to actually build the DVD file and directory structure. Again, the tovid package provides the perfect tool for this: makedvd. And as it turns out, this command is the simplest yet:

/usr/share/tovid/makedvd mydvd.xml

The xml file describes how the menu is laid out, which video files are used, where they appear on the disc, everything that makedvd needs to organize the files in the way they need to appear on the DVD. This is going to create a subdirectory with the name of the XML file, minus the .xml extension. Check it out, you'll see an AUDIO_TS and a VIDEO_TS directory, and the VIDEO_TS looks exactly the way you expect it to. But you can't just burn it off like this and hope that it's going to work. We have one more step.

The genisoimage package will provide us with the genisoimage command. This package sets up a symlink to that command called mkisofs, which you may already be familiar with, but it is a different package. Our command is pretty simple:

genisoimage -dvd-video -o mydvd.iso mydvd

The '-dvd-video' option is the important one here, it tells genisoimage to actually prepare a DVD video-compliant filesystem, with proper file sorting, padding, etc. Without this option, your DVD may not work so well as you'd think. The '-o' of course is the output file, and the last argument is the directory that we created with the makedvd command.

When you're finished, go ahead and clean up the directory that makedvd created, and you're good to go! You have an .iso file that can be burned off onto DVD using any major CD/DVD burning software. If you want to test it before burning it, just to make sure you're not wasting blank discs on DVDs that need tweaking, just mount it temporarily and test it in your favorite DVD viewing program:

mount -t iso9660 -o loop mydvd.iso /mnt

This will mount it in the /mnt directory, which is where you want to point your viewer. When you're finished, just umount it:

umount /mnt

...and if you liked what you see, go ahead and burn off a copy.

I hope this series of articles gave you a little more insight as to using various Linux utilities to create your own DVDs. Those of you comfortable using Perl, Python, Ruby or whatever to build GUIs may find these instructions invaluable in building your own frontends for whatever purpose you may have. As always, be sure to check the man pages and the project web sites for any addition documentation that I didn't cover (there's a lot). If you end up creating a front-end based on these instructions and wouldn't mind sharing it (and maybe the source code), let me know and I'll post a link here.

Thursday, September 25, 2008

Command Line DVD Authoring: Part 3.1

This is part of a multi-part series on creating DVDs manually from the command line. It is not expected that regular users will generally be performing video editing or DVD authoring from the command line. Rather, this guide is intended for programmers who may be wishing to build a front-end for DVD authoring, and don't want to sift through miles of documentaion just to get the basics. This guide makes use of command-line utilities already freely available, but is not meant to be a complete set of documentation for any of these utilities. Instead, consider it a primer. The parts in this series are:

Part 1: Editing a Video File with MPlayer
Part 2: Converting a Video to DVD Format
Part 3: Making a DVD Menu
Part 3.1: Extracting Audio From A Video
Part 4: Building a DVD .iso File

It should be noted that while the programs themselves should remain relatively the same between Linux distros, the name of the packages themselves are likely to change. This tutorial was written using Ubuntu 8.04 as the reference OS, so if you use a different distro, your mileage may vary.


Part 3.1: Extracting Audio From A Video

I have been burning off a bunch of TV shows onto DVD, for personal use. These are largely shows that I do not expect to be released commercially on DVD. When a show I do like is released on DVD, I prefer buying a nice, clean, professional copy to just watching my homemade copy with TV logos all over it. But when I do put together my own, sometimes I like to use the theme song for the menu music. I've also seen commercially-produced DVDs that just use audio segments from the shows for the menu audio. It's easy to extract the audio, and in fact, you already know how to do most of it.

First, you need to block out the section of audio that will be used. This is as simple as an edl file. Use MPlayer to find the section that you're looking for, use 'i' to mark the beginning, find the end, use 'i' to mark off the end. Fine tune it the same way I showed you in Part 1, and when you're ready, use MEncoder to cut out that single piece of video. The command line will look something like this:

mencoder myoriginalvideo.mpg -of mpeg -oac copy -ovc copy
-o mycutvideo.mpg -edl myedlfile.edl

Part 2 explained what these specific command line options do, so by now you know that you're basically creating a video that only contains the clip that you want to extract the audio from. Since you're just making a frame-by-frame copy, both of the audio and the video, no re-encoding needs to happy, and that means no loss of quality.

For the next step, you need to install a lovely little package called transcode, which includes a utility called tcextract. This program gives us the ability to extract either audio or video from a stream, and save it in whatever format you like. The default is to save it in whatever format it detects inside the stream, but if it's having problems detecting it, you may need to specify it. First, let's take a look at a sample command line for extracting audio:

/usr/bin/tcextract -i myshow.mpg -a 0 -x mp3 2> /dev/null > myshowmusic.mp3

The '-i' option specifies the input file. The -a option specifies which audio or video track to rip (there will likely be only one, which would be track 0) and -x specifies the output format (default is whatever the original was encoded in). Because people are used to mp3 files, I just went with that. If you're *nix-savvy, you already know that '2> /dev/null' is throwing away STDERR messages, and '> myshowmusic.mp3' is dumping the STDOUT to a file. The tcextract program doesn't have an output file option, it just sends it through STDOUT, which is useful for all sorts of piping operations.

You might also be interested in the command line to extract just the video. This is because some DVRs (including MythTV) may store video in standard MPEG2 files, but they also add little index markers to help with playback. This results in a non-standard file which isn't going to play well with some players. Splitting the audio and video and then recombining them is an effective way to drop these index markers. The video command might look like this:

/usr/bin/tcextract -i myshow.mpg -x mpeg2 2> /dev/null > myshowvideoonly.mpeg2

The command line is pretty close to the audio one, it just uses a different codec. When you're finished extracting both the audio and the video, you can use the mplex utility (provided by the mjpegtools package) to combine them back together. The command line will look something like this:

/usr/bin/mplex -O -200 -f 8 -M -o myshow.mpg myshowvideo.mpeg2 myshowaudio.mp3

The '-O' option is important in our case, because sometimes splitting audio and video causes them to get out of sync with each other. The '-200' is actually an argument to '-O' telling it to start the video negative 200 milliseconds before the audio (so basically, 200ms after). The '-f 8' tells mplex to use a very minimal DVD format (check the man page for other formats available). The '-M' switch, yeah, I'm not totally keen on what it does. According to the man page, 'This flag makes mplex ignore sequence end markers embedded in the first video stream instead of switching to a new output file. This is sometimes useful splitting a long stream in files based on a -S limit that doesn’t need a run-in/run-out like (S)VCD.' I hope you know what that means, because I got lost halfway in. Last up, -o specifies the output file, and the next two arguments are the input files (video and audio) that will be stuck back together.

For those of you that are interested, the command line switches that I used were swiped from the source code of a program called tivo2dvd. If you're up to speed on your Perl, you might want to check out the source. In fact, when it comes down to it, most of this series of articles was derived from reading the source code of this package. It's amazing what you can discover by reading a little source code, isn't it? My articles go into detail that source code doesn't, but let's make sure to give credit where credit is due.

Wednesday, September 24, 2008

Command Line DVD Authoring: Part 3

This is part of a multi-part series on creating DVDs manually from the command line. It is not expected that regular users will generally be performing video editing or DVD authoring from the command line. Rather, this guide is intended for programmers who may be wishing to build a front-end for DVD authoring, and don't want to sift through miles of documentaion just to get the basics. This guide makes use of command-line utilities already freely available, but is not meant to be a complete set of documentation for any of these utilities. Instead, consider it a primer. The parts in this series are:

Part 1: Editing a Video File with MPlayer
Part 2: Converting a Video to DVD Format
Part 3: Making a DVD Menu
Part 3.1: Extracting Audio From A Video
Part 4: Building a DVD .iso File

It should be noted that while the programs themselves should remain relatively the same between Linux distros, the name of the packages themselves are likely to change. This tutorial was written using Ubuntu 8.04 as the reference OS, so if you use a different distro, your mileage may vary.


Part 3: Making a DVD Menu

Part 2 dealt with with converting a file to DVD format, with or without the edl files discussed in Part 1. This part talks about taking your newly-edited (and possibly re-encoded) video files and building a menu for them.

There's an excellent suite of tools available for DVD authoring in Linux, called tovid. This package also includes a command, tovid, to convert from one video format to DVD format. The reason I didn't use this command in Part 2 is that as far as I know, it does not support .edl files, and that is key for linear editing.

But that doesn't mean that you shouldn't install tovid. There are a variety of other tools in that package that can be used for other steps of the DVD authoring process. The first of these that we will discuss, makemenu, will help us get our main menu together for our disc. It is possible to author a DVD with no menu, creating a disc that starts playing a video immediately, and then loops when it finishes. I have a commercially-produced DVD like this at home, and it feels like little more than an amateur attempt. Even if you only have one video to play, it's nice to begin with a menu and allow the viewer to start the video when they're good and ready, and when the video ends, it's nice to have it just end.

A typical command line for makemenu might look something like this:

/usr/share/tovid/makemenu -ntsc -dvd -scale \
-background my-background-image.jpg \
-audio my-menu-music.mp3 \
-textcolor '#000000' \
-font Helvetica \
-fontsize 24 \
-align southwest \
-menu-title 'My DVD' 'Show 1' 'Show 2' \
-out my-dvd

You can probably figure out the -ntsc option (there is a -pal option too). We have specified -dvd because we're not interested in creating a vcd or svcd. Both -ntsc and -dvd are defaults. The -scale option is a safeguard, in case you decide to use an image file that isn't exactly a 4:3 aspect ratio. Be careful here, it may make your background image look funny. There is also an option to crop the image to make it fit, but personally, I'll just use an image editing program to do that part.

Next we specify the background image itself. I'll leave that up to you. If you don't specify one, there is a default. You may want to specify some audio too (the default is 4 seconds of silence). I would leave that up to you too, but if you're interested in using audio ripped from your video, check out Part 3.1: Extracting Audio From A Video.

Most of the rest of the options have to do with the text that will show up in your menu. The color of the text is in #RRGGBB format (default is white, #FFFFFF) the default font is Helvetica, and the default fontsize is 24. Align tells makemenu where to actually position the text. Since makemenu uses ImageMagick, it will support any 'gravity' option from ImageMagick (including northwest for the top-left, southwest for the bottom-left, etc)

Next up is the -menu-title option. This will accept a series of arguments, the first one of which is the title of the disc itself. If you don't want to specify a title (I find that sometimes the background image says it all), you can use leave it as ' '. Every argument after that will be the name of each video segment that you provide in the next command line (see Part 4). The last argument, -out, will specify the name of the MPEG2 file in VOB format that will be used on the DVD itself.

Once you have the menu made up, you're just a couple of steps away from actually creating your DVD image. We'll finish up in Part 4: Building a DVD .iso File.

Tuesday, September 23, 2008

Command Line DVD Authoring: Part 2

This is part of a multi-part series on creating DVDs manually from the command line. It is not expected that regular users will generally be performing video editing or DVD authoring from the command line. Rather, this guide is intended for programmers who may be wishing to build a front-end for DVD authoring, and don't want to sift through miles of documentaion just to get the basics. This guide makes use of command-line utilities already freely available, but is not meant to be a complete set of documentation for any of these utilities. Instead, consider it a primer. The parts in this series are:

Part 1: Editing a Video File with MPlayer
Part 2: Converting a Video to DVD Format
Part 3: Making a DVD Menu
Part 3.1: Extracting Audio From A Video
Part 4: Building a DVD .iso File

It should be noted that while the programs themselves should remain relatively the same between Linux distros, the name of the packages themselves are likely to change. This tutorial was written using Ubuntu 8.04 as the reference OS, so if you use a different distro, your mileage may vary.

Part 2: Converting a Video to DVD Format

Part 1 dealt with using mplayer to create an edl (edit decision list) file. This part deals with converting a file to DVD format, with or without an edl file. In many cases, the video you want to convert does not need any editing. In these cases, you can skip the instructions specific to the -edl option.

There is a utility for MPlayer called MEncoder, which can generally convert between any two formats that MPlayer is able to play. Technically, MPlayer and MEncoder are part of the same project, but depending on your Linux distro, may need to be installed in separate packages. Many of the command-line options for the mplayer command are also available for the mencoder command, including the -edl option that we used to playback a video with an edl file. When using this option, you may want to keep in mind the -hr-edl-seek option. This is due to a common method of video compression using what is sometimes referred to as keyframe boundaries.

As you might expect, it is possible to store each separate frame of a video inside a file. This is basically the same thing that reel-to-reel projectors do, playing back each frame of a show one-by-one. This is a lossless method, meaning that since each frame is stored exactly as is, there will be no loss in quality. Unfortunately, it also takes up considerably more resources. The MPEG standard takes a different approach. A single frame of video will be stored, but then rather than storing the entire following frame, only the changes between the frames are stored. This is often performed using a lossy method, meaning that there might be some slight loss of quality in the compressed video. Every few seconds, this progression of storing only changes to a frame is thrown away, and the video starts again with a new frame, and then stores only the subsequent changes. These are known as keyframe boundaries, and they should be kept in mind when editing video.

Because of these boundaries, most video editing software will re-encode video each time an edit is made to the original. Even if the same compression method is used as before, it will be applied a second time, which is likely to result in a loss of more quality. Some editing software provides the ability to cut at keyframe boundaries, resulting in a file that is bit by bit identical to the original, except for where content was removed. Since no re-encoding was involved, there will be no additional loss to the video quality.

The cool thing about the -hr-edl-seek option is that rather than skipping over the part of the video in the edl file, it tells mencoder to decode each invidual frame, and then re-encode only the ones that are actually needed. This allows mencoder to create an edited version of the video more precisely than usual. It does have a couple of caveats, however. First of all, it is slower. But personally, I don't mind the performance trade. I'm probably going to queue up my shows to re-encode overnight anyway. Second, if you decide to try and use it with '-ovc copy' (a method that only copies the video frame by frame, rather than decoding and re-encoding), it may not work as well, if at all. Your mileage may vary.

Once your edl file is set up, it's time to convert it into a type of MPEG2 file called a VOB, which is the format that DVD players use. If your video came from a TiVo box, there's a good chance it's already technically in MPEG2 format. If you recorded your show at "High Quality", it's probably at exactly the same quality as most DVDs anyway. (Disclaimer: If you use the TiVo service, it is probably a violation of your service agreement to strip the TiVo copy protection in order for the video to be used in this manner. If you decide to do that, you're on your own.) If your video came from another source, it will probably need to undergo some re-encoding.

What part of the world you live in will also come into play at this point. This has nothing to do with DVD regions, but in fact has to do with analog encoding methods. DVDs are digital, and should not technically be subject to analog restrictions. But American televisions are still designed to handle video differently than, say, British televisions, because of how video has been transmitted in those countries for decades. If you are in America (and some other parts of the world), you will probably want to optimize your shows for NTSC output. If you are in Britain (or some other parts of the world), you will probably want to go with PAL. It is also possible to set up your video for full-screen or wide-screen.

The following command line is my own personal command for making VOB files. It is shown here on multiple lines with backslashes, but I have it as a single line in a script, minus the backslashes:

mencoder -of mpeg -mpegopts format=dvd -srate 48000 -ofps 30000/1001 \
-ovc copy -oac copy -lavcopts vcodec=mpeg2video: \
vrc_buf_size=1835:keyint=18:vrc_maxrate=9800: \
vbitrate=4900:aspect=4/3:acodec=ac3:abitrate=192 \
"$INPUTFILE" -o "$OUTPUTFILE" -edl "$EDLFILE" \
-hr-edl-seek

Note: I use .mpeg2 for my output file extension, but what you choose for yours really doesn't matter. It's going to get renamed anyway.

My command was adapted from a series of commands available on the Gentoo wiki. I don't know what it is about Gentoo users, but they have managed to put together some excellent documentation on MPlayer and MEncoder, in many cases far superior to MPlayer and MEncoder's own documentation. Their original versions are as follows:

NTSC Widescreen:

mencoder -of mpeg -mpegopts format=dvd -srate 48000 -ofps 30000/1001\
-ovc lavc -oac lavc -lavcopts \
vcodec=mpeg2video:\
vrc_buf_size=1835:\
keyint=18:\
vrc_maxrate=9800:\
vbitrate=4900:\
aspect=16/9:\
acodec=ac3:abitrate=192 \
~/Videos/path/to/file-divx.avi -o ~/Videos/path/to/file-divx.mpeg2

NTSC Fullscreen:

mencoder -of mpeg -mpegopts format=dvd -srate 48000 -ofps 30000/1001\
-ovc lavc -oac lavc -lavcopts \
vcodec=mpeg2video:\
vrc_buf_size=1835:\
keyint=18:\
vrc_maxrate=9800:\
vbitrate=4900:\
aspect=4/3:\
acodec=ac3:abitrate=192 \
~/Videos/path/to/file-divx.avi -o ~/Videos/path/to/file-divx.mpeg2

PAL Widescreen

mencoder -of mpeg -mpegopts format=dvd -srate 48000 -ofps 25 \
-ovc lavc -oac lavc -lavcopts vcodec=mpeg2video:\
vrc_buf_size=1835:\
keyint=15:\
vrc_maxrate=9800:\
vbitrate=4900:\
aspect=16/9:\
acodec=ac3:abitrate=192 \
~/Videos/path/to/file-divx.avi -o ~/Videos/path/to/file-divx.mpeg2

PAL Fullscreen

mencoder -of mpeg -mpegopts format=dvd -srate 48000 -ofps 25 \
-ovc lavc -oac lavc -lavcopts vcodec=mpeg2video:\
vrc_buf_size=1835:\
keyint=15:\
vrc_maxrate=9800:\
vbitrate=4900:\
aspect=4/3:\
acodec=ac3:abitrate=192 \
~/Videos/path/to/file-divx.avi -o ~/Videos/path/to/file-divx.mpeg2

You'll notice that my version differs from the Gentoo version in a couple of ways. First of all, I make use of the aforementioned -edl and -hr-edl-seek options. Secondly, I switched my encoding method from the 'lavc' option to 'copy'. Since my files are typically encoded in a quality similar to the TiVo files that I mentioned, there is no need to re-encode the video; it's perfect as-is. I noticed that when I used the 'lavc' option, my video files ended up somewhere in the neighborhood of a third of the size of the originals (a quarter, if you count the commercials that were cut), and they suffered from a quality loss. Granted, I would have been able to fit far more video on a single disc like this, but I like my videos to be high-quality.

As far as the options go, well, I don't actually understand all of them. I just copied them from the Gentoo wiki. I do recognize a couple of them. The -ofps option specifies the frames per second. PAL uses 25, while NTSC uses 29.97. Yes, you read that right, NTSC has always used a decimal number of frames per second. As I understand it, it was "a brilliant technical hack that allows the color video signal to be shoehorned into a black and white signal without screwing up B&W TVs".

The -of option speficies the output format (use 'mencoder -of help' to see what's available for you), while -o specifies the output file. There isn't an actual flag for the input file, it just needs to appear somewhere in the command line. The -ovc and -oac options specify the output video and audio codecs, respectively. The rest of the options that you see are specific to DVDs and aspect ratios. I'm going to let you look them up. Personally, right now I'm happy that somebody did that part for me.

Now that you have your videos in the format that the DVD player is expecting, it's time to start authoring the disc itself. We'll get started with this in Part 3: Making a DVD Menu.

Monday, September 22, 2008

Command Line DVD Authoring: Part 1

This is part of a multi-part series on creating DVDs manually from the command line. It is not expected that regular users will generally be performing video editing or DVD authoring from the command line. Rather, this guide is intended for programmers who may be wishing to build a front-end for DVD authoring, and don't want to sift through miles of documentaion just to get the basics. This guide makes use of command-line utilities already freely available, but is not meant to be a complete set of documentation for any of these utilities. Instead, consider it a primer. The parts in this series are:

Part 1: Editing a Video File with MPlayer
Part 2: Converting a Video to DVD Format
Part 3: Making a DVD Menu
Part 3.1: Extracting Audio From A Video
Part 4: Building a DVD .iso File

It should be noted that while the programs themselves should remain relatively the same between Linux distros, the name of the packages themselves are likely to change. This tutorial was written using Ubuntu 8.04 as the reference OS, so if you use a different distro, your mileage may vary.


Part 1: Editing a Video File with MPlayer

This part of the series assumes that you have a raw video file that needs editing. The only requirement on the format of the file is that it is something that MPlayer can read. There are two basic types of edits that can be used: skipping or muting video.

MPlayer uses a type of file called an "edit decision list" (edl). The concept of an edit decision list is not new. It has existed in several other pieces of software for years, and in fact really existed as entries on a piece of paper before video editing software even existed. In MPlayer this file specifies, in seconds and parts of a second, where the edit is to be made. The basic format of each line in an MPlayer edl file is:

<start point> <end point> <edit type>

The start and end points are specified in seconds, up to a few decimal points. The edit type can be 0 to skip the section of video, or 1 to just mute the section of video. A typical edl file may look like this:

0 68.201469 0
653.652954 833.749575 0
1374.589844 1619.768193 0
2079.794434 2350.648281 0
3169.433105 3466.679939 0
3669.048584 3815.461840 0

This file was generated from an hour-long television show that was set to start recording one minute early, and finish one minute after the show was expected to finish. I like to record my shows like this, since not all networks start and finish their shows the same way, or even at the same time. This edl file cut out the first minute and then some, took out a few commercial breaks, and then at the end of the show, cut out everything following the closing credits. Since I wasn't sure exactly what second the show ended at, I took out some insurance and added over a minute to the end time.

Originally, edl files were designed for editing language and content out of shows, not for cutting commercial breaks. This explains why they allowed such a high resolution to the start and end times. It may be that you only wish to mute a single word or phrase from a show. In this case, the amount of time between the start and end time will likely be less than a second, and the last value of the line will be 1, not 0.

There are two steps to dealing with edl files: you need to create the file, and then playback the video using the file. The mplayer command will accept the -edlout option to create a file while you watch the video. The -edl option will then playback a video using the file that you created. To start creating an edl file, the command will look something like this:

mplayer myvideo.mpg -edlout myvideo.edl

When using -edlout, there are some keystrokes that you can use to navigate the video, and then save start and end points to the edl file. Pressing 'i' once will tell mplayer to start an edit, and pressing it a second time will tell mplayer to end that edit and save it out to a file. By default, these edits will be saved as a skip (0), rather than a mute (1). If you have started an edit and mplayer finishs the video and exits before you end the edit, it will not be saved to the file.

Of course, it's difficult to watch a show and be able to time pressing the 'i' key perfectly every time. Pressing 'o' will toggle the on-screen display (OSD) modes. The first shows the time elapsed. The second also shows the total time of the video. The third will disable OSD altogether, and the fourth will re-enable it. By default, the mplayer command will start with OSD on, but nothing displayed. The gmplayer command will, by default, show the time elapsed and the total time.

If you miss a critical point where you would have pressed 'i' to mark a start or end point, you can just press 'i' anyway, and then go back into the file and fine-tune it later. Or there are a series of keys that you can used to navigate more quickly through a file. The left and right arrows will skip 10 seconds back or forward, respectively. The up and down arrows will skip 1 minute back or forward, respectively. And the page-up and page-down arrows will skip 10 minutes (or 15, depending on your system, I've seen both) back or forward, respectively.

Pressing or 'p' will pause or unpause the video. Pressing the period (.) will move one frame forward and then pause. Pressing the period repeatedly will continue to step one frame forward at a time. The nice thing is, when you step forward a frame, you can actually hear the snippet of audio for that frame. Then again, since a frame is so short in terms of time displayed, that may not be so helpful. If you press 'i' while the video is paused, it will mark the start (or end) point and then continue play.

When the video finishes playing, or you exit mplayer by pressing 'q', you will have an edl file ready for use, or for fine tuning. Chances are, even when you get good at pressing 'i' just at the right moment, you're still going to need to do some file tuning. Play back the video using a command that looks like this:

mplayer myvideo.mpg -edl myvideo.edl

Be careful that you don't accidentally type -edlout instead of -edl. The -edlout option will start a new, blank file, overwriting any that were there before. As you play back your video, it may be helpful to turn on the onscreen display. All of the same navigational keys work in this mode as well, and you will probably want to skip to each point to watch and verify your edit. If you skip past the start point, mplayer will not perform the edit. That means that if your edl file skipped three minutes of commercials, and you skipped over where the commercials started, mplayer will not know to just skip to the end of the commercials. But you can just skip back before the edit point, and it will still work as expected.

When I edit videos like this, I usually keep two terminal windows open: one with the edl file open in vim, and one in which to run the mplayer command over and over again. This is because once you have saved a change to the edl file, mplayer must be reloaded before it will pick up the new changes.

When you are finished creating your edl file, you can just use it for watching videos with mplayer (using the -edl option each time), or you can use it to make a second copy of the video, with all of the edits already in place. This is covered in Part 2: Converting a Video to DVD Format.

Wednesday, September 17, 2008

Tidbits

I don't have much to say about any one thing today, but I do have little things to say about a few things.

First of all, my buddy Paul took off to Seattle for a new job, and his old house in Sandy, UT is for sale. I've been to said house before, and it is nice. I kind of wish I could afford it myself. If you're interested, check out the website that he's set up for it.

Do you remember a couple of years ago when I first posted about absorption pasta? I've been playing with it off an on lately. Last night I made some with spaghetti (in a very wide pan, obviously) and chicken stock spiked with a little Worcestershire sauce, black pepper and Italian seasonings. Towards the end I added a handful of mirepoix (they sell it in the frozen veggie section now, at my local grocery store). When the pasta was pretty much al dente but there was still a little hot liquid, I added some marinara sauce. I served it with shredded cheddar on top, and it was my favorite thing to have eaten all week. My wife, who normally hates red sauce, even liked it.

Lately I've been working on clearing shows off of the DVR to make room for more shows. This has involved converting my wife's hairdressing shows to DVD format. In fact, I have not been allowed to remove Shear Genius without first making sure she has a copy on a DVD, whether Bravo ever decides to release it (preferred) or I have to do it myself. Last week I failed to bring any suitable reading material for the flights to and from Maryland, so I pulled out my laptop and slammed out a tutorial for command line DVD authoring in Linux, based on my recent experience. I now have a five-part series almost ready for public consumption. I even managed to get Linux video guru Steve Dibb to take a look at it and give me some thoughts. I plan to post it next week, one part per day.

Last but not least, I've been playing with what is basically a copy-cat recipe of Valdosta pecans. It's not done yet, but I can give you the ingredient list for my first batch. I might have tweaked with it a little bit, in an attempt to add depth to the flavors.

1 cup pecans pieces and halves, toasted
1/2 cup water
1/4 cup Jack Daniels
2 Tbsp sugar
1/4 cup dried cranberries
1 tsp dried orange zest
1/2 tsp freshly ground black pepper
pinch Kosher salt

Initial observations: sweetness is decent, but I wish I'd had brown sugar on hand. I didn't taste the JD at all. The orange flavor was a little strong, but that may have been because I only had orange-flavored Craisins on hand. I think the black pepper was just about dead on. When I get a chance to pick up more dried cranberries, I'll try it again and post instructions.

Thursday, September 11, 2008

Food Allergies and Crap Foods

I saw an interesting commercial last night. A boy and a girl (maybe early 20s) were picnicking in the park, and the girl offered the boy a popsicle. He told her her couldn't eat that, it had corn syrup in it! "You know what they say about corn syrup!" She asked him what they did say. He thought about it for what seemed like an eternity, and then apparently decided that he had no idea what they say about it, and that corn syrup must be okay. He then took the girl's popsicle from her and asked why she didn't bring two, or something like that. I can't say I was surprised when I saw that the commercial had been produced by some sort of group of corn farmers or something.

I think you know I'll be the first to tell you what's wrong with corn syrup: it tastes like crap. My dear cousin Ali would also tell you that she's allergic to it, and that if she ate enough of it, it would likely make her deathly ill. I'll come back to Ali in a moment.

Some of you may be thinking back to recipes that I've posted here in the past. Both the Tux Cake and the Beastie Cake had corn syrup in them, and in both cases it was corn syrup that I intentionally added. Am I hypocritical? Did I suddenly have an epiphany when I saw the popsicle commercial? It turns out corn syrup does have its uses. I keep a bottle of it in my pantry, and when I'm doing sugar work (and sometimes chocolate work), I might pull it out. It has very specific properties that I look forward to employing when I'm doing candy work. As an invert sugar, there are certain things that it can do that regular sugar just can't. In fact, one of those things is to help keep regular sugar from misbehaving.

That said, I've thought and thought and I can't think of anything besides candy that I would ever use corn syrup for. And in fact, there have been times when I've used honey, or real maple syrup in candymaking instead of corn syrup, because they're both invert sugars, and they taste several times better.

Look at a few other products in the store. How about barbecue sauce? Can you believe how much corn syrup they add to that? Dark brown sugar would taste much better, but it's not cheap like corn syrup is. How about bread? Have you ever wondered why so many breads list "high fructose corn syrup" as an ingredient? Sugars do play an important role in breadmaking, but cane sugar works just as well as corn sugar. Back before I even met my wife, I decided not to buy bread with corn syrup, and in the three years that we've been married, that's never been a problem. It turns out the stuff with sugar or brown sugar tastes better anyway.

Some people might point out the preservative properties of corn syrup. I would point out that cane sugar is a preservative too. So are honey and real maple syrup. Sugar is a preservative, no matter what form it comes in. So why use corn syrup instead of cane sugar? Because it's cheap. Manufacturers can charge you exactly the same for a product, whether it uses cane sugar or corn syrup, but they stand to make a lot more money when they use corn syrup. In this day and age, many companies will used it as an excuse to lower their price below the superior product of their competitor. They don't care that the cane sugar version of their product tasted better. It tastes "close enough", and some people can't even tell the difference. In fact, they've been doing it for so long that most people don't even know what the real stuff tastes like.

And of course, there are allergens. Even with all of her allergies, I'm sure there are plenty of soft drinks that my cousin Ali would be able to drink if they weren't crapped up with corn syrup and worse. It's been theorized that the recent epidemic of food allergies may be caused by overexposure to certain foods, particularly preservatives. I can't tell you how many times I've heard that exposure to latex can cause a person who previously had no problems with it to become increasingly allergic to it. It wouldn't surprise me to hear that there are other things out there that do the same thing.

If this were true with foods, it would certainly explain why a lot of people who led perfectly normal lives previously were suddenly and tragically diagnosed with a variety of illnesses and allergies that they literally have to change their entire lifestyle to deal with. Ali certainly wasn't born being allergic to wheat. She ate it her entire life with no ill effects, until at some point six years ago, it started making her sick. She eventually found out that a long list of foods were making her sick, and she had to re-learn how to cook, and ultimately how to eat.

Ali posted an article about Food Allergy Basics that was extremely informative. She's not the first person I've met with her condition, and by far not the last. Every few weeks I'll have a student in my class that can't partake of the catered lunch with everyone else because of some allergy or intolerance. When I find out, I always make sure they get a copy of my Gluten-Free Focaccia recipe, so that they can have something to eat that doesn't taste like crap. Now Ali's given me another resource to give them as well.

You don't need to give in to the major manufacturers, just because they shove their products in their faces. Maybe now would be a great time to re-learn how to cook and eat, before your body forces you to do so anyway. It might be a good time to start wondering why corn farmers are suddenly starting to put out propoganda about how safe and good their product is. When somebody suddenly makes a point of trying to convince you that there's nothing wrong with their product, maybe there's more to the story than they're telling you. Find out for yourself.

Monday, September 1, 2008

airplanemode.sh

I've mentioned this script to a couple of people, and I thought I might share where it's at right now, in case anybody's interested. A lot of the commands in here are things that I got from a program called PowerTOP, which I think is indespensible to anybody planning on using their laptop with a battery. This script runs on Ubuntu 8.04 (Hardy Heron) on a Thinkpad R61i, but I think it could be easily adapted to your own environment. In my case, it extended the life of my 4 cell battery from about 1 1/2 hours to almost 3 1/2 hours (even watching movies from the hard drive).

Most of these options are pretty self-explanatory to the professional Linux user, who should adapt them for their own needs. For instance, I definitely would take out the "hal-disable-polling" option if I planned on using the optical drive. And yes, I do run both MySQL and PostgreSQL at the moment (long story, but it's only temporary), I expect you'll only be running one of them, assuming you run either at all. But I found disabling them to add an extra 15 minutes or so to my normal 1 1/2 hours, and that's a lot.

A note about brightness. I don't mind turning the brightness down all the way anyway, at least on an airplane. I can still see it just fine. I did discover something interesting about the brightness though. If you go to System >> Preferences >> Power Management and click the "On Battery Power" tab, you will find an option for "Dim display when idle" and another for "Reduce backlight brightness". This seems like a good idea at first. If you don't touch your notebook for a couple of minutes (on full brightness), it will turn the brightness down for you. When you press a key in this state, it turns it back up to 50% brightness. The problem is, if you were already at 20% (the lowest value that seems to work, which is what my script sets it to) before the system went idle, it still turns it up to 50%.

Imagine that, a power saving option that, in the wrong hands, actually wastes power. I'm sure this is a kernel option in /proc/ or /sys/, but I haven't found it yet to add to my script. In the meantime, I just took the checkmarks out of those boxes. If anybody knows where those settings are, I would love to add them to my script.

I don't have a more elegant way at the moment to kill NetworkManager. Then again, I haven't really looked. I do know that Red Hat likes to use an /etc/init.d/NetworkManager script, but Ubuntu seems to be without. If you know of a more graceful way to kill it, let me now.

It should also be noted that this script is for somebody that turns their notebook on while on battery power, and then turns it off when finished. If you plan on using suspend or sleep modes (neither of which I really care for), you may want to rethink some of the other lines in here. I'm thinking about setting up a Grub option for this script, and maybe setting up a different runlevel (maybe the infamous runlevel 4?) to handle the /etc/init.d/ services for me. Add that to the "when I get around to it" list. On that note, it should be obvious that this script needs to be run as root (or at least with sudo), but I thought I'd toss in a reminder anyway.

Without any further ado, the script:

echo 20 > /proc/acpi/video/VID0/LCD0/brightness
echo min_power > /sys/class/scsi_host/host0/link_power_management_policy
echo 1500 > /proc/sys/vm/dirty_writeback_centisecs
hal-disable-polling --device /dev/cdrom 'hal'
ethtool -s eth0 wol d
modprobe -r hci-usb
/etc/init.d/apache2 stop
/etc/init.d/mysql stop
/etc/init.d/postgresql-8.3 stop
/etc/init.d/avahi-daemon stop
/etc/init.d/cupsys stop
/etc/init.d/bluetooth stop
/etc/init.d/dhcdbd stop
/etc/init.d/winbind stop
/etc/init.d/stunnel4 stop
/etc/init.d/bluetooth stop
/etc/init.d/networking stop
pkill NetworkManager