Revision History Version 0.10 The beginning Version 0.11 Ether-wake update. Version 0.12 Slackware updates. Version 0.14 Small updates. Version 0.20 Updates and formatting. Version 0.27 Filling in gaps and organizing. Version 0.35 More filling in. Version 0.36 Formatting. Version 0.37 OpenMosix testing Version 0.38 Dual use workstation/node reboot script Version 0.40 Improved mp3 ripping script Version 0.41 Updated software versions Version 0.50 Improved mp3 ripping script Version 0.51 Add link for id3ed Version 0.52 Updated mp3 ripping script Contact If you have any additions, corrections or suggestions please email me at hackjoe@yahoo.com. Beginning In the beginning I wanted to create a quick and easy cluster with the nodes booting from a CD. Things quickly spiraled into a much larger project, including PXE network booting, ram disks, NFS root, DHCP/BOOTP, and Wake on LAN. I originally started with Red Hat 7.2 since I had the CDs handy and I could use the rpm files. I have since tested it with Slackware 8.1 and 9.0beta to fill in any gaps in this HOWTO. Hopefully everything is generic enough that any Linux distribution should work fine. I have included a list of files that are required to get things up and running. You may also need to upgrade the TFTP server, as there are some reports of the TFTP servers packaged in distributions not playing nice with PXE booting. Also included is a list of reference material, mostly Internet links that I have used. Some of the links I used often enough that I printed the document. Other links I only used to fix specific problems that I ran into, I have include them in case they clear up any issues you may encounter. I found most of them via cross-links and wading through Google searches. Requirements and Assumptions I am assuming you know how to install a Linux distribution, patch and compile a kernel, compile user land software, basic file commands, change settings in the BIOS, and an ok grip on networking. All of the user land software is relatively stable and should compile without any problems using, ./configure && make && make install. Any exceptions are noted in the detailed instructions. The master server will be running TFTP, DHCP, and NFS daemons, which could cause some problems if it was connected to a production network. I have not put a lot of thought into the general security of the cluster; there could be some large security holes. All the files are setup for a 192.168.0.0/24 address space. Dell Precision 410, GX110 and GX115 hardware were used for testing, so some settings may not be available on different systems. The terms server and master are interchangeable. The terms node and slave are interchangeable. Checklist/Quick start 1. Download latest versions of the required files. 2. Extract kernel sources 3. Extract openMosix kernel patch. 4. Apply openMosix kernel patch. 5. Compile master kernel. 6. Install master kernel. 7. Compile client kernel. 8. Reboot master computer. 9. Setup server daemons. 10. Setup server configuration files. 11. Setup server userland software. 12. Setup client file system. 13. Ether-wake. 14. Doing something useful. Required files: http://openmosix.snarc.org/files/stable/patch-2.4.26-om-20040706.bz2 http://prdownloads.sourceforge.net/openmosix/openMosixUserland-0.3.6-2.tgz?download http://www.kernel.org/pub/linux/kernel/v2.4/linux-2.4.26.tar.bz2 http://www.busybox.net/downloads/busybox-0.60.5.tar.bz2 http://www.openmosixview.com/download/openmosixview-1.5.tar.gz http://www.kernel.org/pub/linux/utils/boot/syslinux/syslinux-2.10.tar.bz2 ftp://ftp.scyld.com/pub/diag/ether-wake.c http://psoftware.org/clumpos/contrib/plumpos-6.2.bin.iso.gz http://prdownloads.sourceforge.net/lame/lame-3.96.1.tar.gz?download http://www.openmosixview.com/omtest/omtest-0.1-4.tar.gz http://www.dakotacom.net/~donut/programs/id3ed/id3ed-1.10.4.tar.gz Potentially required files http://www.kernel.org/pub/software/network/tftp/tftp-hpa-0.31.tar.bz2 Compiling the kernels Backup your existing kernel sources. Extract kernel 2.4.26. Extract openMosix patch into the /usr/src/linux directory. Patch kernel sources with the openMosix patch with 'patch -Np1 < openMosix-2.4.26 > err.om' Check the patch error file, err.om, to make sure everything worked OK. Both kernels In the openMosix menu select the following options. openMosix process migration support Stricter security on openMosix ports 1 for Level of process-identity disclosure (0-3) openMosix File system Disable OOM Killer Server kernel 1. Configure a standard kernel for the master server. 2. Select the NFS server option in File Systems-Network Files Systems 3. Make dep, make clean, make bzImage, make modules and make modules_install. 4. Copy /usr/src/linux/arch/i386/boot/bzImage to /boot/vmlinuz-openmosix-server 5. Copy /usr/src/linux/System.map to /boot/System.Map-openmosix-server 6. Edit /etc/lilo.conf, or the boot manager you are using, to boot the new kernel. Node kernel 1. Configure a kernel for the node, selecting all required drivers. A monolithic kernel is recommended. 2. Select the RAM disk support in Block Devices with a size of 16384. 3. Select the Initial RAM disk support in Block Devices. 4. Select the IP kernel level auto configuration in the Networking Options 5. Select the IP: BOOTP support in the Networking Options. 6. Select the NFS file system support option in File Systems-Network Files Systems 7. Select the Root file system on NFS option in File Systems-Network Files Systems 8. Make dep, make clean, and make bzImage. 9. Leave the compiled kernel in arch/i386/boot. Server daemons Dchpd 1. Edit /etc/dhcpd.conf. 2. Edit one of the rc files to start the DHCPD daemon. 3. Most of the HOWTOs setup each node with a specific IP via MAC addresses. I am not terribly concerned with knowing which node is doing the work so I have skipped this step. Server /etc/dhcpd.conf ddns-update-style ad-hoc; #slackware specific subnet 192.168.0.0 netmask 255.255.255.0 { option subnet-mask 255.255.255.0; range dynamic-bootp 192.168.0.2 192.168.0.25; default-lease-time 21600; max-lease-time 43200; next-server 192.168.0.1; filename "pxelinux.0"; option root-path "192.168.0.1:/home/client/"; } Tftp 1. Extract tftp software if required. 2. Edit the xinetd.conf or inetd.conf to start the TFTP daemon. 3. The following line in rc.local will work as well. /usr/sbin/in.tftpd -l -s /home/tftpboot -u nobody & Red Hat server /etc/xinetd.conf # # Simple configuration file for xinetd # # Some defaults, and include /etc/xinetd.d/ defaults { instances = 60 log_type = SYSLOG authpriv log_on_success = HOST PID log_on_failure = HOST cps = 25 30 } service tftp { socket_type = dgram protocol = udp wait = yes user = root server = /usr/sbin/in.tftpd server_args = -s /home/tftpboot -u nobody disable = no } NFS 1. Edit the /etc/exports file. 2. Edit the appropriate rc files to start the NFS daemons. 3. Test with 'mount 192.168.0.1:/home/client /mnt/' Server /etc/exports /home/client/ 192.168.0.0/24(rw,no_root_squash) /home/tftpboot/ 192.168.0.0/24(rw,no_root_squash) Server files Configuration files. Create and edit /etc/hpc.map. Server and node /etc/hpc.map # Each line should contain 3 fields, mapping IP addresses to MOSIX node-numbers: # 1) first MOSIX node-number in range. # 2) IP address of the above node (or node-name from /etc/hosts). # 3) number of nodes in this range. # # Example: 10 machines with IP 192.168.1.50 - 192.168.1.59 # 1 192.168.1.50 10 # # MOSIX-# IP number-of-nodes # ============================ 1 192.168.0.1 6 Setting up mosix file system. Create a directory /mfs Add the following line to /etc/fstab. mymfs /mfs mfs defaults 0 0 User land files. OpenMosix User land 1. There must be an OpenMosix kernel compiled for this software to compile properly. 2. Extract the openMosix userland sources and cd into the created directory. 3. Edit the 'configuration' file. Change the OPENMOSIX line to = to where you have put the patched kernel sources. 4. make all 5. The hard way to make openMosix start on boot. Copy openMosix file from the scripts directory to the appropriate startup directory. 6. The easy way to make openMosix start on boot. Add the following line to rc.local /sbin/setpe -W -f /etc/hpc.map. 7. Test with /bin/mosctl status 1 OpenMosixview 1. Extract the openmosixview sources and cd into the created directory. 2. Configure, compile and install. 3. Start X and test. Setup client file system 1. Create a directory /home/tftpboot and change the permissions to 755 2. Create a directory /home/client and change the permissions to 755. 3. Download the plumpos.iso and burn it onto a CD. You do not have to burn the CD but if one of the node workstations has a bootable CD drive it may help in troubleshooting the cluster later on. Extract the plumpos-aux.1.tgz to /home/client. 4. Extract the busybox tarball. Change the directory into the busybox directory and run make and sh ./install.sh /home/client/ 5. Copy /etc/hpc.map to /home/client/etc/ 6. Create init.d directory in /home/client/etc/ 7. Create and edit rcS file in /home/client/etc/init.d/ 8. Create directory /home/client/dev 9. Create directory /home/client/mfs 10. Create directory /home/client/var 11. Create directory /home/client/proc 12. Create and edit /home/client/etc/fstab 13. Install compiled node kernel in /home/tftpboot. 14. Extract the syslinux tarball. Change the directory into the syslinux directory and copy the file pxelinux.0 to /home/tftpboot. 15. Create /home/tftpboot/pxelinux.cfg directory 16. Create and edit the /home/tftpboot/pxelinux.cfg/default file to reflect the name of the new node kernel. Node /etc/init.d/rcS #!/bin/ash mount /proc -t proc mknod /dev/ram0 b 5 0 mount /var mkdir -p /var/lock/subsys /sbin/setpe -W -f /etc/hpc.map # /bin/rbt # add if using dual use workstations Node /etc/fstab 192.168.0.1:/home/client/ / nfs rw,hard,intr,nolock 0 0 192.168.0.1:/home/client/usr /usr nfs rw,hard,intr,nolock 0 0 192.168.0.1:/home/client/tmp /tmp nfs rw,hard,intr,nolock 0 0 none /proc proc defaults 0 0 /dev/ram0 /var ramfs defaults 0 0 mymfs /mfs mfs defaults 0 0 /home/tftpboot/pxelinux.cfg/default LABEL linux KERNEL vmlinuz-openmosix-node APPEND nfsaddrs=192.168.0.1 root=/home/client Ether-wake Enable WOL and collect the MAC addresses on node stations. Extract the ether-wake file. Compile with this command 'gcc -O -Wall -o ether-wake ether-wake.c' Copy the ether-wake binary to /usr/local/sbin/ Test with this command. /usr/local/sbin/ether-wake 00:11:22:33:44:55 Where 00:11:22:33:44:55 is the MAC address of the node's NIC. The node should wake up and proceed with a network boot. Add the ether-wake commands to wake the nodes to the rc.local file or the crontab script to run at night on the master server. If there is a BIOS option is to network boot first when a Magic Packet is received, enable this option. This allows the workstation to be used during in the day as usual booting from the hard drive and become a node, booting from the network at night when it receives a Magic Packet from the master server. Use the rbt script to reboot the computer. Remove the # from the last line in the rcS file as well. /home/client/bin/rbt #!/bin/ash clear echo -n "Press 'Enter' to reboot" read x /sbin/reboot Doing something useful/testing your new cluster. OpenMosix Testing Extract the omtest tarball to /usr/local. Change to the directory /usr/local/omtest Run the command './compile_tests.sh' This will install and compile the required scripts. Run the command ./start_openMosix_test.sh This will take a while on slower machines or a small cluster. The test will create a log file in /tmp Run the command ./run_lmbench.sh MP3 encoding test script The following script will count the number of nodes currently up, the total number of processors running on those nodes. It will then start as many copies of lame as there are processors running. You will need to have more wav files than processors of course. This can be used to test the cluster and start using the process balancing/migration features. #!/bin/bash #ripping cd cd=/dev/sr0 cd /stripe/rip echo Ripping CD to wav /usr/bin/cdda2wav -cddb=1 -s -D $cd -alltracks eject $cd echo Ripping CD to wav complete echo Counting nodes cn=0 cpu=0 tn=$(ls /proc/hpc/nodes | wc -l) for x in `ls /proc/hpc/nodes` do nocpu=`cat /proc/hpc/nodes/$x/cpus` if (( $nocpu!="-101" )); then ((cn=cn+1)) let "cpu=cpu+`cat /proc/hpc/nodes/$x/cpus`" fi done #echo The cluster has $tn nodes with $cpu cpus. #echo Ripping wav to mp3 # Variables dir="/stripe/rip" # Encoders # lame encoder="/usr/local/bin/lame" options="-S -m s -h -b 192" # BladeEnc #encoder="/usr/local/bin/BladeEnc" #options="-br 320 -rawstereo -nocfg" # Start the mp3 encoding c=1 n=$(ls $dir/*wav | wc -l) until [ $c -gt $n ] do g=$(ps -ef|grep -v grep|grep lame|wc -l) while [ $g -ge $cpu ] do echo Running on all cpus. sleep 15; g=$(ps -ef|grep -v grep|grep lame|wc -l) done if [ $c -le 9 ] then $encoder $options audio_0$c.wav & else $encoder $options audio_$c.wav & fi ((c=c+1)) done echo Wav to mp3 done #variables #a=albumtitle #b=band #f=filename #g=genre #s=song ##sc=song count ##st=song total #t=track #y=year echo Parsing CDDB info a=$(cat $dir/audio_01.inf | grep -i Albumtitle | sed -e s/Albumtitle=// | sed -e s/\'\//g | tr -d '\t') b=$(cat $dir/audio_01.inf | grep -i AlbumPerformer | sed -e s/Albumperformer=// | sed -e s/\'\//g | tr -d '\t') g=$(cat $dir/audio.cddb | grep -i DGENRE | sed -e s/DGENRE=//) y=$(cat $dir/audio.cddb | grep -i DYEAR | sed -e s/DYEAR=//) mkdir /stripe/music/"${g}" echo Adding ID3 tags, renaming, and moving mp3s st=$(ls $dir/*wav | wc -l) sc=0 t=1 until [ $t -gt $st ] do s=$(cat $dir/audio.cddb | grep -i TTITLE$sc= | sed -e s/TTITLE$sc=// | sed -e s#/#-#g) if [ $t -le 9 ] then /stripe/bin/id3ed -q -n "${b}" -a "${a}" -y $y -g "${g}" -k $t -s "${s}" $dir/audio_0$t.wav.mp3 else /stripe/bin/id3ed -q -n "${b}" -a "${a}" -y $y -g "${g}" -k $t -s "${s}" $dir/audio_$t.wav.mp3 fi f=$b-$s if [ $t -le 9 ] then mv $dir/audio_0$t.wav.mp3 /stripe/music/"${g}"/"${f}.mp3" else mv $dir/audio_$t.wav.mp3 /stripe/music/"${g}"/"${f}.mp3" fi ((sc=sc+1)) ((t=t+1)) done echo Copying files to /stripe/music/$g echo Cleaning up rm $dir/*inf rm $dir/*wav rm $dir/audio.cd* echo Complete References: Each link is to documentation that either I consulted often or only once to solve a specific problem. http://www.ovro.caltech.edu/ovrodocs/misc/PXE-Network-Bootloading.pdf http://www.tldp.org/HOWTO/JavaStation-HOWTO/ http://www.math.byu.edu/CompResources/cheops/files.html http://www.math.byu.edu/CompResources/cheops/server_tweaks.html http://www.netwinder.org/~ralphs/howto/Disk-Update-HOWTO-9.html http://webpal.bigbrd.com/ http://www.busybox.net/lists/busybox/2002-February/011008.html http://www.ibiblio.org/pub/Linux/docs/HOWTO/mini/NFS-Root http://www.ibiblio.org/pub/Linux/docs/HOWTO/Mosix-HOWTO http://openmosix.sourceforge.net/community.html http://openmosix.sourceforge.net/ http://www-105.ibm.com/developerworks/education.nsf/linux-onlinecourse-bytitle/F86D74C7B3B4E65486256B2900073A2E?OpenDocument http://clumpos.psoftware.org/ http://www.vlug.org/vlug/meetings/X-terminal_presentation/details.html Linux Magazine October 2002 Network Booting article