Hacking Inferno

Friday, March 5, 2010

Aggregated status file and selective reservations

There has been couple of important improvements in the working of taskfs. Firstly, there is addition of working status file in top directory of taskfs. Do not confuse the session level status file with this one. Session level status file provides the status of only that session. But this file provides the status of all resources available. This file is readable and these reads are non-destructive. Which means you can read this file can be read multiple times. What this file returns is number of compute nodes available for each OS and Architecture. The content of this file is a list of tuples each containing three values, OS Architecture Nodecount. This file is aggregated, which means that these nodecount represent counts including all the descendent's. This file gives an idea of amount and type of resources available for you.

Next enhancement is to implement selective reservation using information available from status file. So, now resource reservations can also specify what type of resources are needed for their jobs. Reservations will be made only on those resources which have requested OS and architecture. If no OS or architecture is given or if wildcard is given instead, then any available resource will be used. Following is the new format of res command. res .

Now, lets try out one demo. Python is used bellow to provide interactive job management with taskfs.


$ python
Python 2.6.4 (r264:75706, Dec  7 2009, 18:45:15)
[GCC 4.4.1] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys, os
>>> os.system ("9pfuse localhost:5555 ./mpoint")
0
>>> os.system ("tree ./mpoint")
./mpoint
`-- remote
    |-- arch [error opening dir]
    |-- clone
    |-- env [error opening dir]
    |-- fs [error opening dir]
    |-- ns [error opening dir]
    `-- status

5 directories, 2 files
0
>>> topStatusFile = open ( "./mpoint/remote/status", "r", 0)
>>> print topStatusFile.read()
Linux 386 4

>>> topStatusFile.close()

Till this point, we have mounted the taskfs, and we have checked the status of top level status file. It shows that there are 4 nodes, each with Linux running on 386 architecture. Now, lets do some reservation and demo job execution.


>>> ctlFile = open ( "./mpoint/remote/clone", "r+", 0)
>>> print ctlFile.read()
0
>>> os.system ("tree ./mpoint")
./mpoint
`-- remote
    |-- 0
    |   |-- args
    |   |-- ctl
    |   |-- env
    |   |-- ns
    |   |-- status
    |   |-- stderr
    |   |-- stdin
    |   |-- stdio
    |   |-- stdout
    |   `-- wait
    |-- arch [error opening dir]
    |-- clone
    |-- env [error opening dir]
    |-- fs [error opening dir]
    |-- ns [error opening dir]
    `-- status

6 directories, 12 files
0
>>> sFile = open ("./mpoint/remote/0/status", "r", 0)
>>> print sFile.read()
cmd/0 2 Unreserved /home/pravin/projects/inferno/pravin//inferno-rtasks/ ''

>>> sFile.close()
>>> ctlFile.write ("res 5 Linux 386")
>>> sFile = open ("./mpoint/remote/0/status", "r", 0)
>>> print sFile.read()
cmd/1 2 Closed /home/pravin/inferno/layer3//hg/ ''
cmd/2 2 Closed /home/pravin/inferno/layer3//hg/ ''
cmd/3 2 Closed /home/pravin/inferno/layer3//hg/ ''
cmd/1 2 Closed /home/pravin/inferno/pravin2//hg/ ''
cmd/2 2 Closed /home/pravin/inferno/pravin2//hg/ ''

>>> sFile.close()
>>> ctlFile.write ("exec hostname")
>>> ioFile = open ("./mpoint/remote/0/stdio", "rr", 0)
>>> print ioFile.read()
inferno-test
inferno-test
inferno-test
inferno-test
inferno-test

>>> ioFile.close()
>>> sFile = open ("./mpoint/remote/0/status", "r", 0)
>>> print sFile.read()
cmd/1 2 Done /home/pravin/inferno/layer3//hg/ hostname
cmd/2 2 Done /home/pravin/inferno/layer3//hg/ hostname
cmd/3 2 Done /home/pravin/inferno/layer3//hg/ hostname
cmd/1 2 Done /home/pravin/inferno/pravin2//hg/ hostname
cmd/2 2 Done /home/pravin/inferno/pravin2//hg/ hostname

>>> sFile.close()
>>> topStatusFile = open ("./mpoint/remote/status", "r", 0)
>>> print topStatusFile.read()
Linux 386 4

Following the demo of how resource re-claimation works once clone file is closed.


>>> os.system ("tree -d ./mpoint")
./mpoint
`-- remote
    |-- 0
    |   |-- 0
    |   |   `-- 0
    |   |       |-- 0
    |   |       |-- 1
    |   |       `-- 2
    |   `-- 1
    |       |-- 0
    |       `-- 1
    |-- arch [error opening dir]
    |-- env [error opening dir]
    |-- fs [error opening dir]
    `-- ns [error opening dir]

14 directories
0
>>> ctlFile.close ()
>>> os.system ("tree -d ./mpoint")
./mpoint
`-- remote
    |-- 0
    |-- arch [error opening dir]
    |-- env [error opening dir]
    |-- fs [error opening dir]
    `-- ns [error opening dir]

6 directories
0
>>>

Unfortunately, I do not have testbed with different OS and architectures running. So, selective repeat is practically useless in this case. But it is quite handy when one is dealing with diverse mix of compute nodes.

The code revision ae33f01fea onwards, selective reservation works.

Friday, February 19, 2010

Hierarchical job distribution support in Taskfs

This is quick post to update on completion of hierarchical task distribution in taskfs. In current situation, taskfs works by distributing the tasks to devcmd2 of remote nodes and those remote nodes execute the jobs. This is flat hierarchy of one level. In hierarchical system, client taskfs will mount remote taskfs which intern will mount next level of taskfs and so on...

To implement this, local task execution support was added to the taskfs by merging it with devcmd2. This way, now all taskfs are capable of behaving as leaf node, internal node or as root node. By using these taskfs as node, and mount-links as edges, one can create a directed graph.

The task given to root taskfs will be equally distributed in all its immediate children. These children will intern distribute the share of load given to them among their immediate children. Input will be similarly propagated from root to leaf nodes via tree. And output will be aggregated at each internal node before passing it to parent.

This directed graph is automatically created based on the availability of mounted remotes taskfs filesystems. Taskfs which does not have any other taskfs mounted will assume itself as leaf node. One can also force local execution of task by sending res 0 command or by directly sending exec cmd without doing reservation. Taskfs will execute job locally if there is no prior reservation command or if prior reservation command requested zero remote resources.

Revision 103b7aa87c is the first revision providing this support. One can use the same methodology described in previous blog do execute their jobs. The only difference that can be seen is that
when resources are reserved, all of those may not appear in session directory, only the immediate childrens of this taskfs will appear here.

Now, lets see one live example of this new taskfs, and how this tree looks like. Following is similar execution as described in previous blog, so not repeating the steps here. This tree was created for request of 4 tasks.


$ 9pfuse localhost:5555 mpoint
$ tree mpoint
mpoint
`-- remote
    |-- arch [error opening dir]
    |-- clone
    |-- env [error opening dir]
    |-- fs [error opening dir]
    |-- ns [error opening dir]
    `-- status [error opening dir]

6 directories, 1 file
$ ./cloneUserPause mpoint/remote/clone 4 "pwd"
6 directories, 1 file
  [[DEBUG]] command is [pwd] and input file [(null)]
  [[DEBUG]] clone returned [0]
  [[DEBUG]] path to ctl file is [mpoint/remote//0/ctl]
  [[DEBUG]] Writing command [res 4]
  [[DEBUG]] Writing command [exec pwd]
  [[DEBUG]] input file path [mpoint/remote//0/stdio]
  [[DEBUG]] opening [mpoint/remote//0/stdio] for reading output
/home/pravin/inferno/layer3/hg
/home/pravin/inferno/layer3/hg
/home/pravin/inferno/pravin2/hg
/home/pravin/inferno/pravin2/hg
$ tree mpoint
mpoint
`-- remote
    |-- 0
    |   |-- 0
    |   |   |-- 0
    |   |   |   |-- 0
    |   |   |   |   |-- args
    |   |   |   |   |-- ctl
    |   |   |   |   |-- env
    |   |   |   |   |-- ns
    |   |   |   |   |-- status
    |   |   |   |   |-- stderr
    |   |   |   |   |-- stdin
    |   |   |   |   |-- stdio
    |   |   |   |   |-- stdout
    |   |   |   |   `-- wait
    |   |   |   |-- 1
    |   |   |   |   |-- args
    |   |   |   |   |-- ctl
    |   |   |   |   |-- env
    |   |   |   |   |-- ns
    |   |   |   |   |-- status
    |   |   |   |   |-- stderr
    |   |   |   |   |-- stdin
    |   |   |   |   |-- stdio
    |   |   |   |   |-- stdout
    |   |   |   |   `-- wait
    |   |   |   |-- args
    |   |   |   |-- ctl
    |   |   |   |-- env
    |   |   |   |-- ns
    |   |   |   |-- status
    |   |   |   |-- stderr
    |   |   |   |-- stdin
    |   |   |   |-- stdio
    |   |   |   |-- stdout
    |   |   |   `-- wait
    |   |   |-- args
    |   |   |-- ctl
    |   |   |-- env
    |   |   |-- ns
    |   |   |-- status
    |   |   |-- stderr
    |   |   |-- stdin
    |   |   |-- stdio
    |   |   |-- stdout
    |   |   `-- wait
    |   |-- 1
    |   |   |-- 0
    |   |   |   |-- args
    |   |   |   |-- ctl
    |   |   |   |-- env
    |   |   |   |-- ns
    |   |   |   |-- status
    |   |   |   |-- stderr
    |   |   |   |-- stdin
    |   |   |   |-- stdio
    |   |   |   |-- stdout
    |   |   |   `-- wait
    |   |   |-- 1
    |   |   |   |-- args
    |   |   |   |-- ctl
    |   |   |   |-- env
    |   |   |   |-- ns
    |   |   |   |-- status
    |   |   |   |-- stderr
    |   |   |   |-- stdin
    |   |   |   |-- stdio
    |   |   |   |-- stdout
    |   |   |   `-- wait
    |   |   |-- args
    |   |   |-- ctl
    |   |   |-- env
    |   |   |-- ns
    |   |   |-- status
    |   |   |-- stderr
    |   |   |-- stdin
    |   |   |-- stdio
    |   |   |-- stdout
    |   |   `-- wait
    |   |-- args
    |   |-- ctl
    |   |-- env
    |   |-- ns
    |   |-- status
    |   |-- stderr
    |   |-- stdin
    |   |-- stdio
    |   |-- stdout
    |   `-- wait
    |-- arch [error opening dir]
    |-- clone
    |-- env [error opening dir]
    |-- fs [error opening dir]
    |-- ns [error opening dir]
    `-- status [error opening dir]

14 directories, 81 files

The root tree had two children, one at /home/pravin/inferno/pravin/inferno-rtasks and other at /home/pravin/inferno/pravin2/hg. The first child had it's own child running at location /home/pravin/inferno/layer3/hg. As you can see, actual execution was done by leaf taskfs only. Internal taskfs worked only for delegating the work and aggregating the output. Above tree will help visualizing the delegation of work.

Note : Above shown tree is available only when user is still holding the the root clone file open. As long as this file is closed. The entire tree representing the remote resources is released. Following is the tree when ./cloneUserPause program terminates.


$ tree mpoint
mpoint
`-- remote
    |-- 0
    |   |-- args
    |   |-- ctl
    |   |-- env
    |   |-- ns
    |   |-- status
    |   |-- stderr
    |   |-- stdin
    |   |-- stdio
    |   |-- stdout
    |   `-- wait
    |-- arch [error opening dir]
    |-- clone
    |-- env [error opening dir]
    |-- fs [error opening dir]
    |-- ns [error opening dir]
    `-- status [error opening dir]

7 directories, 11 files

Tuesday, February 16, 2010

Supporting clone file semantics from Taskfs

This is another blogpost describing something that is already implemented, but not documented properly. So this blogpost is dedicated to explaining the taskfs support for clone file semantics. There are many bugfixes and optimizations in the revision 6feb9d0431 which is referred in this post, but those are not of main concern here. This revision is also tagged as v0.0.1 and is hopefully stable.

Lets start with what is clone file semantics? It means, when you read from clone file, it gives the name of resource which is allocated for current session. It also converts the clone file to the ctl file of that session resource. After this read, you are supposed to treat the clone file channel/descriptor as ctl file channel/descriptor. Whenever this descriptor is closed, taskfs will assume that this session is no longer needed by user, and it will free the remote resources. It will aslo release the taskfs session for future session requests. The main reason for this behavior is that Taskfs can easily reclaim the resources and do the garbage collection once clone file descriptor is closed.

This semantic of releasing the resources when clone and ctl files are closed, is little unnatural to the POSIX and command prompt users. This also means that you can't use input/output redirections and cat commands from shell on these files, as shell will automatically close these files once that command is complete. For example, lets see the following command.


$ cd mpoint/remote
$ cat clone
0
$ cat clone
0
$

Here, we requested for new taskfs session two times, and it returned the same session both times. This may look erroneous from POSIX users, but you should realize that cat clone reads and shows the resource reserved by taskfs and then closes the clone file. This close triggers the release of taskfs session resource (in this case resource 0). This released resource is again allocated to the next user requesting it by opening and reading from clone.

Now lets see one more example of doing writes on ctl file of taskfs session resource.


cat clone
0
$ cd 0
$ echo res 4 > ctl

This command will write res 4 to ctl file and close the file after it. As a result of that, taskfs will allocate 4 remote resources and release them after receiving the close call. This is not desired behavior for command line users. The take home message here is Don't use input redirections and cat commands on clone and ctl files.

The proper way (ie. clone filesystem way) to use this filesystem is

Open clone file.

Read the resource name from clone file.

Write res n into same clone file descriptor, where n is number of resources needed.

Write exec cmd into same clone file descriptor, where cmd is command to be executed.

Open stdio file in write mode, write the input data into this file descriptor and the close this descriptor once all input is written.

Open stdio file in read mode, read the data from this file descriptor and the close this descriptor once all data is read. This is the Standard Output of your command.

Open stderr file in read mode, read the data from this file descriptor and the close this descriptor once all data is read. This is the Standard Error of your command.

Close the clone file descriptor. This will automatically release all the resources allocated for this session of taskfs including remote resources reserved by res n.

Here is the small C program cloneUser.c I wrote to do all above steps.

Now, lets see the example run, showing how it can be used.


$ 9pfuse localhost:5555 mpoint
$
$ tree mpoint
mpoint
`-- remote
    |-- arch [error opening dir]
    |-- clone
    |-- env [error opening dir]
    |-- fs [error opening dir]
    |-- ns [error opening dir]
    `-- status [error opening dir]

6 directories, 1 file
$
$ ./cloneUser mpoint/remote/clone 4 "hostname"
  [[DEBUG]] command is [hostname] and input file [(null)]
  [[DEBUG]] clone returned [0]
  [[DEBUG]] path to ctl file is [mpoint/remote//0/ctl]
  [[DEBUG]] Writing command [res 4]
  [[DEBUG]] Writing command [exec hostname]
  [[DEBUG]] input file path [mpoint/remote//0/stdio]
  [[DEBUG]] opening [mpoint/remote//0/stdio] for reading output
BlackPearl
inferno-test
inferno-test
BlackPearl
$
$ tree mpoint
mpoint
`-- remote
    |-- 0
    |   |-- args
    |   |-- ctl
    |   |-- env
    |   |-- ns
    |   |-- status
    |   |-- stderr
    |   |-- stdin
    |   |-- stdio
    |   |-- stdout
    |   `-- wait
    |-- arch [error opening dir]
    |-- clone
    |-- env [error opening dir]
    |-- fs [error opening dir]
    |-- ns [error opening dir]
    `-- status [error opening dir]

7 directories, 11 files
$
$ ./cloneUser mpoint/remote/clone 4 "wc -l" cloneUser.c
  [[DEBUG]] command is [wc -l] and input file [cloneUser.c]
  [[DEBUG]] clone returned [0]
  [[DEBUG]] path to ctl file is [mpoint/remote//0/ctl]
  [[DEBUG]] Writing command [res 4]
  [[DEBUG]] Writing command [exec wc -l]
  [[DEBUG]] input file path [mpoint/remote//0/stdio]
  [[DEBUG]] opening [mpoint/remote//0/stdio] for reading output
192
192
192
192
$
$ tree mpoint
mpoint
`-- remote
    |-- 0
    |   |-- args
    |   |-- ctl
    |   |-- env
    |   |-- ns
    |   |-- status
    |   |-- stderr
    |   |-- stdin
    |   |-- stdio
    |   |-- stdout
    |   `-- wait
    |-- arch [error opening dir]
    |-- clone
    |-- env [error opening dir]
    |-- fs [error opening dir]
    |-- ns [error opening dir]
    `-- status [error opening dir]

7 directories, 11 files
$

In above test run, the cloneUser.c is the program mentioned above. In first execution of cloneUser program, 4 remote nodes are used to execute the the hostname command. You can observe that after termination of cloneUser, all remote resources are released because of which they are not visible in directory structure. Second invocation demonstrates the use of input file with cloneUser. This invocation sends the cloneuser.c file as input to the wc -l program which returns the number of lines in this code. All four remote resources report correct line-count showing expected behavior. It can be also observed that second execution of cloneUser reuses the same taskfs session 0 which was released after completion of first execution of cloneUser. So, this test run is good example of how resource reclamation works in taskfs.

Many to many support for Taskfs

This blog on this functionality was pending for some time. Finally posting it today. Hopefully it will create the base for understanding next functionality on which work is already started.

Lets start with what is many-to-many in this context. In one-to-many taskfs, once divided, these sub-tasks are not aware of presence of other subtasks. So they can't communicate with each other. Many-to-many taskfs provides a way by which subtasks can communicate with each other. This is achieved by binding all remote resources in the directory of current session of taskfs. Each subtask will be running into one of these pseudo subdirectories (which actually are compute resources on some remote nodes). In this setup, all subtasks can assume that all other subdirectories are other subtasks, and can directly communicate with these neighboring subdirectories without asking parent about actual locations of these resources.

Now, lets see how exactly it is implemented. The trick used here is whenever session in taskfs allocate new remote resource, it will bind this new resource directory as one of the sub-directory within current session directory. This also simplifies read and write aggregations. Reads and writes on taskfs session directory will just pass that read/write to corresponding file in all subdirectories. Taskfs does not bother to remember where exactly are the remote resources as this information is automatically encoded by bindings between subdirectories and remote resource directories.

Now, lets see the example of it. Following test-run assumes that there are remote deployments of inferno which are exporting devcmd2, and client inferno instance has mounted those exported remote taskfs on /remote/. This client inferno is also assumed to export the taskfs for others to use. Following are the commands executed on client inferno host.


styxlisten -A tcp!*!5555 export /task
mount -A tcp!9.3.61.180!6666 /remote/host1
mount -A tcp!9.3.61.180!6667 /remote/host2
mount -A tcp!127.0.0.1!6666 /remote/host3

Now, following was executed on Linux shell by mounting the taskfs exported by above inferno client. $ represents Linux shell prompt.


$ 9pfuse localhost:5555 mpoint
$ tree mpoint
mpoint
`-- remote
    |-- arch [error opening dir]
    |-- clone
    |-- env [error opening dir]
    |-- fs [error opening dir]
    |-- ns [error opening dir]
    `-- status [error opening dir]
$ cd mpoint/remote
$ cat clone
0
$ cd 0
$ tree ../
../
|-- 0
|   |-- args
|   |-- ctl
|   |-- env
|   |-- ns
|   |-- status
|   |-- stderr
|   |-- stdin
|   |-- stdio
|   |-- stdout
|   `-- wait
|-- arch [error opening dir]
|-- clone
|-- env [error opening dir]
|-- fs [error opening dir]
|-- ns [error opening dir]
`-- status [error opening dir]

6 directories, 11 files

$ echo res 4 > ctl
$ tree ../
../
|-- 0
|   |-- 0
|   |   |-- args
|   |   |-- ctl
|   |   |-- env
|   |   |-- ns
|   |   |-- status
|   |   |-- stderr
|   |   |-- stdin
|   |   |-- stdio
|   |   |-- stdout
|   |   `-- wait
|   |-- 1
|   |   |-- args
|   |   |-- ctl
|   |   |-- env
|   |   |-- ns
|   |   |-- status
|   |   |-- stderr
|   |   |-- stdin
|   |   |-- stdio
|   |   |-- stdout
|   |   `-- wait
|   |-- 2
|   |   |-- args
|   |   |-- ctl
|   |   |-- env
|   |   |-- ns
|   |   |-- status
|   |   |-- stderr
|   |   |-- stdin
|   |   |-- stdio
|   |   |-- stdout
|   |   `-- wait
|   |-- 3
|   |   |-- args
|   |   |-- ctl
|   |   |-- env
|   |   |-- ns
|   |   |-- status
|   |   |-- stderr
|   |   |-- stdin
|   |   |-- stdio
|   |   |-- stdout
|   |   `-- wait
|   |-- args
|   |-- ctl
|   |-- env
|   |-- ns
|   |-- status
|   |-- stderr
|   |-- stdin
|   |-- stdio
|   |-- stdout
|   `-- wait
|-- arch [error opening dir]
|-- clone
|-- env [error opening dir]
|-- fs [error opening dir]
|-- ns [error opening dir]
`-- status [error opening dir]

10 directories, 51 files
$ cd 3
$ pwd
/home/pravin/projects/inferno/mpoint/remote/0/3
$ ls -l
total 0
--w--w--w- 1 pravin pravin 0 2010-01-14 17:12 args
-rw-rw---- 1 pravin pravin 0 2010-01-14 17:12 ctl
--w--w--w- 1 pravin pravin 0 2010-01-14 17:12 env
-rw-rw---- 1 pravin pravin 0 2010-01-14 17:12 ns
-r--r--r-- 1 pravin pravin 0 2010-01-14 17:12 status
-r--r--r-- 1 pravin pravin 0 2010-01-14 17:12 stderr
--w--w--w- 1 pravin pravin 0 2010-01-14 17:12 stdin
-rw-rw---- 1 pravin pravin 0 2010-01-14 17:12 stdio
-r--r--r-- 1 pravin pravin 0 2010-01-14 17:12 stdout
-r--r--r-- 1 pravin pravin 0 2010-01-14 17:12 wait
$ cat status
cmd/1 1 Closed /home/pravin/projects/inferno/ericvh//ericvh-brasil/ ''
$ cd ..
$ pwd
/home/pravin/projects/inferno/mpoint/remote/0
$ cat status
cmd/0 1 Closed /home/pravin/projects/inferno/ericvh//ericvh-brasil/ ''
cmd/0 1 Closed /home/pravin/inferno/host2//ericvh-brasil/ ''
cmd/0 1 Closed /home/pravin/inferno/ericvh//ericvh-brasil/ ''
cmd/1 1 Closed /home/pravin/projects/inferno/ericvh//ericvh-brasil/ ''
$ echo exec hostname > ctl
$ cat stdio
BlackPearl
inferno-test
inferno-test
BlackPearl
$ cat stderr
$ cat status
cmd/0 5 Done /home/pravin/projects/inferno/ericvh//ericvh-brasil/ hostname
cmd/0 5 Done /home/pravin/inferno/host2//ericvh-brasil/ hostname
cmd/0 5 Done /home/pravin/inferno/ericvh//ericvh-brasil/ hostname
cmd/1 5 Done /home/pravin/projects/inferno/ericvh//ericvh-brasil/ hostname
$ cd ../../..
$ pwd
/home/pravin/projects/inferno
$ fusermount -u mpoint
$ ls mpoint
$

Now, lets see what exactly happened above.

The taskfs was mounted with 9pfuse.

The directory structure of initial taskfs was shown with tree command.

Changed the directory in side taskfs by cd mpoint/remote

taskfs sesion was created by cat clone command.

Changed the directory to taskfs session. in this case it iscd 0

Directory structure of taskfs after session creation is examined. You can see that there are no subdirectories to the session directory, which indicate that there are no remote resources allocated.

Remote resources were allocated by echo res 4 > ctl command.

Directory structure of taskfs after resource allocation is examined. You can see that there are four subdirectories, one for each remote resource.

Changed the directory to one of the subdirectory representing remote resource, and operated on it independently. Checked the status of that resource using cat status.

Checked the status of taskfs session by doing cat status on session directory. You can see that it shows the status of all remote resources.

Executed the command on taskfs session by doing echo exec hostname > ctl

Checked the output, stderr and status after execution. You can see that there are two different hostnames in those 4 outputs.

Came out of mounted directory, and unmounted it.

This functionality of taskfs is available in revision d125d391c1.

Friday, January 29, 2010

Using taskfs from Linux.

The plan9port is allowing using synthetic inferno filesystems from Linux systems. I have used this to simplify testing of inferno setup. This blog will explain how this test-setup is created.

Firstly, you will need to install plan9port by downloading plan9port.tgz. Detailed installation instructions are available here , but I will repeat here what I have done for this.


mkdir for_plan9
cd for_plan9
wget http://swtch.com/plan9port/plan9port.tgz
tar -xvf plan9Port.tgz
cd plan9
./INSTALL

You will need to add few paths related to plan9port to your path. Once that is done, you can mount exported inferno filesystem using 9pfuse. Once mounted, these files can be manipulated using simple linux tools. For unmounting these filesystem you can use fusermount with -u option.

Now, for test purposes, I have two inferno instances running on inferno-test host using separate port numbers, and one is running locally. For automatically mounting these remote ends and export local taskfs startup script of inferno is used. This startup script option works well with Linux emu, but may not work on other platform, so check before using. It can be used as emu -v . Following is the script which starts my inferno with taskfs.


$ cat runInfernoScript.sh 
#!/bin/bash
BASEDIR=/home/pravin/projects/inferno/pravin/
INSTALL_DIR=inferno-rtasks
$BASEDIR/$INSTALL_DIR/Linux/386/bin/emu -r $BASEDIR/$INSTALL_DIR/  -v /dis/sh.dis /dis/rmount.sh

here, /dis/sh.dis is responsible for inferno shell and /dis/rmount.sh is my script which does all the initialization I need. Here is the content of it.


echo exporting /task
styxlisten -A tcp!*!5555 export /task
echo Mounting 6666 on host1
mount -A tcp!9.3.61.180!6666 /remote/host1
echo Mounting 6667 on host2
mount -A tcp!9.3.61.180!6667 /remote/host2
echo Mounting from localhost:6666 on host3
mount -A tcp!127.0.0.1!6669 /remote/host3
echo Mounting from localhost:6667 on host4
mount -A tcp!127.0.0.1!6670 /remote/host4
pause # this will stop emu from quiting immediately.
echo You should never see this line in output.

Command pause on second last line makes sure that emu will not terminate after executing startup script.

Now, lets go to Linux shell, and execute following script.


#!/bin/bash
set -e
cmd='hostname'
9pfuse localhost:5555 mpoint
cd mpoint/remote
res=`cat clone`
echo "Taskfs resource allocated is $res" 
cd $res
echo res 4 > ctl
echo exec $cmd > ctl
echo "Showing output bellow --------------------"
cat stdio

echo "Showing stderr bellow --------------------"
cat stderr


echo "Showing status bellow --------------------"
cat status

cd ..
cd ..
cd ..
fusermount -u mpoint
echo "done -------------------------------------"

Above code does following.

Mount taskfs using 9pfuse.

Reserve 4 remote resources.

Execute hostname command on them.

Get the output from stdio.

Show any errors from stderr.

Show the information related to the status of all remote nodes.

Release taskfs by unmounting it using fusermount.

And here is the output of running this script.


$ ./testRun.sh 
Taskfs resource allocated is 0
Showing output bellow --------------------
inferno-test
BlackPearl
inferno-test
inferno-test
Showing stderr bellow --------------------
Showing status bellow --------------------
cmd/0 5 Done /home/pravin/inferno/host2//ericvh-brasil/ hostname
cmd/1 5 Done /home/pravin/projects/inferno/ericvh//ericvh-brasil/ hostname
cmd/1 5 Done /home/pravin/inferno/ericvh//ericvh-brasil/ hostname
cmd/1 5 Done /home/pravin/inferno/host2//ericvh-brasil/ hostname
done -------------------------------------

You can see from the output that, out of 4 remote resources requested, 3 requests went to two inferno instances running on same host and one was allocated to single inferno instance running on other host. Only one instance was used twice, all other inferno instances were used only once.

So, this is first working proof that taskfs can be used to distribute jobs to multiple nocdes. And this can be done simply by mounting taskfs and using it directly.

Thursday, January 28, 2010

Taskfs with one-to-many support

As promised the 123f4c23dd has support for one to many task execution. In simple words, it means that if you mount multiple remote resources, any command issued will be executed on all remote resources with input provided to all of those nodes. The stdio, stderr and status files will provide contents from all remote locations in merged format.

At implementation level, once the resources are reserved, any write command on taskfs resource is repeated on corresponding files on all remote nodes. Similarly reads on taskfs files propagated with data from all remote node files.

Following are two scenarios of one to many taskfs usages.


; cd /task/remote
; cat clone
0;
; cd 0
; cat status
cat: read error: No remote reservation done
; mount -A tcp!127.0.0.1!6666 /remote/host1
; echo res 2 > ctl
; cat status
cmd/0 1 Closed /home/pravin/projects/inferno/ericvh//ericvh-brasil/ ''
cmd/1 1 Closed /home/pravin/projects/inferno/ericvh//ericvh-brasil/ ''
; echo exec pwd > ctl
; cat stdio
/home/pravin/projects/inferno/ericvh/ericvh-brasil
/home/pravin/projects/inferno/ericvh/ericvh-brasil

Above is the first scenario of reserving two remote resources when there is only one remote host is available, namely host1. Request for two resources is satisfied by creating two separate sessions on same remote host. As a result of this, output of pwd command is same on both sessions, which can be seen from contents of stdio.


; cd /task/remote
; cat clone
0;
; cd 0
; mount -A tcp!127.0.0.1!6666 /remote/host1
; mount -A tcp!127.0.0.1!6667 /remote/host2
; echo res 2 > ctl
; cat status
cmd/0 1 Closed /usr/share/lxr/source/inferno//ericvh-brasil/ ''
cmd/2 1 Closed /home/pravin/projects/inferno/ericvh//ericvh-brasil/ ''
; echo exec pwd > ctl
; cat stdio
/usr/share/lxr/source/inferno/ericvh-brasil
/home/pravin/projects/inferno/ericvh/ericvh-brasil
;

In this second scenario, there are two remote nodes. Namely host1 and host2. Reserving two remote resources end put reserving one session on each of this remote node. As a result, the output in stdio shows different result for both nodes.

Wednesday, January 27, 2010

Support for explicit reservation in taskfs

Here is the 4cff22b0ac revision which supports res command on ctl file. This involves few fundamental changes in how reservation is done. Remote reservation is moved from clone file read to ctl res command. The ctl exec command and stdio, status, stderr files are modified to to verify if remote resources are allocated or not before doing anything. If no remote resources are reserved, then No remote reservation done error is returned.

Currently, remote reservation of only one node is supported. Next revision of code will hopefully support multiple remote reservations and executions.

Following is the demo execution of how one can use this.


; cd /task/remote
; cat clone
0; cd 0
; echo exec pwd > ctl
echo: write error: No remote reservation done
; cat status
cat: read error: No remote reservation done
; echo res 1 > ctl
echo: write error: No remote resources available
; mount -A tcp!127.0.0.1!6666 /remote/host2
; echo res 1 > ctl
; echo exec pwd > ctl
; cat stdio
/home/pravin/projects/inferno/ericvh/ericvh-brasil
; cat status
cmd/0 3 Done /home/pravin/projects/inferno/ericvh//ericvh-brasil/ pwd
; cat stderr
;

Above output shows how ctl exec command and status behaves
before and after ctl res command.