2012年11月15日 星期四
KVM with LVM snapshot
一般來說, 如果使用 KVM來進行VM的管理,多半採用
virsh snapshot-create (VM name)
virsh snapshot-create-as (VM name) (snapshot name)
virsh snapshot-list (VM name) #查到snapshot name
virsh snapshot-revert (VM name) (snapshot name)
但這種方式,只適用在qcow2的vm image上
若VM的disk 使用raw type, 那便沒辦法了
於是, LVM的idea一般人都會想到, 利用 LVM的logical volumn當成VM raw disk
再利用snapshot 搭配 rollback(lvconvert --merge),可以進行類似KVM snapshot功能
step 1)先在Host的VG建立一個LV
lvcreate -L 100G -n lv_test /dev/VolGroup
step 2)建立一個VM
virt-install -n vm_test -r 1024 --vcpus=1 --disk path=/dev/VolGroup/lv_test .....(略)
安裝完成後,用root登入,然後將VM關掉
step 3)測試VM raw disk snapshot
take snapshots:
time line t0 -----> t1 -----> t2 -------> t3 ------> t4
take snapshot lv_test_ss1 lv_test_ss2
[action in VM] init 0 boot touch 1.txt boot rm *
rm A.txt touch 2.txt
init 0 init 0
[files at ~/.] A.txt B.txt 2.txt
B.txt 1.txt
rollback:
time line t5 -----> t6 -----> t7 -------> t8
rollback lv_test_ss1 lv_test_ss2
[action in VM] boot init 0 boot
[files at ~/.] 2.txt A.txt A.txt B.txt
B.txt B.txt 1.txt
(回到t0) (回到t2)
take snapshot lv_test_ss1)
lvcreate -L 50G -s -n lv_test_ss1 /dev/VolGroup/lv_test
take snapshot lv_test_ss2)
lvcreate -L 50G -s -n lv_test_ss2 /dev/VolGroup/lv_test
rollback to snapshot lv_test_ss1)
lvconvert --merge /dev/VolGroup/lv_test_ss1
rollback to snapshot lv_test_ss2)
lvconvert --merge /dev/VolGroup/lv_test_ss2
註:1)對於snapshot本身不能再對它取snapshot,也就是說,下面是無效指令
lvcreate -L 50G -s -n lv_test_xx /dev/VolGroup/lv_test_ss1
2)上例來看,rollback 至 lv_test_ss1時, 檔案回復至當時最原始的A.txt以及B.txt
rollback 至 lv_test_ss2時, 檔案回復至剛做lv_test_ss2前的B.txt以及1.txt
3)看來在進行snapshot rollback時, 對於中間的snapshot的內容,也會一一進行追蹤
不會導致檔案的遺失. 例如上例中, 在rollback至lv_test_ss1時,早已沒有了1.txt
但在進行lv_test_ss2 rollback時,卻還是正確的檔案數量(也沒受t4 rm *的影響)
4)lvconvert --merge似乎是RHEL/CentOS 6之後才有提供
5)如果執行 lvconvert --merge出現warning "Can't merge over open origin volume"時,那你可能會需要對LV做 deactivate and activate的動作
lvchange -an /dev/VolGroup/lv_test ##deactivate
lvchange -ay /dev/VolGroup/lv_test ##activate
6)一般如果你直接對snapshot做lvremove的動作, 那所有的異動都會進到原本的LV
2012年11月9日 星期五
KVM -- create a virtual machine console displayed on vnc
prepare yum server in ftp://172.16.43.248/pub62/
and kickstart file @172.16.43.248:/var/www/html/pub62/ks.cfg
172.16.43.248 172.16.43.145 (qemu:///system)
+------------------+ +-----------------------+
|Yum Server | | Virtualization |
|ftp /pub62 | | Virtualization Client |
|http /pub62/ks.cfg| | |
| | | VNCSERVERS "50:root" |
| | | |
|------------------| | 172.16.43.149 |
| | | +------------+ | vnc: 172.16.43.145:0
| | | | cent6 +--= br0 (BRIDGE eth0)
| | | +------------+ |
| | | |
| | | dhcp |
| | | +------------+ | vnc: 172.16.43.145:1
| | | |centNAT +--= virbr0 (NAT)
| | | +------------+ |
+-------+----------+ +------+----------------+
| | eth0 vnc: 172.16.43.145:50
--------+-----------------------+--------------------------+ vnc client accessable
[root@localhost ~]# yum install tigervnc-server
[root@localhost ~]# vi /etc/sysconfig/vncservers
:
:
# VNCSERVERS="2:myusername"
# VNCSERVERARGS[2]="-geometry 800x600 -nolisten tcp -localhost"
VNCSERVERS="50:root"
[root@localhost ~]# vncpasswd
[root@localhost ~]# vncserver
New 'localhost.localdomain:50 (root)' desktop is localhost.localdomain:50
Starting applications specified in /root/.vnc/xstartup
Log file is /root/.vnc/localhost.localdomain:50.log
[root@localhost ~]# vncserver -kill :50
[root@localhost ~]# service vncserver start
[root@localhost ~]# chkconfig vncserver on
[root@localhost ~]# yum groupinstall 'Virtualization' 'Virtualization Client'
[root@localhost ~]# lsmod | grep kvm
kvm_intel 50380 6
kvm 305081 1 kvm_intel
[root@localhost ~]# service NetworkManager stop
[root@localhost ~]# chkconfig NetworkManager off
<<<<<<<<<< BRIDGE >>>>>>>>>>
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE="eth0"
HWADDR="68:05:CA:08:46:F0"
#NM_CONTROLLED="yes"
ONBOOT="yes"
BOOTPROTO=static
BRIDGE=br0
#IPADDR=172.16.43.145
#NETMAS=255.255.255.0
#GATEWAY=172.16.43.1
[root@localhost ~]# cat /etc/sysconfig/network-scripts/ifcfg-br0
DEVICE=br0
ONBOOT=yes
BOOTPROTO=static
IPADDR=172.16.43.145
NETMASK=255.255.255.0
GATEWAY=172.16.43.1
TYPE=Bridge
USERCTL=yes
NAME="Bridge eth0"
[root@localhost ~]# lv_create -L 100G -n lv_test VolGroup
[root@localhost ~]# cat rhel-install4.sh
#!/bin/bash
virsh destroy cent6 &> /dev/null
##vnc的port如果不指定,可避免同時安裝多台VM的listen port衝突,port便由系統配發
##vnc的配置也可由--vnc(預設便是listen 127.0.0.1) 取代--graphics vnc,...
/usr/sbin/virt-install -n cent6 -r 1024 --vcpus=1 --accelerate --disk path=/dev/VolGroup/lv_test --network bridge:br0 -x "ks=http://172.16.43.248/pub62/ks.cfg ksdevice=eth0 ip=172.16.43.149 netmask=255.255.255.0 dns=168.95.1.1 gateway=172.16.43.1" -l ftp://172.16.43.248/pub62/ --graphics vnc[,port=5900],listen=0.0.0.0 &
sleep 5
virsh autostart cent6
<<<<<<<<<< NAT >>>>>>>>>>
[root@localhost ~]# cat note-NAT
[root@localhost ~]# lvcreate -L 100G -n lv_test2 VolGroup
[root@localhost ~]# vi /etc/sysctl.conf
# Kernel sysctl configuration file for Red Hat Linux
#
# For binary values, 0 is disabled, 1 is enabled. See sysctl(8) and
# sysctl.conf(5) for more details.
# Controls IP packet forwarding
net.ipv4.ip_forward = 1
[root@localhost ~]# sysctl -p
net.ipv4.ip_forward = 1
:
[root@localhost ~]# service iptables status
iptables: Firewall is not running.
[root@localhost ~]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
Chain FORWARD (policy ACCEPT)
target prot opt source destination
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
[root@localhost ~]# service libvirtd restart
Stopping libvirtd daemon: [ OK ]
Starting libvirtd daemon: [ OK ]
[root@localhost ~]# iptables -L
Chain INPUT (policy ACCEPT)
target prot opt source destination
ACCEPT udp -- anywhere anywhere udp dpt:domain
ACCEPT tcp -- anywhere anywhere tcp dpt:domain
ACCEPT udp -- anywhere anywhere udp dpt:bootps
ACCEPT tcp -- anywhere anywhere tcp dpt:bootps
Chain FORWARD (policy ACCEPT)
target prot opt source destination
ACCEPT all -- anywhere 192.168.122.0/24 state RELATED,ESTABLISHED
ACCEPT all -- 192.168.122.0/24 anywhere
ACCEPT all -- anywhere anywhere
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
REJECT all -- anywhere anywhere reject-with icmp-port-unreachable
Chain OUTPUT (policy ACCEPT)
target prot opt source destination
[root@localhost ~]# cat ./rhel-install5.sh
#!/bin/bash
virsh destroy centNAT &> /dev/null
/usr/sbin/virt-install -n centNAT -r 1024 --vcpus=1 --accelerate --disk path=/dev/VolGroup/lv_test2 --network network:default -x "ks=http://172.16.43.248/pub62/ks.cfg ip=dhcp dns=168.95.1.1 " -l ftp://172.16.43.248/pub62/ --graphics vnc,port=5901,listen=0.0.0.0 &
sleep 5
virsh autostart centNAT
If you found out that ksmd was run with high CPU% please do this:
service ksm stop
service ksmtuned stop
refer to : https://bugzilla.redhat.com/show_bug.cgi?id=541230
2012年11月6日 星期二
基於mpich2進行雲端運算實例
近來為了一個實驗性的架構,以支應明年進行服務擴充而準備。選定了以Hadoop+Hypertable為基底的資料服務;並選用了MPICH2+Torque來進行分散式運算的工作。在這裡先不論其好壞(日後再論述),先進行實作之探討:
主機架構
選用的Hypertable+Hadoop在先前的文章裡已有描述其安裝與建置步驟;而MPICH2+Torque的建置過程也在可以這裡找到。建置過程,礙於成本上考量,每一台主機皆是以虛擬機來達成多台的部署。在MPICH2+Torque的架構上,選用2台電腦做為分散式計算的compute nodes,其中一台為主要分散式運算服務的發送指令主機(Head node)
在這個案例裡,利用Client Agent(cagent)在compute node上來當作服務客戶要求的程序(process),採multiprocess方式運作,每當一個用戶端連線時,隨即在compute node上fork出一個程式來服務它。後續再依用戶端的要求,對2台compute node進行相對應服務的取得(分散運算或雲端運算)
上圖,即是針對上述的概念所進行一項實作內容的雛形架構。服務用戶端的程式為cagent(Client Agent),而用戶端連線後,想要取得計算移動平均線的服務mv_avg ( moving average)。 從流程的概念上分述如下:
(1)用戶端以TCP/IP對Client Agent取得連線的同意後,Client Agent於是fork另一程式出來進行單獨服務
(2)Client Agent依用戶請求,應取得移動平均之計算結果(mv_avg)。Client Agent依此,向內部的運算節點取得運要之資源(此例,透過mpiexec將mv_avg程序分成多節點計算)
(3)mv_avg以MPICH2為基,可以分別提供3種計算方式,(A)單節點運算方式,(B)雙節點運算方式(即將工作分派于2個節點平均計算),(C)多節點計算方式(即以第0節點來發號施令,其餘節點依其派于工作進行計算)
(4)各節點交付計算任務後(計算5、10、15、20、30、40、60、80、120、180、240)等移動平均計算,各計算節點所需之資料來源,乃是透Hypertable Thrift API向Hypertable取得,各計算計後將結果交付第0節點
(5)第0節點依計算結果透過Client Agent,一一回覆給用戶端
以下為Mpi工作分配的主要源碼:
/*
mode-1
[single node] --+
^ | task id == 0 , do it all
| |
+----+
mode-2
do #0,2,4...tasks
[taskid == 0] ---+ +---[taskid == 1]
^ ^ | | do #1,3,5.... tasks
| | | |
| +----+ |
+-----------+
mode-3
[taskid == 0] +--->[taskid == 1]
^ ^ ^ |
| | +-----------+ +->[taskid == 2]
dispatch| +---------------+
task # |
+------------------->[taskid == n]
finish task
and
reply data [seq][sendbuff=data1,data2,data3]...
*/
if(ntasks == 1) /*mode-1, single node run all tasks*/
{
int i;
for (i=taskid; i >sizeofar(mv_avg_para);i+=ntasks ) {
if(argc == 2)
GetCandleDataDefault(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb); /*from Hypertable*/
else
GetCandleDataPeriod(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb, start_date,end_date); /*from Hypertable*/
send_channel( channel, sendbuff , realsize,i);
}
}
else if(ntasks == 2) /*mode-2 , 2 nodes share all tasks*/
{
int wait=1;
if( taskid == 0)
{
int i,j,count, source;
for(i=0 ; i <= sizeofar(mv_avg_para) ;i++){
task2do[i]= TASK_INPIT;
}
i=taskid;
while(wait)
{
int flag;
MPI_Iprobe(MPI_ANY_SOURCE, REPLY, MPI_COMM_WORLD,&flag, &status);
if(flag) /*data received*/
{
source = status.MPI_SOURCE;
MPI_Get_count(&status, MPI_LONG, &count);
count = count > baserec ? baserec : count;
MPI_Recv(inptbuff, count, MPI_LONG, source, REPLY, MPI_COMM_WORLD, &status);
task2do[inptbuff[0] ]=TASK_DONE;
send_channel( channel, &inptbuff[1] , count-1,inptbuff[0]);
}
if ( i < sizeofar(mv_avg_para) ) {
if(task2do[i ]==TASK_INPIT)
{
task2do[i ]=TASK_RUN;
if(argc == 2)
GetCandleDataDefault(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb); /*from Hypertable*/
else
GetCandleDataPeriod(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb, start_date,end_date); /*from Hypertable*/
send_channel( channel, sendbuff , realsize,i);
task2do[i ]=TASK_DONE;
i+=ntasks;
}
}
wait=0;
for (j=0; j < sizeofar(mv_avg_para);j++ ) {
wait+=task2do[j];
}
}
}
else /*taskid == 1*/
{
int i;
for (i=taskid; i < sizeofar(mv_avg_para);i+=ntasks ) {
if(argc == 2)
GetCandleDataDefault(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb); /*from Hypertable*/
else
GetCandleDataPeriod(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb, start_date,end_date); /*from Hypertable*/
outpbuff[0] = i;
if(realsize > 0)
{
MPI_Send( outpbuff, realsize+1, MPI_LONG, 0, REPLY, MPI_COMM_WORLD );
}
}
}
}
else /*mode-3, multiple nodes tasks ,ntasks > 2*/
{
int i,ierr,j,count;
int wait=1;
if(taskid==0)
{
for(i=0 ; i <= sizeofar(mv_avg_para) ;i++){
task2do[i]= TASK_INPIT;
}
/*first dispatch task to nodes*/
for(i=1,j=0 ; i < ntasks && j < sizeofar(mv_avg_para) ;i++,j++){
ierr=MPI_Send(&j,1,MPI_INT,
i,REQUEST,MPI_COMM_WORLD);
task2do[j]= TASK_RUN;
}
/*receiving result , and assign next task*/
while (wait) {
int pos;
MPI_Recv( inptbuff, baserec+1, MPI_LONG, MPI_ANY_SOURCE, REPLY, MPI_COMM_WORLD, &status );
MPI_Get_count(&status, MPI_LONG, &count);
pos = inptbuff[0];
send_channel( channel, &inptbuff[1] , count-1,inptbuff[0]);
task2do[ pos ]=TASK_DONE;
wait=0;
for (i=0; i < sizeofar(mv_avg_para);i++ ) {
if(task2do[i] == TASK_INPIT)
{
/*dispatch next task*/
ierr=MPI_Send(&i,1,MPI_INT,
status.MPI_SOURCE ,REQUEST,MPI_COMM_WORLD);
task2do[i]= TASK_RUN;
}
wait+=task2do[i];
}
}
/*all task was done , send 'finish(==0)' to all nodes*/
for(i=1,j=0 ; i < ntasks && j < sizeofar(mv_avg_para) ;i++,j++){
int noCalc=-1;
ierr=MPI_Send(&noCalc,1,MPI_INT,
i,REQUEST,MPI_COMM_WORLD); /*all works done!!*/
}
}
else
{
int mv_avg_pos=0;
MPI_Recv( &mv_avg_pos, 1, MPI_INT, 0, REQUEST,
MPI_COMM_WORLD, &status );
while(mv_avg_pos >= 0)
{
if(argc == 2)
GetCandleDataDefault(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[mv_avg_pos] ,symb); /*from Hypertable*/
else
GetCandleDataPeriod(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[mv_avg_pos] ,symb, start_date,end_date); /*from Hypertable*/
outpbuff[0] = mv_avg_pos;
if(realsize > 0)
{
MPI_Send( outpbuff, realsize+1, MPI_LONG, 0, REPLY, MPI_COMM_WORLD );
}
MPI_Recv( &mv_avg_pos, 1, MPI_INT, 0, REQUEST,
MPI_COMM_WORLD, &status );
}
}
}
MPI_Barrier(MPI_COMM_WORLD);
依上述的計算結果
(1)Hypertable 僅有300筆的情況下的簡單計算
mode-1:1.397秒 (1 process 1 nodes)
mode-2:1.296秒 (2 processes 2 nodes)
(2)模擬各節點需,每個round(每計算一次移動平均)需要耗時1秒以上複雜計算時的結果
mode-1:11.015秒 (1 process 1 nodes)
mode-2:8.015秒(2 processes 2 nodes)
平台可再擴充之部分
(1)配合PBS進行批次作業之工作分派,以及工作之監控
(2)配合PBS對於常用之Service Process採常駐形式進行服務
(3)用戶與Client Agent間可以用多種介面支援XML, Json並輔以資料加密與壓縮
主機架構
選用的Hypertable+Hadoop在先前的文章裡已有描述其安裝與建置步驟;而MPICH2+Torque的建置過程也在可以這裡找到。建置過程,礙於成本上考量,每一台主機皆是以虛擬機來達成多台的部署。在MPICH2+Torque的架構上,選用2台電腦做為分散式計算的compute nodes,其中一台為主要分散式運算服務的發送指令主機(Head node)
在這個案例裡,利用Client Agent(cagent)在compute node上來當作服務客戶要求的程序(process),採multiprocess方式運作,每當一個用戶端連線時,隨即在compute node上fork出一個程式來服務它。後續再依用戶端的要求,對2台compute node進行相對應服務的取得(分散運算或雲端運算)
上圖,即是針對上述的概念所進行一項實作內容的雛形架構。服務用戶端的程式為cagent(Client Agent),而用戶端連線後,想要取得計算移動平均線的服務mv_avg ( moving average)。 從流程的概念上分述如下:
(1)用戶端以TCP/IP對Client Agent取得連線的同意後,Client Agent於是fork另一程式出來進行單獨服務
(2)Client Agent依用戶請求,應取得移動平均之計算結果(mv_avg)。Client Agent依此,向內部的運算節點取得運要之資源(此例,透過mpiexec將mv_avg程序分成多節點計算)
(3)mv_avg以MPICH2為基,可以分別提供3種計算方式,(A)單節點運算方式,(B)雙節點運算方式(即將工作分派于2個節點平均計算),(C)多節點計算方式(即以第0節點來發號施令,其餘節點依其派于工作進行計算)
(4)各節點交付計算任務後(計算5、10、15、20、30、40、60、80、120、180、240)等移動平均計算,各計算節點所需之資料來源,乃是透Hypertable Thrift API向Hypertable取得,各計算計後將結果交付第0節點
(5)第0節點依計算結果透過Client Agent,一一回覆給用戶端
以下為Mpi工作分配的主要源碼:
/*
mode-1
[single node] --+
^ | task id == 0 , do it all
| |
+----+
mode-2
do #0,2,4...tasks
[taskid == 0] ---+ +---[taskid == 1]
^ ^ | | do #1,3,5.... tasks
| | | |
| +----+ |
+-----------+
mode-3
[taskid == 0] +--->[taskid == 1]
^ ^ ^ |
| | +-----------+ +->[taskid == 2]
dispatch| +---------------+
task # |
+------------------->[taskid == n]
finish task
and
reply data [seq][sendbuff=data1,data2,data3]...
*/
if(ntasks == 1) /*mode-1, single node run all tasks*/
{
int i;
for (i=taskid; i
if(argc == 2)
GetCandleDataDefault(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb); /*from Hypertable*/
else
GetCandleDataPeriod(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb, start_date,end_date); /*from Hypertable*/
send_channel( channel, sendbuff , realsize,i);
}
}
else if(ntasks == 2) /*mode-2 , 2 nodes share all tasks*/
{
int wait=1;
if( taskid == 0)
{
int i,j,count, source;
for(i=0 ; i <= sizeofar(mv_avg_para) ;i++){
task2do[i]= TASK_INPIT;
}
i=taskid;
while(wait)
{
int flag;
MPI_Iprobe(MPI_ANY_SOURCE, REPLY, MPI_COMM_WORLD,&flag, &status);
if(flag) /*data received*/
{
source = status.MPI_SOURCE;
MPI_Get_count(&status, MPI_LONG, &count);
count = count > baserec ? baserec : count;
MPI_Recv(inptbuff, count, MPI_LONG, source, REPLY, MPI_COMM_WORLD, &status);
task2do[inptbuff[0] ]=TASK_DONE;
send_channel( channel, &inptbuff[1] , count-1,inptbuff[0]);
}
if ( i
if(task2do[i ]==TASK_INPIT)
{
task2do[i ]=TASK_RUN;
if(argc == 2)
GetCandleDataDefault(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb); /*from Hypertable*/
else
GetCandleDataPeriod(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb, start_date,end_date); /*from Hypertable*/
send_channel( channel, sendbuff , realsize,i);
task2do[i ]=TASK_DONE;
i+=ntasks;
}
}
wait=0;
for (j=0; j
wait+=task2do[j];
}
}
}
else /*taskid == 1*/
{
int i;
for (i=taskid; i
if(argc == 2)
GetCandleDataDefault(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb); /*from Hypertable*/
else
GetCandleDataPeriod(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[i] ,symb, start_date,end_date); /*from Hypertable*/
outpbuff[0] = i;
if(realsize > 0)
{
MPI_Send( outpbuff, realsize+1, MPI_LONG, 0, REPLY, MPI_COMM_WORLD );
}
}
}
}
else /*mode-3, multiple nodes tasks ,ntasks > 2*/
{
int i,ierr,j,count;
int wait=1;
if(taskid==0)
{
for(i=0 ; i <= sizeofar(mv_avg_para) ;i++){
task2do[i]= TASK_INPIT;
}
/*first dispatch task to nodes*/
for(i=1,j=0 ; i < ntasks && j < sizeofar(mv_avg_para) ;i++,j++){
ierr=MPI_Send(&j,1,MPI_INT,
i,REQUEST,MPI_COMM_WORLD);
task2do[j]= TASK_RUN;
}
/*receiving result , and assign next task*/
while (wait) {
int pos;
MPI_Recv( inptbuff, baserec+1, MPI_LONG, MPI_ANY_SOURCE, REPLY, MPI_COMM_WORLD, &status );
MPI_Get_count(&status, MPI_LONG, &count);
pos = inptbuff[0];
send_channel( channel, &inptbuff[1] , count-1,inptbuff[0]);
task2do[ pos ]=TASK_DONE;
wait=0;
for (i=0; i <
if(task2do[i] == TASK_INPIT)
{
/*dispatch next task*/
ierr=MPI_Send(&i,1,MPI_INT,
status.MPI_SOURCE ,REQUEST,MPI_COMM_WORLD);
task2do[i]= TASK_RUN;
}
wait+=task2do[i];
}
}
/*all task was done , send 'finish(==0)' to all nodes*/
for(i=1,j=0 ; i < ntasks && j < sizeofar(mv_avg_para) ;i++,j++){
int noCalc=-1;
ierr=MPI_Send(&noCalc,1,MPI_INT,
i,REQUEST,MPI_COMM_WORLD); /*all works done!!*/
}
}
else
{
int mv_avg_pos=0;
MPI_Recv( &mv_avg_pos, 1, MPI_INT, 0, REQUEST,
MPI_COMM_WORLD, &status );
while(mv_avg_pos >= 0)
{
if(argc == 2)
GetCandleDataDefault(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[mv_avg_pos] ,symb); /*from Hypertable*/
else
GetCandleDataPeriod(sendbuff,baserec,basebuff, &realsize ,baserec,mv_avg_para[mv_avg_pos] ,symb, start_date,end_date); /*from Hypertable*/
outpbuff[0] = mv_avg_pos;
if(realsize > 0)
{
MPI_Send( outpbuff, realsize+1, MPI_LONG, 0, REPLY, MPI_COMM_WORLD );
}
MPI_Recv( &mv_avg_pos, 1, MPI_INT, 0, REQUEST,
MPI_COMM_WORLD, &status );
}
}
}
MPI_Barrier(MPI_COMM_WORLD);
依上述的計算結果
(1)Hypertable 僅有300筆的情況下的簡單計算
mode-1:1.397秒 (1 process 1 nodes)
mode-2:1.296秒 (2 processes 2 nodes)
mode-3:1.298秒 (3 processes under 2 nodes)
mode-3:1.216秒 (11 processes under 2 nodes)
mode-1:11.015秒 (1 process 1 nodes)
mode-2:8.015秒(2 processes 2 nodes)
mode-3:10.018秒(3 processes under 2 nodes)
mode-3:2.031秒(11 processes under 2 nodes)
PS. 模擬耗工作,是以計算完移動平均後,再Sleep 1秒鐘為之。因為是簡單表示工作分派,不作其他資源耗用之假想
以上結果,對於表示的平行運算,只能粗淺意會,大略呈現分散運算的概念,實際需要考量現實世界的因素還有很多,需要加輔經驗的驗證方能有效適用各項計算工作。
實際上,若1 個compute node配置一個CPU或Core,上述的模擬應可大略表達分散運算的目的。在進行分散運算時,仍要注意工作內容是否採分散運算較佳或者集中1個process較佳
(1)compute nodes的 CPU & RAM資源的分佈狀況(是否足以進行分散運算)
(2)service process本身的工作是否能夠進行拆解
(3)service process的工作拆解後,分散於各節點,會不會反而因為網路傳送耗時而失去意義
(4)分散運算有其資源使用與最快速計算時間中,最適當的滿足點。應視運算成本與可容許計算時間之間,取得一個平衡
平台可再擴充之部分
(1)配合PBS進行批次作業之工作分派,以及工作之監控
(2)配合PBS對於常用之Service Process採常駐形式進行服務
(3)用戶與Client Agent間可以用多種介面支援XML, Json並輔以資料加密與壓縮
Torque with MPICH2 , Hadoop and Hypertable
Torque(OpenPBS) Resource Manager
base on: OpenMP and MPICH2 hybrid example
[root@cent146 ~]# cat /etc/hosts
172.16.43.245 DFS3 HT245 # Added by NetworkManager
127.0.0.1 localhost.localdomain localhost
::1 DFS3 localhost6.localdomain6 localhost6
172.16.173.145 cent145
172.16.173.146 cent146
172.16.173.146 172.16.173.145 172.16.43.141
+---------------+ +-------------+ +------------+
|Head node | |Compute node | |Hypertable |
|Compute node | | | | |
|---------------| |-------------| |------------|
|HT thrift api | |HT thrift api| | |
|---------------| |-------------| |------------|
|pbs_server | |pbs_mom | | |
|pbs_sched(maui)| | | |------------|
|pbs_mom | | | |nn01 |
|---------------| |-------------| +------+-----+
|cent146 | |cent145 | |
+-------+-------+ +------+------+ +----NAT(@VM)--+-
| | |
--------+--------------------+------------+
##qsub: submit job to pbs_server
## pbs_server informs to pbs_sched(default FIFO scheduler,pbs_sched, or advanced by MAUI/MOAB)
## pbs_sched make local policy decisions for resource usage and allocate nodes to jobs
## pbs_sched sends instructions to run the job with the node list to pbs_server
## pbs_server must recognize which systems on the network are its compute nodes($TORQUE_HOME/server_priv/nodes)
## pbs_server sends the new job to the 1st node(Mother Superior) in the node list to run it
## Other nodes in a job are called sister moms.
## in self-built rpm , $TORQUE_HOME is equal to /var/spool/torque
## in epel rpm , $TORQUE_HOME is equal to /etc/torque
##qmgr: the server configuration can be modified to set up TORQUE to do useful work.
## such as enable sheduler,
[self-built rpm(see Appendix-A) without munge and with ssh/scp, torque-4.1.2 ]
# @cent145 & cent146, install it by root user
rpm -Uvh http://ftp.sjtu.edu.cn/sites/download.fedora.redhat.com/pub/epel/6/x86_64/epel-release-6-7.noarch.rpm
[root@cent146 ~]# yum install torque torque-devel torque-scheduler torque-server torque-mom
[root@cent145 ~]# yum install torque torque-devel torque-scheduler torque-server torque-mom
## setup ssh passwordless (root and hypertable users)
## setup information below
cat /usr/share/doc/torque-4.1.2/README.torqueFireFox to http://adaptivecomputing.com/documentation/
or
FireFox to http://mauichina.blogspot.tw/2009/06/torque_19.html
[root@cent146 server_priv]# rpm -ql torque-server
:
:
/var/spool/torque/server_priv/nodes
/var/spool/torque/server_priv/queues
##Setting up
STEP-1)create serverdb file(in $TORQUE_HOME/server_priv )
## http://www.adaptivecomputing.com/resources/docs/torque/4-1-2/Content/topics/1-installConfig/initializeConfigOnServer.htm#pbs_server%20-t%20create[root@cent146 ~]# pbs_server -t create ## These commands must be executed by root.
##or
##[root@cent146 ~]# cp /usr/share/doc/torque-2.5.7/torque.setup .
##[root@cent146 ~]# ./torque.setup 'username'
[root@cent146 ~]# qmgr -c 'p s'
#
# Set server attributes.
#
set server acl_hosts = cent146
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server mom_job_sync = True
set server moab_array_compatible = True
[root@cent146 ~]# ps -ef | grep pbs_server
root 2426 1 0 19:22 ? 00:00:00 pbs_server -t create
root 2435 1361 0 19:24 pts/0 00:00:00 grep pbs_server
[root@cent146 ~]# qterm
STEP-2)specify all nodes @cent146
## http://www.adaptivecomputing.com/resources/docs/torque/4-1-2/Content/topics/1-installConfig/specifyComputeNodes.htm[root@cent146 server_priv]# vi /var/spool/torque/server_priv/nodes
cent146 master comnode
cent145 slave comnode
[root@cent146 ~]# ps -ef | grep pbs_server
root 2439 1361 0 19:24 pts/0 00:00:00 grep pbs_server
[root@cent146 ~]# pbs_server ## restart up pbs_server for following setup-procedure
STEP-3)specify compute node's config(for cent145 & cent146, root user only)
## privilege for pbs_server(hostname in /etc/hosts) launch job to pbs_mom[root@cent146 ~]# vi /var/spool/torque/mom_priv/config
$pbsserver cent146
[root@cent145 ~]# vi /var/spool/torque/mom_priv/config
$pbsserver cent146
STEP-4)create a queue and enable the server to accept and run jobs
[root@cent146 ~]# find / -name torque.setup/usr/share/doc/torque-server-4.1.2/torque.setup
[root@cent146 ~]# cp /usr/share/doc/torque-server-4.1.2/torque.setup ~/.
[root@cent146 ~]# chmod 755 torque.setup
[root@cent146 ~]# torque.setup root
[root@cent146 ~]# qmgr -c "set server scheduling=true"
[root@cent146 ~]# qmgr -c "create queue batch queue_type=execution"
[root@cent146 ~]# qmgr -c "set queue batch started=true"
[root@cent146 ~]# qmgr -c "set queue batch enabled=true"
[root@cent146 ~]# qmgr -c "set queue batch resources_default.nodes=1"
[root@cent146 ~]# qmgr -c "set queue batch resources_default.walltime=3600"
[root@cent146 ~]# qmgr -c "set server default_queue=batch"
[root@cent146 ~]# qmgr -c 'p s'
#
# Create queues and set their attributes.
#
#
# Create and define queue batch
#
create queue batch
set queue batch queue_type = Execution
set queue batch resources_default.nodes = 1
set queue batch resources_default.walltime = 01:00:00
set queue batch enabled = True
set queue batch started = True
#
# Set server attributes.
#
set server scheduling = True
set server acl_hosts = cent146
set server default_queue = batch
set server log_events = 511
set server mail_from = adm
set server scheduler_iteration = 600
set server node_check_rate = 150
set server tcp_timeout = 300
set server job_stat_rate = 45
set server poll_jobs = True
set server mom_job_sync = True
set server next_job_number = 0
set server moab_array_compatible = True
##verify all queues are properly configured
[root@cent146 server_priv]# qstat -q
server: cent146
Queue Memory CPU Time Walltime Node Run Que Lm State
---------------- ------ -------- -------- ---- --- --- -- -----
batch -- -- -- -- 0 0 -- E R
----- -----
0 0
## verify all nodes are correctly reporting
[root@cent146 server_priv]# pbsnodes -a
cent146
state = free
np = 1
properties = master,comnode
ntype = cluster
status = rectime=1350043175,varattr=,jobs=,state=free,netload=14748874,gres=,loadave=0.00,ncpus=1,physmem=1017464kb,availmem=2876912kb,totmem=3081840kb,idletime=63603,nusers=0,nsessions=0,uname=Linux cent146 2.6.32-71.el6.x86_64 #1 SMP Fri May 20 03:51:51 BST 2011 x86_64,opsys=linux
mom_service_port = 15002
mom_manager_port = 15003
gpus = 0
cent145
state = free
np = 1
properties = slave,comnode
ntype = cluster
mom_service_port = 15002
mom_manager_port = 15003
gpus = 0
## submit a basic job - DO NOT RUN AS ROOT
[root@cent146 server_priv]# echo "sleep 30" | qsub
qsub can not be run as root
[root@cent146 server_priv]# useradd -u 500 hypertable
[root@cent146 server_priv]# su - hypertable
[root@cent146 server_priv]# torque.setup hypertable ##notice: ssh passwordless should be completed
[root@cent146 server_priv]# echo set server operators += hypertable@cent146 | qmgr
Max open servers: 9
[root@cent146 server_priv]# echo set server managers += hypertable@cent146 | qmgr
Max open servers: 9
[hypertable@cent146 ~]$ cat /tmp/dolog.sh
#!/bin/bash
echo "$(date +%Y%m%d-%H%M%S)" >> /tmp/tmp.log
[hypertable@cent146 ~]$ chmod 755 /tmp/dolog.sh
[hypertable@cent146 ~]$ scp /tmp/dolog.sh cent146:/tmp/.
[hypertable@cent146 ~]$ qsub -l nodes=1:slave+1:comnode /tmp/dolog.sh
#[hypertable@cent146 ~]$ echo "sleep 30" | qsub
0.cent146
[hypertable@cent146 ~]$ qstat -a
cent146:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- ----------- -------- ---------------- ------ ----- ------ ------ ----- - -----
0.cent146 hypertable batch dolog.sh 1479 2 1:slav -- 01:00 Q 00:00
##At this point, the job should be in the Q state and will not run because a scheduler is not running yet.
STEP-5) service auto-launch in boot time
[root@cent146 mom_priv]# chkconfig --list | grep pbspbs_mom 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
pbs_sched 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
pbs_server 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
[root@cent146 mom_priv]# chkconfig --list | grep trqauthd
trqauthd 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
[root@cent145 mom_logs]# chkconfig --list | grep pbs
pbs_mom 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
pbs_sched 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
pbs_server 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
[root@cent145 mom_logs]# chkconfig --list | grep trqauthd
trqauthd 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
[openpbs with mpiexec]
[hypertable@cent146 ~]$ cat /tmp/dolog.sh
#!/bin/bash
echo "$(date +%Y%m%d-%H%M%S)" >> /tmp/tmp.log
[hypertable@cent146 ~]$ scp /tmp/dolog.sh cent145:/tmp/.
[hypertable@cent146 ~]$ cat host.hydra
cent145
cent146
[hypertable@cent146 ~]$ vi myprog
###PBS -l nodes=1:master+1:slave
#PBS -l nodes=1:master
#PBS -N sean_job
#PBS -j oe
##/tmp/dolog.sh
/usr/lib64/mpich2/bin/mpiexec -f host.hydra -n 4 /tmp/dolog.sh
[hypertable@cent146 ~]$ qsub -V myprog
33.cent146
[hypertable@cent146 ~]$ qstat
Job id Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
33.cent146 sean_job hypertable 00:00:00 C batch
STEP-6)Optional: install maui instead of pbs_sched
## rpm build by fpm , refer to APPENDIX-B[hypertable@cent146 ~]$ su -
[root@cent146 ~]# yum install maui
[root@cent146 ~]# source /etc/profile.d/maui.sh
##environment $PATH already config in /etc/profile.d/maui.sh(bash)
[root@cent146 ~]# vi /usr/local/maui/maui.cfg
# maui.cfg 3.3.1
SERVERHOST cent146
# primary admin must be first in list
ADMIN1 root hypertable #add hypertable , or there will error about authorization
# Resource Manager Definition
#RMCFG[CENT145] TYPE=PBS
#RMCFG[CENT146] TYPE=PBS@RMNMHOST@fRMTYPE[0] PBS
RMCFG[CENT146] TYPE=PBS
#before starting maui, pbs_sched should be off
[root@cent146 server_priv]# service pbs_sched stop
Shutting down TORQUE Scheduler: [ OK ]
[root@cent146 server_priv]# chkconfig pbs_sched off
[root@cent146 server_priv]# chkconfig --list | grep maui
maui.d 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
[root@cent146 server_priv]# service maui.d start
Starting MAUI Scheduler: [ OK ]
[root@cent146 server_priv]# showq
ACTIVE JOBS--------------------
JOBNAME USERNAME STATE PROC REMAINING STARTTIME
0 Active Jobs 0 of 0 Processors Active (0.00%)
IDLE JOBS----------------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
0 Idle Jobs
BLOCKED JOBS----------------
JOBNAME USERNAME STATE PROC WCLIMIT QUEUETIME
Total Jobs: 0 Active Jobs: 0 Idle Jobs: 0 Blocked Jobs: 0
[root@cent146 server_priv]# su - hypertable
[root@cent146 server_priv]# qsub myprog
[hypertable@cent146 ~]$ qstat -a
cent146:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- ----------- -------- ---------------- ------ ----- ------ ------ ----- - -----
81.cent146 hypertable batch sean_job 30286 1 master -- 01:00 C 00:00
STEP-7)Optional: install hypertable thrift interface
##download rpm[root@cent146 ~]# scp 172.16.43.248:/var/ftp/extras/hypertable-th* .
[root@cent146 ~]# scp hypertable-thriftbroker-only-0.9.6.1-linux-x86_64.rpm cent145:~/.
[root@cent146 ~]# yum localinstall hypertable-thriftbroker-only-0.9.6.1-linux-x86_64.rpm --nogpgcheck
[root@cent145 ~]# yum localinstall hypertable-thriftbroker-only-0.9.6.1-linux-x86_64.rpm --nogpgcheck
[root@cent146 ~]# ln -s /opt/hypertable/0.9.6.1/ /opt/hypertable/current
[root@cent145 ~]# ln -s /opt/hypertable/0.9.6.1/ /opt/hypertable/current
## Make candle.cc at Hypertable Server
###prepare a Hypertable develope environment in MPICH2 server(cent146)
#[root@cent146 include]# yum install gcc g++ make boost-devel
#[root@cent146 include]# scp -r 172.16.43.141:/opt/hypertable/current/include/* .
#[root@cent146 src]# cd /opt/hypertable/current/lib
#[root@cent146 lib]# scp 172.16.43.141:/opt/hypertable/current/lib/libHyperCommon.a .
#[root@cent146 lib]# scp 172.16.43.141:/opt/hypertable/current/lib/libHypertable.a .
#[root@cent146 lib]# ln -s /opt/hypertable/current/lib/libthrift-0.8.0.so /opt/hypertable/current/lib/libthrift.so
#[root@cent146 lib]# ln -s /opt/hypertable/current/lib/libevent-1.4.so.2 /opt/hypertable/current/lib/libevent.so
#[root@cent146 lib]# ln -s /opt/hypertable/current/lib/liblog4cpp.so.4 /opt/hypertable/current/lib/liblog4cpp.so
[hypertable@nn01 src]$ cat candle.cc
#include <iostream>
#include <fstream>
#include <netinet/in.h>
#include "ThriftBroker/Client.h"
#include "ThriftBroker/gen-cpp/HqlService.h"
#include "ThriftBroker/ThriftHelper.h"
#include "ThriftBroker/SerializedCellsReader.h"
using namespace Hypertable;
using namespace Hypertable::ThriftGen;
void run(Thrift::Client *client);
void test_hql(Thrift::Client *client, std::ostream &out);
int main(int argc , char * argv[]) {
//Thrift::Client *client = new Thrift::Client("localhost", 38080);
Thrift::Client *client = NULL;
if(argc > 1)
{
client = new Thrift::Client(argv[1], 38080);
std::cout << argv[1] << std::endl;
}
else
{
client = new Thrift::Client("localhost", 38080);
std::cout << "localhost" << std::endl;
}
if(client)
run(client);
}
void run(Thrift::Client *client) {
try {
std::ostream &out = std::cout;
out << "running test_hql" << std::endl;
test_hql(client, out);
}
catch (ClientException &e) {
std::cout << e << std::endl;
exit(1);
}
}
void test_hql(Thrift::Client *client, std::ostream &out) {
HqlResult result;
if (!client->namespace_exists("quote"))
{
out << "no quote namespace exist" << std::endl;
return;
}
Namespace ns = client->namespace_open("quote");
if(client->table_exists(ns,"candle_daily"))
{
HqlResultAsArrays result_as_arrays;
client->hql_query_as_arrays(result_as_arrays, ns, "select * from candle_daily");
out << result_as_arrays.cells[0] << std::endl << "****" << std::endl;
for(unsigned int i = 0 ; i < 2 ; i++)
for(unsigned int j = 0 ; j < result_as_arrays.cells[i].size() ; j++)
out << result_as_arrays.cells[i][j] << std::endl << "****" << std::endl;
}
client->namespace_close(ns);
}
[hypertable@nn01 src]$ export LD_LIBRARY_PATH=/opt/hypertable/current/lib/; make candle
[hypertable@nn01 src]$ scp candle cent145:~/src/ht/.
[hypertable@nn01 src]$ scp candle cent146:~/src/ht/.
##Prepare PBS and launch the job(hypertable thrift broker's ip is 172.16.43.141)
[hypertable@cent146 ~]$ cat myht#PBS -l nodes=master
#PBS -N ht_job
#PBS -j oe
export LD_LIBRARY_PATH=/opt/hypertable/current/lib:$LD_LIBRARY_PATH
time /usr/lib64/mpich2/bin/mpiexec -f host.hydra -n 2 ~/src/ht/candle 172.16.43.141
[hypertable@cent146 ~]$ scp myht cent145:~/.
[hypertable@cent146 ~]$ qsub myht
90.cent146
[hypertable@cent146 ~]$ qstat -a
cent146:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
-------------------- ----------- -------- ---------------- ------ ----- ------ ------ ----- - -----
90.cent146 hypertable batch ht_job 1336 1 master -- 01:00 C 00:00
APPENDIX-A)build a torque rpmfile
##default torque-*.rpm seems make with r*(rcp , rsh...etc), I need to re-build a rpm with ssh,scp...
[root@cent145 ~]# cd /root/rpmbuild/SOURCE
[root@cent145 SOURCE]# wget http://www.adaptivecomputing.com/resources/downloads/torque/torque-4.1.2.tar.gz
##wget http://www.adaptivecomputing.com/resources/downloads/torque/torque-2.5.7.tar.gz
[root@cent145 SOURCE]# tar -zxvf torque-4.1.2.tar.gz
[root@cent145 SOURCE]# cd torque-4.1.2
[root@cent145 torque-4.1.2]# cp torque.spec ../../SPECS
[root@cent145 torque-4.1.2]# yum install openssl-devel
[root@cent145 SPECS]# cd ../../SPECS
[root@cent145 SPECS]# rpmbuild -bb torque.spec
[root@cent145 SPECS]# cd ../BUILD/torque-4.1.2
[root@cent145 torque-4.1.2]# cat config.status| grep "RSH_PATH"
S["RSH_PATH"]="ssh"
[root@cent145 torque-4.1.2]# cat config.status| grep "RCP"
S["INCLUDE_MOM_RCP_FALSE"]=""
S["INCLUDE_MOM_RCP_TRUE"]="#"
S["RCP_ARGS"]="-rpB"
S["RCP_PATH"]="/usr/bin/scp"
[root@cent145 torque-4.1.2]# cd ../../SPECS
[root@cent145 SPECS]# mkdir -p /var/ftp/torque
[root@cent145 SPECS]# cp ../RPMS/x86_64/torque* /var/ftp/torque
[root@cent145 SPECS]# cp ../RPMS/noarch/torque* /var/ftp/torque
#Add a new repository(createrepo) for mpich2,
#and then install mpich2 and all dependencies with yum for all servers.
APPENDIX-B)build a maui rpmfile
[root@cent145 ~]# yum install git-core ruby ruby-devel ruby-ext ruby-rdoc[root@cent145 ~]# wget http://rubyforge.org/frs/download.php/75475/rubygems-1.8.11.tgz
[root@cent145 ~]# tar -zxvf rubygems-1.8.11.tgz
[root@cent145 ~]# cd rubygems-1.8.11;ruby setup.rb
[root@cent145 rubygems-1.8.11]# gem install fpm
##download maui-3.3.1.tar.gz from www.adaptivecomputing.com
[root@cent145 ~]# tar -zxvf maui-3.3.1.tar.gz[root@cent145 ~]# cd maui-3.3.1
[root@cent145 maui-3.3.1]# ./configure --prefix=/usr/local/maui
[root@cent145 maui-3.3.1]# make
make -C src/moab all
make[1]: Entering directory `/root/maui-3.3.1/src/moab'
gcc -I../../include/ -I/usr/local/maui/include -I/usr/include/torque -D__LINUX -D__MPBS -g -O2 -D__M64 -c MPBSI.c
MPBSI.c:177: ?航炊嚗? ?et_svrport??憿?銵?
/usr/include/torque/pbs_ifl.h:686: ?酉嚗et_svrport????銝?恐?甇?MPBSI.c:178: ?航炊嚗? ?penrm??憿?銵?
/usr/include/torque/pbs_ifl.h:687: ?酉嚗penrm????銝?恐?甇?make[1]: *** [MPBSI.o] Error 1
make[1]: Leaving directory `/root/maui-3.3.1/src/moab'
make: *** [all] Error 2
##Fix the problem when make err on the newest maui with torque
[root@cent145 maui-3.3.1]# find . -name MPBSI.c./src/moab/MPBSI.c
[root@cent145 maui-3.3.1]# cp ./src/moab/MPBSI.c ./src/moab/MPBSI.c.orig
[root@cent145 maui-3.3.1]# vi ./src/moab/MPBSI.c
[root@cent145 maui-3.3.1]# diff ./src/moab/MPBSI.c.orig ./src/moab/MPBSI.c
177,178c177,178
< extern int get_svrport(const char *,char *,int);
< extern int openrm(char *,int);
---
> extern unsigned int get_svrport(char *,char *,unsigned int);
> extern int openrm(char *,unsigned int);
[root@cent145 maui-3.3.1]# make clean;make
##place into tmp dir, and prepare for rpm
[root@cent145 maui-3.3.1]# sed -i'.bkp' 's/\$(INST_DIR)/\$(DESTDIR)\/\$(INST_DIR)/g' src/*/Makefile[root@cent145 maui-3.3.1]# sed -i'' 's/\$(MSCHED_HOME)/\$(DESTDIR)\/\$(MSCHED_HOME)/g' src/*/Makefile
[root@cent145 maui-3.3.1]# DESTDIR=/tmp/maui make install
[root@cent145 maui-3.3.1]# ls -l /tmp/maui/
蝮質? 4
drwxr-xr-x 3 root root 4096 2012-10-23 22:45 usr
[root@cent145 maui-3.3.1]# mkdir /tmp/maui/etc
[root@cent145 maui-3.3.1]# mkdir /tmp/maui/etc/profile.d
[root@cent145 maui-3.3.1]# mkdir /tmp/maui/etc/init.d
[root@cent145 maui-3.3.1]# cp etc/maui.d /tmp/maui/etc/init.d/
[root@cent145 maui-3.3.1]# cp etc/maui.{csh,sh} /tmp/maui/etc/profile.d/
##edit /tmp/maui/etc/init.d/maui.d at line 12, change MAUI_PREFIX setting as
[root@cent145 maui-3.3.1]# vi /tmp/maui/etc/init.d/maui.d:
#MAUI_PREFIX=/opt/maui
MAUI_PREFIX=/usr/local/maui
:
## add 2 shell scripts
[root@cent145 maui-3.3.1]# vi /tmp/maui/post-install.sh#!/bin/bash
chkconfig --add maiu.d
chkconfig --level 3456 maui.d on
[root@cent145 maui-3.3.1]# vi /tmp/maui/pre-uninstall.sh
#!/bin/bash
chkconfig --del maui.d
[root@cent145 maui-3.3.1]# chmod 755 /tmp/maui/post-install.sh
[root@cent145 maui-3.3.1]# chmod 755 /tmp/maui/pre-uninstall.sh
## rpm build by fpm
[root@cent145 maui-3.3.1]# fpm -s dir -t rpm -n maui -v 3.3.1 -C /tmp/maui \> -p /tmp/maui-3.3.1-x86_64-fpmbuild.rpm --post-install /tmp/maui/post-install.sh \
> --pre-uninstall /tmp/maui/pre-uninstall.sh etc usr
[root@cent145 maui-3.3.1]# ls -l /tmp/*.rpm
-rw-r--r-- 1 root root 42178761 2012-10-23 22:55 /tmp/maui-3.3.1-x86_64-fpmbuild.rpm
[root@cent145 maui-3.3.1]# rpm -qpl /tmp/maui-3.3.1-x86_64-fpmbuild.rpm
/etc/init.d/maui.d
/etc/profile.d/maui.csh
/etc/profile.d/maui.sh
/usr/local/maui/bin/canceljob
/usr/local/maui/bin/changeparam
/usr/local/maui/bin/checkjob
/usr/local/maui/bin/checknode
/usr/local/maui/bin/diagnose
/usr/local/maui/bin/mbal
/usr/local/maui/bin/mclient
/usr/local/maui/bin/mdiag
/usr/local/maui/bin/mjobctl
/usr/local/maui/bin/mnodectl
/usr/local/maui/bin/mprof
/usr/local/maui/bin/mschedctl
/usr/local/maui/bin/mstat
/usr/local/maui/bin/releasehold
/usr/local/maui/bin/releaseres
/usr/local/maui/bin/resetstats
/usr/local/maui/bin/runjob
/usr/local/maui/bin/schedctl
/usr/local/maui/bin/sethold
/usr/local/maui/bin/setqos
/usr/local/maui/bin/setres
/usr/local/maui/bin/setspri
/usr/local/maui/bin/showbf
/usr/local/maui/bin/showconfig
/usr/local/maui/bin/showgrid
/usr/local/maui/bin/showhold
/usr/local/maui/bin/showq
/usr/local/maui/bin/showres
/usr/local/maui/bin/showstart
/usr/local/maui/bin/showstate
/usr/local/maui/bin/showstats
/usr/local/maui/include/moab.h
/usr/local/maui/lib/libmcom.a
/usr/local/maui/lib/libmoab.a
/usr/local/maui/log
/usr/local/maui/maui-private.cfg
/usr/local/maui/maui.cfg
/usr/local/maui/sbin/maui
/usr/local/maui/spool
/usr/local/maui/stats
/usr/local/maui/tools
/usr/local/maui/traces
[root@cent145 maui-3.3.1]# rpm -qpi /tmp/maui-3.3.1-x86_64-fpmbuild.rpm
Name : maui Relocations: /
Version : 3.3.1 Vendor: root@cent145
Release : 1 Build Date: 镼踹?2012撟?0??3??(?曹?) 22??5??0蝘?Install Date: (not installed) Build Host: cent145
Group : default Source RPM: maui-3.3.1-1.src.rpm
Size : 102654035 License: unknown
Signature : (none)
Packager : <root@cent145>
URL : http://example.com/no-uri-given
Summary : no description given
Description :
no description given
## rpm file could be local install by
[root@cent145 maui-3.3.1]# yum localinstall /tmp/maui-3.3.1-x86_64-fpmbuild.rpm## or update yum client's repo setting /etc/yum.repos.d/extension.repo
[mpich2]name=CentOS-$releasever - mpich2
baseurl=ftp://172.16.43.248/mpich2
gpgcheck=0
[torque]
name=CentOS-$releasever - torque
baseurl=ftp://172.16.43.248/torque
gpgcheck=0
[maui]
name=CentOS-$releasever - maui
baseurl=ftp://172.16.43.248/maui
gpgcheck=0
[root@cent145 maui-3.3.1]# yum install maui
[root@cent145 tmp]# chkconfig --list | grep maui
maui.d 0:?? 1:?? 2:?? 3:?? 4:?? 5:?? 6:??
reference: http://blog.ajdecon.org/installing-the-maui-scheduler-with-torque-410/
訂閱:
文章 (Atom)
文章分類
- 爬山 (3)
- 參考文章 (3)
- 鳥事 (5)
- 報稅 (1)
- AIX (2)
- ajax (1)
- BCB (3)
- C/C++ (2)
- cloudera (3)
- DISK (1)
- ftp (1)
- Fuse (2)
- gdb (2)
- hadoop (13)
- hdfs (8)
- HPC (2)
- hypertable (12)
- iOS (1)
- iscsi (1)
- JAVA (2)
- KFS (5)
- kickstart (1)
- KVM (2)
- LAMP (2)
- linux (2)
- Lion (1)
- LVM (2)
- mapreduce (3)
- mpi (3)
- mpich2 (4)
- msgpack (2)
- mysql (2)
- nfs (1)
- openmp (2)
- OS (1)
- OSX (2)
- others (5)
- PBS (1)
- performance_tuning (3)
- php (3)
- phplist (3)
- programming (27)
- REST (2)
- RHCA (6)
- rhel (13)
- rhel6 (4)
- scp (1)
- shell_scripts (2)
- snowleopard (2)
- Solaris (6)
- ssh (1)
- syslog (1)
- t-442-1 (4)
- torque (1)
- ubuntu (2)
- VNC (1)
- watercolor (5)
- windows (1)
- yum (1)