Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tickets/dm 17656 #508

Open
wants to merge 104 commits into
base: u/jgates/loader
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
104 commits
Select commit Hold shift + click to select a range
24fdf1c
Modified FileServer and added lader to build.
jgates108 Feb 2, 2018
8657cc9
Modified FileServerConnection.
jgates108 Feb 7, 2018
e3723a2
Modified FileServer and added lader to build.
jgates108 Feb 2, 2018
757bdf6
Modified FileServerConnection.
jgates108 Feb 7, 2018
64d7678
Added udp test code.
jgates108 Mar 7, 2018
1747b79
Test binaries.
jgates108 Mar 7, 2018
c650198
Changed to echo udp server and client.
jgates108 Mar 8, 2018
17875a5
Some progress.
jgates108 Mar 8, 2018
b5c56c2
Added SConscript file.
jgates108 Mar 9, 2018
f13a1c6
Working UDP async echo server and sync client.
jgates108 Mar 9, 2018
6fe9cf8
Implemented basic master and worker.
jgates108 Mar 16, 2018
263e92f
Fixed errors from rebase.
jgates108 Aug 27, 2018
6d5e162
Changes to handleTest2
jgates108 Sep 7, 2018
9b170d2
TCP server working test.
jgates108 Sep 13, 2018
7d8db62
Passing basic tests, again.
jgates108 Sep 28, 2018
7d012c9
Added code so worker recognizes new neighbor.
jgates108 Oct 3, 2018
9307c28
Added files.
jgates108 Oct 3, 2018
bf2bce5
Added range setting logic.
jgates108 Oct 11, 2018
e147bcd
Range setting changes.
jgates108 Oct 15, 2018
7cec788
Shift to right almost working.
jgates108 Oct 18, 2018
ab197f9
Shift to right appears to work.
jgates108 Oct 19, 2018
2613224
Fixed some bugs. Added ability to send a key insert to a neighbor.
jgates108 Oct 24, 2018
e22c5d0
Shifting appears to work.
jgates108 Oct 25, 2018
b000de0
Fixed communication problem.
jgates108 Oct 26, 2018
fc9d1c7
Added code so the master will send ranges to all its clients.
jgates108 Oct 26, 2018
02fba9e
Cleaned up code.
jgates108 Oct 29, 2018
51739ae
More code cleanup.
jgates108 Oct 30, 2018
b0be4f6
More code cleanup.
jgates108 Oct 30, 2018
e2891ea
Removed unused files.
jgates108 Nov 1, 2018
868aace
Made some review changes.
jgates108 Nov 9, 2018
25acfd4
Made review changes.
jgates108 Nov 14, 2018
cba467b
Made review changes and cleaned up some todo items.
jgates108 Nov 16, 2018
08c935c
Moved MsgElement classes into their own header file.
jgates108 Nov 20, 2018
8967fbe
Put DoListItem in its own header file.
jgates108 Nov 21, 2018
da3cf79
Fragile initialization order of Central moved to initialize function.
jgates108 Nov 29, 2018
840b269
Created WorkerListItemBase class.
jgates108 Dec 3, 2018
6c5f8cb
Implemented max number of concurrent client insert and lookup requests.
jgates108 Dec 20, 2018
4ab5a48
Changed to use CompositeKey instead of string for keys.
jgates108 Jan 10, 2019
e8aacb9
Created separate programs for master, client, and server.
jgates108 Jan 18, 2019
f7d221e
Modified FileServer and added lader to build.
jgates108 Feb 2, 2018
78732ce
Modified FileServerConnection.
jgates108 Feb 7, 2018
62ea94c
Modified FileServer and added lader to build.
jgates108 Feb 2, 2018
77de954
Modified FileServerConnection.
jgates108 Feb 7, 2018
9425a12
Added udp test code.
jgates108 Mar 7, 2018
b40ec85
Test binaries.
jgates108 Mar 7, 2018
8401c78
Changed to echo udp server and client.
jgates108 Mar 8, 2018
1dcfe00
Some progress.
jgates108 Mar 8, 2018
af3d654
Added SConscript file.
jgates108 Mar 9, 2018
fd8af34
Working UDP async echo server and sync client.
jgates108 Mar 9, 2018
379d435
Implemented basic master and worker.
jgates108 Mar 16, 2018
6b5fb97
Fixed errors from rebase.
jgates108 Aug 27, 2018
020a427
Changes to handleTest2
jgates108 Sep 7, 2018
2810e62
TCP server working test.
jgates108 Sep 13, 2018
2473845
Passing basic tests, again.
jgates108 Sep 28, 2018
378ec47
Added code so worker recognizes new neighbor.
jgates108 Oct 3, 2018
f2db0e3
Added files.
jgates108 Oct 3, 2018
4bc3338
Added range setting logic.
jgates108 Oct 11, 2018
479f6fb
Range setting changes.
jgates108 Oct 15, 2018
8526688
Shift to right almost working.
jgates108 Oct 18, 2018
5bd2f76
Shift to right appears to work.
jgates108 Oct 19, 2018
c01cf16
Fixed some bugs. Added ability to send a key insert to a neighbor.
jgates108 Oct 24, 2018
7f938ff
Shifting appears to work.
jgates108 Oct 25, 2018
1ebebe5
Fixed communication problem.
jgates108 Oct 26, 2018
ff13cbd
Added code so the master will send ranges to all its clients.
jgates108 Oct 26, 2018
d00a31e
Cleaned up code.
jgates108 Oct 29, 2018
b8348f0
More code cleanup.
jgates108 Oct 30, 2018
126835d
More code cleanup.
jgates108 Oct 30, 2018
131a5e4
Removed unused files.
jgates108 Nov 1, 2018
844c391
Made some review changes.
jgates108 Nov 9, 2018
38c07c9
Made review changes.
jgates108 Nov 14, 2018
04e2a09
Made review changes and cleaned up some todo items.
jgates108 Nov 16, 2018
70730c3
Moved MsgElement classes into their own header file.
jgates108 Nov 20, 2018
d839ad4
Put DoListItem in its own header file.
jgates108 Nov 21, 2018
ab69c00
Fragile initialization order of Central moved to initialize function.
jgates108 Nov 29, 2018
a0c3a70
Fixed a race condition and some minor changes.
jgates108 Dec 3, 2018
49d1c0e
Fixed a race condition and some minor changes.
jgates108 Dec 3, 2018
aefa5a6
Created WorkerListItemBase class.
jgates108 Dec 4, 2018
b062471
Replace boost:bind calls with lambdas.
jgates108 Dec 6, 2018
c95a543
Removed a comment.
jgates108 Dec 7, 2018
0a40045
Add configuration file classes.
jgates108 Dec 13, 2018
e7c2513
Fixed a bug with TCP communications and added configuration file elem…
jgates108 Dec 18, 2018
9a95c3e
Cleaned up code.
jgates108 Dec 19, 2018
15c7689
Added reasonable type checking to configuration file values.
jgates108 Jan 9, 2019
2358c35
Implemented max number of concurrent client insert and lookup requests.
jgates108 Dec 20, 2018
364d629
Started converting to using CompositeKey instead of string keys.
jgates108 Dec 21, 2018
ac60356
Changed to use CompositeKey instead of string for keys.
jgates108 Jan 10, 2019
36da2ce
Created separate programs for master, client, and server.
jgates108 Jan 18, 2019
4752dc7
Added timers to client program.
jgates108 Jan 25, 2019
865b44c
Added files for configuring kubernetes.
jgates108 Mar 13, 2019
2ed287e
Kubernetes configuration added.
jgates108 Mar 18, 2019
0b100c5
Added configuration for appClientNum.
jgates108 Mar 19, 2019
a35dda8
Code now inserts keys into the system.
jgates108 Mar 21, 2019
16a53d2
Added multiple clients to kuebernetes configuration.
jgates108 Mar 22, 2019
dd92d7f
Significantly reduced the number of concurrent insert attempts in k8s…
jgates108 Mar 29, 2019
8e1e1de
Increased the size of the buffer used for shifts.
jgates108 Apr 11, 2019
2138a67
Added comments.
jgates108 Apr 23, 2019
d6f9a49
Made changes to indexer Dockerfile.
jgates108 Oct 29, 2019
4defa29
Fixed Dockerfile.
jgates108 Oct 30, 2019
b1b0d2d
Merge branch 'tickets/DM-17656' of github.com:lsst/qserv into tickets…
jgates108 Oct 30, 2019
ccd0bcc
MAde changes to container build files.
jgates108 Nov 1, 2019
a01379b
Made changes so kubernetes can terminate containers.
jgates108 Nov 6, 2019
914341a
Fixed race condition in MsgElement::retrieve.
jgates108 Nov 8, 2019
58f783e
Changed to keep map of DNS addresses due to 5 second DNS delays.
jgates108 Dec 12, 2019
ee9ebde
Changed to lazy initialization for CentralFollower::_wWorkerList.
jgates108 Jan 8, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
18 changes: 18 additions & 0 deletions admin/templates/configuration/etc/log4cxx.index.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#
# Configuration file for log4cxx
# can be used for unit test
# by launching next command before unit tests:
# export LSST_LOG_CONFIG=$HOME/.lsst/log4cxx.unittest.properties
#

log4j.rootLogger=INFO, CONSOLE
#log4j.rootLogger=DEBUG, CONSOLE
#log4j.rootLogger=WARN, CONSOLE

log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
#log4j.appender.CONSOLE.layout.ConversionPattern=[%d{yyyy-MM-ddTHH:mm:ss.SSSZ}] [%t] %-5p %c{2} (%F:%L) - %m%n
log4j.appender.CONSOLE.layout.ConversionPattern=[%d{ddTHH:mm:ss.SSSZ}] [%t] %-5p %c{2} (%F:%L) - %m%n

# Tune log at the module level
#log4j.logger.lsst.qserv.util=DEBUG
18 changes: 18 additions & 0 deletions admin/templates/configuration/etc/log4cxx.index_master.properties
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
#
# Configuration file for log4cxx
# can be used for unit test
# by launching next command before unit tests:
# export LSST_LOG_CONFIG=$HOME/.lsst/log4cxx.unittest.properties
#

#log4j.rootLogger=INFO, CONSOLE
log4j.rootLogger=DEBUG, CONSOLE
#log4j.rootLogger=WARN, CONSOLE

log4j.appender.CONSOLE=org.apache.log4j.ConsoleAppender
log4j.appender.CONSOLE.layout=org.apache.log4j.PatternLayout
#log4j.appender.CONSOLE.layout.ConversionPattern=[%d{yyyy-MM-ddTHH:mm:ss.SSSZ}] [%t] %-5p %c{2} (%F:%L) - %m%n
log4j.appender.CONSOLE.layout.ConversionPattern=[%d{ddTHH:mm:ss.SSSZ}] [%t] %-5p %c{2} (%F:%L) - %m%n

# Tune log at the module level
#log4j.logger.lsst.qserv.util=DEBUG
22 changes: 22 additions & 0 deletions admin/tools/docker/index/container/buildContainers.README
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
Invoking the following command from the qserv directory should build and push the containers and is
useful for breaking the build into smaller commands when there are problems.

It helps to do a "rm -rf bin share build" before running as docker copies everything in
the qserv directory and includes it in the base container. This saves a couple of GB
in both the initial copy and pushing the containers.


docker build -f admin/tools/docker/index/container/dev/Dockerfile -t qserv/indexbase:dev . && \
cd admin/tools/docker/index/container/dev/worker/ && docker build -t qserv/indexworker:dev . && \
cd ../master/ && docker build -t qserv/indexmaster:dev . && \
cd ../clientNum/ && docker build -t qserv/indexclientnum:dev . && \
cd ../../../../../../../../qserv
docker push qserv/indexmaster:dev && docker push qserv/indexworker:dev && docker push qserv/indexclientnum:dev


Useful kubernetes commands:
kubectl apply -f admin/tools/docker/index/index-k8-m.yaml
kubectl delete -f admin/tools/docker/index/index-k8-m.yaml
kubectl get pods
kubectl logs -f imaster-sts-0 | grep -i keycount
kubectl logs -f iclientnum2-sts-0 | egrep "DONE|INSERT|LOOK"
24 changes: 24 additions & 0 deletions admin/tools/docker/index/container/buildContainers.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
#! /bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on my understanding on what the script does it's supposed to be launched from (cd <path>) some specific directory (which is the same one where the script runs. This opens numerous possibilities for making mistakes. I would recommend passing a target folder as a parameter to the script, so that it could be run like:

% qserv/admin/tools/docker/loader/container/buildContainers.bash <qserv-base-dir>

You may get and evaluate a value of the parameter with:

QSERV_BASE_DIR="$1"
if [ -z "$QSERV_BASE_DIR" ]; then
    echo "usage: <path>"
    exit 1
fi
cd $QSERV_BASE_DIR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would also recommend adding:

set -e

to this and all other shell scripts. This will terminate the on errors and prevent unreliable results. See more on tis at:
https://stackoverflow.com/questions/6930295/set-e-and-short-tests


set -e

# qserv/admin/tools/docker/loader/container/buildContainers.bash
# cd back to base qserv directory as the Dockerfile COPY needs the entire project
# in the docker context.
cd ../../../../../../qserv
docker build -f admin/tools/docker/index/container/dev/Dockerfile -t qserv/indexbase:dev .

# go to individual directories to minimize the size of docker's context copy
# worker
cd admin/tools/docker/index/container/dev/worker/ && docker build -t qserv/indexworker:dev .
#docker build -f admin/tools/docker/index/container/dev/worker/Dockerfile -t qserv/indexworker:dev .

# master
cd ../master/ && docker build -t qserv/indexmaster:dev .
#docker build -f admin/tools/docker/index/container/dev/master/Dockerfile -t qserv/indexmaster:dev .

# clientNum
cd ../clientNum/ && docker build -t qserv/indexclientnum:dev .
#docker build -f admin/tools/docker/index/container/dev/clientNum/Dockerfile -t qserv/indexclientnum:dev .


26 changes: 26 additions & 0 deletions admin/tools/docker/index/container/dev/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
# docker build -f admin/tools/docker/index/container/dev/Dockerfile -t qserv/indexbase:dev .
#
# Using the development toolchain

FROM qserv/qserv:dev

USER 0

#RUN mv /usr/bin/sh /usr/bin/sh.old && ln -s /usr/bin/bash /usr/bin/sh
RUN yum update --assumeyes && yum install --assumeyes bind-utils gdb screen

USER 1000

RUN mkdir /home/qserv/dev/ && \
chown -R qserv:qserv /home/qserv

COPY --chown=qserv:qserv . /home/qserv/dev/qserv

RUN bash -lc "rm -rf /home/qserv/dev/qserv/build /home/qserv/dev/qserv/share /home/qserv/dev/qserv/bin && \
cd /qserv/stack/ && source ./loadLSST.bash && \
cd /home/qserv/dev/qserv && setup -r . -t qserv-dev && \
printenv && \
scons -j10 install && \
mkdir -p /home/qserv/run && \
qserv-configure.py --all -R /home/qserv/run"

13 changes: 13 additions & 0 deletions admin/tools/docker/index/container/dev/clientNum/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's a point in building a separate binary container for each tool (clientNum, appClientNum, appClientNumScreen, etc.)? You may as well package all these binaries into a single container. Then you may launch the container by specifying an application you want to run:

% docker run -it --rm qserv/index:dev clientNum.bash <parameters>

I think you may just add the name of a script/binary as an extra argument in the Kubernetes YAML file:

args: ["clientNum.bash", "10000000", "1", "client-k8s-a1.cnf"]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It made the yaml scripts easier to write and get working (which was not easy) and an actual instance of the system would not have the client apps, as they are only for testing. It would make sense to eventually trim down the containers so they only contain what they really need.
Currently, the containers all share the same qserv/indexbase:dev. The differences for the individual containers really comes down the entrypoint designations and bash scripts, which is small. There could be significant differences between master and worker containers in the future.

#
# cd ~/work/qserv/admin/tools/docker/index/container/dev/clientNum
# docker build -t qserv/indexclientnum:dev .
FROM qserv/indexbase:dev

USER 0

RUN yum update --assumeyes && yum install --assumeyes bind-utils

USER 1000

ENTRYPOINT ["/home/qserv/dev/qserv/admin/tools/docker/index/container/dev/clientNum/appClientNum.bash"]
23 changes: 23 additions & 0 deletions admin/tools/docker/index/container/dev/clientNum/appClientNum.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
#! /bin/bash -l
# admin/tools/docker/loader/container/dev/clientNum/appClientNum.bash

_term() {
echo "Caught SIGTERM signal!"
kill -TERM "$child" 2>/dev/null
}

trap _term SIGTERM
trap _term SIGKILL

source /qserv/stack/loadLSST.bash
cd /home/qserv/dev/qserv
setup -r . -t qserv-dev

export LSST_LOG_CONFIG=/home/qserv/dev/qserv/admin/templates/configuration/etc/log4cxx.index.properties

echo appClientNum $1 $2 $3

/home/qserv/dev/qserv/build/loader/appClientNum $1 $2 /home/qserv/dev/qserv/core/modules/loader/config/$3

child=$!
wait "$child"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is needed here? I don't see anything above which would be launching a detached process.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe Kubernetes or microk8s wasn't able to kill these containers until all this was added. They'd get stuck terminating the pods.

Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
#! /bin/bash -l
# admin/tools/docker/loader/container/dev/clientNum/appClientNumScreen.bash

echo appClientScreen $1 $2 $3

screen -dm /home/qserv/dev/qserv/admin/tools/docker/index/container/dev/clientNum/appClientNum $1 $2 $3


tail -f /dev/null
13 changes: 13 additions & 0 deletions admin/tools/docker/index/container/dev/master/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
# Run the following build command from the qserv base directory (could be ~/work/qserv or ~/development/qserv)
# The COPY command can only access files below $PWD in the file tree.
# cd ~/work/qserv/admin/tools/docker/index/container/dev/master
# docker build -t qserv/indexmaster:dev .
FROM qserv/indexbase:dev

USER 0

RUN yum update --assumeyes && yum install --assumeyes bind-utils

USER 1000

ENTRYPOINT ["/home/qserv/dev/qserv/admin/tools/docker/index/container/dev/master/appMaster.bash"]
22 changes: 22 additions & 0 deletions admin/tools/docker/index/container/dev/master/appMaster.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#! /bin/bash -l
# admin/tools/docker/loader/container/dev/master/appMaster.bash

_term() {
echo "Caught SIGTERM signal!"
kill -TERM "$child" 2>/dev/null
}

trap _term SIGTERM
trap _term SIGKILL

source /qserv/stack/loadLSST.bash
cd /home/qserv/dev/qserv
setup -r . -t qserv-dev

export LSST_LOG_CONFIG=/home/qserv/dev/qserv/admin/templates/configuration/etc/log4cxx.index_master.properties

/home/qserv/dev/qserv/build/loader/appMaster /home/qserv/dev/qserv/core/modules/loader/config/master.cnf

child=$!
echo "child ${child}"
wait "$child"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this is needed here? I don't see anything above which would be launching a detached process.

13 changes: 13 additions & 0 deletions admin/tools/docker/index/container/dev/worker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
#
#
# cd ~/work/qserv/admin/tools/docker/index/container/dev/worker
# docker build -t qserv/indexworker:dev .
FROM qserv/indexbase:dev

USER 0

RUN yum update --assumeyes && yum install --assumeyes bind-utils

USER 1000

ENTRYPOINT ["/home/qserv/dev/qserv/admin/tools/docker/index/container/dev/worker/appWorker.bash"]
22 changes: 22 additions & 0 deletions admin/tools/docker/index/container/dev/worker/appWorker.bash
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
#! /bin/bash
# admin/tools/docker/loader/container/dev/worker/appWorker.bash

_term() {
echo "Caught SIGTERM signal!"
kill -TERM "$child" 2>/dev/null
}

trap _term SIGTERM
trap _term SIGKILL

source /qserv/stack/loadLSST.bash
cd /home/qserv/dev/qserv
setup -r . -t qserv-dev

export LSST_LOG_CONFIG=/home/qserv/dev/qserv/admin/templates/configuration/etc/log4cxx.index.properties

/home/qserv/dev/qserv/build/loader/appWorker /home/qserv/dev/qserv/core/modules/loader/config/worker-k8s-a.cnf

child=$!
echo "child ${child}"
wait "$child"
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
#! /bin/bash
# admin/tools/docker/loader/container/dev/worker/appWorkerScreen.bash


screen -dm /home/qserv/dev/qserv/admin/tools/docker/index/container/dev/worker/appWorker.bash

tail -f /dev/null
Loading