Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] New Spark Load #214

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
42d965b
init
gnehil Apr 19, 2024
abe3f2d
add v2 type cast
gnehil May 20, 2024
7952387
check hll type and bitmap type mapping
gnehil May 21, 2024
b1c983d
remove duplicate code
gnehil May 21, 2024
689cefe
package refactor
gnehil May 21, 2024
79d3dac
optimize spark app state check
gnehil May 21, 2024
3747765
enhance spark master check
gnehil May 21, 2024
59fc6d2
add junit dependency
gnehil May 21, 2024
7108c90
add const
gnehil May 21, 2024
812c784
add git ignored item and remove files which should not be uploaded
gnehil May 21, 2024
4935a56
change dpp module
gnehil May 23, 2024
8731d50
rename fs util
gnehil May 23, 2024
e1eedc0
rename default app jar name
gnehil May 23, 2024
262a3b9
add hadoop aws dependency
gnehil May 23, 2024
d46f4ef
add hadoop aws dependency
gnehil May 23, 2024
24f4be8
change exception message
gnehil May 23, 2024
2b78bad
add load cancel method
gnehil May 23, 2024
cdf9318
add shutdown hook
gnehil May 23, 2024
85aa248
complete cancel job
gnehil Jun 12, 2024
8a5c285
add default config value
gnehil Jun 12, 2024
3bfb068
rename and remove useless getInstance method
gnehil Jun 12, 2024
38edddf
fill columns and column from path as empty list
gnehil Jun 12, 2024
78257af
loader factory
gnehil Jun 12, 2024
17b4773
package
gnehil Jun 12, 2024
dcd7d8e
add license header
gnehil Jun 18, 2024
90dc710
build script
gnehil Jun 18, 2024
b1f5607
add kerberos login
gnehil Jul 4, 2024
3c36eb8
add schema version to index meta
gnehil Jul 4, 2024
2f161b3
add shutdown hoot for cancel load
gnehil Jul 4, 2024
901d974
refactor recovery schema change check
gnehil Jul 4, 2024
c5ef91a
rename load name
gnehil Jul 4, 2024
336dbf7
copy reference classes from doris fe common and remove fe common depe…
gnehil Jul 4, 2024
8c5a790
add license header
gnehil Jul 5, 2024
15bdf4d
add gson dependency for EtlJobConfig
gnehil Jul 18, 2024
12b7b20
fix start script
gnehil Jul 23, 2024
fe964d2
serialize by jackson
gnehil Jul 29, 2024
19eec53
change dep version var
gnehil Jul 29, 2024
912dfd0
add license header
gnehil Jul 29, 2024
b3d7023
move working dir option to root
gnehil Aug 2, 2024
50f1d1a
add ut
gnehil Aug 2, 2024
c34286b
add ut
gnehil Aug 7, 2024
28a3946
api path change
gnehil Aug 9, 2024
41865dc
change http client dep
gnehil Aug 14, 2024
9cbe4d3
add fe client http res content empty check
gnehil Aug 14, 2024
e8f0500
add mow check
gnehil Aug 14, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,16 @@ spark-doris-connector/output/
spark-doris-connector/target/
spark-doris-connector/.idea/

spark-load/target
spark-load/spark-load-core/dependency-reduced-pom.xml
spark-load/spark-load-core/output/
spark-load/spark-load-core/target/
spark-load/spark-load-core/.idea/
spark-load/spark-load-dist/dependency-reduced-pom.xml
spark-load/spark-load-dist/target/
spark-load/spark-load-dpp/dependency-reduced-pom.xml
spark-load/spark-load-dpp/target/


### Java template
# Compiled class file
Expand Down
175 changes: 175 additions & 0 deletions spark-load/build.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,175 @@
#!/usr/bin/env bash
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

##############################################################
# This script is used to compile Spark-Load
# Usage:
# sh build.sh
#
##############################################################

# Bugzilla 37848: When no TTY is available, don't output to console
have_tty=0
# shellcheck disable=SC2006
if [[ "`tty`" != "not a tty" ]]; then
have_tty=1
fi

# Bugzilla 37848: When no TTY is available, don't output to console
have_tty=0
# shellcheck disable=SC2006
if [[ "`tty`" != "not a tty" ]]; then
have_tty=1
fi

# Only use colors if connected to a terminal
if [[ ${have_tty} -eq 1 ]]; then
PRIMARY=$(printf '\033[38;5;082m')
RED=$(printf '\033[31m')
GREEN=$(printf '\033[32m')
YELLOW=$(printf '\033[33m')
BLUE=$(printf '\033[34m')
BOLD=$(printf '\033[1m')
RESET=$(printf '\033[0m')
else
PRIMARY=""
RED=""
GREEN=""
YELLOW=""
BLUE=""
BOLD=""
RESET=""
fi

echo_r () {
# Color red: Error, Failed
[[ $# -ne 1 ]] && return 1
# shellcheck disable=SC2059
printf "[%sDoris%s] %s$1%s\n" $BLUE $RESET $RED $RESET
}

echo_g () {
# Color green: Success
[[ $# -ne 1 ]] && return 1
# shellcheck disable=SC2059
printf "[%sDoris%s] %s$1%s\n" $BLUE $RESET $GREEN $RESET
}

echo_y () {
# Color yellow: Warning
[[ $# -ne 1 ]] && return 1
# shellcheck disable=SC2059
printf "[%sDoris%s] %s$1%s\n" $BLUE $RESET $YELLOW $RESET
}

echo_w () {
# Color yellow: White
[[ $# -ne 1 ]] && return 1
# shellcheck disable=SC2059
printf "[%sDoris%s] %s$1%s\n" $BLUE $RESET $WHITE $RESET
}

# OS specific support. $var _must_ be set to either true or false.
cygwin=false
os400=false
# shellcheck disable=SC2006
case "`uname`" in
CYGWIN*) cygwin=true;;
OS400*) os400=true;;
esac

# resolve links - $0 may be a softlink
PRG="$0"

while [[ -h "$PRG" ]]; do
# shellcheck disable=SC2006
ls=`ls -ld "$PRG"`
# shellcheck disable=SC2006
link=`expr "$ls" : '.*-> \(.*\)$'`
if expr "$link" : '/.*' > /dev/null; then
PRG="$link"
else
# shellcheck disable=SC2006
PRG=`dirname "$PRG"`/"$link"
fi
done

# Get standard environment variables
# shellcheck disable=SC2006
ROOT=$(cd "$(dirname "$PRG")" &>/dev/null && pwd)
export DORIS_HOME=$(cd "$ROOT/../" &>/dev/null && pwd)

. "${DORIS_HOME}"/env.sh

# include custom environment variables
if [[ -f ${DORIS_HOME}/custom_env.sh ]]; then
. "${DORIS_HOME}"/custom_env.sh
fi

selectSpark() {
echo 'Spark-Load supports multiple versions of spark. Which version do you need ?'
select spark in "2.x" "3.x" "other"
do
case $spark in
"2.x")
return 1
;;
"3.x")
return 2
;;
*)
echo "invalid selected, exit.."
exit 1
;;
esac
done
}

SPARK_VERSION=0
selectSpark
SparkVer=$?
if [ ${SparkVer} -eq 1 ]; then
SPARK_VERSION="spark2"
SCALA_VERSION="scala_2.11"
elif [ ${SparkVer} -eq 2 ]; then
SPARK_VERSION="spark3"
SCALA_VERSION="scala_2.12"
fi

echo_g " spark load run based on : ${SPARK_VERSION} and ${SCALA_VERSION}"
echo_g " build starting..."

${MVN_BIN} clean package -P${SPARK_VERSION},${SCALA_VERSION} "$@"

EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
DIST_DIR=${DORIS_HOME}/dist
[ ! -d "$DIST_DIR" ] && mkdir "$DIST_DIR"
dist_jar=$(ls "${ROOT}"/target | grep "spark-load-")
rm -rf "${DIST_DIR}"/"${dist_jar}"
cp "${ROOT}"/target/"${dist_jar}" "$DIST_DIR"

echo_g "*****************************************************************"
echo_g "Successfully build Spark-Load"
echo_g "dist: $DIST_DIR/$dist_jar "
echo_g "*****************************************************************"
exit 0;
else
echo_r "Failed build Spark-Load"
exit $EXIT_CODE;
fi
Loading
Loading