• Spark and Qlik Integration

    Steps: Start Spark Thrift Server on Datastax Cluster $ dse -u cassandra -p <password> spark-sql-thriftserver start --conf spark.cores.max=4 --conf spark.executor.memory=2G --conf spark.driver.maxResultSize=1G --conf spark.kryoserializer.buffer.max=512M --conf spark.sql.thriftServer.incrementalCollect=true Enable Qlik Server’s Security Group on AWS to access port 10000 (basically from qlik, need to connect to thrift server port 10000) Install Simba ODBC Driver for Spark on the Qilk Windows EC2 Instance Create System DSN as follows:
  • Datastax Spark on AWS

    Configuration: DSE 5.0.6 (See Datastax Cassandra on AWS for Installation Details) /etc/dse/spark/spark-env.sh export SPARK_PUBLIC_DNS=<node1_public_ip> export SPARK_DRIVER_MEMORY="2048M" export SPARK_WORKER_CORES=2 export SPARK_WORKER_MEMORY="4G" /etc/dse/spark/spark-defaults.conf spark.scheduler.mode FAIR spark.cores.max 2 spark.executor.memory 1g spark.cassandra.auth.username analytics spark.cassandra.auth.password ***** spark.scheduler.allocation.file /etc/dse/spark/fairscheduler.xml spark.eventLog.enabled True #spark.default.parallelism: 3*4cores=12 spark.default.parallelism 12 /etc/dse/spark/fairscheduler.xml <allocations> <pool name="default"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>4</minShare> </pool> <pool name="admin"> <schedulingMode>FAIR</schedulingMode> <weight>1</weight> <minShare>4</minShare> </pool> </allocations> $ grep initial_spark_worker_resources /etc/dse/dse.yaml initial_spark_worker_resources: 0.7 so when you start dse spark or dse spark-sql, in spark UI, you can see 3 out of 4 cores allocated