Steps:

  • Start Spark Thrift Server on Datastax Cluster
$ dse -u cassandra -p <password> spark-sql-thriftserver start --conf spark.cores.max=4 --conf spark.executor.memory=2G --conf spark.driver.maxResultSize=1G --conf spark.kryoserializer.buffer.max=512M --conf spark.sql.thriftServer.incrementalCollect=true
  • Enable Qlik Server’s Security Group on AWS to access port 10000 (basically from qlik, need to connect to thrift server port 10000)
  • Install Simba ODBC Driver for Spark on the Qilk Windows EC2 Instance
    Create System DSN as follows:

Spark Server Type: SparkThriftServer
Host: internal-spark-thriftserver-prod-lb-861234576.ap-southeast-1.elb.amazonaws.com (DNS name of spark thrift server ELB)
Port: 10000
Database: avm_analytics
Authentication Mechanism: Username
Thrift Transport: SASL

  • Now go to Qlik Admin UI -> Data Connections, click on above DSN, it gets connected
  • In the Data Editor, give below to execute query
LIB connect TO ‘Simba Spark(Qlik-sense-administration)’  
select txn_id,txn_date from transactions where txn_date>=‘2017-06-05’ and txn_date<‘2017-06-06’  
  • Observe the execution of spark job in Spark Web UI