数据分析与挖掘
Spark SQL分析
df.createOrReplaceTempView('users') spark.sql('SELECT COUNT(*) FROM users').show()
机器学习与挖掘
from pyspark.ml.classification import LogisticRegression lr = LogisticRegression() model = lr.fit(df)
流式分析
from pyspark.sql import SparkSession spark = SparkSession.builder.getOrCreate() ds = spark.readStream.format('kafka').option('subscribe', 'topic').load()