Databricks Certified Associate Developer for Apache Spark 3.0 Exam Questions.pdf

Databricks Certified Associate Developer for Apache Spark 3.0 Databricks Certified Associate Developer for Apache Spark 3.0 Free Questions https://www.passquestion.com/ Databricks-Certified-Associate-Developer-for-Apache-Spark-3.0 .html Which of the following code blocks silently writes DataFrame itemsDf in avro format to location fileLocation if a file does not yet exist at that location? A. itemsDf.write.avro(fileLocation) B. itemsDf.write.format("avro").mode("ignore").save(fileLocation) C. itemsDf.write.format("avro").mode("errorifexists").save(fileLocation) D. itemsDf.save.format("avro").mode("ignore").write(fileLocation) E. spark.DataFrameWriter(itemsDf).format("avro").write(fileLocation) Answer: A Question 1 Which of the elements that are labeled with a circle and a number contain an error or are misrepresented? A. 1, 10 B. 1, 8 C. 10 D. 7, 9, 10 E. 1, 4, 6, 9 Answer: B Question 2 Which of the following code blocks displays the 10 rows with the smallest values of column value in DataFrame transactionsDf in a nicely formatted way? A. transactionsDf.sort(asc(value)).show(10) B. transactionsDf.sort(col("value")).show(10) C. transactionsDf.sort(col("value").desc()).head() D. transactionsDf.sort(col("value").asc()).print(10) E. transactionsDf.orderBy("value").asc().show(10) Answer: B Question 3 Which of the following code blocks can be used to save DataFrame transactionsDf to memory only, recalculating partitions that do not fit in memory when they are needed? A. from pyspark import StorageLevel transactionsDf.cache(StorageLevel.MEMORY_ONLY) B. transactionsDf.cache() C. transactionsDf.storage_level('MEMORY_ONLY') D. transactionsDf.persist() E. transactionsDf.clear_persist() F. from pyspark import StorageLevel transactionsDf.persist(StorageLevel.MEMORY_ONLY) Answer: F Question 4 Which of the following is a viable way to improve Spark's performance when dealing with large amounts of data, given that there is only a single application running on the cluster? A. Increase values for the properties spark.default.parallelism and spark.sql.shuffle.partitions B. Decrease values for the properties spark.default.parallelism and spark.sql.partitions C. Increase values for the properties spark.sql.parallelism and spark.sql.partitions D. Increase values for the properties spark.sql.parallelism and spark.sql.shuffle.partitions E. Increase values for the properties spark.dynamicAllocation.maxExecutors, spark.default.parallelism, and spark.sql.shuffle.partitions Answer: A Question 5 Which of the following is the deepest level in Spark's execution hierarchy? A. Job B. Task C. Executor D. Slot E. Stage Answer: B Question 6 Which of the following code blocks returns all unique values across all values in columns value and productId in DataFrame transactionsDf in a one-column DataFrame? A. tranactionsDf.select('value').join(transactionsDf.select('productId'), col('value')==col('productId'), 'outer') B. transactionsDf.select(col('value'), col('productId')).agg({' * ': 'count'}) C. transactionsDf.select('value', 'productId').distinct() D. transactionsDf.select('value').union(transactionsDf.select('productId')).distinct() E. transactionsDf.agg({'value': 'collect_set', 'productId': 'collect_set'}) Answer: D Question 7 Which of the following code blocks returns a DataFrame showing the mean value of column "value" of DataFrame transactionsDf, grouped by its column storeId? A. transactionsDf.groupBy(col(storeId).avg()) B. transactionsDf.groupBy("storeId").avg(col("value")) C. transactionsDf.groupBy("storeId").agg(avg("value")) D. transactionsDf.groupBy("storeId").agg(average("value")) E. transactionsDf.groupBy("value").average() Answer: C Question 8 The code block displayed below contains an error. The code block should produce a DataFrame with color as the only column and three rows with color values of red, blue, and green, respectively. Find the error. Code block: 1.spark.createDataFrame([("red",), ("blue",), ("green",)], "color") Instead of calling spark.createDataFrame, just DataFrame should be called. A. The commas in the tuples with the colors should be eliminated. B. The colors red, blue, and green should be expressed as a simple Python list, and not a list of tuples. C. Instead of color, a data type should be specified. D. The "color" expression needs to be wrapped in brackets, so it reads ["color"]. Answer: D Question 9 Which of the following code blocks stores DataFrame itemsDf in executor memory and, if insufficient memory is available, serializes it and saves it to disk? A. itemsDf.persist(StorageLevel.MEMORY_ONLY) B. itemsDf.cache(StorageLevel.MEMORY_AND_DISK) C. itemsDf.store() D. itemsDf.cache() E. itemsDf.write.option('destination', 'memory').save() Answer: D Question 10