GES-C01 SnowPro Specialty Gen AI Certification PDF Questions

1 / 23 Snowflake GES-C01 Exam SnowPro ® Specialty: Gen AI Certification Exam https://www.passquestion.com/ges-c01.html 35% OFF on All, Including GES-C01 Questions and Answers P ass GES-C01 Exam with PassQuestion GES-C01 questions and answers in the first attempt. https://www.passquestion.com/ 2 / 23 1.A data application developer is tasked with building a multi-turn conversational AI application using Streamlit in Snowflake (SiS) that leverages the COMPLETE (SNOWFLAKE. CORTEX) LLM function. To ensure the conversation flows naturally and the LLM maintains context from previous interactions, which of the following is the most appropriate method for handling and passing the conversation history? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C 2.A Streamlit application developer wants to use AI_COMPLETE (the latest version of COMPLETE (SNOWFLAKE. CORTEX)) to process customer feedback. The goal is to extract structured information, such as the customer's sentiment, product mentioned, and any specific issues, into a predictable JSON format for immediate database ingestion. Which configuration of the AI_COMPLETE function call is essential for achieving this structured output requirement? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: 'AI_COMPLETE Structured Outputs' (and its predecessor ‘ COMPLETE Structured Outputs ’ ) specifically allows supplying a JSON schema as the 'response_format ’ argument to ensure completion responses follow a predefined structure. This significantly reduces the need for post-processing in AI data pipelines and enables seamless integration with systems requiring deterministic responses. The JSON schema object defines the structure, data types, and constraints, including required fields. While prompting the model to 'Respond in JSON' can improve accuracy for complex tasks, the ‘ response_format ’ argument is the direct mechanism for enforcing the schema. Setting ‘ temperature ’ to 0 provides more consistent 3 / 23 results for structured output tasks. Option A is a form of prompt engineering, which can help but does not guarantee strict adherence as response_format ’ does. Option B controls randomness and length, not output structure. Option D is less efficient for extracting multiple related fields compared to a single structured output call. Option E's ‘ guardrails' are for filtering unsafe or harmful content, not for enforcing output format. 3.A Snowflake developer, AI_ENGINEER, is creating a Streamlit in Snowflake (SiS) application that will utilize a range of Snowflake Cortex LLM functions, including SNOwFLAKE.CORTEX.COMPLETE,SNOwFLAKE.CORTEX.CLASSIFY_TEXT, and SNOwFLAKE.CORTEX.EMBED_TEXT_768. The application also needs to access data from tables within a specific database and schema. AI_ENGINEER has created a custom role, app_dev_role, for the application to operate under. Which of the following privileges or roles are absolutely necessary to grant to app_dev_role for the successful execution of these Cortex LLM functions and interaction with the specified database objects? (Select all that apply.) A. Option A B. Option B C. Option C D. Option D E. Option E Answer: A,C Explanation: To execute Snowflake cortex AI functions such as 'SNOWFLAKE.CORTEX.COMPLETE, ‘ SNOWFLAKE.CORTEX.CLASSIFY_TEXT, and ‘ EMBED_TEXT_768' (or their SAE prefixed counterparts), the role used by the application in this case) must be granted the 'SNOWFLAKE.CORTEX_USER database role. Additionally, for the Streamlit application to access any database or schema objects (like tables for data input/output, or for the Streamlit app itself if it is stored as a database object), the USAGE privilege must be granted on those specific database and schema objects. Option B, 'CREATE SNOWFLAKE.ML.DOCUMENT_INTELLIGENCE, is a privilege specific to creating Document AI model builds and is not required for general Cortex LLM functions. Option D, ‘ ACCOUNTADMIN ’ , grants excessive privileges and is not a best practice for application roles. Option E, 'CREATE COMPUTE POOL', is a privilege related to Snowpark Container Services for creating compute pools, which is not directly required for running a Streamlit in Snowflake application that consumes Cortex LLM functions. 4.A data application developer is tasked with building a multi-turn conversational AI application using Streamlit in Snowflake (SiS) that leverages the COMPLETE (SNOWFLAKE. CORTEX) LLM function. To 4 / 23 ensure the conversation flows naturally and the LLM maintains context from previous interactions, which of the following is the most appropriate method for handling and passing the conversation history? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: To provide a stateful, conversational experience with the 'COMPLETE (SNOWFLAKE.CORTEX)' function (or its latest version, 'AI_COMPLETE'), all previous user prompts and model responses must be explicitly passed as part of the argument. This argument expects an array of objects, where each object represents a turn and contains a 'role' ('system', 'user', or 'assistant') and a ‘ content key, presented in chronological order. In Streamlit, ‘ st.session_state ’ is the standard and recommended mechanism for storing and managing data across reruns of the application, making it ideal for maintaining chat history, by initializing 'st.session_state.messages = [l' and appending messages to it. Option A is incorrect because 'COMPLETE does not inherently manage history from external tables. Option B is incorrect as ‘ COMPLETE does not retain state between calls; history must be explicitly managed. Option D is a less effective form of prompt engineering compared to passing structured history, as it loses the semantic role distinction and can be less accurate for LLMs. Option E describes a non- existent parameter for the 'COMPLETE function. 5.A Streamlit application developer wants to use AI_COMPLETE (the latest version of COMPLETE (SNOWFLAKE.CORTEX)) to process customer feedback. The goal is to extract structured information, such as the customer's sentiment, product mentioned, and any specific issues, into a predictable JSON format for immediate database ingestion. Which configuration of the AI_COMPLETE function call is essential for achieving this structured output requirement? A. Option A B. Option B 5 / 23 C. Option C D. Option D E. Option E Answer: C Explanation: 'AI_COMPLETE Structured OutputS (and its predecessor ‘ COMPLETE Structured OutputS) specifically allows supplying a JSON schema as the ‘ response_format' argument to ensure completion responses follow a predefined structure. This significantly reduces the need for post-processing in AI data pipelines and enables seamless integration with systems requiring deterministic responses. The JSON schema object defines the structure, data types, and constraints, including required fields. For complex tasks, prompting the model to respond in JSON can improve accuracy, but the ‘ response_format ’ argument is the direct mechanism for enforcing the schema. Setting ‘ temperature to 0 provides more consistent results for structured output tasks. Option A is a form of prompt engineering, which can help but does not guarantee strict adherence as ‘ response_format does. Option B controls randomness and length, not output structure. Option D, while 'AI_EXTRACT (or EXTRACT ANSWER) can extract information, using it multiple times and then manually combining results is less efficient and less robust than a single 'AI_COMPLETE call with a structured output schema for multiple related fields. Option E's ‘ guardrails' are for filtering unsafe or harmful content, not for enforcing output format. 6. A. B. C. The USAGE privilege on the specific database and schema where the Streamlit application and its underlying data tables are located. D. The ACCOUNTADMIN role to ensure unrestricted access to all Snowflake Cortex features. E. The CREATE COMPUTE POOL privilege to provision resources for the Streamlit application. Answer: A,C Explanation: To execute Snowflake Cortex AI functions such as 'SNOWFLAKE.CORTEX.COMPLETE, 'CLASSIFY TEXT (SNOWFLAKE.CORTEX)', and (or their prefixed counterparts like 'AI_COMPLETE', 'AI_CLASSIFY, 'AI_EMBED), the role used by the application in this case) must be granted the database role. This role includes the privileges to call these functions. Additionally, for the Streamlit application to access any database or schema objects (like tables for data input/output, or for the Streamlit app itself if it is stored as a database object), the 'USAGE ’ privilege must be granted on those specific database and schema objects. Option B, 'CREATE SNOWFLAKE.ML.DOCUMENT_INTELLIGENCE, is a privilege specific to creating 6 / 23 DocumentAl model builds and is not required for general Cortex LLM functions. Option D, 'ACCOUNTADMIN", grants excessive privileges and is not a best practice for application roles. Option E, 'CREATE COMPUTE POOL', is a privilege related to Snowpark Container Services for creating compute pools, which is generally not directly required for running a Streamlit in Snowflake application that consumes Cortex LLM functions via SQL, unless the LLMs themselves were deployed as services on compute pools using Model Serving in Snowpark Container Services, which is not explicitly stated as the method of LLM usage here. 7.A data engineer is building a Snowflake data pipeline to ingest customer reviews from a raw staging table into a processed table. For each review, they need to determine the overall sentiment (positive, neutral, negative) and store this as a distinct column. The pipeline is implemented using SQL with streams and tasks to process new data. Which Snowflake Cortex LLM function, when integrated into the SQL task, is best suited for this sentiment classification and ensures a structured, single-label output for each review? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: B Explanation: To classify text into predefined categories, the function (or its updated version, is purpose-built and directly returns the classification label. This approach is more direct and efficient than using 'SENTIMENT()' which returns a score, which extracts an answer to a question, or multiple calls which return Boolean values. While could be prompted for classification, is a more specific task-specific function designed for this exact use case within Cortex LLM functions. 8.A financial services company is developing an automated data pipeline in Snowflake to process Federal Reserve Meeting Minutes, which are initially loaded as PDF documents. The pipeline needs to extract specific entities like the FED's stance on interest rates ('hawkish', 'dovish', or 'neutral') and the reasoning behind it, storing these as structured JSON objects within a Snowflake table. The goal is to ensure the output is always a valid JSON object with predefined keys. Which AI_COMPLETE configuration, used within an in-line SQL statement in a task, is most effective for achieving this structured extraction directly in the pipeline? 7 / 23 A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: To ensure that LLM responses adhere to a predefined JSON structure, the 'AI_COMPLETE function's ‘ response_format ’ argument, which accepts a JSON schema, is the most effective and direct method. This mechanism enforces the structure, data types, and required fields, significantly reducing the need for post-processing and ensuring deterministic, high-quality output. The AI-Infused Data Pipelines with Snowflake Cortex blog highlights asking the LLM to create a JSON object for maximizing utility. While setting ‘ temperature' to 0 can improve consistency, it does not enforce a specific schema. Prompt engineering (Option A) can help but does not guarantee strict adherence. Using multiple extraction calls (Option D) is less efficient and robust for extracting multiple related fields than a single 'AI_COMPLETE call with a structured output schema. Snowflake Cortex does not automatically infer and enforce a JSON schema without explicit configuration (Option E). 9.A data engineering team is building a pipeline in Snowflake that uses a SQL task to call various Snowflake Cortex LLM functions (e.g., AI_COMPLETE, AI EMBED) on large datasets of customer interaction logs. The team observes fluctuating costs and occasional query failures, which sometimes halt the pipeline. To address these issues and ensure an efficient, robust, and monitorable pipeline, which of the following actions or considerations are essential? (Select all that apply.) A. Option A B. Option B C. Option C D. Option D E. Option E Answer: A,B,E 8 / 23 Explanation: A. "Correct." The 'TRY function is designed to perform the same operation as but returns ‘ NULL ‘ instead of raising an error when the LLM operation cannot be performed. This is critical for building robust data pipelines, as it prevents pipeline halts due to transient or specific LLM failures, allowing for more resilient data processing. B. ‘ The view provides detailed information on token consumption and credit usage for Snowflake Cortex LLM functions. Monitoring this view is essential for understanding cost drivers and optimizing expenditure within AI pipelines. C. "Incorrect." Snowflake recommends executing queries that call Cortex AISQL functions with a smaller warehouse (no larger than MEDIUM), as larger warehouses do not necessarily increase performance but can lead to unnecessary costs. The LLM inference itself runs on Snowflake-managed compute, not solely on the user's virtual warehouse compute size. D. ‘ Setting the 'temperature' parameter to 0 makes the LLM's output more deterministic and focused. While this can be beneficial for consistency in certain tasks, it does not directly minimize token usage. Token usage is primarily determined by the length of the input prompt and the length of the generated output, which can vary regardless of ‘ temperature'. E. "Correct." Encapsulating complex and potentially lengthy prompt logic within a UDF CUSER DEFINED FUNCTION') makes the prompts more manageable, reusable, and easier to integrate programmatically into SQL statements within a data pipeline. This improves code organization and maintainability. 10.A data engineering team is setting up an automated pipeline in Snowflake to process call center transcripts. These transcripts, once loaded into a raw table, need to be enriched by extracting specific entities like the customer's name, the primary issue reported, and the proposed resolution. The extracted data must be stored in a structured JSON format in a processed table. The pipeline leverages a SQL task that processes new records from a stream. Which of the following SQL snippets and approaches, utilizing Snowflake Cortex LLM functions, would most effectively extract this information and guarantee a structured JSON output for each transcript? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: To guarantee a structured JSON output for entity extraction, (the updated version of 'COMPLETE()') with the response_format ’ argument and a specified JSON schema is the most effective approach. This mechanism enforces that the LLM's output strictly conforms to the predefined structure, including data types and required fields, significantly reducing the need for post-processing and improving data quality within the pipeline. 9 / 23 Option A requires multiple calls and manual JSON assembly, which is less efficient. Option B relies on the LLM's 'natural ability' to generate JSON, which might not be consistently structured without explicit ‘ response_format ’ Option D uses, which is for generating summaries, not structured entity extraction. Option E involves external LLM API calls and Python UDFs, which, while possible, is less direct than using native 'AI_COMPLETE structured outputs within a SQL pipeline in Snowflake Cortex for this specific goal. 11.A data team has implemented a Snowflake data pipeline using SQL tasks that process customer call transcripts daily. This pipeline relies heavily on SNOWFLAKE. CORTEX. COMPLETE() (or its updated alias) for various text analysis tasks, such as sentiment analysis and summary generation. Over time, they observe that the pipeline occasionally fails due to LLM-related errors, and the compute costs are higher than anticipated. What actions should the team take to improve the robustness and cost-efficiency of this data pipeline? (Select all that apply.) A. Option A B. Option B C. Option C D. Option D E. Option E Answer: A,C,D Explanation: A. ‘ 'Correct." performs the same operation as 'COMPLETE(Y but returns ‘ NULC on failure instead of raising an error when the LLM operation cannot be performed. This is critical for building robust data pipelines, as it prevents pipeline halts due to transient or specific LLM failures. B. ' ‘ Incorrect." Snowflake recommends executing queries that call Cortex AISQL functions with a smaller warehouse (no larger than MEDIUM). Larger warehouses do not necessarily increase performance but can lead to unnecessary costs, as the LLM inference itself runs on Snowflake-managed compute, not solely on the user's virtual warehouse compute size. C. "Correct." The 'SNOWFLAKE.ACCOUNT USAGE.CORTEX_FUNCTIONS USAGE HISTORY view provides detailed information on token consumption and credit usage for Snowflake Cortex LLM functions. Monitoring this view is essential for understanding cost drivers and optimizing expenditure within AI pipelines. D. ‘ ‘ Correct." Snowflake Cortex AI functions incur compute costs based on the number of tokens processed (both input and output). Optimizing prompt engineering to be concise and effective directly contributes to reducing the number of tokens consumed and, therefore, the associated costs. E. "Incorrect." Setting the ‘ temperature' parameter to 1.0 makes the LLM's output more diverse and random. While useful for creativity, it does not guarantee a reduction in token usage or a lower error rate. For the most consistent results, setting ‘ temperature' to 0 is generally recommended. 10 / 23 12.A financial institution wants to develop a Snowflake-based pipeline to process call transcripts from their customer support. The pipeline needs to perform two main tasks: first, ‘‘ summarize very lengthy technical support calls ’’ (up to 20,000 tokens per transcript) into concise actionable insights, and second, ‘‘ classify the sentiment ’’ of these calls as 'positive', 'neutral', or 'negative'. Given these requirements for integration into SQL data pipelines, which combination of Snowflake Cortex functions and prompt engineering considerations would be most appropriate? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: B Explanation: For summarizing very lengthy technical support calls (up to 20,000 tokens), a model with a sufficiently large context window is essential. (the updated version of offers flexibility for detailed summarization with prompt engineering. A model like ‘ mistral-large? has a context window of 128,000 tokens, making it suitable for such long inputs. Encapsulating complex prompt logic within a SQL User Defined Function (UDF) is a recommended practice for better management and reusability in data pipelines. For classifying sentiment into predefined categories ('positive', 'neutral', 'negative'), (the updated version of is purpose-built and directly returns the classification label. A. is a generic summarization function, but 'AI_COMPLETE with a large model provides more control for 'actionable insights'. returns a numerical score, requiring additional logic for categorical output. C. 'SNOWFLAKE.CORTEX.EXTRACT ANSWER()' is designed to extract specific answers to questions, not to summarize text. Using it multiple times for summarization would be inefficient and less effective. While can perform classification, is the specialized function for this task. D. ‘ gemma-7b ’ has a context window of 8,000 tokens, which is insufficient for processing calls up to 20,000 tokens, potentially leading to truncation or incomplete results. E. and SUMMARIZE AGG()' are designed to aggregate insights or summaries ‘ across multiple rows ’ or groups of text, not to summarize a single, lengthy document. returns a boolean result, making it less suitable for multi-category classification directly. 13.A data engineering team is designing a Snowflake data pipeline to automatically enrich a 'customer issues ’ table with product names extracted from raw text-based 'issue_description' columns. They want to use a Snowflake Cortex function for this extraction and integrate it into a stream and task-based pipeline. Given the 'customer_issues' table with an 'issue_id' and (VARCHAR), which of the following SQL snippets correctly demonstrates the use of a Snowflake Cortex function for this data enrichment within a task, 11 / 23 assuming is a stream on the 'customer issues' table? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: B Explanation: Option B correctly uses to pull specific information (product name) from unstructured text, which is a common data enrichment task. It also integrates with a stream ('issue_stream') by filtering for 'METADATA$ACTION = 'INSERT" and uses a 'MERGE statement, which is suitable for incremental updates in a data pipeline by inserting new extracted data based on new records in the stream. Option A uses for generating a response, not for specific entity extraction, and its prompt is less precise for this task than 'EXTRACT_ANSWER. Option C uses 'SNOWFLAKE.CORTEX.CLASSIFY_TEXT for classification, not direct entity extraction of a product name, and attempts to update the source table directly, which is not ideal for adding new columns based on stream data. Option D proposes a stored procedure and task, which is a valid pipeline structure. However, the EXTRACT ANSWER call within the procedure only returns a result set and does not demonstrate the final insertion or merging step required to persist the extracted data into an ‘ enriched_issues ’ table. Option E uses to generate vector embeddings, which is a form of data enrichment, but the scenario specifically asks for 'product names' (a string value), not embeddings for similarity search. 14.A retail company wants to implement an automated data pipeline in Snowflake to analyze daily customer reviews. The goal is to enrich a 'product_reviews_sentiment' table with sentiment categories (e.g., 'positive', 'neutral', 'negative') for each new review. They require the sentiment to be returned as a JSON object for downstream processing and need the pipeline to handle potential LLM errors gracefully without stopping. Assuming a stream 'new reviews_stream' monitors a 'customer _ reviews ’ table, which approach effectively uses a Snowflake Cortex function for this scenario? 12 / 23 A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: Option C is the most effective approach for this scenario. It correctly uses "SNOWFLAKE.CORTEX.TRY COMPLETE", which performs the same operation as 'COMPLETE but returns NULL instead of raising an error when the operation cannot be performed, making the pipeline more robust to LLM issues. The ‘ response_format option ensures the output adheres to a specified JSON schema for structured sentiment categories, meeting the requirement for structured output. This is integrated within a 'MERGE statement in a task for incremental processing of new data from Option A suggests a Python UDF with 'COMPLETE. While feasible, 'TRY_COMPLETE is explicitly designed for graceful error handling in pipelines, which 'COMPLETE lacks by default. Option B uses ‘ SNOWFLAKE.CORTEX.SENTIMENT, which returns a numeric score (e.g., 0.5424458), not a categorical JSON object, requiring additional post-processing logic for categorization. Option D uses for summarization and ‘ AI CLASSIFY for classification. While 'AI_CLASSIFY can categorize, the request is for sentiment of ‘ each ’ review, and 'AI_AGG' would aggregate before classifying, not fulfilling the individual review sentiment requirement. Option E suggests a dynamic table, but dynamic tables currently do not support incremental refresh with 'COMPLETE (or 'AI_COMPLETE) functions, making them unsuitable for continuous LLM-based processing in this manner. Furthermore, 'COMPLETE does not offer the graceful error handling of 'TRY COMPLETE'. 15.A data architect is integrating Snowflake Cortex LLM functions into various data enrichment pipelines. To ensure optimal performance, cost-efficiency, and accuracy, which of the following are valid best practices or considerations for these pipelines? A. When extracting specific entities from documents using SAI EXTRACT or '!PREDICT, it is often more effective to fine-tune a Document AI model for complex or varied document layouts rather than relying solely on extensive prompt engineering for zero-shot extraction. B. For tasks requiring deterministic JSON outputs, explicitly specifying a JSON schema using the ‘ response_format' argument with 'AI COMPLETE is crucial, and for OpenAI (GPT) models, including the ‘ required' field and setting ‘ additionalPropertieS to 'false' in every node of the schema is a mandatory requirement. C. To manage costs effectively for LLM functions like SAI COMPLETE in a pipeline, always use the 13 / 23 largest available warehouse size (e.g., 6XL Snowpark- optimized) to maximize throughput, as this directly reduces the overall token processing time and cost. D. When performing sentiment analysis on customer feedback using 'AI_SENTIMENT, it's best practice to pass detailed, multi-turn conversation history to the function to enhance accuracy, similar to how 'AI_COMPLETE handles conversational context. E. For data enrichment involving classification with 'AI_CLASSIFY', using descriptive and mutually exclusive categories in plain English, along with an optional clear task description, can significantly improve classification accuracy. Answer: A,B,E Explanation: Option A is correct. For extracting information from documents with complex or varied layouts, fine-tuning a Document AI model can significantly improve results compared to relying solely on zero-shot extraction and extensive prompt engineering. Document AI provides both zero-shot extraction and fine-tuning capabilities, with fine-tuning recommended to improve results on specific document types. Option B is correct. To ensure 'AI_COMPLETE (or 'COMPLETE) returns responses in a structured JSON format, it is essential to specify a JSON schema using the ‘ response_format ’ argument. For OpenAl (GPT) models, specific requirements include setting ‘ additionalPropertieS to 'false' in every node and ensuring the ‘ required' field lists all property names. Option C is incorrect. Snowflake explicitly recommends executing queries that call Cortex AISQL functions (such as 'AI COMPLETES) using a smaller warehouse, no larger than MEDIUM. Using larger warehouses does not increase performance for these functions but will incur unnecessary compute costs. The LLM inference itself is managed by Snowflake, and its performance isn't directly scaled by warehouse size in the same way as traditional SQL queries. Option D is incorrect. 'AI_SENTIMENT (and 'SENTIMENT) is a task-specific function designed to return a sentiment score for a given English-language text. Unlike 'AI_COMPLETE (or 'COMPLETE'), which supports multi-turn conversations by passing conversation history for a stateful experience, SAI SENTIMENT processes individual text inputs and is not designed to leverage multi-turn context in the same way for sentiment analysis. Option E is correct. For classification tasks using 'AI_CLASSIFY (or 'CLASSIFY TEXT), best practices include using plain English for the input text and categories, ensuring categories are descriptive and mutually exclusive, and adding a clear ‘ task_description ’ when the relationship between input and categories is ambiguous. These guidelines significantly improve classification accuracy. 16.A Gen AI Specialist is tasked with implementing a data pipeline to automatically enrich new customer feedback entries with sentiment scores using Snowflake Cortex functions. The new feedback arrives in a staging table, and the enrichment process must be automated and cost-effective. Given the following pipeline components, which combination of steps is most appropriate for setting up this continuous data augmentation process? 14 / 23 A. Option A B. Option B C. Option C D. Option D E. Option E Answer: C Explanation: Option C is the most direct and efficient approach for continuously augmenting data with sentiment scores in a Snowflake pipeline. is a task-specific AI function designed for this purpose, returning an overall sentiment score for English-language text. SNOWF LAKE .CORTEX.SENTIMENT Integrating it directly into a task that monitors a stream allows for automated, incremental processing of new data as it arrives in the stage. The source explicitly mentions using Cortex functions in data pipelines via the SQL interface. Option A is plausible, but calling SENTIMENT directly in SQL within a task (Option C) is simpler and avoids the overhead of a Python UDF if the function is directly available in SQL, which it is. Option B, using a dynamic table, is not supported for Snowflake Cortex functions. Option D, while powerful for custom LLMs, is an over-engineered solution and introduces more complexity (SPCS setup, custom service) than necessary for a direct sentiment function. Option E describes a manual, non- continuous process, which contradicts the requirement for an automated pipeline. 17.A financial institution wants to automate the extraction of key entities (e.g., invoice number, total amount, list of invoice items) from incoming PDF financial statements into a structured JSON format within their Snowflake data pipeline. The extracted data must conform to a specified JSON schema for seamless downstream integration. Which Snowflake Cortex capabilities, when combined, can best achieve this data augmentation and ensure schema adherence in a continuous processing pipeline? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: B,D Explanation: 15 / 23 18.A data engineering team aims to automatically classify incoming customer support requests into predefined categories ('Technical Issue', 'Billing Inquiry', 'General Question') as part of their Snowflake data ingestion pipeline. The goal is to achieve high classification accuracy while managing LLM inference costs efficiently. Which of the following strategies, when applied within a Snowflake data pipeline using Streams and Tasks, would best contribute to meeting these objectives? A. Option A B. Option B C. Option C D. Option D E. Option E Answer: A Explanation: 16 / 23 19. Which of the following SQL snippets, when executed against a single invoice file like "invoice001 .pdf", correctly extracts and transforms the desired data, assuming 'json_content' holds the raw Document AI output? A) B) C) D) E) A. Option A B. Option B C. Option C D. Option D E. Option E Answer: B Explanation: Option B correctly uses a Common Table Expression (CTE) to retrieve the raw JSON output from (which is a Document AI method for extracting information from documents in a stage), leveraging to access the 17 / 23 document. It then accesses the 'invoice_number' and 'vendor_name' using .value' syntax, appropriate for values returned as an array containing a single object with a ‘ value' field, as shown in Document AI output examples. The 'LATERAL FLATTEN' clause is correctly applied to expand the array of line items, and 'ARRAY_AGG' combined with 'ARRAY _ TO STRING ’ converts these items into a comma-separated string. Finally, it groups by the single-value extracted fields. Option A attempts to flatten the result multiple times or in an incorrect way within the SELECT statement without a proper FROM' clause for the flattened data, leading to inefficient or incorrect aggregation. Option C directly references a staged file path (@invoice_docs_stage/invoice001.pdf) without the necessary GET PRESIGNED URL' function, which is required when calling '!PREDICT' with a file from a stage. It also incorrectly assumes direct .value ’ access for array-wrapped single values and does not correctly transform the 'invoice_itemS array into a string. Option D's subquery for 'ARRAY AGG' is syntactically problematic for direct column access from the outer query without explicit 'LATERAL FLATTEN' at the top level. Option E only extracts the ‘ ocrScore ’ from the document metadata and does not perform the requested data transformations. 20.A data engineer is designing an automated pipeline to process customer feedback comments from a 'new_customer_reviews' table, which includes a 'review_text' column. The pipeline needs to classify each comment into one of three predefined categories: 'positive', 'negative', or 'neutral', and store the classification label in a new 'sentiment_label' column. Which of the following statements correctly describe aspects of implementing this data transformation using 'SNOWFLAKE.CORTEX.CLASSIFY_TEXT' in a Snowflake pipeline? A. The classification can be achieved by integrating a 'SELECT statement with into an 'INSERT or 'UPDATE task. B. Including an optional 'task_description' such as can improve the accuracy of classification, especially if the relationship between text and categories is ambiguous. C. The cost for ‘ CLASSIFY _ TEXT is incurred based on the number of pages processed in the input document. D. The argument must contain exactly three unique categories for sentiment classification. E. Both the input string to classify and the are case-sensitive, potentially yielding different results for variations in capitalization. Answer: A,B,E Explanation: Option A is correct. 'SNOWFLAKE.CORTEX.CLASSIFY_TEXT classifies free-form text into categories and returns an ‘ OBJECT ’ value (VARIANT) where the 'label' field specifies the category. This can be extracted using ‘ [ ‘ labeI ’ ] ’ and seamlessly integrated into 'INSERT or ‘ UPDATE' statements within a pipeline task for data transformation. Option B is correct. Adding a clear ‘ task_description ’ to the ‘ options' argument for ‘ CLASSIFY_TEXT ’ can significantly improve classification accuracy. This is particularly useful when the relationship between the input text and the provided categories is ambiguous or nuanced. Option C is incorrect. incurs compute cost based on the number of tokens processed (both input and 18 / 23 output tokens), not on the number of pages in a document. Functions like 'AI_PARSE_DOCUMENT bill based on pages. Option D is incorrect. The argument for ‘ CLASSIFY_TEXT ’ must contain at least two and at most 100 unique categories. It is not strictly limited to three for any classification task, including sentiment. Option E is correct. Both the 'input' string to classify and the are case-sensitive, meaning that differences in capitalization for either the input text or the category labels can lead to different classification results. 21.A Snowflake developer is tasked with enhancing a daily data pipeline. The pipeline processes raw text descriptions of system events and needs to extract structured information, specifically the (string) and its (string, restricted to 'low', 'medium', 'high', 'critical'). The output must be a strictly formatted JSON object, ensuring data quality for downstream analytics. Consider the following SQL snippet intended for this transformation: Which of the following statements are correct regarding this implementation and best practices for using with structured outputs in a data pipeline? A. The ‘ response_format ’ correctly defines the expected JSON structure, using ‘ enum ’ for ‘ severity_lever and ‘ required' to ensure ‘ event_name ’ and severity_lever are always present if extracted. B. Setting ‘ temperature ‘ to '0.7 ‘ is optimal for ensuring the most consistent and deterministic JSON outputs, especially for complex extraction tasks. C. Using 'TRY COMPLETE instead of would allow the pipeline to gracefully handle cases where the model fails to generate a valid JSON response by returning 'NULL' instead of an error. D. The complexity of the JSON schema, particularly deep nesting, does not impact the number of tokens processed and billed for 'AI_COMPLETE Structured Outputs. E. For all models supported by 'AI_COMPLETE' Structured Outputs, the ‘ additionalPropertieS field must be set to ‘ false ’ in every node of the schema, and the ‘ required ’ field must contain all property names. Answer: A,C Explanation: Option A is correct. The ‘ response_format ’ argument with its JSON schema accurately specifies the desired structured output for 'AI_COMPLETE. It correctly uses the ‘ enum ’ keyword to restrict the possible values for ‘ severity_lever and the ‘ required ’ field to mandate the presence of ‘ event_name ’ and ‘ severity_lever fields in the output if the model can extract them. This reduces post-processing needs and enables seamless integration. Option B is incorrect. For the most consistent and deterministic results, especially when aiming for strict JSON adherence in data pipelines, it is recommended to set the ‘ temperature' option to A ‘ temperature of '0.7 ’ would lead to more diverse and random output, which is generally undesirable for structured data extraction where consistency is key. 19 / 23 Option C is correct. In a production data pipeline, 'TRY_COMPLETE is preferred over 'AI_COMPLETE for robustness. If the model fails to generate a valid response (e.g., cannot adhere to the schema or encounters an internal error), 'TRY COMPLETE ’ returns 'NULL ’ instead of raising an error, allowing the pipeline to continue processing other records without interruption. Option D is incorrect. The number of tokens processed (and thus billed) for 'AI_COMPLETE Structured Outputs does increase with schema complexity. Highly-structured responses, especially those with deep nesting, consume a larger number of input and output tokens. Option E is incorrect. The specific requirements for 'additionalProperties' being 'false' and the ‘ required' field containing all property names only apply to OpenAI (GPT) models when using Structured Outputs. Other models do not strictly enforce these requirements, although including them may simplify schema management across different model types. 22.A Gen AI Specialist is setting up their Snowflake environment to deploy a high-performance open-source LLM for real-time inference using Snowpark Container Services (SPCS). They need to create a compute pool that can leverage NVIDIAAIOG GPUs to optimize model performance. Which of the following SQL statements correctly creates a compute pool capable of supporting an intensive GPU usage scenario, such as serving LLMs, while adhering to common configuration best practices for a new, small-scale deployment in Snowpark Container Services? A) B) C) D) 20 / 23 E) A. Option A B. Option B C. Option C D. Option D E. Option E Answer: D Explanation: Option D is correct. The 'GPU NV_M' instance family is explicitly described as "Optimized for intensive GPU usage scenarios like Computer Vision or LLMsNLMs", providing 4 NVIDIAAIOG GPUs. Setting = 1' and 'MAX_NODES = 1' is appropriate for a small- scale deployment, and = 1800' (30 minutes) is a sound practice for cost management during inactivity. Option A is incorrect because is a generic CPU instance, not a GPU instance suitable for LLMs. Option B uses which is a GPU instance and the "smallest NVIDIA GPU size available for Snowpark Containers to get started". While functional, ‘ GPU is more directly aligned with "intensive GPU usage scenarios like LLMs" as stated in the question. 'AUTO RESUME = TRUE is the defau