Leaf #0 – Top 13 Techniques for Google BigQuery Optimization

Leaf #0 – Top 13 Techniques for Google BigQuery Optimization

Dear friends, Happy International Women’s Day. Last day i met a surprise friend called Lakshmi in the meetup where we get together to share the learning on big data on the cloud. He expressed that dataottam team, is doing awesome via blogs, meetups, and much more. We energized with his appreciation. As a token of gift we the dataottam team started a initiative called Leaf (Learning from this week ), where we will bring you insights and value from all our deep dive learning from that week.

In this Leaf #0 episode let’s converse about the Google BigQuery optimization. BigQuery is Google’s server less, highly scalable, low cost enterprise data warehouse designed to make all our data analysts productive. We can use SQL to analyze the data and to find meaningful insights from the large data sets. BigQuery enables us to analyze all our data by creating a logical data warehouse over managed, columnar storage, apart from that it allows us to capture and analyze data in real-time using the powerful streaming ingestion capability. And we have cheering news, BigQuery is free for up to 1TB of data analyzed each month and 10GB of data stored. All my free GCP training’s, projects goes well with out any penny.

When we use Google BigQuery, our data is stored on a Colossus filesystem; which is the same technology underpinning other main Google services and all data is encrypted at rest. It can be interacted in many ways like web console, client-side API (Python, Go, C#, Java, Node.js, PHP and Ruby), a command line interface, standard interfaces like JDBC and ODBC, and 3rd party tools (eg. Matillion ). Whenever we execute a SQL statement, it is parsed and optimized internally by an execution engine called Dremel. To achieve the vital aim of scalability, Google internally splits every SQL task between many workers compute nodes called slots, and we can call this approach as MPP (Massively Parallel Processing).

Avoid SELECT *, select * will selects all columns. So select specific columns to reduce the amount of data scanned.
LIMIT and WHERE clauses doesn’t affect the query cost. LIMIT is a final step which restricts the amount of data shown, after the full query has been executed. WHERE clause is applied after data has been scanned. So we can use LIMIT to avoid having to order all results of a query. Then query it just once, using a superset of all the necessary WHERE clauses, and materialize an intermediate table; then repeatedly query the intermediate table, using individual WHERE clauses.
WITH will not materialize at runtime, hence use a intermediary table rather than a CTE (Common Table Expression).
Leverage the Approximate Aggregations like COUNT(DISTINCT).
Don’t use SELECT * …LIMIT types, rather use preview within console, use the 2bq head2 command or we can use the API.
Avoid HAVING clauses when able, instead use WHERE to reduce the amount of data being shuffled.
Optimize our WHERE clauses by joining largest table first, smallest table next and then the other tables, in descending size order.
DATE Partitioning for a table is great advantage, so we can simple use the _PARTIONTIME pseudocolumn to fetch data from the partitions specified by the date filter. It helps the amount of data scanned.
Use SELECT with a wildcard in a FROM clause with manually sharded table.
Leverage Analytic Functions like SUM, CORR, STDDEV, and much more to reduce duplicated scanning efforts.

Click here for more details… happy reading and happy sharing.

See u in the next Leaf (Learning from this week).

Leaf #0 – Top 13 Techniques for Google BigQuery Optimization

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112