Friday, November 25, 2016

[BigQuery][Tutorial] Upload gz file as table data via command line in windows platform

If you already have a dataset and table, you can skip step 1 to step 3 .

Step 1 - download source file

Download example file from this link, that's raw logs shared by Wikipedia:
https://dumps.wikimedia.org/other/pagecounts-raw/2015/2015-06/pagecounts-20150616-160000.gz

Step 2 - Create dataset 

Go to the BigQuery web UI and then click the down arrow icon  down arrow icon image  next to your project name in the navigation and select "Create new dataset".


For this case i used "wikipedia" as dataset name, and then click "OK" button:

Step 3 - Create table

Hover on the "wikipedia" dataset ID in the navigation which you have just created. Click the down arrow icon down arrow icon image  image next to the ID and click "Create new table".


Select "Create empty table " option in Source data column, and then enter the following value for the destination table name under Destination Table.
pagecounts_20150616_16
And then click the Edit as text link in the Schema section :


Insert the content to schema input area where text in green are column name and text in blue are related data type.
language:string,title:string,requests:integer,content_size:integer

Leave others fields as default and click "Create Table".


So we would have a table named "pagecounts_20150616_16" with 4 columns, where columns are "language" in string type, "title" in string type, "requests" in integer type and "content_size" in integer type.

Step 4 - Login google cloud sdk shell and select project

Open your "google cloud sdk Shell" application:

And it would ask you login or not, Enter "y" after "would you like to login ?"

After you press y, a browser with google account login page would pop up and you have to login a google account, after finished the process your console would update automatically, press your project item number from shown list and click "Enter":


Step 5 - Upload by command

Before upload the gz file, please make sure you got the dataset and the table, use this comment to upload file where text in orange is your table ID, text in red is your location of the file you want to be uploaded and text in green is your table schema in text form.
bq load -F" " --quote "" YourProjectName:DatasetName.pagecounts_20150616_16 YourFilePath\pagecounts-20150616-160000.gz language,title,requests:integer,content_size:integer
Example:
bq load -F" " --quote "" xxx-100004:wikipedia.pagecounts_20150616_16 C:\Users\username\Desktop\BigQuery\pagecounts-20150616-160000.gz  language,title,requests:integer,content_size:integer


 

Reference

https://cloud.google.com/bigquery/quickstart-web-ui
https://www.reddit.com/r/bigquery/comments/3dg9le/analyzing_50_billion_wikipedia_pageviews_in_5/

No comments :

Post a Comment