Step 1 - download source file
Download example file from this link, that's raw logs shared by Wikipedia:https://dumps.wikimedia.org/other/pagecounts-raw/2015/2015-06/pagecounts-20150616-160000.gz
Step 2 - Create dataset
Go to the BigQuery web UI and then click the down arrow icon next to your project name in the navigation and select "Create new dataset".
For this case i used "wikipedia" as dataset name, and then click "OK" button:
Step 3 - Create table
Hover on the "wikipedia" dataset ID in the navigation which you have just created. Click the down arrow icon image next to the ID and click "Create new table".
Select "Create empty table " option in Source data column, and then enter the following value for the destination table name under Destination Table.
pagecounts_20150616_16
And then click the Edit as text link in the Schema section :
language:string,title:string,requests:integer,content_size:integer
So we would have a table named "pagecounts_20150616_16" with 4 columns, where columns are "language" in string type, "title" in string type, "requests" in integer type and "content_size" in integer type.
Step 4 - Login google cloud sdk shell and select project
Open your "google cloud sdk Shell" application:And it would ask you login or not, Enter "y" after "would you like to login ?"
After you press y, a browser with google account login page would pop up and you have to login a google account, after finished the process your console would update automatically, press your project item number from shown list and click "Enter":
Step 5 - Upload by command
Before upload the gz file, please make sure you got the dataset and the table, use this comment to upload file where text in orange is your table ID, text in red is your location of the file you want to be uploaded and text in green is your table schema in text form.bq load -F" " --quote "" YourProjectName:DatasetName.pagecounts_20150616_16 YourFilePath\pagecounts-20150616-160000.gz language,title,requests:integer,content_size:integerExample:
bq load -F" " --quote "" xxx-100004:wikipedia.pagecounts_20150616_16 C:\Users\username\Desktop\BigQuery\pagecounts-20150616-160000.gz language,title,requests:integer,content_size:integer
Reference
https://cloud.google.com/bigquery/quickstart-web-uihttps://www.reddit.com/r/bigquery/comments/3dg9le/analyzing_50_billion_wikipedia_pageviews_in_5/
No comments :
Post a Comment