diff --git a/samples/Using IBM Cloud SQL Query to analyze LogDNA data.ipynb b/samples/Using IBM Cloud SQL Query to analyze LogDNA data.ipynb new file mode 100644 index 0000000..7998eb5 --- /dev/null +++ b/samples/Using IBM Cloud SQL Query to analyze LogDNA data.ipynb @@ -0,0 +1,1843 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Using IBM Cloud SQL Query to analyze LogDNA data" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "\n", + "**IBM Cloud SQL Query** is IBM's serverless SQL service on data in **Cloud Object Storage**. It allows to run ANSI SQL on Parquet, CSV and JSON data sets. It is based on Apache Spark SQL as the query engine in the background. It can also be used to pre-process and analyze the log archives that LogDNA writes. The below Notebook uses the the Python SDK from IBM Cloud SQL Query. Further details on the Python SDK could be found in Git

\n", + "\n", + "**LogDNA** is a cloud-based log management software that aggregates all system and application logs in one centralized logging system. LogDNA keeps your logs only for a limited period of time.\n", + "However, LogDNA can be configured to **export logs from LogDNA to IBM Cloud Object Storage**. Archived logs are in JSON format and preserve metadata associated with each line. Logs will be exported daily in a compressed format (.json.gz).\n", + "

\n", + "\n", + "This notebook gives an overview on the SQL Query features that help with preparing the log archives for further analysis. Furthermore it shows how to query the logs with SQL to filter out the \"noise\" in your data.\n", + "The start of the notebook shows how to work with your own LogDNA data and the chapter 5 focus on existing ingress sample log data." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Table of contents\n", + "1. [Setup libraries](#setup)
\n", + "2. [Configure SQL Query](#configure)
\n", + "3. [Prepare LogDNA dumps for analysis](#prepare)
\n", + "4. [How to work with SQL Query vs. LogDNA UI](#comparison)
\n", + " 4.1 [Filtering by sources, apps, levels or tags](#filter)
\n", + " 4.2 [Field search on parsed fields](#fieldsearch)
\n", + " 4.3 [Jump to timeframe](#jumptime)
\n", + " 4.4 [Retrieving schema information](#schema)
\n", + "5. [Using SQL Query for log analysis](#loganalysis)
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 1. Setup libraries" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Run the following cell at least once in your notebook environment in order to install required packages, such as the SQL Query client library:" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "!pip -q install ibmcloudsql\n", + "!pip -q install sqlparse" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Pixiedust database opened successfully\n", + "Table VERSION_TRACKER created successfully\n", + "Table METRICS_TRACKER created successfully\n", + "\n", + "Share anonymous install statistics? (opt-out instructions)\n", + "\n", + "PixieDust will record metadata on its environment the next time the package is installed or updated. The data is anonymized and aggregated to help plan for future releases, and records only the following values:\n", + "\n", + "{\n", + " \"data_sent\": currentDate,\n", + " \"runtime\": \"python\",\n", + " \"application_version\": currentPixiedustVersion,\n", + " \"space_id\": nonIdentifyingUniqueId,\n", + " \"config\": {\n", + " \"repository_id\": \"https://github.com/ibm-watson-data-lab/pixiedust\",\n", + " \"target_runtimes\": [\"Data Science Experience\"],\n", + " \"event_id\": \"web\",\n", + " \"event_organizer\": \"dev-journeys\"\n", + " }\n", + "}\n", + "You can opt out by calling pixiedust.optOut() in a new cell.\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + "
\n", + " \n", + " \n", + " \n", + " Pixiedust version 1.1.17\n", + "
\n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[31mPixiedust runtime updated. Please restart kernel\u001b[0m\n", + "Table USER_PREFERENCES created successfully\n", + "Table service_connections created successfully\n" + ] + } + ], + "source": [ + "import ibmcloudsql\n", + "from pixiedust.display import *\n", + "import pandas as pd\n", + "logDNADump=''\n", + "targetUrl=''\n", + "logData=''" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2. Configure SQL Query\n", + "1. You need an **API key** for an IBM cloud identity that has access to your Cloud Object Storage bucket for writing SQL results and to your SQL Query instance. To create API keys log on to the IBM Cloud console and go to Manage->Security->Platform API Keys, click the `Create` button, give the key a custom name and click `Create`. In the next dialog click `Show` and copy the key to your clipboard and paste it below in this notebook.\n", + "2. You need the **instance CRN** for the SQL Query instance. You can find it in the IBM Cloud console dashboard. Make sure you have `All Resources` selected as resource group. In the section `Services` you can see your instances of SQL Query and Cloud Object Storage. Select the instance of SQL Query that you want to use. In the SQL Query dashboard page that opens up you find a section titled **REST API** with a button labelled **Instance CRN**. Click the button to copy the CRN into your clipboard and paste it here into the notebook. If you don't have an SQL Query instance created yet, create one first.\n", + "3. You need to specify the location on Cloud Object Storage where your **query results** should be written. This comprises three parts of information that you can find in the Cloud Object Storage UI for your instance in the IBM Cloud console. You need to provide it as a **URL** using the format `cos:////[]`. You have the option to use the cloud object storage **bucket that is associated with your project**. In this case, execute the following section before you proceed. \n", + "4. You need to specify the location on Cloud Object Storage where your **logDNA objects are stored**. \n", + "5. You need to specify the location on Cloud Object Storage where your pre-processed log data is stored. See below for an example pre-processing query.\n", + "
\n", + "\n", + "For more background information, check out the SQL Query documentation." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import getpass\n", + "import sqlparse\n", + "from pygments import highlight\n", + "from pygments.lexers import get_lexer_by_name\n", + "from pygments.formatters import HtmlFormatter, Terminal256Formatter\n", + "from IPython.core.display import display, HTML\n", + "\n", + "apikey=getpass.getpass('Enter IBM Cloud API Key (leave empty to use previous one): ') or apikey\n", + "instancecrn=input('Enter SQL Query Instance CRN (leave empty to use previous one): ') or instancecrn\n", + "\n", + "sqlClient = ibmcloudsql.SQLQuery(apikey, instancecrn, client_info='SQL Query Starter Notebook')\n", + "sqlClient.logon()\n", + "print('\\nYour SQL Query web console link:')\n", + "sqlClient.sql_ui_link()\n", + "print('\\n')\n", + "\n", + "# Specify where to write write query results:\n", + "if targetUrl == '':\n", + " targetUrl=input('Enter target URL for SQL results (like cos://us-south/resultBucket/notebookResults): ')\n", + "else:\n", + " targetUrl=input('Enter target URL for SQL results (leave empty to use ' + targetUrl + '): ') or targetUrl\n", + "\n", + "# Specify where the location of logdna dumps:\n", + "if logDNADump == '':\n", + " logDNADump=input('Enter URL for LogDNA archive data (like cos://us-south/archiveBucket): ')\n", + "else:\n", + " logDNADump=input('Enter URL for LogDNA archive data (leave empty to use ' + logDNADump + '): ') or logDNADump\n", + "\n", + "# Specify where to find the preprocessed log data: \n", + "if logData == '':\n", + " logData=input('Enter URL where to store preprocessed log data (like cos://us-south/preprocessedBucket): ')\n", + "else:\n", + " logData = input('\\nEnter URL where to store preprocessed log data (leave empty to use ' + logData + '): ') or logData\n", + "\n", + "# For pretty-printing SQL statements\n", + "def format_sql(sql):\n", + " formatted_sql = sqlparse.format(sql, reindent=True, indent_tabs=True, keyword_case='upper')\n", + " lexer = get_lexer_by_name(\"sql\", stripall=True)\n", + " formatter = Terminal256Formatter(style='tango')\n", + " return (highlight(formatted_sql, lexer, formatter))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 3. Prepare LogDNA dumps for analysis\n", + "You first have to configure LogDNA to archive the log files to IBM Cloud Object Store. Once archiving is configured for your account, your logs will be exported on a daily or hourly basis in a compressed format (.json.gz). \n", + "\n", + "**Note: The following just demonstrates how the preparation of log files would look like sql-wise. You probably want to run this regularly as an automated task on your LogDNA dumps to have the data prepared for log analysis.**\n", + "\n", + "It is possible to query json.gz with SQL Query directly. However, we would not recommend this format as it limits Spark's possibility to process the file in parallel. See How to Layout Big Data in IBM Cloud Object Storage for Spark SQL for details. \n", + "\n", + "To improve query performance the data can be partitioned in more manageable junks, for example by hour or application. This allows Spark to skip whole partitions if the respective data is not queried." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sqlClient.logon()\n", + "\n", + "sql = \"SELECT *, \" + \\\n", + " \"date_format(from_unixtime(_source._ts / 1000, 'yyyy-MM-dd HH:mm:ss'), 'yyyy') AS _year, \" + \\\n", + " \"date_format(from_unixtime(_source._ts / 1000, 'yyyy-MM-dd HH:mm:ss'), 'D') AS _dayofyear, \" + \\\n", + " \"date_format(from_unixtime(_source._ts / 1000, 'yyyy-MM-dd HH:mm:ss'), 'HH') AS _hour \" + \\\n", + " \"FROM {} STORED AS JSON \" + \\\n", + " \"INTO {} STORED AS JSON PARTITIONED BY (_year, _dayofyear, _hour)\"\n", + "sql = sql.format(logDNADump, logData)\n", + "print(format_sql(sql))\n", + "\n", + "jobId = sqlClient.submit_sql(sql)\n", + "print(\"SQL query submitted and running in the background. Could take some time depending on the size of your archive data. jobId = \" + jobId)\n", + "job_status = sqlClient.wait_for_job(jobId)\n", + "\n", + "print(\"\\nJob \" + jobId + \" finished with status: \" + job_status)\n", + "if job_status == 'failed':\n", + " details = sqlClient.get_job(jobId)\n", + " print(\"\\nError: {}\\nError Message: {}\".format(details['error'], details['error_message']))\n", + "print(\"\\nResult stored in: \" + sqlClient.get_job(jobId)['resultset_location'])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 4. How to work with SQL query compared to LogDNA UI\n", + "Now your preprocessed log data is ready for analysis. But before we get started, we give a quick overview on how to work with SQL Query compared to the LogDNA UI to perform tasks that you need for log analysis like filtering or searching log records. \n", + "\n", + "#### 4.1 Filtering by sources, apps, levels or tags:\n", + " - **LogDNA**: \n", + " Use the respective drop-down to select one or more sources, apps, log levels or tags.\n", + "

\n", + " - **SQL Query**: \n", + " Filter on the respective column. The log record's source, app, log level or tags (if defined) are stored along with the log record also for archived logs.\n", + " - for source:\n", + "```sql \n", + "WHERE _source._host IN(\"source1\", \"source2\")\n", + "```\n", + " - for app:\n", + "```sql \n", + "WHERE _source._app IN(\"app1\", \"app2\")\n", + "```\n", + " - for level: \n", + "```sql \n", + "WHERE _source.level IN(\"level1\", \"level2\")\n", + "```\n", + " - for tags: You have to explode the tags array before you can query it.\n", + "```sql \n", + "WHERE array_contains(_source._tag, \"someTag\")\n", + "```\n", + "\n", + "#### 4.2 Field search on parsed fields:\n", + " - **LogDNA**:\n", + " In the search input field you enter your search term: \n", + " - for prefix match on the parsed field (case insensitive): `:` \n", + " - for term match(case insensitive): `:==`\n", + " - for prefix match on the parsed field (case sensitive): `:=`\n", + " - for term match(case sensitive): `:===`\n", + " - for list of prefix matches (case insensitive): `:[value1, value2]`\n", + " - find log records that contain the parsed field, the value doesn't matter: `:*`\n", + "

\n", + " - **SQL Query**:\n", + " When archiving the log records all parsed fields are preserved in the log records, so it is possible to query them with SQL Query. You don't have the full-text search capability of LogDNA, however you can work with the LIKE operator and wildcards on specific fields.\n", + " - for prefix match on the parsed field (case sensitive):\n", + "```sql \n", + "WHERE _source.nameOfParsedField LIKE \"SomeValue%\"\n", + "```\n", + " - for term match on the parsed field\n", + "```sql \n", + "WHERE _source.nameOfParsedField LIKE \"%SomeValue%\"\n", + "```\n", + "If in addition case insensitive search is required, the field can be converted to lower case with LOWER(). Make sure to use a lower case search pattern in this case, for example:\n", + "```sql \n", + "WHERE LOWER(_source.nameOfParsedField) LIKE \"%somevalue%\"\n", + "```\n", + " - for list of prefix matches:\n", + "```sql \n", + "WHERE _source.nameOfParsedField LIKE \"value1%\" OR _source.nameOfParsedField LIKE \"value2%\"\n", + "```\n", + " - find log records that contain the parsed field, the value doesn't matter:\n", + "```sql \n", + "WHERE _source.nameOfParsedField IS NOT NULL\n", + "``` \n", + "\n", + "#### 4.3 Jump to timeframe: \n", + " - **LogDNA**: In the Jump to timeframe box you can enter the desired day and time, e.g. today at 11am or a timeframe, e.g. last fri 4:30p to 11/12 1 AM. \n", + " - **SQL Query**: The LogDNA archived logs contain the field _source._ts which contains a timestamp in milli-second granularity in UTC. Find an example below on how to convert a local datetime string into timestamp that you can use in your query.\n", + " \n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "WHERE _source._ts = 1566071695000.0 AND _year = '2019' AND _dayofyear = '229' AND _hour = 19\n" + ] + } + ], + "source": [ + "from datetime import datetime, timezone, timedelta\n", + "from dateutil import tz\n", + "\n", + "# For example, we want to convert this date and time string\n", + "jumpToTime = \"2019-8-17 21:54:55\"\n", + "\n", + "# Adjust for your timezone\n", + "#input_timezone = tz.gettz('America/New_York')\n", + "input_timezone = tz.gettz('Europe/Berlin')\n", + "\n", + "\n", + "jumpToUTC = datetime.strptime(jumpToTime, '%Y-%m-%d %H:%M:%S')\\\n", + " .replace(tzinfo=input_timezone) \\\n", + " .astimezone(tz=timezone.utc) \n", + "year = datetime.strftime(jumpToUTC, '%Y')\n", + "dayofyear = datetime.strftime(jumpToUTC, '%j')\n", + "hour = datetime.strftime(jumpToUTC, '%H')\n", + "\n", + "jumpToTimestamp = jumpToUTC.timestamp() * 1000\n", + "\n", + "print(\"WHERE _source._ts = {} AND _year = '{}' AND _dayofyear = '{}' AND _hour = {}\".format(jumpToTimestamp, year, dayofyear, hour))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 4.4 Retrieving schema information: \n", + "\n", + "##### 1. Get a list of sources and application names:\n", + "LogDNA UI lists the application and sources, that you log data for, in the respective drop-down lists. \n", + "With SQL Query this information can be retrieved as follows:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;24;01mSELECT\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_app\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mapplication\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_host\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;24;01mSOURCE\u001b[39;00m\n", + "\u001b[38;5;24;01mFROM\u001b[39;00m \u001b[38;5;0mcos\u001b[39m\u001b[38;5;0;01m:\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mus\u001b[39m\u001b[38;5;166;01m-\u001b[39;00m\u001b[38;5;0mgeo\u001b[39m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;24;01msql\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mLogDNA\u001b[39m \u001b[38;5;0mSTORED\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mJSON\u001b[39m \u001b[38;5;24;01mGROUP\u001b[39;00m \u001b[38;5;24;01mBY\u001b[39;00m \u001b[38;5;0mapplication\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;24;01msource\u001b[39;00m \u001b[38;5;24;01mORDER\u001b[39;00m \u001b[38;5;24;01mBY\u001b[39;00m \u001b[38;5;0mapplication\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;24;01msource\u001b[39;00m \u001b[38;5;24;01mINTO\u001b[39;00m \u001b[38;5;0mcos\u001b[39m\u001b[38;5;0;01m:\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mus\u001b[39m\u001b[38;5;166;01m-\u001b[39;00m\u001b[38;5;0msouth\u001b[39m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mexpire\u001b[39m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;24;01mresult\u001b[39;00m \u001b[38;5;0mSTORED\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mCSV\u001b[39m\n", + "\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
applicationsource
0nginx-ingresspublic-cr5ff87ab416044e60b8f8b19c1da44ccc-alb1...
1nginx-ingresspublic-cr5ff87ab416044e60b8f8b19c1da44ccc-alb1...
2nginx-ingresspublic-cr5ff87ab416044e60b8f8b19c1da44ccc-alb1...
3nginx-ingresspublic-cr5ff87ab416044e60b8f8b19c1da44ccc-alb1...
\n", + "
" + ], + "text/plain": [ + " application source\n", + "0 nginx-ingress public-cr5ff87ab416044e60b8f8b19c1da44ccc-alb1...\n", + "1 nginx-ingress public-cr5ff87ab416044e60b8f8b19c1da44ccc-alb1...\n", + "2 nginx-ingress public-cr5ff87ab416044e60b8f8b19c1da44ccc-alb1...\n", + "3 nginx-ingress public-cr5ff87ab416044e60b8f8b19c1da44ccc-alb1..." + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "sqlClient.logon()\n", + "\n", + "sql = \"SELECT _source._app AS application , _source._host AS source \" + \\\n", + " \"FROM {} STORED AS JSON \" + \\\n", + " \"GROUP BY application, source \" + \\\n", + " \"ORDER BY application, source \" + \\\n", + " \"INTO {} STORED AS CSV\"\n", + "\n", + "sql = sql.format(logData, targetUrl)\n", + "print(format_sql(sql))\n", + "\n", + "result_df = sqlClient.run_sql(sql)\n", + "\n", + "if isinstance(result_df, str):\n", + " print(result_df)\n", + " \n", + "result_df.head(4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "##### 2. Retrieve schema information for a specific application:\n", + "Schema information is needed to know which fields are available for querying. You can look at the LogDNA UI and expand the respective log line, there you find the field names.
\n", + "As a general rule:\n", + "* fields generated by LogDNA start with a underscore in the archived log records, e.g. \\_source, \\_app\n", + "* fields parsed from log records are passed along as is to the archived logs, e.g. level \n", + "
\n", + "\n", + "With SQL Query schema information can be retrieved by using DESCRIBE. Running DESCRIBE on the whole log file results in a schema that gives you the fields for all applications. To get a more manageable schema file, first retrieve the logs for the application you are interested in and then run DESCRIBE on it. This returns a JSON object. Then you can switch to the SQL Query UI to view its content. Alternatively you can run FLATTEN before DESCRIBE to flatten the nested JSON and then retrieve the result to view in the notebook. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sqlClient.logon()\n", + "\n", + "sql = \"SELECT * \" + \\\n", + " \"FROM {} STORED AS JSON \" + \\\n", + " \"WHERE _source._app = 'nginx-ingress' AND _source._status IS NOT NULL \" + \\\n", + " \"INTO {} STORED AS JSON\"\n", + "\n", + "sql = sql.format(logData, targetUrl)\n", + "print(format_sql(sql))\n", + "\n", + "jobid_ingress = sqlClient.submit_sql(sql)\n", + "job_status = sqlClient.wait_for_job(jobid_ingress)\n", + "if job_status == 'failed':\n", + " details = sqlClient.get_job(jobid_ingress)\n", + " print(\"Error: {}\\nError Message: {}\".format(details['error'], details['error_message']))\n", + "print(\"jobId: \" + jobid_ingress + \"\\n\")\n", + "\n", + "details = sqlClient.get_job(jobid_ingress)\n", + "ingress_records = details['resultset_location']\n", + "sql = \"SELECT * FROM DESCRIBE(FLATTEN({} STORED AS JSON)) \" + \\\n", + " \"INTO {} STORED AS JSON\"\n", + "\n", + "sql = sql.format(ingress_records, targetUrl)\n", + "print(format_sql(sql))\n", + "jobid_schema = sqlClient.submit_sql(sql)\n", + "job_status = sqlClient.wait_for_job(jobid_schema)\n", + "if job_status == 'failed':\n", + " details = sqlClient.get_job(jobid_schema)\n", + " print(\"Error: {}\\nError Message: {}\".format(details['error'], details['error_message']))\n", + "\n", + "print(\"jobId: \" + jobid_schema)\n", + "\n", + "print('\\nUse SQL Query UI to view the results')\n", + "sqlClient.sql_ui_link()" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\n", + "Schema information:\n" + ] + }, + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
namenullabletype
0_dayofyearTruelong
1_hourTruelong
2_yearTruelong
3jobidTruestring
4_source__appTruestring
5_source__fileTruestring
6_source__hostTruestring
7_source__lidTruestring
8_source__logtypeTruestring
9_source__statusTruestring
10_source__tsTruestring
11_source_containerTruestring
12_source_containeridTruestring
13_source_hostTruestring
14_source_namespaceTruestring
15_source_nodeTruestring
16_source_request_idTruestring
17_source_request_methodTruestring
18_source_request_timeTruestring
19_source_request_uriTruestring
20_source_time_dateTruestring
21_source_upstream_connect_timeTruestring
22_source_upstream_header_timeTruestring
23_source_upstream_response_timeTruestring
24_source_upstream_statusTruestring
\n", + "
" + ], + "text/plain": [ + " name nullable type\n", + "0 _dayofyear True long\n", + "1 _hour True long\n", + "2 _year True long\n", + "3 jobid True string\n", + "4 _source__app True string\n", + "5 _source__file True string\n", + "6 _source__host True string\n", + "7 _source__lid True string\n", + "8 _source__logtype True string\n", + "9 _source__status True string\n", + "10 _source__ts True string\n", + "11 _source_container True string\n", + "12 _source_containerid True string\n", + "13 _source_host True string\n", + "14 _source_namespace True string\n", + "15 _source_node True string\n", + "16 _source_request_id True string\n", + "17 _source_request_method True string\n", + "18 _source_request_time True string\n", + "19 _source_request_uri True string\n", + "20 _source_time_date True string\n", + "21 _source_upstream_connect_time True string\n", + "22 _source_upstream_header_time True string\n", + "23 _source_upstream_response_time True string\n", + "24 _source_upstream_status True string" + ] + }, + "execution_count": 6, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "print('\\nSchema information:')\n", + "result_schema = sqlClient.get_result(jobid_schema)\n", + "sqlClient.get_result(jobid_schema)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 5 Use SQL query for log analysis\n", + "In the following we will use SQL Query to analyze the ingress log records of a test system's kubernetes. First let's retrieve the request counts for each worker node along with the average request time and http status codes.
\n", + "\n", + "hint: \n", + "- In the SQL Query UI you get a SQL Editor with syntax highlighting and validation, so it is probably most effective to use the UI to develop a SQL and then copy it over to the notebook.\n", + "- You may submit SQL Query jobs to run in the background and then retrieve the results when the job is done. This allows you to continue the work on your notebook while the query is running.\n", + "- Storing the job ID in your notebook allows you to retrieve the persisted results without having to rerun the query. " + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "230\n" + ] + } + ], + "source": [ + "# Get day of year from date string\n", + "from datetime import datetime\n", + "dayofyear = datetime.strptime(\"2019-08-18\", '%Y-%m-%d').timetuple().tm_yday\n", + "print(dayofyear)" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;24;01mSELECT\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_status\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mstatus\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;24;01mCOUNT\u001b[39;00m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_status\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mstatus_count\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mnode\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mnode\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;24;01mAVG\u001b[39;00m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mrequest_time\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mrequest_time\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;0mconcat\u001b[39m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0mdate_format\u001b[39m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0mfrom_unixtime\u001b[39m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_ts\u001b[39m \u001b[38;5;166;01m/\u001b[39;00m \u001b[38;5;20;01m1000\u001b[39;00m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;64m'yyyy-MM-dd HH:mm:ss'\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;64m'yyyy-MM-dd'\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;64m' '\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_hour\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;64m':00:00'\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0m_datetime\u001b[39m\n", + "\u001b[38;5;24;01mFROM\u001b[39;00m \u001b[38;5;0mcos\u001b[39m\u001b[38;5;0;01m:\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mus\u001b[39m\u001b[38;5;166;01m-\u001b[39;00m\u001b[38;5;0mgeo\u001b[39m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;24;01msql\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mLogDNA\u001b[39m \u001b[38;5;0mSTORED\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mJSON\u001b[39m \u001b[38;5;24;01mWHERE\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_app\u001b[39m \u001b[38;5;166;01m=\u001b[39;00m \u001b[38;5;64m'nginx-ingress'\u001b[39m \u001b[38;5;24;01mAND\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_status\u001b[39m \u001b[38;5;24;01mIS\u001b[39;00m \u001b[38;5;24;01mNOT\u001b[39;00m \u001b[38;5;24;01mNULL\u001b[39;00m \u001b[38;5;24;01mAND\u001b[39;00m \u001b[38;5;0m_year\u001b[39m \u001b[38;5;166;01m=\u001b[39;00m \u001b[38;5;20;01m2019\u001b[39;00m \u001b[38;5;24;01mAND\u001b[39;00m \u001b[38;5;0m_dayofyear\u001b[39m \u001b[38;5;24;01mBETWEEN\u001b[39;00m \u001b[38;5;20;01m223\u001b[39;00m \u001b[38;5;24;01mand\u001b[39;00m \u001b[38;5;20;01m230\u001b[39;00m \u001b[38;5;24;01mGROUP\u001b[39;00m \u001b[38;5;24;01mBY\u001b[39;00m \u001b[38;5;0m_year\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_dayofyear\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_hour\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_datetime\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mnode\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_status\u001b[39m \u001b[38;5;24;01mINTO\u001b[39;00m \u001b[38;5;0mcos\u001b[39m\u001b[38;5;0;01m:\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mus\u001b[39m\u001b[38;5;166;01m-\u001b[39;00m\u001b[38;5;0msouth\u001b[39m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mexpire\u001b[39m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;24;01mresult\u001b[39;00m \u001b[38;5;0mSTORED\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mCSV\u001b[39m\n", + "\n", + "jobId: c4a03654-e46e-4de1-814c-dfb411155063\n" + ] + } + ], + "source": [ + "sqlClient.logon()\n", + "# we switch now to sql query sample preprocessed data for the following queries\n", + "logData = 'cos://us-geo/sql/LogDNA'\n", + "\n", + "# targetUrl = \n", + "sql = \"SELECT _source._status AS status, COUNT(_source._status) AS status_count, _source.node AS node, \" + \\\n", + " \"AVG(_source.request_time) AS request_time, \" + \\\n", + " \"concat(date_format(from_unixtime(_source._ts / 1000, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' ', _hour, ':00:00') AS _datetime \" + \\\n", + " \"FROM {} STORED AS JSON \" + \\\n", + " \"WHERE _source._app = 'nginx-ingress' AND _source._status IS NOT NULL AND _year = 2019 AND _dayofyear BETWEEN 223 and 230 \" + \\\n", + " \"GROUP BY _year, _dayofyear, _hour, _datetime, _source.node, _source._status \" + \\\n", + " \"INTO {} STORED AS CSV \"\n", + "\n", + "sql = sql.format(logData, targetUrl)\n", + "print(format_sql(sql))\n", + "\n", + "jobid = sqlClient.submit_sql(sql)\n", + "print(\"jobId: \" + jobid)" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [], + "source": [ + "sqlClient.logon()\n", + "job_status = sqlClient.wait_for_job(jobid)\n", + "if job_status == 'failed':\n", + " details = sqlClient.get_job(jobid)\n", + " print(\"Error: {}\\nError Message: {}\".format(details['error'], details['error_message']))\n", + "\n", + "result_df = sqlClient.get_result(jobid)\n", + "\n", + "# Adjust datatypes for Pixiedust\n", + "result_df['datetime'] = pd.to_datetime(result_df['_datetime'])\n", + "result_df['status'] = result_df['status'].apply(str)\n", + "#print(result_df.head())\n", + "#print(result_df.info())" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now let's plot the data ..." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pixiedust": { + "displayParams": { + "chartsize": "88", + "clusterby": "node", + "dynamicfilter": "datetime", + "handlerId": "lineChart", + "keyFields": "datetime", + "legend": "true", + "lineChartType": "grouped", + "logx": "false", + "logy": "false", + "mpld3": "false", + "rendererId": "matplotlib", + "rowCount": "240", + "title": "Hourly traffic by worker node", + "valueFields": "status_count" + } + }, + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
\n", + "
\n", + " Hourly traffic by worker node\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "from pixiedust.display import *\n", + "display(result_df)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Ok, that's too many data points. Let's resample the data. I chose 2 data points a day, but you can easily adjust the freq parameter." + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": { + "pixiedust": { + "displayParams": { + "chartsize": "72", + "charttype": "stacked", + "clusterby": "node", + "filter": "{}", + "handlerId": "lineChart", + "keyFields": "datetime", + "lineChartType": "grouped", + "logy": "false", + "no_margin": "true", + "rendererId": "matplotlib", + "title": "Daily traffic by worker node", + "valueFields": "status_count" + } + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
\n", + "
\n", + " Daily traffic by worker node\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + " \n", + " \n", + " \n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "summary = result_df.groupby([pd.Grouper(key='datetime', freq='12H'), 'node', 'status']) \\\n", + " .agg({'status_count':'sum', 'request_time': 'mean'}) \n", + "summary.reset_index(inplace=True)\n", + "display(summary)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now, let's look at the HTTP status codes. Here we are probably most interested in failed requests." + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": { + "pixiedust": { + "displayParams": { + "charttype": "stacked", + "clusterby": "status", + "handlerId": "barChart", + "keyFields": "datetime", + "rendererId": "bokeh", + "valueFields": "status_count" + } + }, + "scrolled": true + }, + "outputs": [], + "source": [ + "daily_summary = result_df.groupby([pd.Grouper(key='datetime', freq='D'), 'node', 'status']) \\\n", + " .agg({'status_count':'sum', 'request_time': 'mean'}) \n", + "daily_summary.reset_index(inplace=True)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pixiedust": { + "displayParams": { + "chartsize": "73", + "charttype": "stacked", + "clusterby": "status", + "filter": "{\"field\": \"status\", \"constraint\": \"None\", \"value\": \"^4|^5\", \"case_matter\": \"false\", \"regex\": \"false\"}", + "handlerId": "barChart", + "keyFields": "datetime", + "no_margin": "true", + "rendererId": "bokeh", + "title": "Number of Client or Server Errors per day", + "valueFields": "status_count" + } + }, + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "display(daily_summary)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "On 2019-08-17 we see a lot of server errors (500). To find out more about the failing requests, we retrieve the request URI info. For that we need to make some adjustments to the request URIs to not get flooded with groups, e.g. not found requests can have all sorts of request URIs, so map all of them to INVALID. In our data we use unique jobIds, CRNs or instance IDs as part of our request URIs to request information on these specific artifacts. However, for now we are only interested in the general request type, so we map these as well to a common URI. See the query below." + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[38;5;24;01mSELECT\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_status\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mstatus\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;24;01mCOUNT\u001b[39;00m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_status\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mstatus_count\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mnode\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mnode\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;24;01mCASE\u001b[39;00m\n", + "\t\t\t\t\t\u001b[38;5;24;01mWHEN\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mrequest_uri\u001b[39m \u001b[38;5;0mRLIKE\u001b[39m \u001b[38;5;64m'/v2/sql_jobs/[^\\S]+'\u001b[39m \u001b[38;5;24;01mTHEN\u001b[39;00m \u001b[38;5;64m'/v2/sql_jobs/jobid'\u001b[39m\n", + "\t\t\t\t\t\u001b[38;5;24;01mWHEN\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mrequest_uri\u001b[39m \u001b[38;5;0mRLIKE\u001b[39m \u001b[38;5;64m'/active_instance/[^\\S]+'\u001b[39m \u001b[38;5;24;01mTHEN\u001b[39;00m \u001b[38;5;64m'/active_instance/instanceid'\u001b[39m\n", + "\t\t\t\t\t\u001b[38;5;24;01mWHEN\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mrequest_uri\u001b[39m \u001b[38;5;0mRLIKE\u001b[39m \u001b[38;5;64m'/v2/service_instances/[^\\S]+'\u001b[39m \u001b[38;5;24;01mTHEN\u001b[39;00m \u001b[38;5;64m'/v2/service_instances/crn'\u001b[39m\n", + "\t\t\t\t\t\u001b[38;5;24;01mWHEN\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mrequest_uri\u001b[39m \u001b[38;5;0mRLIKE\u001b[39m \u001b[38;5;64m'/dashboard/[^\\S]+'\u001b[39m \u001b[38;5;24;01mTHEN\u001b[39;00m \u001b[38;5;64m'/dashboard/id'\u001b[39m\n", + "\t\t\t\t\t\u001b[38;5;24;01mWHEN\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;24;01mhost\u001b[39;00m \u001b[38;5;166;01m=\u001b[39;00m \u001b[38;5;64m'pact.ys1-dev-sql-query.us-south.containers.appdomain.cloud'\u001b[39m \u001b[38;5;24;01mTHEN\u001b[39;00m \u001b[38;5;64m'PACT'\u001b[39m\n", + "\t\t\t\t\t\u001b[38;5;24;01mWHEN\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_status\u001b[39m \u001b[38;5;166;01m=\u001b[39;00m \u001b[38;5;20;01m404\u001b[39;00m \u001b[38;5;24;01mTHEN\u001b[39;00m \u001b[38;5;64m'INVALID'\u001b[39m\n", + "\t\t\t\t\t\u001b[38;5;24;01mELSE\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mrequest_uri\u001b[39m\n", + "\t\u001b[38;5;24;01mEND\u001b[39;00m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mrequest_uri\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;24;01mAVG\u001b[39;00m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mrequest_time\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mrequest_time\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m\n", + "\t\u001b[38;5;0mconcat\u001b[39m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0mdate_format\u001b[39m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0mfrom_unixtime\u001b[39m\u001b[38;5;0;01m(\u001b[39;00m\u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_ts\u001b[39m \u001b[38;5;166;01m/\u001b[39;00m \u001b[38;5;20;01m1000\u001b[39;00m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;64m'yyyy-MM-dd HH:mm:ss'\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;64m'yyyy-MM-dd'\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;64m' '\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_hour\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;64m':00:00'\u001b[39m\u001b[38;5;0;01m)\u001b[39;00m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0m_datetime\u001b[39m\n", + "\u001b[38;5;24;01mFROM\u001b[39;00m \u001b[38;5;0mcos\u001b[39m\u001b[38;5;0;01m:\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mus\u001b[39m\u001b[38;5;166;01m-\u001b[39;00m\u001b[38;5;0mgeo\u001b[39m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;24;01msql\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mLogDNA\u001b[39m \u001b[38;5;0mSTORED\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mJSON\u001b[39m \u001b[38;5;24;01mWHERE\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_app\u001b[39m \u001b[38;5;166;01m=\u001b[39;00m \u001b[38;5;64m'nginx-ingress'\u001b[39m \u001b[38;5;24;01mAND\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_status\u001b[39m \u001b[38;5;24;01mIS\u001b[39;00m \u001b[38;5;24;01mNOT\u001b[39;00m \u001b[38;5;24;01mNULL\u001b[39;00m \u001b[38;5;24;01mAND\u001b[39;00m \u001b[38;5;0m_dayofyear\u001b[39m \u001b[38;5;166;01m=\u001b[39;00m \u001b[38;5;20;01m229\u001b[39;00m \u001b[38;5;24;01mGROUP\u001b[39;00m \u001b[38;5;24;01mBY\u001b[39;00m \u001b[38;5;0m_year\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_dayofyear\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_hour\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_datetime\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0mnode\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0m_source\u001b[39m\u001b[38;5;0;01m.\u001b[39;00m\u001b[38;5;0m_status\u001b[39m\u001b[38;5;0;01m,\u001b[39;00m \u001b[38;5;0mrequest_uri\u001b[39m \u001b[38;5;24;01mINTO\u001b[39;00m \u001b[38;5;0mcos\u001b[39m\u001b[38;5;0;01m:\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mus\u001b[39m\u001b[38;5;166;01m-\u001b[39;00m\u001b[38;5;0msouth\u001b[39m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;0mexpire\u001b[39m\u001b[38;5;166;01m/\u001b[39;00m\u001b[38;5;24;01mresult\u001b[39;00m \u001b[38;5;0mSTORED\u001b[39m \u001b[38;5;24;01mAS\u001b[39;00m \u001b[38;5;0mCSV\u001b[39m\n", + "\n", + "jobId: 02436aa3-0342-485b-9c33-30bba1ebfa91\n" + ] + } + ], + "source": [ + "sqlClient.logon()\n", + "# logData =\n", + "# targetUrl = \n", + "sql = \"SELECT _source._status AS status, \" + \\\n", + " \"COUNT(_source._status) AS status_count, \" + \\\n", + " \"_source.node AS node, \" + \\\n", + " \"CASE \" + \\\n", + " \"WHEN _source.request_uri RLIKE '/v2/sql_jobs/[^\\S]+' THEN '/v2/sql_jobs/jobid' \" + \\\n", + " \"WHEN _source.request_uri RLIKE '/active_instance/[^\\S]+' THEN '/active_instance/instanceid' \" + \\\n", + " \"WHEN _source.request_uri RLIKE '/v2/service_instances/[^\\S]+' THEN '/v2/service_instances/crn' \" + \\\n", + " \"WHEN _source.request_uri RLIKE '/dashboard/[^\\S]+' THEN '/dashboard/id' \" + \\\n", + " \"WHEN _source.host = 'pact.ys1-dev-sql-query.us-south.containers.appdomain.cloud' THEN 'PACT' \" + \\\n", + " \"WHEN _source._status = 404 THEN 'INVALID' \" + \\\n", + " \"ELSE _source.request_uri \" + \\\n", + " \"END AS request_uri, \" + \\\n", + " \"AVG(_source.request_time) AS request_time, \" + \\\n", + " \"concat(date_format(from_unixtime(_source._ts / 1000, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' ', _hour, ':00:00') AS _datetime \" + \\\n", + " \"FROM {} STORED AS JSON \" + \\\n", + " \"WHERE _source._app = 'nginx-ingress' AND _source._status IS NOT NULL AND _dayofyear = 229 \" + \\\n", + " \"GROUP BY _year, _dayofyear, _hour, _datetime, _source.node, _source._status, request_uri \" + \\\n", + " \"INTO {} STORED AS CSV\"\n", + "\n", + "sql = sql.format(logData, targetUrl)\n", + "print(format_sql(sql))\n", + "jobid = sqlClient.submit_sql(sql)\n", + "print(\"jobId: \" + jobid)" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [], + "source": [ + "sqlClient.logon()\n", + "job_status = sqlClient.wait_for_job(jobid)\n", + "if job_status == 'failed':\n", + " details = sqlClient.get_job(jobid)\n", + " print(\"Error: {}\\nError Message: {}\".format(details['error'], details['error_message']))\n", + "resultRange = sqlClient.get_result(jobid)\n", + "\n", + "# Adjust datatypes for Pixiedust\n", + "resultRange['datetime'] = pd.to_datetime(resultRange['_datetime'])\n", + "resultRange['status'] = resultRange['status'].apply(str)\n", + "#print(resultRange.head())\n", + "#print(resultRange.info())" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pixiedust": { + "displayParams": { + "aggregation": "SUM", + "charttype": "stacked", + "clusterby": "status", + "dynamicfilter": "status", + "handlerId": "barChart", + "keyFields": "datetime", + "legend": "true", + "rendererId": "bokeh", + "timeseries": "false", + "title": "Number of HTTP status codes per hour that we see on 2019-08-17 ", + "valueFields": "status_count" + } + }, + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
\n", + "
\n", + " Number of HTTP status codes per hour that we see on 2019-08-17 \n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(resultRange)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So we started to see many 500s at 10am and kept getting them until 9pm. Now we also want to know which request types encountered these 500 errors. " + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pixiedust": { + "displayParams": { + "aggregation": "SUM", + "chartsize": "68", + "charttype": "grouped", + "clusterby": "request_uri", + "filter": "{\"field\": \"status\", \"constraint\": \"None\", \"value\": \"500\", \"case_matter\": \"false\", \"regex\": \"false\"}", + "handlerId": "barChart", + "keyFields": "status", + "legend": "true", + "mpld3": "false", + "no_margin": "true", + "rendererId": "bokeh", + "title": "Number of server errors by request URI", + "valueFields": "status_count" + } + }, + "scrolled": true + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "display(resultRange)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + " So much for that. Now we have a look at the request times." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pixiedust": { + "displayParams": { + "aggregation": "AVG", + "chartsize": "61", + "charttype": "grouped", + "clusterby": "node", + "handlerId": "barChart", + "keyFields": "datetime", + "title": "Average request times by worker node", + "valueFields": "request_time" + } + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
\n", + "
\n", + " Average request times by worker node\n", + "
\n", + "
\n", + " \n", + " \n", + "
\n", + "
\n", + " \n", + " \n", + "
" + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "display(daily_summary)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "These are pretty high response times. However, we retrieved average request times and averages are affected by outliers, especially when probably most response times are sub-second and then you have some that take minutes. So let's just look at the long-running ones." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pixiedust": { + "displayParams": { + "aggregation": "SUM", + "chartsize": "74", + "charttype": "subplots", + "clusterby": "status", + "filter": "{\"field\": \"request_time\", \"constraint\": \"greater_than\", \"value\": \"2.5\", \"case_matter\": \"False\", \"regex\": \"False\"}", + "handlerId": "barChart", + "keyFields": "datetime", + "no_margin": "true", + "title": "Number of requests per day and worker node (only long-running requests > 2.5s)", + "valueFields": "status_count" + } + }, + "scrolled": false + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "display(daily_summary)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "So we have about 20 to 120 long-running requests every day returning 101. So for now we take out these requests, to see the average request times for the majority of the requests on the system. And that looks much better, now we see the expected sub-second response times:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "pixiedust": { + "displayParams": { + "aggregation": "AVG", + "clusterby": "node", + "filter": "{\"field\": \"request_time\", \"constraint\": \"less_than\", \"value\": \"2.5\", \"case_matter\": \"False\", \"regex\": \"False\"}", + "handlerId": "barChart", + "keyFields": "datetime", + "no_margin": "true", + "title": "Average request times by day and worker node (without the long-running requests)", + "valueFields": "request_time" + } + } + }, + "outputs": [ + { + "data": { + "text/html": [ + "
Hey, there's something awesome here! To see it, open this notebook outside GitHub, in a viewer like Jupyter
" + ], + "text/plain": [ + "" + ] + }, + "metadata": { + "pixieapp_metadata": null + }, + "output_type": "display_data" + } + ], + "source": [ + "display(daily_summary)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "Copyright © IBM Corp. 2019. This notebook and its source code are released under the terms of the MIT License." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.6", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.8" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +}