ติดตั้ง Apache Spark บน Ubuntu 22.04

ติดตั้ง java

update packages

$ sudo apt update

ติดตั้ง Java JDK (openjdk)

$ sudo apt install default-jdk

ตรวจสอบการติดตั้งบน Ubuntu 22.04.2

$ java --version
openjdk 11.0.19 2023-04-18
OpenJDK Runtime Environment (build 11.0.19+7-post-Ubuntu-0ubuntu122.04.1)
OpenJDK 64-Bit Server VM (build 11.0.19+7-post-Ubuntu-0ubuntu122.04.1, mixed mode, sharing)

ติดตั้ง Apache Spark

ติดตั้ง package curl , mlocate , git , scala

$ sudo apt install curl mlocate git scala 

ดาว์นโหลด Apache Spark จาก Download Apache Spark™

$ wget https://dlcdn.apache.org/spark/spark-3.3.2/spark-3.3.2-bin-hadoop3.tgz

แตกไฟล์

$ tar xvf spark-3.3.2-bin-hadoop3.tgz

ย้ายไฟล์

$ sudo mv spark-3.3.2-bin-hadoop3/ /opt/spark 

Set Spark environment

$ sudo nano ~/.bashrc

ใส่ค่านี้ต่อที่ด้านล่างของไฟล์ .bashrc

export SPARK_HOME=/opt/spark

export PATH=$PATH:$SPARK_HOME/bin:$SPARK_HOME/sbin

export SPARK_LOCAL_IP=localhost

export PYSPARK_PYTHON=/usr/bin/python3

export PYTHONPATH=$(ZIPS=("$SPARK_HOME"/python/lib/*.zip); IFS=:; echo "${ZIPS[*]}"):$PYTHONPATH
$ source .bashrc

run Spark shell

$ spark-shell

run Pyspark

$ pyspark
Python 3.10.6 (main, Nov 14 2022, 16:10:14) [GCC 11.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/07/28 23:41:19 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 3.3.2
      /_/

Using Python version 3.10.6 (main, Nov 14 2022 16:10:14)
Spark context Web UI available at http://localhost:4040
Spark context available as 'sc' (master = local[*], app id = local-1690562479751).
SparkSession available as 'spark'.
>>>

Start a standalone master server

$ start-master.sh
starting org.apache.spark.deploy.master.Master, logging to /opt/spark/logs/spark-jack-org.apache.spark.deploy.master.Master-1-jack22042.out

The process will be listening on TCP port 8080.

$ sudo ss -tunelp | grep 8080
tcp   LISTEN 0      1      [::ffff:127.0.0.1]:8080             *:*    users:(("java",pid=6346,fd=283)) uid:1000 ino:63398 sk:e cgroup:/user.slice/user-1000.slice/session-4.scope v6only:0 <->

The Web UI looks like below. http://localhost:8080

Starting Spark Worker Process

$ start-worker.sh spark://jack22042:7077
starting org.apache.spark.deploy.worker.Worker, logging to /opt/spark/logs/spark-jack-org.apache.spark.deploy.worker.Worker-1-jack22042.out

 shut down the master and slave Spark processes

$ stop-worker.sh
$ stop-master.sh

ติดตั้ง Flask

Flask ใช้ Jinja เป็น HTML templates

Install Flask

สร้าง folder สำหรับโปรเจ็กส์

> mkdir myproject ; cd myproject

สร้างไฟล์ requirements.txt

Flask==2.0.2

พิมพ์คำสั่งตามนี้

python -m venv venv
.\venv\scripts\activate
pip install -r requirements.txt

สร้างไฟล์ app.py หรือชื่ออื่นก็ได้ เช่น hello.py แต่ห้ามชื่อ flask.py

from flask import Flask

app = Flask(__name__)

@app.route("/")
def hello_world():
    return "<p>Hello, World!</p>"

ไฟล์ชื่อ app.py สั่งรัน Flask แบบนี้

flask run

แต่ถ้าไฟล์ชื่ออื่น เช่น ไฟล์ชื่อ hello.py สั่งรันแบบนี้

flask --app hello run

รันแบบ debug –  restarts the server whenever you make changes to the code.

flask run --debug

เปิด browser ไปที่ http://127.0.0.1:5000/

HTML Escaping

from flask import Flask
from markupsafe import escape

app = Flask(__name__)

@app.route("/")
def hello_world():
    return "<p>Hello, World!</p>"

@app.route("/<name>")
def hello(name):
    return f"Hello, {escape(name)}!"

ไปที่ http://127.0.0.1:5000/jack

จะได้ Hello, jack!

Routing

from flask import Flask
from markupsafe import escape

app = Flask(__name__)

@app.route('/')
def index():
    return 'Index Page'

@app.route('/hello')
def hello_world():
    return 'Hello, World'

@app.route("/<name>")
def hello(name):
    return f"Hello, {escape(name)}!"

Variable Rules

from flask import Flask
from markupsafe import escape

app = Flask(__name__)

@app.route('/')
def index():
    return 'Index Page'

@app.route('/hello')
def hello_world():
    return 'Hello, World'

@app.route("/<name>")
def hello(name):
    return f"Hello, {escape(name)}!"

@app.route('/user/<username>')
def show_user_profile(username):
    # show the user profile for that user
    return f'User {escape(username)}'

@app.route('/post/<int:post_id>')
def show_post(post_id):
    # show the post with the given id, the id is an integer
    return f'Post {post_id}'

@app.route('/path/<path:subpath>')
def show_subpath(subpath):
    # show the subpath after /path/
    return f'Subpath {escape(subpath)}'

Converter types:

string(default) accepts any text without a slash
intaccepts positive integers
floataccepts positive floating point values
pathlike string but also accepts slashes
uuidaccepts UUID strings

Apache log4net™ กับ .NET 6 WinForms

สร้างโปรเจ็กส์แบบ Windows Forms App (winforms) ชื่อ WinFormsApp1

ดูไฟล์ Program.cs

namespace WinFormsApp1
{
    internal static class Program
    {
        /// <summary>
        ///  The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main()
        {
            // To customize application configuration such as set high DPI settings or default font,
            // see https://aka.ms/applicationconfiguration.
            ApplicationConfiguration.Initialize();
            Application.Run(new Form1());
        }
    }
}

ดูไฟล์ Form1.cs

namespace WinFormsApp1
{
    public partial class Form1 : Form
    {
        public Form1()
        {
            InitializeComponent();
        }
    }
}

ติดตั้ง Package log4net , Microsoft.Extensions.Hosting และ Microsoft.Extensions.Logging.Log4Net.AspNetCore

ดูไฟล์ WinFormsApp1.csproj

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>WinExe</OutputType>
    <TargetFramework>net6.0-windows</TargetFramework>
    <Nullable>enable</Nullable>
    <UseWindowsForms>true</UseWindowsForms>
    <ImplicitUsings>enable</ImplicitUsings>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="log4net" Version="2.0.15" />
    <PackageReference Include="Microsoft.Extensions.Hosting" Version="6.0.1" />
    <PackageReference Include="Microsoft.Extensions.Logging.Log4Net.AspNetCore" Version="6.1.0" />
  </ItemGroup>

</Project>

สร้างไฟล์ log4net.config และ ตั้งค่า Copy to Output Directory เป็น Copy always

<?xml version="1.0" encoding="utf-8" ?>
<log4net>
  <appender name="RollingLogFileAppender" type="log4net.Appender.RollingFileAppender">
    <lockingModel type="log4net.Appender.FileAppender+MinimalLock"/>
    <file value="log/" />
    <datePattern value="yyMMdd'Jack.log'" />
    <staticLogFileName value="false"/>
    <appendToFile value="true"/>
    <rollingStyle value="Date"/>
    <maxSizeRollBackups value="100"/>
    <maximumFileSize value="15MB"/>
    <encoding value="UTF-8"/>
    <layout type="log4net.Layout.PatternLayout">
      <param name="ConversionPattern" value="%-5p%d{ HH:mm:ss} li:%line - [%method] %m  %n" />
    </layout>
  </appender>
  <root>
    <level value="ALL"/>
    <appender-ref ref="RollingLogFileAppender"/>
  </root>
</log4net>

แก้ไขไฟล์ Program.cs

using Microsoft.Extensions.DependencyInjection;
using Microsoft.Extensions.Hosting;
using Microsoft.Extensions.Logging;

namespace WinFormsApp1
{
    internal static class Program
    {
        /// <summary>
        ///  The main entry point for the application.
        /// </summary>
        [STAThread]
        static void Main()
        {
            // To customize application configuration such as set high DPI settings or default font,
            // see https://aka.ms/applicationconfiguration.
            ApplicationConfiguration.Initialize();

            ///Generate Host Builder and Register the Services for DI
            var builder = new HostBuilder()
               .ConfigureServices((hostContext, services) =>
               {
                   //Register all your services here
                   services.AddLogging(configure => configure.AddConsole())
                           .AddScoped<Form1>();
               }).ConfigureLogging(logBuilder =>
               {
                   logBuilder.SetMinimumLevel(LogLevel.Trace);
                   logBuilder.AddLog4Net("log4net.config");
               });

            var host = builder.Build();

            using (var serviceScope = host.Services.CreateScope())
            {
                var services = serviceScope.ServiceProvider;
                try
                {
                    var form1 = services.GetRequiredService<Form1>();
                    Application.Run(form1);

                    Console.WriteLine("Success");
                }
                catch (Exception ex)
                {
                    Console.WriteLine("Error");
                    Console.WriteLine(ex.ToString());
                }
            }
        }
    }
}

แก้ไขไฟล์ Form1.cs

using Microsoft.Extensions.Logging;

namespace WinFormsApp1
{
    public partial class Form1 : Form
    {
        private readonly ILogger _logger;
        public Form1(ILogger<Form1> logger)
        {
            InitializeComponent();
            _logger = logger;

            try
            {
                _logger.LogInformation("Form1 Started");
                MessageBox.Show("Hello World!");
            }
            catch (Exception ex)
            {
                _logger.LogError(ex.Message);
            }
        }
    }
}

Delta Lake Features

1. สร้างตาราง students

%sql
CREATE TABLE students (
  id INT, name STRING, value DOUBLE);

ดูคำสั่งสร้างตารางนี้ด้วย SHOW CREATE TABLE

%sql
SHOW CREATE TABLE students
%sql
CREATE TABLE spark_catalog.default.students (
  id INT,
  name STRING,
  value DOUBLE)
USING delta
TBLPROPERTIES (
  'delta.minReaderVersion' = '1',
  'delta.minWriterVersion' = '2')

Using DESCRIBE EXTENDED allows us to see important metadata about our table.

%sql
DESCRIBE EXTENDED students
%python
df1 = spark.sql('DESCRIBE EXTENDED students')
df1.show()
+--------------------+--------------------+-------+
|            col_name|           data_type|comment|
+--------------------+--------------------+-------+
|                  id|                 int|   null|
|                name|              string|   null|
|               value|              double|   null|
|                    |                    |       |
|# Detailed Table ...|                    |       |
|             Catalog|       spark_catalog|       |
|            Database|             default|       |
|               Table|            students|       |
|                Type|             MANAGED|       |
|            Location|dbfs:/user/hive/w...|       |
|            Provider|               delta|       |
|               Owner|                root|       |
| Is_managed_location|                true|       |
|    Table Properties|[delta.minReaderV...|       |
+--------------------+--------------------+-------+

DESCRIBE DETAIL is another command that allows us to explore table metadata.

%sql
DESCRIBE DETAIL students
%python
df2 = spark.sql('DESCRIBE DETAIL students')
df2.show(vertical=True)
-RECORD 0--------------------------------
 format           | delta                
 id               | 46681f33-7201-4c6... 
 name             | spark_catalog.def... 
 description      | null                 
 location         | dbfs:/user/hive/w... 
 createdAt        | 2023-04-17 16:24:... 
 lastModified     | 2023-04-17 16:24:32  
 partitionColumns | []                   
 numFiles         | 0                    
 sizeInBytes      | 0                    
 properties       | {}                   
 minReaderVersion | 1                    
 minWriterVersion | 2        

ดู Delta Lake Files

%python
display(dbutils.fs.ls('dbfs:/user/hive/warehouse/students'))
%python
li_file = dbutils.fs.ls('dbfs:/user/hive/warehouse/students')
df3 = sqlContext.createDataFrame(li_file)
df3.show()
+--------------------+--------------------+----+----------------+
|                path|                name|size|modificationTime|
+--------------------+--------------------+----+----------------+
|dbfs:/user/hive/w...|         _delta_log/|   0|   1681750558289|
+--------------------+--------------------+----+----------------+

Reviewing Delta Lake Transactions

%sql
DESCRIBE HISTORY students
%python
df4 = spark.sql('DESCRIBE HISTORY students')
df4.show(vertical=True)
-RECORD 0-----------------------------------
 version             | 0                    
 timestamp           | 2023-04-17 16:24:32  
 userId              | 8501686721698164     
 userName            | odl_user_915759@d... 
 operation           | CREATE TABLE         
 operationParameters | {isManaged -> tru... 
 job                 | null                 
 notebook            | {1477724271071511}   
 clusterId           | 0415-162149-6ai590aw 
 readVersion         | null                 
 isolationLevel      | WriteSerializable    
 isBlindAppend       | true                 
 operationMetrics    | {}                   
 userMetadata        | null                 
 engineInfo          | Databricks-Runtim... 

2. เพิ่มข้อมูล 3 ครั้ง

%sql
INSERT INTO students VALUES (1, "Yve", 1.0);
INSERT INTO students VALUES (2, "Omar", 2.5);
INSERT INTO students VALUES (3, "Elia", 3.3);
%python
df2 = spark.sql('DESCRIBE DETAIL students')
df2.show(vertical=True)
RECORD 0--------------------------------
 format           | delta                
 id               | 46681f33-7201-4c6... 
 name             | spark_catalog.def... 
 description      | null                 
 location         | dbfs:/user/hive/w... 
 createdAt        | 2023-04-17 16:24:... 
 lastModified     | 2023-04-17 16:51:01  
 partitionColumns | []                   
 numFiles         | 3                    
 sizeInBytes      | 2613                 
 properties       | {}                   
 minReaderVersion | 1                    
 minWriterVersion | 2  
%python
li_file = dbutils.fs.ls('dbfs:/user/hive/warehouse/students')
df3 = sqlContext.createDataFrame(li_file)
df3.show()
+--------------------+--------------------+----+----------------+
|                path|                name|size|modificationTime|
+--------------------+--------------------+----+----------------+
|dbfs:/user/hive/w...|         _delta_log/|   0|   1681750558289|
|dbfs:/user/hive/w...|part-00000-1d6df3...| 868|   1681750256000|
|dbfs:/user/hive/w...|part-00000-57bc12...| 872|   1681750261000|
|dbfs:/user/hive/w...|part-00000-ec8db3...| 873|   1681750259000|
+--------------------+--------------------+----+----------------+
%python
display(spark.sql(f"SELECT * FROM json.`dbfs:/user/hive/warehouse/students/_delta_log/00000000000000000001.json`"))

รันคำสั่งนี้จะ error

%sql
SELECT * FROM parquet.`dbfs:/user/hive/warehouse/students/part-00000-1d6df344-3187-42fd-8591-df4ed47b403f.c000.snappy.parquet`
AnalysisException: Incompatible format detected.

A transaction log for Delta was found at `dbfs:/user/hive/warehouse/students/_delta_log`,
but you are trying to read from `dbfs:/user/hive/warehouse/students/part-00000-57bc1277-813b-41c0-990d-4ba8fd05c9b1.c000.snappy.parquet` using format("parquet"). You must use
'format("delta")' when reading and writing to a delta table.

To disable this check, SET spark.databricks.delta.formatCheck.enabled=false
To learn more about Delta, see https://docs.databricks.com/delta/index.html; line 1 pos 14

ให้ SET ค่านี้ก่อน

%sql
SET spark.databricks.delta.formatCheck.enabled=false

รันใหม่จะได้ละ

%sql
SELECT * FROM parquet.`dbfs:/user/hive/warehouse/students/part-00000-1d6df344-3187-42fd-8591-df4ed47b403f.c000.snappy.parquet`

คำสั่ง CREATE VIEW

Constructs a virtual table that has no physical data based on the result-set of a SQL query. ALTER VIEW and DROP VIEW only change metadata.

Syntax

CREATE [ OR REPLACE ] [ TEMPORARY ] VIEW [ IF NOT EXISTS ] view_name
    [ column_list ]
    [ COMMENT view_comment ]
    [ TBLPROPERTIES clause ]
    AS query

column_list
   ( { column_alias [ COMMENT column_comment ] } [, ...] )

ตัวอย่าง

%sql
CREATE OR REPLACE TEMPORARY VIEW demo_tmp2(name, value) AS
VALUES
  ("Yi", 1),
  ("Ali", 2),
  ("Selina", 3)

หรือใช้คำว่า TEMP แทน TEMPORARY ก็ได้

%sql
CREATE OR REPLACE TEMP VIEW demo_tmp1(name, value) AS
VALUES
  ("Yi", 1),
  ("Ali", 2),
  ("Selina", 3)

PySpark: display a spark data frame in a table format

สร้าง PySpark DataFrame ชื่อ df

%python
df = sqlContext.createDataFrame([("foo", 1), ("bar", 2), ("baz", 3)], ("k", "v"))
%python
print(type(df))

# <class 'pyspark.sql.dataframe.DataFrame'>

แสดงตาราง

%python
df.show()

# +---+---+
# |  k|  v|
# +---+---+
# |foo|  1|
# |bar|  2|
# |baz|  3|
# +---+---+

แสดงตาราง โดยกำหนด n = 2

%python
df.show(n=2)

# +---+---+
# |  k|  v|
# +---+---+
# |foo|  1|
# |bar|  2|
# +---+---+
# only showing top 2 rows
%python
df.show(2, True)

# +---+---+
# |  k|  v|
# +---+---+
# |foo|  1|
# |bar|  2|
# +---+---+
# only showing top 2 rows

doc

%python
df.show?
Signature:
df.show(
    n: int = 20,
    truncate: Union[bool, int] = True,
    vertical: bool = False,
) -> None
Docstring:
Prints the first ``n`` rows to the console.

.. versionadded:: 1.3.0

Parameters
----------
n : int, optional
    Number of rows to show.
truncate : bool or int, optional
    If set to ``True``, truncate strings longer than 20 chars by default.
    If set to a number greater than one, truncates long strings to length ``truncate``
    and align cells right.
vertical : bool, optional
    If set to ``True``, print output rows vertically (one line
    per column value).

Examples
--------
>>> df
DataFrame[age: int, name: string]
>>> df.show()
+---+-----+
|age| name|
+---+-----+
|  2|Alice|
|  5|  Bob|
+---+-----+
>>> df.show(truncate=3)
+---+----+
|age|name|
+---+----+
|  2| Ali|
|  5| Bob|
+---+----+
>>> df.show(vertical=True)
-RECORD 0-----
 age  | 2
 name | Alice
-RECORD 1-----
 age  | 5
 name | Bob
File:      /databricks/spark/python/pyspark/sql/dataframe.py

ติดตั้ง Java JDK บน Ubuntu 20.04

Installing Java

update packages ก่อน

$ sudo apt update

ติดตั้ง Java JDK 11 (openjdk)

$ sudo apt install default-jdk

แต่ถ้าจะติดตั้ง Java 8 ใช้คำสั่ง

sudo apt install openjdk-8-jdk

โปรแกรมจะติดตั้งอยู่ที่ /usr/lib/jvm/java-11-openjdk-amd64/bin/

    $ ls -l /usr/lib/jvm/java-11-openjdk-amd64/bin/java*
    -rwxr-xr-x 1 root root 14560 ม.ค.  20 16:07 /usr/lib/jvm/java-11-openjdk-amd64/bin/java
    -rwxr-xr-x 1 root root 14608 ม.ค.  20 16:07 /usr/lib/jvm/java-11-openjdk-amd64/bin/javac
    -rwxr-xr-x 1 root root 14608 ม.ค.  20 16:07 /usr/lib/jvm/java-11-openjdk-amd64/bin/javadoc
    -rwxr-xr-x 1 root root 14576 ม.ค.  20 16:07 /usr/lib/jvm/java-11-openjdk-amd64/bin/javap

    ตรวจสอบการติดตั้งบน Ubuntu 20.04.6

    $ java --version
    openjdk 11.0.18 2023-01-17
    OpenJDK Runtime Environment (build 11.0.18+10-post-Ubuntu-0ubuntu120.04.1)
    OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Ubuntu-0ubuntu120.04.1, mixed mode, sharing)
    $ javac --version
    javac 11.0.18

    ตรวจสอบการติดตั้งบน Ubuntu 22.04.2

    $ java --version
    openjdk 11.0.18 2023-01-17
    OpenJDK Runtime Environment (build 11.0.18+10-post-Ubuntu-0ubuntu122.04)
    OpenJDK 64-Bit Server VM (build 11.0.18+10-post-Ubuntu-0ubuntu122.04, mixed mode, sharing)
    $ javac --version
    javac 11.0.18

    Managing Java

    ใช้คำสั่ง update-alternatives

    $ update-alternatives --help
    Usage: update-alternatives [<option> ...] <command>
    
    Commands:
      --install <link> <name> <path> <priority>
        [--slave <link> <name> <path>] ...
                               add a group of alternatives to the system.
      --remove <name> <path>   remove <path> from the <name> group alternative.
      --remove-all <name>      remove <name> group from the alternatives system.
      --auto <name>            switch the master link <name> to automatic mode.
      --display <name>         display information about the <name> group.
      --query <name>           machine parseable version of --display <name>.
      --list <name>            display all targets of the <name> group.
      --get-selections         list master alternative names and their status.
      --set-selections         read alternative status from standard input.
      --config <name>          show alternatives for the <name> group and ask the
                               user to select which one to use.
      --set <name> <path>      set <path> as alternative for <name>.
      --all                    call --config on all alternatives.
    
    <link> is the symlink pointing to /etc/alternatives/<name>.
      (e.g. /usr/bin/pager)
    <name> is the master name for this link group.
      (e.g. pager)
    <path> is the location of one of the alternative target files.
      (e.g. /usr/bin/less)
    <priority> is an integer; options with higher numbers have higher priority in
      automatic mode.
    
    Options:
      --altdir <directory>     change the alternatives directory.
      --admindir <directory>   change the administrative directory.
      --log <file>             change the log file.
      --force                  allow replacing files with alternative links.
      --skip-auto              skip prompt for alternatives correctly configured
                               in automatic mode (relevant for --config only)
      --quiet                  quiet operation, minimal output.
      --verbose                verbose operation, more output.
      --debug                  debug output, way more output.
      --help                   show this help message.
      --version                show the version.

    You can have multiple Java installations on one server. You can configure which version is the default for use on the command line by using the update-alternatives command.

    $ sudo update-alternatives --config java

    ถ้ามี java ตัวเดียวก็จะขึ้นประมาณนี้

    $ sudo update-alternatives --config java
    There is only one alternative in link group java (providing /usr/bin/java): /usr/lib/jvm/java-11-openjdk-amd64/bin/java
    Nothing to configure.

    แต่ถ้ามี java หลายตัว ก็จะแสดงให้เราเลือก

    javac ก็เหมือนกัน ใช้คำสั่ง

    $ sudo update-alternatives --config javac

    Setting the JAVA_HOME

    $ sudo nano /etc/environment

    At the end of this file, add the following line, and to not include the bin/ portion of the path: (หา path ได้ด้วยคำสั่ง update-alternatives)

    JAVA_HOME="/usr/lib/jvm/java-11-openjdk-amd64"
    JAVA_HOME="/usr/lib/jvm/java-8-openjdk-amd64"

    Modifying this file will set the JAVA_HOME path for all users on your system.

    Save the file and exit the editor.

    Now reload this file to apply the changes to your current session:

    $ source /etc/environment

    Verify that the environment variable is set:

    $ echo $JAVA_HOME

    Link

    คำสั่ง apt

    ดู help ของคำสั่ง apt

    $ apt --help
    apt 2.4.8 (amd64)
    Usage: apt [options] command
    
    apt is a commandline package manager and provides commands for
    searching and managing as well as querying information about packages.
    It provides the same functionality as the specialized APT tools,
    like apt-get and apt-cache, but enables options more suitable for
    interactive use by default.
    
    Most used commands:
      list - list packages based on package names
      search - search in package descriptions
      show - show package details
      install - install packages
      reinstall - reinstall packages
      remove - remove packages
      autoremove - Remove automatically all unused packages
      update - update list of available packages
      upgrade - upgrade the system by installing/upgrading packages
      full-upgrade - upgrade the system by removing/installing/upgrading packages
      edit-sources - edit the source information file
      satisfy - satisfy dependency strings
    
    See apt(8) for more information about the available commands.
    Configuration options and syntax is detailed in apt.conf(5).
    Information about how to configure sources can be found in sources.list(5).
    Package and version choices can be expressed via apt_preferences(5).
    Security details are available in apt-secure(8).
                                            This APT has Super Cow Powers.

    Update and Upgrade packages

    update list of available packages

    $ sudo apt update

    list packages ที่สามารถอัพเกรดได้

    $ apt list --upgradable

    upgrade the system by installing/upgrading packages

    $ sudo apt upgrade

    ทำคำสั่ง apt update และต่อด้วย apt upgrade ด้วยการใช้คำสั่ง &&

    When using the && command, the second command will be executed only when the first one has been succcefully executed.

    $ sudo apt update && sudo apt upgrade

    Install or Remove package

    install packages

    $ sudo apt install <package_name>

    remove packages

    $ sudo apt remove <package_name>

    Remove automatically all unused packages

    $ sudo apt autoremove

    reinstall packages

    sudo apt reinstall <package_name>
    sudo apt reinstall lighttpd

    hold a package ด้วย apt-mark

    sudo apt-mark hold <package_name>
    sudo apt-mark hold sudo

    unhold a package ด้วย apt-mark

    sudo apt-mark unhold <package_name>
    sudo apt-mark unhold sudo

    Other

    show package details

    To show or see information about the given package(s) including its dependencies, installation and download size, sources the package is available from, the description of the packages content and much more:

    $ apt show <package_name>
    $ apt show sudo

    List package dependency

    apt depends <package_name>
    apt depends sudo

    search in package descriptions

    apt search php
    apt search mysql-5.?
    apt search mysql-server-5.?
    apt search httpd*
    apt search ^apache
    apt search ^nginx
    apt search ^nginx$

    apt search ค้นหาใน package descriptions ทำให้ได้ข้อมูลเยอะเกิน หา pakcage ที่ต้องการยาก ให้ลองใช้ apt list แทน

    apt list
    apt list | more
    apt list | grep foo
    apt list | grep php7-
    
    apt list nginx
    apt list 'php7*'

    List all installed packages

    apt list --installed
    apt list --installed | grep <package_name>

    Ref

    Write Excel with PySpark

    ที่ Cluster ติดตั้ง com.crealytics:spark-excel-2.12.17-3.0.1_2.12:3.0.1_0.18.1

    สร้าง pyspark dataframe

    %python
    data = [('A', "1"),
            ('B', "2"),
            ('C', "3"),
            ('D', "4")
            ]
    print(type(data))  # <class 'list'>
    df = spark.createDataFrame(data)
    print(type(df))    # <class 'pyspark.sql.dataframe.DataFrame'>
    display(df)

    เขียนไฟล์ excel

    %python
    path = '/mnt/xxx/tmp/'
    filename = f'{path}output1.xlsx'
    print(f'filename = {filename}')
    df.write.format("com.crealytics.spark.excel")\
      .option("header", "true")\
      .mode("overwrite")\
      .save(filename)

    ลอง %fs ls ‘/mnt/xxx/tmp/‘ จะเห็นไฟล์ dbfs:/mnt/xxx/tmp/output1.xlsx ละ

    สร้าง dataframe อีกอัน

    %python
    columns = ['Identifier', 'Value', 'Extra Discount']
    vals = [(1, 150, 0), (2, 160, 12)]
    df2 = spark.createDataFrame(vals, columns)
    df2.show()
    
    # +----------+-----+--------------+
    # |Identifier|Value|Extra Discount|
    # +----------+-----+--------------+
    # |         1|  150|             0|
    # |         2|  160|            12|
    # +----------+-----+--------------+

    เขียนแบบ append โดยข้อมูลเริ่มต้นที่ cell B3 ถึง C35 คอลัมน์ Extra Discount เลยหายไป

    %python
    df2.write.format("com.crealytics.spark.excel") \
      .option("dataAddress", "'My Sheet'!B3:C35") \
      .option("header", "true") \
      .mode("append") \
      .save(filename)

    Write Excel with Pandas

    สร้าง pandas DataFrame

    %python
    import pandas as pd
    import openpyxl
    
    df = pd.DataFrame([[11, 21, 31], [12, 22, 32], [31, 32, 33]],
                      index=['one', 'two', 'three'], columns=['a', 'b', 'c'])
    
    print(df)
    
    #         a   b   c
    # one    11  21  31
    # two    12  22  32
    # three  31  32  33
    
    print(type(df))
    # <class 'pandas.core.frame.DataFrame'>

    เขียน pandas DataFrame ลงไฟล์

    %python
    path = '/tmp/'
    filename = f'{path}output1.xlsx'
    print(filename)
    with pd.ExcelWriter(filename) as writer:  
         df.to_excel(writer, sheet_name='Sheet_name_1')