Diff of Julia/文字列 - Takuya Miyashita

The added line is THIS COLOR.
The deleted line is THIS COLOR.
Go to Julia/文字列.
Deleting diff of Julia/文字列
#author("2019-11-01T18:49:43+09:00","default:Miyashita","Miyashita")
#author("2019-11-01T18:50:20+09:00","default:Miyashita","Miyashita")
*文字列　メモ [#j91570b9]
v0.6.x と v0.7.0 の間で仕様変更された部分が多い．
#contents

**characterとsrting [#f682c24a]
Julia では，文字列の種類に character と srting がある．
-character~
単一の文字を character としている．シングルクォートで表す．
#codeprettify{{
julia> typeof('A')
Char
julia> typeof('あ')
Char
}}
改行(\n)や，タブ(\t)も文字として扱われる．
#codeprettify{{
julia> typeof('\n')
Char
}}
~
-string~
文字列の並び（Vector{UInt8}）．ダブルクォート""で囲われる．
#codeprettify{{
julia> typeof("A")
String
}}
複数行に跨がるときはダブルクォートを3つで囲む．
#codeprettify{{
julia> typeof("""Hello
       World
       """)
String
}}
文字列が全て ASCII で表現できれば ASCIIString になり，それ以外なら UTF8String になるらしい．~
これらの文字列は immutable である．変更する場合はコピーするか置換するかで別の変数とする．~
また，s[n:m] のようにして文字列を抜き出すことができる．
~
~

**文字列の結合，繰り返し [#cf44b94d]
複数の文字列を結合するには * または string(arg1, arg2, arg3)を使う．
#codeprettify{{
julia> "abc"*"def"
"abcdef"
}}
#codeprettify{{ 
julia> string("abc","de","f")
"abcdef
}}
同じ文字配列の繰り返しは ^ で行う．
#codeprettify{{
julia> "%d,"^5
"%d,%d,%d,%d,%d,"
}}
上記のように，Julia での文字列の結合・繰り返しは数学の演算に近い．~

ディレクトリやファイルを繋ぐときは，join か joinpath が良い．~
Julia の join は Python と少し違う．joinpath が Python でいう join，MATLAB でいう fullfile．
#codeprettify{{
julia> join(["dirname","filename"],"/")
"dirname/filename"
}}
~
~

**文字列の長さ [#abb289cd]
length または lastindex (以前は endof) で長さを取得する．ただし，lastindex は複数バイトで表現される Unicode 文字を含むときは注意が必要．~
length が無難だと思う．
#codeprettify{{
julia> length("COMME des GARÇONS")
17
}}
#codeprettify{{
julia> lastindex("COMME des GARÇONS") # "Ç"が2つとしてカウントされる
18
}}
~
~

**文字列の検索・検出・抽出 [#t4f9b601]
-%%search%% → findfirst, findlast, findnext~
search はなくなり findfirst, findlast, findnext になった．
#codeprettify{{
julia> findfirst("a","abad")
1:1
julia> findlast("a","abad")
3:3
julia> findnext("a","abad",1) # 3番目の引数は開始位置
1:1
julia> findnext("a","abad",2)
3:3
julia> findnext("a","abad",3)
3:3
julia> findnext("a","abad",4)

julia> findnext("a","abad",4) === nothing
true
}}
~
~

-%%contains%% → occursin~
含まれているかどうか，true(=1) か false(=0) で返す．~
. をつけて配列の全要素に対して判定すると便利．
#codeprettify{{
julia> s = ["COMME","des","GARÇONS"]
3-element Array{String,1}:
 "COMME"
 "des"
 "GARÇONS"
}} 
#codeprettify{{
julia> occursin.("O",s)
3-element BitArray{1}:
  true
 false
  true
}} 
~
~

-occursin (正規表現)~
occursin は正規表現にも対応しているが，配列のパターンマッチを行う場合，上記と同様の"."によるブロードキャストが（正規表現が含まれると）できない．
#codeprettify{{
julia> s = ["COMME","des","GARÇONS"]
3-element Array{String,1}:
 "COMME"
 "des"
 "GARÇONS"
}} 
#codeprettify{{
julia> occursin.(r"O",s)
ERROR: MethodError: no method matching length(::Regex)
}}
正規表現では，map (または broadcast) を使う必要がある点に注意．
#codeprettify{{
julia> map(x->occursin(r"O",x),s)
3-element Array{Bool,1}:
  true
 false
  true
}}
#codeprettify{{
julia> broadcast(x->occursin(r"O",x),s)
3-element BitArray{1}:
  true
 false
  true
}}
~
~

-match, %%ismatch%%, ~
正規表現を使って，パターンマッチを行う．~
ismatch はなくなって occursin に．~
#codeprettify{{
# match はマッチした部分を返す
julia> m = match(r"[A-Z]{5}\s[a-z]{3}\s\p{Lu}{7}\s[a-z]{5}\s\d{4}$","COMME des GARÇONS since 1969")
RegexMatch("COMME des GARÇONS since 1969")
}}
この戻り値 m は RegexMatch という構造体になり，正規表現の部分に()で括ると，それぞれを抽出できる．~
(\d{n}) のように正規表現で数値として抽出しても，String で抽出されることに注意．
#codeprettify{{
# スペースとsince以外の部分を()で括る
julia> m = match(r"([A-Z]{5}\s[a-z]{3}\s\p{Lu}{7})\s[a-z]{5}\s(\d{4})$","COMME des GARÇONS since 1969")
RegexMatch("COMME des GARÇONS since 1969", 1="COMME des GARÇONS", 2="1969")
}}
#codeprettify{{
#capturesをそれぞれを表示 
julia> Printf.@printf("Wikipediaによれば%sは%s年設立", m.captures[1], m.captures[2])
WikipediaによればCOMME des GARÇONSは1969年設立
}}
~
~

**文字列の置換 [#l18b5474]
置換は replace で行う．正規表現でも使える．
#codeprettify{{
julia> replace("COMME des GARÇONS",r"[A-Z]{5}" => "")
" des GARÇONS"
}}
大文字，小文字の変換もできる，
#codeprettify{{
julia> replace("COMME des GARÇONS","O" => lowercase) # Oを小文字に
"CoMME des GARÇoNS"
}}
#codeprettify{{
julia> replace("COMME des GARÇONS",r"[a-z]" => uppercase) # 小文字を大文字に
"COMME DES GARÇONS"
}}
~
~

**配列から特定の文字列を含む要素を抽出 [#w6cccfe0]
Vector{String} に対して，ある文字列が配列の何番目の要素に含まれているかを求める．~
テキストを読んだ時に使う．
#codeprettify{{
# txt :: Vector{String} から "hoge" :: String が含まれる配列を探す
# txt::Vector{String} から "hoge"::String が含まれる配列を探す
findfirst(x->occursin("hoge", x), txt)
}}