Feeds:
Posts
Comments

Archive for the ‘Programming’ Category

I’ve spent many hours becoming (more or less) proficient in the ceremony around PostgreSQL functions;  hours that hopefully you won’t have to spend! Here is a one minute list.

  1. Functions and Stored Procedures are all Functions in PostgreSQL, so don’t waste time searching for syntax related to “stored procedures” in the PostgreSQL manual.
  2. Use the opening phrase “create or replace function …” so that you can quickly reload it while debugging.
  3. The function name may be schema qualified (eg- “my_schema.my_function”).  If it is not schema qualified, it will go into the public schema.
  4. The argument is in parentheses and is a comma separated list of names and simple PostgreSQL types like int, text, or varchar.
  5. I usually prefix a “p_” to all my parameter names to make the following code more readable.  You should too; you’ll thank me later.
  6. PostgreSQL can overload functions, so if you change the argument list, you will load a different function.  Beware of this and always remember drop a function explicitly if you didn’t intend for an overload to be floating around.
  7. The return value can be void (if your function is like a stored procedure in other SQL variants);  it can be any PostgreSQL data type like int, text, or varchar; it can be a TABLE definition if it is returning a query; or it can be a special data type like TRIGGER.
  8. Don’t forget the “AS”
  9. The text of the function will appear between double dollar sign delimiters after the AS, $$.  These are just text delimiters, there is no special voodoo in these.
  10. The text delimiter dollar signs may have other identifying characters in between, like $aa$ or $bbb$. These make different delimiters.
  11. If you mark beginning of text with one delimiter, close it with the same delimiter, like “$aa$ blah; blah; blah; $aa$”.  “$aa$ blah blah blah $bbb$” won’t work.
  12. If you need to declare local variables, include a DECLARE block first.  Local variables can be declared with any of the simple PostgreSQL types like int, text or varchar in addition to some special types.
  13. I usually prefix a “l_” to all my local variable names to make the following code more readable.  You should too; you’ll thank me later.
  14. The DECLARE block does not need its own END statement.
  15. The next block is delimited by BEGIN … END;.  Your SQL code goes in there.
  16. Assigning values to local variables?  Use “:=” operator and not “=”.  Also, the expression on the right of a “:=” should go into “()” parentheses.
  17. If your function returns anything other than void, you need to have a RETURN statement in there.
  18. After the final $$ text delimiter, include the string “language plpgsql”. There are other languages; you will probably never use them.

So here’s an example of a wrapper function which returns the time with a hello message.

CREATE OR REPLACE FUNCTION public.hello_time(p_hello_msg varchar) RETURNS varchar AS
$$
DECLARE
    l_tmp_str varchar;
BEGIN
    l_tmp_str := ( cast ( now() as varchar) );
    return p_hello_msg || ' ' || l_tmp_str;
END;
$$ LANGUAGE plpgsql;

You can execute it like this:

> select hello_time('hello');
hello 2018-08-15 12:24:34.0764339-05

Happy coding! Look for more TL; DR; guides in the future at The Solarium’s TL; DR; Guides.

Advertisements

Read Full Post »

Installing a new PostgreSQL database AND being able to connect to it can be a daunting task at times.  When you use an installer (Windows) or a package manager (Linux), the installation usually comes with all of the sensible defaults turned on and ready to go.  But, if you are dealing with a “special” custom installation, you may run into some connection issues.  Ain’t that special?

In this blog post, I will assume that you have already installed a PostgreSQL database instance on Linux.  I am working with PostgreSQL 10.4 and CentOS 7, but these tips should be general enough to help you troubleshoot on other platforms.  I am also assuming, if you’re reading this, that you have a client program like DBeaver (https://dbeaver.io/) on Windows, or command line psql on Linux running in a terminal, AND that your client cannot make a connection to your brand spanking new PostgreSQL database instance.  Without going into thousands of pages of reference material, here are four of the most common things to check up on when troubleshooting PostgreSQL connection issues.

  1. Is there network connectivity between client and server?
  2. Is a firewall on the server blocking your connections?
  3. Is PostgreSQL listening to the correct network interface?
  4. Is the client allowed by pg_hba.conf?

Network Connectivity

It sounds dumb, but the first thing to check is the network connection between the client and server machines.  Everything is plugged in, right?  Then try to ping the server from the client and vice versa.  If the machines are not ping-able, then you most likely have a network connectivity problem.  Here are some general troubleshooting ideas for network connectivity.

  • Check that you can ping other sites, like http://www.google.com. If you can, that points to a specific problem between your client and server.
  • You just installed PostgreSQL on the server; is the database server also freshly installed?  Make sure it has the correct networking parameters, and make sure the interface is configured to come up at boot time.
    • On CentOS, the network configuration is kept in a file in /etc/sysconfig/network-scripts, named something like ifcfg-ifacename.  Check that the contents of this file are correct for your network interface, and ONBOOT=yes and run systemctl restart network
  • If your server is running in a Virtual Machine guest OS and the database client is on running the host, check the virtual machine manager and make sure that networking is enabled from host to guest.
  • If you work in a secure corporate “double-secret probation” network environment, it may also be that ICMP is blocked and that ping won’t work.  Submit a ticket to your networking team to find out if that is an issue and for help troubleshooting connectivity.

Firewall Issues

A firewall on the server may be blocking communication to the port that PostgreSQL is listening on.  On the server machine, you can create custom firewall rules to allow network traffic to that port.  Unless you changed it in the postgresql.conf configuration file, the default port that PostgreSQL uses is 5432.  On CentOS, the firewall service is firewalld.  You can check to see if it is running by issuing the command

systemctl status firewalld

It will tell you if the service is disabled/inactive, stopped or running.  As a matter of fixing firewall issues, I don’t recommend stopping/disabling the firewall entirely. Certainly do not do this in production, although it can be OK on an isolated testbench network segment, like a private Hyper-V virtual network.  It is much better to add a custom firewall rule to the CentOS firewalld service for PostgreSQL

  • Look up the port number in postgresql.conf file in the PostgreSQL data directory for your database instance.  (The default value for PostgreSQL is 5432.)
  • For each host or network segment you want to allow through the firewall, run the following command
    firewall-cmd --add-rich-rule='rule family=ipv4 source address=192.168.10.100/32 port port=5432 protocol=tcp accept' --permanent
    This will allow connections from the host with the exact IP address 192.168.10.100 to make connections to the server machine on port 5432. Your parameters will vary of course.
  • Yes, the word “port” appears twice in the above command.  For historical reasons, with firewall rules on CentOS, the port is actually called the “portport”.  (Just kidding, but not about the double occurrence of the word “port” – you really do need it there twice.)
  • Don’t forget to reload the firewall after creating new rules. Run the command
    firewall-cmd --reload
    to do this.

Listener Network Interface

In the PostgreSQL postgresql.conf configuration file there are many settings, including the listen_addresses setting.

#listen_addresses = 'localhost'

PostgreSQL provides configuration files with many settings like this one:  the setting is commented out, but it is set to the default value. This particular setting means that PostgreSQL is by default only listening for connections that come over the localhost network interface; eg local connections. If your client is trying to connect from another machine, then it won’t be able to connect to PostgreSQL. To fix this, uncomment the setting and change it to

listen_addresses = '*'

With this value, PostgreSQL will listen for connections on any interface.  Like with the firewall, before the setting takes effect you need to reload the server configuration.  You can reload the server configuration without restarting the server.  If the data folder for your instance is in the location /usr/data/pgdata, you can reload configuration with the command

pg_ctl -D /usr/data/pgdata reload

(Remember to substitute the folder location of the PostgreSQL on your machine.)  If you have only one network card in your machine, it makes sense to just listen on all interfaces. If you have multiple network cards, it is best practice to listen only on the interfaces over which you expect PostgreSQL clients to connect to prevent spoofed connection attempts.  More details about this setting can be found in the manual.

pg_hba.conf

This file provides another layer of connection security based on the database user that you specified in the client.  Fortunately, if this is the problem, the client will have received and displayed an error message to the effect that the user does not exist in the pg_hba.conf file.  Pretty.  Stinking.  Obvious.

So to fix this issue, it is sufficient to add an entry to the end of this file.  The file is pretty well commented, but briefly, each entry consists of a line

host database_name user_name ip_address authentication_method

So if I had a client connecting with user name “tom_bombadil” to the database “old_forest” from a single computer with IP address 192.168.10.100 and we want to require a password, the line would be

host old_forest tom_bombadil 192.168.10.100/32 md5

There is a complete discussion of these parameters at the PostgreSQL site: https://www.postgresql.org/docs/current/static/auth-pg-hba-conf.html

Conclusion

Connecting to a brand new PostgreSQL instance can be mildly challenging if you’re installing from source or if you have non-standard configuration parameters or went around the installer in some way.  Not all of these issues necessarily involve PostgreSQL either.  Hopefully this list of common issues helped get you up and running!  If you had any other experiences that you think could be helpful, feel free to relate those in the comments!

Read Full Post »

I’ve got a new article up at The Code Project about how the internal keyword in C# can be used safely while preserving encapsulation. In a nutshell, it is about using internal interfaces instead of giving internal access directly to properties or fields, and how this can help maintain strict encapsulation while nevertheless granting special access to service layers. Give it a read and don’t forget to vote!

Read Full Post »

Currency Parsing, Regular Expressions

ASP.Net 2.0 has some very powerful client-side web page validation features, including classes that emit javascript validation code to the user’s web browser.  The CompareValidator, allows one to test whether the value entered into a text box is convertible to a given data type.  All one has to do add the CompareValidator tag to your markup,  set the properties and ASP.Net does the rest.  But don’t expect it to handle a dollar sign: use a regular expression for that!

(more…)

Read Full Post »

Pre-scoring Candidates for String Matching

Keywords: Levenshtein, String Matching, Optimization

Introduction

In many service oriented businesses, the problem comes up to search your customers for names which are published on a government watchlist. The input names on either list may contain typographical errors, so a fault insensitive matching algorithm must be run on each name pair. A model algorithm to consider for this is the Levenshtein string matching algorithm. Levenshtein is a nice algorithm to consider because it is well behaved, easy to understand and work with, and it is unbiased in the sense that it is not parametric and tuned for “English” sounding names only. The Levenshtein algorithm is (somewhat whimsically) explained in another posting here with references.

The time to compare all possible combinations of names varies linearly with the length of each input list, quadratically if they are both growing.  Typical string matching algorithms that are insensitive to typographical errors run more slowly than linear time.  Levenshtein is itself quadratic in the length of the strings it is comparing, so it makes sense to attempt to find a fast, linear-time pre-scoring algorithm that enables fast rejection of as many comparisons as possible beforehand. Furthermore, and perhaps as importantly, it is desirable to find a pre-scoring strategy that is separable over two strings being matched so that the pre-score can be calculated “offline” and stored with each string for quick comparison later.

(more…)

Read Full Post »

String Matching and Zaxxon

Keywords: Levenshtein, Wagner-Fischer, Zaxxon

It comes frequently in the service sector to check your customer names against some list provided by regulatory agencies, and the checking has to be tolerant against slightly mistyped names. So for example if “Hugo Chavez” is on the list, and “Huge Shavez” is in your list of customers, you might want to at least flag it as something to investigate further and see if it is a real match. One of the classic methods for evaluating near matches for words and names is the Levenshtein algorithm, or its closely related cousin Wagner-Fischer. But how does it work? And what does it have to do with Zaxxon?
(more…)

Read Full Post »

Keywords: Open Office, Ghostscript, Do It Yourself Convert Microsoft Word, Excel, Powerpoint to PDF

I’ve written a small program that uses Open Office to open and save different kinds of Microsoft Office files to PDF, and optionally merge them into a single output PDF file using GPL Ghostscript. I posted the code and article at the Code Project: http://www.codeproject.com/KB/java/PDFCM.aspx.

It’s a command line program, and we’re using a simplified version of it in production to do back-office conversions and merges of office files that we get from filling out forms internally and others that we get from customers. There are potentially many documents, and they can vary in size, so it is very cumbersome to cut, paste, print and scan everything to PDF (which is what our staff were doing when I started this project.)

Fortunately, it turns out that (1) one can use PRNADMIN.DLL with a Postscript Printer driver and an ActiveX IE browser to render a web page to Postscript, (2) Open Office can batch convert Microsoft Office files (and many more) to PDF, and (3) Ghostscript will merge Postscript and PDF on the command line.
(more…)

Read Full Post »

Older Posts »